Senior AI Engineer (GenAI + Data Platform β AWS)
We are seeking a Senior AI Engineer to design, build, and scale a production-grade Generative AI and Data Platform on AWS. The role focuses on enabling LLM-powered capabilities through vector search, graph-based knowledge systems, and governed data pipelines. The ideal candidate will own end-to-end delivery across the AI lifecycle, including:
Data ingestion and knowledge curation
Embeddings and retrieval systems
Backend services and APIs
CI/CD pipelines and deployment
This role will closely partner with product and engineering teams to operationalize AI capabilities in externally facing applications and drive evolution toward agentic AI systems.
Key Responsibilities:
GenAI Enablement & Integration
Build and operationalize LLM-powered applications using Retrieval-Augmented Generation (RAG), embeddings pipelines, prompt orchestration and evaluation frameworks, vector search systems using Amazon OpenSearch, graph-based knowledge systems using Amazon Neptune for relationships, lineage, and explainability, and integrate supporting infrastructure: Amazon ElastiCache (Redis) for session state and caching, DynamoDB for scalable, low-latency data access, and implement agentic workflows using frameworks such as LangGraph, AutoGen, CrewAI (or equivalent).
Integrate with LLM frameworks like LangChain, LlamaIndex (tool calling, retrieval orchestration, context management) and define standards for tool integration and context-sharing patterns (MCP-style designs).
Evaluate LLM models and retrieval strategies across latency, cost, accuracy, and context limitations.
Data Pipelines & Knowledge Engineering
Design and build scalable data pipelines using Databricks and Apache Spark.
Implement data ingestion and transformation pipelines, document processing (chunking, metadata tagging), embedding generation and indexing.
Ensure high data quality standards: validation, completeness, consistency, monitoring.
Implement data governance frameworks: data classification and access controls, retention policies, auditability and lineage tracking.
Backend Services & APIs
Develop backend services exposing AI capabilities through secure and scalable APIs.
Define best practices for API contracts and versioning, reliability (retry logic, circuit breakers, idempotency).
Enable reusability of platform capabilities across teams and applications.
Deployment, MLOps & Operational Excellence
Build and manage CI/CD pipelines for AI and data workloads.
Deploy production systems using Docker (containerization), Kubernetes (orchestration).
Implement deployment strategies: blue/green deployments, canary releases, rollback strategies, feature flags.
Ensure system reliability through monitoring (latency, failures, cost, data freshness), alerting and observability, secrets management and least-privilege access.
Optimize platform performance and cost.
LLM Observability, Evaluation & Quality
Define and track GenAI quality metrics: grounding/faithfulness, retrieval relevance, response consistency, latency and cost per request.
Implement prompt/version tracking, offline evaluation pipelines, continuous improvement workflows.
LLM Security, Safety & Compliance
Implement secure AI systems with access control and authentication, data protection policies, responsible AI guardrails.
Ensure compliance with best practices in AI safety, data privacy, monitoring and auditability.
Required Skills:
Strong experience in Generative AI / LLM systems (RAG, embeddings, prompt engineering)
Hands-on experience with AWS ecosystem
Expertise in OpenSearch (vector search), Neptune (graph databases), DynamoDB and Redis (ElastiCache)
Experience with LangChain / LlamaIndex, agentic AI frameworks (LangGraph, AutoGen, CrewAI)
Strong programming skills (Python preferred)
Experience with Databricks and Apache Spark
Solid understanding of data pipelines, distributed systems, API design
Preferred Skills:
Experience with model evaluation frameworks and LLM observability tools
AI governance and compliance frameworks
Kubernetes and advanced MLOps practices
Familiarity with Model Context Protocol (MCP) patterns, agent-based architectures
Qualifications:
Bachelor's or Master's degree in Computer Science / Data Science / AI / related field
Proven experience building production-grade AI platforms and systems
Strong background in end-to-end AI/ML lifecycle delivery
Soft Skills:
Strong problem-solving and analytical thinking
Ability to communicate complex AI concepts clearly
Collaborative and cross-functional mindset
Ownership-driven and proactive execution