F

Principal AI Engineer

Flashii
3 hours ago
Full-time
On-site
San Francisco, California, United States
Location: Onsite – San Francisco, CA Employment Type: Full-Time

Are you ready to apply Make sure you understand all the responsibilities and tasks associated with this role before proceeding.

About the Role: We are seeking an experienced Principal Agentic AI Engineer to lead the design, architecture, and delivery of an enterprise-scale Agentic AI platform. This individual will drive the technical vision for multi-agent AI systems, Retrieval-Augmented Generation (RAG), MCP-based tool integrations, and scalable microservices that let enterprises compose, govern, and operate domain-specific agents at scale — and drive reusability by codifying patterns into shared skills and sub-agents across the Agentic Development Lifecycle (ADLC).

Responsibilities: Agentic AI Architecture: Own end-to-end design of multi-agent systems using LangChain, LangGraph, and Model Context Protocol (MCP) — including planner-executor patterns, sub-agent hierarchies, tool routing, retries, and cost-aware token budgeting. RAG & Knowledge Systems: Architect production-grade RAG pipelines with vector databases (pgvector, Qdrant), hybrid retrieval, re-ranking, and document-aware chunking to ground agents in enterprise knowledge. Solution Architecture: Design reference architectures and solution blueprints for enterprise clients across regulated and consumer-facing industries — translating business outcomes into agentic AI roadmaps and reusable accelerators. Scalable Microservices: Build event-driven microservices on Kafka, polyglot data layers with PostgreSQL and vector DBs, and Kubernetes-based deployment topologies for high-throughput inference workloads. MLOps & Model Lifecycle: Establish practices spanning training, fine-tuning, prompt and config versioning, structured evaluations against golden datasets, drift detection, and automated rollback when output quality degrades. Traceability & Observability: Instrument agent reasoning traces, tool-call audit trails, token spend, and quality signals with Prometheus, Grafana, and OpenTelemetry — enabling policy enforcement and human- in-the-loop oversight. Reusable Engineering Standards: Codify AI engineering patterns (RAG retrievers, agent loops, eval harnesses, traceability spans) into reusable skills, sub-agents, and platform components consumed across multiple product lines. Rapid Engineering in Agentic Development Lifecycle: Roll out AI-led developer tools and sub-agents (Claude Code, Playwright MCP) across planning, code generation, code review, test authoring, and release validation — accelerating delivery while standardizing quality. Presales & Client Engagement: Partner with sales, presales, and customer success on enterprise pursuits — authoring solution designs, leading technical workshops, and shaping agentic AI roadmaps for prospects and existing clients. xsgimln

Qualifications: 8+ years of software engineering and solution architecture experience 3+ years of hands-on experience designing and deploying LLM-based or Agentic AI systems in production environments Deep expertise with, LangChain, LangGraph, Retrieval-Augmented Generation (RAG), MCP / AI tool orchestration, Prompt engineering, Context engineering, Token optimization Strong programming experience in Python and TypeScript (Java preferred) Experience building scalable microservices using FastAPI, Spring Boot, Node.js, or related frameworks Hands-on experience with, AWS, Azure, or GCP, Kubernetes, Docker, Terraform, CI/CD pipelines Strong understanding of, MLOps, AI model lifecycle management, Evaluation framework, Drift detection, AI observability Proven enterprise solution architecture experience translating business requirements into scalable AI solutions