We are seeking a highly experienced professional with deep expertise in agentic architectures and enterprise-scale AI systems. This role requires a self-directed, autonomous engineer capable of taking full ownership of a critical microservice from day one, identifying gaps, proposing solutions, and driving initiatives end-to-end with minimal oversight.
Required Skills & Qualifications
5 years of software engineering experience, with a strong focus on AI/ML engineering and backend systems
Expert-level proficiency in Python and Pydantic for validation and type-safe development
Deep hands-on experience with LangChain, LangGraph, and multi-agent/agentic system design
Strong understanding of Large Language Models (LLMs), including architecture, behavior, strengths, and limitations
Proven experience building RAG-based systems with vector database integration, Elasticsearch preferred
Experience designing scalable AI workflows for big data processing
Strong experience building and consuming REST APIs with FastAPI
Familiarity with LLM fine-tuning and training concepts (conceptual understanding required)
Demonstrated ability to work independently with minimal direction, own systems end-to-end, and proactively identify and drive workstreams
Prior work experience at client or in client's industry
Preferred Skills & Qualifications
Frontend experience with React, HTML, and TypeScript
Background in Data Science or analytics pipelines
Experience with LangSmith (LLM tracing, debugging, evaluation) and Galileo (LLM observability and data quality monitoring)
Experience working in enterprise-scale or financial services environments
Exposure to distributed systems and cloud platforms (AWS/GCP/Azure)
Day-to-Day Responsibilities
Own and lead the design, development, and lifecycle management of an enterprise-scale agentic microservice
Architect and implement LangChain / LangGraph-based multi-agent systems for complex enterprise use cases
Design and build backend APIs and orchestration services using Python and FastAPI
Develop scalable agentic workflows capable of processing large-scale data (batch and streaming pipelines)
Design and optimize Retrieval-Augmented Generation (RAG) workflows integrating vector databases and relational data stores
Apply advanced prompt engineering and context management techniques to improve LLM reliability and output quality
Implement observability, telemetry, and monitoring frameworks to ensure system reliability and performance
Enforce high code quality standards, using Python best practices and Pydantic for validation and schema management
Identify and own technical debt, architectural gaps, and new initiatives, driving them through to completion independently
Company Benefits & Culture
Comprehensive health, dental, and vision insurance
Competitive 401(k) matching and retirement planning
Flexible work arrangements and a supportive work-life balance culture