Location: Jersey City, NJ / NYC Duration: Full Time OR Contract
Job Description: We are seeking a Senior AI Engineer with strong AI system design and agentic reasoning expertise to build and scale GenAI-powered agentic solutions for complex production environments. The role focuses on designing agents that can diagnose, reason, plan, and safely act to reduce operational risk, cost, and incident resolution time in large-scale systems. This position requires deep architectural thinking and agent planning capability, beyond hands-on familiarity with GenAI frameworks.
Key Responsibilities:
Design and implement end-to-end agentic AI systems for production environments, including planning, orchestration, execution, and feedback loops
Build tool-calling agents that combine retrieval, structured reasoning, and secure action execution (function calling, change orchestration, policy enforcement)
Define and implement agent planning architectures, including plan generation, structuring, validation, and re-planning
Productionize LLMs by building evaluation frameworks, RAG pipelines, prompt synthesis, response validation, and self-correction loops
Integrate agents with observability, incident management, deployment, and runbook systems for automated diagnostics and remediation
Design safety, reliability, and governance mechanisms such as guardrails, validator models, adversarial prompt handling, circuit breakers, and rollback strategies
Optimize cost, latency, and performance through context management, caching, model routing, batching, and parallel execution
Collaborate with production and application teams to translate operational challenges into measurable, auditable AI outcomes
Drive architectural design reviews and mentor engineers on agent design, evaluation, and safe deployment patterns
Essential Skills & Experience:
5+ years of software development experience (Python, Java, Go, or C/C++), with strong production systems exposure
3+ years designing and deploying production ML/AI systems, including deployment, monitoring, and evaluation
Strong hands-on experience with LLM-based systems, including:
Retrieval-Augmented Generation (RAG)
Tool-using / function-calling agents
Prompt engineering and adaptation
Deep understanding of agentic AI concepts, especially:
Agent planning vs execution
Multi-step reasoning and decision flows
Handling ambiguity, partial state, and failure modes
Solid foundation in applied statistics, ML fundamentals, algorithms, and data structures
Strong analytical thinking, ownership mindset, and ability to communicate complex designs clearly
Preferred Experience:
Experience with agentic frameworks and orchestrators (e.g., Lang Chain, ADK, similar)
Cloud infrastructure experience (AWS preferred): ECS/EKS, Lambda, S3, DynamoDB, Redshift, Step Functions, SageMaker
Infrastructure-as-code using Terraform or CloudFormation
Mandatory Skills: Java-J2EE.