Senior Software Test Automation Engineer, AI Engineering

TWG Global AI

1 day ago

Full-time

On-site

Santa Monica, California, United States

Senior AI Software Engineer In Test

TWG Global is seeking a Senior AI Software Engineer in Test to join our AI Engineering team building commercial-grade AI products. This is a software engineering role focused on test automation. You won't just write test cases, you'll design and build the frameworks, harnesses, evaluation infrastructure, and tooling that make testing AI agents and LLM-powered applications possible at scale. Our agents are written in LangGraph and run on Azure on the TWG side, with a parallel Vercel-based stack on the Palantir side. You'll write eval sets against both, and you'll validate the surfaces our users actually touch: iOS apps, plugins, and Chrome extensions, not just the model layer. You'll work shoulder-to-shoulder with AI engineers and data scientists, contributing production-quality code to shared repositories. The ideal candidate is a strong coder, fluent in Python and Java — who has shipped automated test infrastructure in a production environment and has hands-on experience evaluating LLM and agentic systems. Key Responsibilities

Framework and harness engineering Design and build scalable, reusable test automation frameworks for AI agents, LLM-powered applications, and underlying APIs. Write clean, maintainable Python for test harnesses, eval pipelines, synthetic data generation utilities, and internal tooling. Treat test code as production code: code review, type hints, documentation, library design. Evaluation infrastructure Build evaluation infrastructure for benchmarking agent performance against SOTA LLMs, competitors, and internal baselines. Own regression suites, golden datasets, rubric-based evals, and metric dashboards. Build tooling for synthetic test data generation, edge-case discovery, and adversarial testing. Resilience and load Design and run release, system, performance, and load tests against streaming, stateful, and async systems. Build chaos and fault injection tooling for token expiry, connection pool exhaustion, provider failover, and cache pressure scenarios. Drive contract testing across LLM providers (Bedrock, Anthropic, OpenAI) to catch parity drift. CI/CD and observability Integrate automated tests into CI/CD so every model, prompt, and code change is validated before it ships. Build trace-based assertions on LangGraph state, tool calls, and agent decisions — debugging an agent failure means replaying graph state, not re-running a prompt. Make observability a first-class testing surface (LangSmith, audit logs). Human-in-the-loop and partnership Implement HIL review workflows where automation alone cannot validate quality, then push the automation boundary outward. Partner with AI engineers and data scientists on model evaluation, training and eval data prep, and root-cause debugging of complex end-to-end failures. Champion quality engineering practices across the team: code review, coverage standards, observability, reproducibility. Ensure user-centric validation so AI outputs are accurate, reliable, and meet real-world application needs.

Apply now

Senior Software Test Automation Engineer, AI Engineering

More jobs

Microservices Tech Lead AI Engineer/Analyst - Vice President TAMPA

Citigroup Inc

Remote Senior Applied AI Engineer - Data & Analytics Architecture

Insight Global