AI Engineer - Classifiers, Media Intelligence & Voice R&D
Insight Global
Job Description
Studio produces and manages massive volumes of AI-generated media-avatars, images, audio, and video. We're hiring an AI Engineer to build the "intelligence layer" on top of that content: classifying it, tagging it, organizing it, and making it searchable and useful at scale.
This role also includes meaningful R&D work in AI voice/audio and image intelligence that will directly shape next-generation product capabilities. You'll operate at the intersection of applied ML, production infrastructure, and product impact-taking models from research and prototypes all the way into reliable production APIs.
Responsibilities:
Design, train, and deploy classification models across Studio's content pipeline (style detection, quality scoring, semantic categorization, filtering, moderation signals)
Build automated tagging and organization systems for our media library:
attribute extraction and feature detection
clustering and similarity grouping
intelligent search and discovery signals
Create and maintain training data pipelines:
dataset curation and QA
annotation workflows and tooling
active learning loops to improve labeling efficiency and model performance
Lead voice/audio R&D by evaluating frontier voice models (TTS, voice cloning, audio synthesis)
and prototyping integrations and define a practical path from research ? production features
Research and prototype image intelligence capabilities relevant to avatars and media tools: face/body analysis, pose estimation, image-to-image consistency and visual similarity, and style understanding
Build evaluation frameworks to track classifier accuracy and calibration as well as model drift and performance over time
Optimize inference for cost and latency (batching, quantization, caching, serving strategy)
Integrate with GPU infrastructure and ship models behind production APIs, with monitoring and operational discipline
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.
Skills and Requirements
*Tech Stack: Python, PyTorch, GPU compute clusters (EOK), REST/webhook APIs, TypeScript (NestJS integration), PostgreSQL (labels/metadata), Redis, Temporal (pipeline orchestration)
3+ years building and deploying ML systems in production, especially classification / tagging / content understanding
Hands-on model training experience (not just API usage): you've curated datasets, tuned hyperparameters, debugged training runs, and shipped the result
Strong background in computer vision (CNNs, ViTs, CLIP-style embeddings, or similar)
Demonstrable interest or experience in voice/audio AI (TTS, voice cloning, audio classification, speech synthesis)-production work, research, or serious side projects all count
Proficiency in Python with PyTorch (or TensorFlow) and comfort reading/adapting research code
Experience building labeling pipelines, annotation processes, or active learning systems
Understanding of production model serving: REST APIs, batching, latency constraints, monitoring, and drift detection
Familiarity with embeddings, vector search, or semantic similarity systems for retrieval/discovery features - Experience with diffusion models, GANs, or generative audio techniques
Experience publishing research, open-sourcing ML tooling, or contributing to ML community work