I

AI Engineer - Classifiers, Media Intelligence & Voice R&D

Insight Global
7 hours ago
Full-time
On-site
Job Description

Studio produces and manages massive volumes of AI-generated media-avatars, images, audio, and video. We're hiring an AI Engineer to build the "intelligence layer" on top of that content: classifying it, tagging it, organizing it, and making it searchable and useful at scale.

This role also includes meaningful R&D work in AI voice/audio and image intelligence that will directly shape next-generation product capabilities. You'll operate at the intersection of applied ML, production infrastructure, and product impact-taking models from research and prototypes all the way into reliable production APIs.

Responsibilities:

Design, train, and deploy classification models across Studio's content pipeline (style detection, quality scoring, semantic categorization, filtering, moderation signals)

Build automated tagging and organization systems for our media library:

attribute extraction and feature detection

clustering and similarity grouping

intelligent search and discovery signals

Create and maintain training data pipelines:

dataset curation and QA

annotation workflows and tooling

active learning loops to improve labeling efficiency and model performance

Lead voice/audio R&D by evaluating frontier voice models (TTS, voice cloning, audio synthesis)

and prototyping integrations and define a practical path from research ? production features

Research and prototype image intelligence capabilities relevant to avatars and media tools: face/body analysis, pose estimation, image-to-image consistency and visual similarity, and style understanding

Build evaluation frameworks to track classifier accuracy and calibration as well as model drift and performance over time

Optimize inference for cost and latency (batching, quantization, caching, serving strategy)

Integrate with GPU infrastructure and ship models behind production APIs, with monitoring and operational discipline

We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.To learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: https://insightglobal.com/workforce-privacy-policy/.

Skills and Requirements

*Tech Stack: Python, PyTorch, GPU compute clusters (EOK), REST/webhook APIs, TypeScript (NestJS integration), PostgreSQL (labels/metadata), Redis, Temporal (pipeline orchestration)

3+ years building and deploying ML systems in production, especially classification / tagging / content understanding

Hands-on model training experience (not just API usage): you've curated datasets, tuned hyperparameters, debugged training runs, and shipped the result

Strong background in computer vision (CNNs, ViTs, CLIP-style embeddings, or similar)

Demonstrable interest or experience in voice/audio AI (TTS, voice cloning, audio classification, speech synthesis)-production work, research, or serious side projects all count

Proficiency in Python with PyTorch (or TensorFlow) and comfort reading/adapting research code

Experience building labeling pipelines, annotation processes, or active learning systems

Understanding of production model serving: REST APIs, batching, latency constraints, monitoring, and drift detection

Familiarity with embeddings, vector search, or semantic similarity systems for retrieval/discovery features - Experience with diffusion models, GANs, or generative audio techniques

Experience publishing research, open-sourcing ML tooling, or contributing to ML community work