building reusable, enterprisewide anomaly detection solutions . This role blends
handson AI/ML engineering ,
observability expertise , and
automation
to proactively detect system issues and improve production reliability.
The ideal candidate has strong
Python-based ML experience , a solid grasp of
observability principles
(logs, metrics, traces), and has worked closely with
Infrastructure, SRE, and Engineering teams
to implement scalable observability solutions across complex systems.
This is a
senior individual contributor role
requiring independence, initiative, and subjectmatter expertise.
Key Responsibilities
AI/ML & Observability Engineering
Design, build, and deploy
AI/ML models for anomaly detection
across telemetry data (logs, metrics, traces, KPIs)
Translate earlystage use cases into
generalized, reusable observability solutions
Modify and extend models to support multiple applications and teams
Apply ML techniques to predict system anomalies
before production impact
Telemetry & System Monitoring
Analyze and correlate
logs, metrics, traces, and system KPIs
Identify early warning signals of instability or degradation
Build dashboards and alerts using observability platforms
Collaboration & Strategy
Work closely with
Infrastructure, SRE, Developers, and Architects
Contribute to
enterprise observability strategy
Act as a subject matter expert for AIdriven observability
Operate independently within a small, highimpact team
Automation & Cloud
Develop automation to support endtoend observability workflows
Deploy solutions in
cloud environments
Leverage OpenTelemetry standards for instrumentation and data collection
Required Qualifications
6+ years of experience in
AI/ML engineering, SRE, or observabilityfocused roles
Strong expertise in
Python
for data processing and ML development
Handson experience building
ML models for anomaly detection
Solid understanding of
observability principles
(logs, metrics, traces)
Experience withobservability tools such as:
Grafana
(preferred)
Splunk
Dynatrace
Familiarity with
OpenTelemetry
Strong automation skills (pipelines, workflows, reusable components)
Experience working in
cloud environments
Excellent problemsolving and communication skills
Preferred Qualifications
Experience designing
predictive models
for system reliability
Background supporting production systems in largescale environments
Experience building reusable ML platforms or shared services
Exposure to enterprisewide monitoring or observability programs
Ideal Candidate Profile
Seniorlevel, handson engineer
Strong ownership mindset; able to drive work endtoend
Comfortable operating with limited supervision
Strategic thinker with pragmatic execution skills
Passionate about reliability, automation, and proactive problem detection
Dexian stands at the forefront of Talent + Technology solutions with a presence spanning more than 70 locations worldwide and a team exceeding 10,000 professionals. As one of the largest technology and professional staffing companies and one of the largest minority-owned staffing companies in the United States, Dexian combines over 30 years of industry expertise with cutting-edge technologies to deliver comprehensive global services and support.
Dexian connects the right talent and the right technology with the right organizations to deliver trajectory-changing results that help everyone achieve their ambitions and goals. To learn more, please visit https://dexian.com/ .
Dexian is an Equal Opportunity Employer that recruits and hires qualified candidates without regard to race, religion, sex, sexual orientation, gender identity, age, national origin, ancestry, citizenship, disability, or veteran status.