Q

LLM Serving Engineer (Cloud AI Engineering), Senior / Staff Engineer

Qualcomm
Full-time
On-site
Austin
$158,400 - $158,400 USD yearly
Overview

LLM Serving Engineer (Cloud AI Engineering) at Qualcomm Technologies, Inc. Qualcomm is investing in Deep Learning and Cloud AI, developing hardware and software solutions for Inference Acceleration. This role spans the full product lifecycle—from cutting-edge research and development to commercial deployment—and demands strategic thinking, strong execution, and excellent communication skills. Responsibilities

Building a scalable LLM inference platform using inference techniques (e.g. disaggregated serving and KV-Cache management, advanced parallelism, speculative algorithms, model optimization, specialized kernels). Contribute to the development of LLM Serving packages (e.g. vLLM, SGLang, TGI, Triton-Inference server, Dynamo, LLM-d). Work closely with customers to drive solutions by collaborating with internal compiler, firmware and platform teams. Work at the forefront of GenAI by understanding advanced algorithms (e.g. attention mechanisms, MoEs) and numerics to identify new optimization opportunities. Drive efficient serving through smart autoscaling, load balancing and routing. Engage with open-source serving communities to evolve the framework. Qualifications

Hands-on experience in one or more of the following LLM serving/Orchestration packages (Triton-Inference Server, vLLM, SGLang, Ollama, llm-d, KServe, LMCache, MoonCake). Deep understanding of foundational LLMs, VLMs, SLMs, transformer-based architectures. Strong experience in developing language models using PyTorch. Strong computer science fundamentals - algorithms, data structures, parallel and distributed programming. Understanding of computer architecture, ML accelerators, in-memory processing and distributed systems. Strong Python development skills for large-scale projects with passion for software engineering. Experience in analyzing, profiling, and optimizing deep learning workloads. Proactive learning about the latest inference optimization techniques. Excellent communication and problem-solving skills, with the ability to thrive in a fast-paced and collaborative environment. MS in Computer Science, Machine Learning, Computer Engineering or Electrical Engineering. Bonus Skills

Open-source contribution to any GenAI package. Experience architecting and developing large-scale distributed systems. High-level kernel design experience (PyTorch, CUDA, Triton). Knowledge of torch.compile or torchDynamo PhD in Computer Science, Computer Engineering or Machine Learning Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience. OR Master’s degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of related experience. OR PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of related experience. Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, Qualcomm will provide reasonable accommodations. Email disability-accomodations@qualcomm.com or call Qualcomm.

Qualcomm is committed to an accessible hiring process and workplace for individuals with disabilities. Pay Range and Benefits

$158,400.00 - $237,600.00 The pay range reflects the broad range for this job code and location. Salary is one component of total compensation, which also includes a discretionary bonus program and potential RSU grants. Details about US benefits are available from your recruiter. For more information about this role, please contact Qualcomm Careers.

#J-18808-Ljbffr