S

Generative AI Engineer (LLM Expert - AWS Focus)

Saviance
2 hours ago
Full-time
On-site
East New York, New York, United States
Generative AI Engineer (LLM Expert – AWS Focus)

Location: Remote Employment Type: Ongoing Contract About BigRio BigRio is a Boston-based, remote-first technology consulting firm specializing in advanced data, cloud, and software engineering solutions. We partner with forward-thinking organizations to deliver scalable, secure, and high-performance technologies, with deep expertise in AI/ML, data engineering, and AWS-native architectures. Our clients span healthcare, life sciences, government, and enterprise sectors, and we're known for tackling complex, high-impact challenges with cutting-edge innovation and measurable results. About the Role We're seeking a hands-on Generative AI Engineer (LLM Expert) who combines strong AWS development experience (60%) with deep expertise in applied LLM engineering (40%). This role is ideal for an engineer who has built real-world applications using OpenAI APIs and retrieval-augmented generation (RAG) — not someone focused on traditional ML or model training. You'll work with BigRio's internal AI team and client partners to design, build, and optimize LLM-powered features, integrating them into cloud-native, production-ready systems. This is a senior technical role, not a research or experimental position. The focus is on building, shipping, and scaling LLM applications using OpenAI models, LangChain, and AWS infrastructure. Key Responsibilities Design, develop, and deploy AWS-based applications (Lambda, API Gateway, ECS, RDS, S3, Secrets Manager) that integrate LLM-powered features. Implement OpenAI-driven workflows, leveraging reasoning and non-reasoning models, temperature settings, and model versioning best practices. Apply prompt engineering and prompt chaining techniques to improve LLM accuracy and performance for production workloads. Build retrieval-augmented generation (RAG) pipelines using LangChain, ChromaDB, or similar frameworks. Develop FastAPI or Flask-based backends that connect to OpenAI APIs and vector databases. Build interactive front-ends and tools using Gradio or Streamlit for rapid prototyping and testing. Ensure secure, containerized deployments using Docker and integrate SSO and role-based access controls. Automate data pipelines and document workflows via Google Drive, AWS SDKs, or REST APIs. Write production-grade Python code, following clean architecture, documentation, and CI/CD best practices. Collaborate closely with AI engineers, DevOps teams, and clients to deliver enterprise-ready LLM applications. Required Qualifications 5+ years of experience in professional software development, with a strong focus on AWS cloud and backend systems. 3+ years of direct experience working with OpenAI APIs, GPT models, and LLM application development. Proven ability to build and deploy LLM-powered applications, not just experiment with models. Knowledge of vector databases like Pinecone or FAISS is required. Expertise in Python, FastAPI, and API-driven architecture. Strong practical experience with LangChain, ChromaDB, RAG, and prompt engineering. Proficiency in Docker, AWS IAM, and secure deployment practices. Excellent communication skills — ability to explain LLM behavior, tradeoffs, and reasoning clearly to both technical and non-technical teams. Comfortable working independently in a fast-paced, client-facing environment across time zones. Nice to Have Experience with LangGraph or other LLM orchestration frameworks. Familiarity with MLOps, CI/CD pipelines, and observability for LLM workloads. Exposure to healthcare, biotech, or regulated data environments. Demonstrated experience explaining and documenting AI system design and decision-making for non-AI stakeholders. What This Role is Not Classical machine learning or model training (e.g., TensorFlow, PyTorch-based model design). Research, experimentation, or theoretical AI. Low-code or no-code chatbot builders. This is a pure LLM engineering and AWS application development role — building scalable, production-quality AI systems using OpenAI and related frameworks.