Amazon
E-commerce and cloud computing giant with AWS, the world's leading cloud platform.
5 Rounds
~28 Days
Very Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
ML Engineer
•
Behavioral
•
medium
Describe a model you deployed to production. What were the biggest challenges?
#Deployment
#Challenges
ML Engineer
•
Behavioral
•
hard
Tell me about a time you had to optimize a model for latency without sacrificing too much accuracy.
#Latency
#Accuracy
ML Engineer
•
Behavioral
•
medium
Describe how you collaborated with data scientists to productionize their research code.
#Research to Production
ML Engineer
•
Behavioral
•
hard
Tell me about a time an ML model caused an unexpected real-world impact.
#Responsibility
#AI Safety
ML Engineer
•
Behavioral
•
easy
How do you keep up with the rapidly evolving ML landscape?
#Continuous Learning
ML Engineer
•
Behavioral
•
hard
Describe a time you had to re-architecture a system because the original ML approach didn't scale.
#Scalability
ML Engineer
•
Behavioral
•
medium
Tell me about a disagreement you had with a researcher. How did you resolve it?
#Communication
ML Engineer
•
Behavioral
•
medium
How do you decide when a model is 'good enough' to ship?
#Quality
#Judgment
ML Engineer
•
Behavioral
•
medium
Tell me about a time you demonstrated customer obsession in an ML project. (LP)
#Customer Obsession
ML Engineer
•
Coding
•
hard
Implement a K-means clustering algorithm from scratch in Python.
#K-Means
#Clustering
ML Engineer
•
Coding
•
hard
Implement logistic regression with gradient descent in NumPy.
#Logistic Regression
#NumPy
ML Engineer
•
Coding
•
hard
Write a custom PyTorch Dataset and DataLoader for irregular time series data.
#PyTorch
#DataLoader
ML Engineer
•
Coding
•
medium
Implement a sliding window approach to detect anomalies in a time series.
#Anomaly Detection
#Time Series
ML Engineer
•
Coding
•
hard
How would you write a batched inference pipeline using Python and Triton server?
#Triton
#Batching
ML Engineer
•
System Design
•
hard
Design a CI/CD pipeline for ML models.
#CI/CD
#Deployment
ML Engineer
•
System Design
•
hard
What is a feature store? Design one from scratch.
#Feature Engineering
#MLOps
ML Engineer
•
System Design
•
hard
How would you serve a model that needs to respond in under 10ms?
#Low Latency
#Serving
ML Engineer
•
System Design
•
hard
Design a system to retrain models automatically when performance degrades.
#Retraining
#Automation
ML Engineer
•
System Design
•
hard
Design YouTube's video recommendation system end to end.
#Recommendations
#Ranking
ML Engineer
•
System Design
•
hard
Design a real-time content moderation system.
#NLP
#Real-Time
ML Engineer
•
System Design
•
hard
Design a search ranking system for an e-commerce platform.
#Ranking
#Relevance
ML Engineer
•
System Design
•
hard
Design a training and serving architecture for a large language model at scale.
#Infrastructure
#Scale
ML Engineer
•
System Design
•
hard
How would you build a personalized ad targeting system?
#Targeting
#ML Systems
ML Engineer
•
Technical
•
easy
What is the difference between a data scientist and an ML engineer?
#Roles
#MLOps
ML Engineer
•
Technical
•
medium
Explain the model training pipeline from raw data to deployment.
#Pipeline
#Training
ML Engineer
•
Technical
•
medium
What is the difference between online learning and offline learning?
#Online Learning
#Batch Learning
ML Engineer
•
Technical
•
medium
How do you handle missing data in ML model features?
#Imputation
#Missing Data
ML Engineer
•
Technical
•
medium
Explain gradient descent variants: batch, stochastic, and mini-batch.
#Gradient Descent
#Optimization
ML Engineer
•
Technical
•
medium
What are learning rate schedulers and why are they important?
#Learning Rate
#Training
ML Engineer
•
Technical
•
hard
Explain the attention mechanism in transformers with mathematical detail.
#Attention
#Transformers
ML Engineer
•
Technical
•
hard
What is quantization in neural networks? How does it reduce inference cost?
#Quantization
#Inference
ML Engineer
•
Technical
•
hard
Explain knowledge distillation. When would you use it?
#Distillation
#Compression
ML Engineer
•
Technical
•
hard
What is the difference between model parallelism and data parallelism in distributed training?
#Parallelism
#Training
ML Engineer
•
Technical
•
medium
How do you version ML models and datasets? What tools do you use?
#Versioning
#DVC
#MLflow
ML Engineer
•
Technical
•
hard
Explain blue-green deployment vs canary deployment for ML models.
#Blue-Green
#Canary
ML Engineer
•
Technical
•
hard
How do you detect data drift vs model drift? How do you respond to each?
#Drift
#Production
ML Engineer
•
Technical
•
medium
What is shadow mode deployment in ML?
#Shadow Mode
#A/B Testing
ML Engineer
•
Technical
•
medium
Explain model serialization formats: ONNX, TorchScript, SavedModel.
#ONNX
#Serialization
ML Engineer
•
Technical
•
medium
What is Kubernetes? How is it used for ML model serving?
#Kubernetes
#Serving
ML Engineer
•
Technical
•
hard
How do you optimize GPU utilization during training?
#GPU
#Performance
ML Engineer
•
Technical
•
hard
Explain mixed precision training (FP16/BF16). What are the risks?
#Mixed Precision
#Performance
ML Engineer
•
Technical
•
medium
What are the differences between PyTorch and TensorFlow for production?
#PyTorch
#TensorFlow
ML Engineer
•
Technical
•
medium
How do you profile and debug a slow training run?
#Profiling
#Debugging
ML Engineer
•
Technical
•
hard
Explain the RLHF (Reinforcement Learning from Human Feedback) training approach.
#RLHF
#Fine-Tuning
ML Engineer
•
Technical
•
hard
What is LoRA (Low-Rank Adaptation)? How does it reduce fine-tuning costs?
#LoRA
#Fine-Tuning
ML Engineer
•
Technical
•
hard
What is RAG (Retrieval-Augmented Generation)? Describe its architecture.
#RAG
#Vector Search
ML Engineer
•
Technical
•
hard
How would you evaluate an LLM for a production use case?
#Evaluation
#Benchmarking
ML Engineer
•
Technical
•
medium
Explain vector databases. What are FAISS, Pinecone, and Weaviate?
#Vector DB
#Embeddings
ML Engineer
•
Technical
•
medium
What is model ensembling? When does it help, and when does it hurt?
#Ensembling
#Performance
ML Engineer
•
Technical
•
hard
How would you use SageMaker for end-to-end MLOps?
#SageMaker
#AWS
ML Engineer
•
Technical
•
hard
Explain how Amazon Personalize works internally.
#Personalize
#AWS
ML Engineer
•
Technical
•
hard
How would you deploy a fraud detection model on AWS Lambda?
#Lambda
#Fraud
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.