ML Engineer • Behavioral • medium

Describe a model you deployed to production. What were the biggest challenges?

#Deployment #Challenges

Practice

ML Engineer • Behavioral • hard

Tell me about a time you had to optimize a model for latency without sacrificing too much accuracy.

#Latency #Accuracy

Practice

ML Engineer • Behavioral • medium

Describe how you collaborated with data scientists to productionize their research code.

#Research to Production

Practice

ML Engineer • Behavioral • hard

Tell me about a time an ML model caused an unexpected real-world impact.

#Responsibility #AI Safety

Practice

ML Engineer • Behavioral • easy

How do you keep up with the rapidly evolving ML landscape?

#Continuous Learning

Practice

ML Engineer • Behavioral • hard

Describe a time you had to re-architecture a system because the original ML approach didn't scale.

#Scalability

Practice

ML Engineer • Behavioral • medium

Tell me about a disagreement you had with a researcher. How did you resolve it?

#Communication

Practice

ML Engineer • Behavioral • medium

How do you decide when a model is 'good enough' to ship?

#Quality #Judgment

Practice

ML Engineer • Behavioral • medium

What is Meta's approach to responsible AI?

#Responsible AI #Fairness

Practice

ML Engineer • Coding • hard

Implement a K-means clustering algorithm from scratch in Python.

#K-Means #Clustering

Practice

ML Engineer • Coding • hard

Implement logistic regression with gradient descent in NumPy.

#Logistic Regression #NumPy

Practice

ML Engineer • Coding • hard

Write a custom PyTorch Dataset and DataLoader for irregular time series data.

#PyTorch #DataLoader

Practice

ML Engineer • Coding • medium

Implement a sliding window approach to detect anomalies in a time series.

#Anomaly Detection #Time Series

Practice

ML Engineer • Coding • hard

How would you write a batched inference pipeline using Python and Triton server?

#Triton #Batching

Practice

ML Engineer • System Design • hard

Design a CI/CD pipeline for ML models.

#CI/CD #Deployment

Practice

ML Engineer • System Design • hard

What is a feature store? Design one from scratch.

#Feature Engineering #MLOps

Practice

ML Engineer • System Design • hard

How would you serve a model that needs to respond in under 10ms?

#Low Latency #Serving

Practice

ML Engineer • System Design • hard

Design a system to retrain models automatically when performance degrades.

#Retraining #Automation

Practice

ML Engineer • System Design • hard

Design YouTube's video recommendation system end to end.

#Recommendations #Ranking

Practice

ML Engineer • System Design • hard

Design a real-time content moderation system.

#NLP #Real-Time

Practice

ML Engineer • System Design • hard

Design a search ranking system for an e-commerce platform.

#Ranking #Relevance

Practice

ML Engineer • System Design • hard

Design a training and serving architecture for a large language model at scale.

#Infrastructure #Scale

Practice

ML Engineer • System Design • hard

How would you build a personalized ad targeting system?

#Targeting #ML Systems

Practice

ML Engineer • Technical • easy

What is the difference between a data scientist and an ML engineer?

#Roles #MLOps

Practice

ML Engineer • Technical • medium

Explain the model training pipeline from raw data to deployment.

#Pipeline #Training

Practice

ML Engineer • Technical • medium

What is the difference between online learning and offline learning?

#Online Learning #Batch Learning

Practice

ML Engineer • Technical • medium

How do you handle missing data in ML model features?

#Imputation #Missing Data

Practice

ML Engineer • Technical • medium

Explain gradient descent variants: batch, stochastic, and mini-batch.

#Gradient Descent #Optimization

Practice

ML Engineer • Technical • medium

What are learning rate schedulers and why are they important?

#Learning Rate #Training

Practice

ML Engineer • Technical • hard

Explain the attention mechanism in transformers with mathematical detail.

#Attention #Transformers

Practice

ML Engineer • Technical • hard

What is quantization in neural networks? How does it reduce inference cost?

#Quantization #Inference

Practice

ML Engineer • Technical • hard

Explain knowledge distillation. When would you use it?

#Distillation #Compression

Practice

ML Engineer • Technical • hard

What is the difference between model parallelism and data parallelism in distributed training?

#Parallelism #Training

Practice

ML Engineer • Technical • medium

How do you version ML models and datasets? What tools do you use?

#Versioning #DVC #MLflow

Practice

ML Engineer • Technical • hard

Explain blue-green deployment vs canary deployment for ML models.

#Blue-Green #Canary

Practice

ML Engineer • Technical • hard

How do you detect data drift vs model drift? How do you respond to each?

#Drift #Production

Practice

ML Engineer • Technical • medium

What is shadow mode deployment in ML?

#Shadow Mode #A/B Testing

Practice

ML Engineer • Technical • medium

Explain model serialization formats: ONNX, TorchScript, SavedModel.

#ONNX #Serialization

Practice

ML Engineer • Technical • medium

What is Kubernetes? How is it used for ML model serving?

#Kubernetes #Serving

Practice

ML Engineer • Technical • hard

How do you optimize GPU utilization during training?

#GPU #Performance

Practice

ML Engineer • Technical • hard

Explain mixed precision training (FP16/BF16). What are the risks?

#Mixed Precision #Performance

Practice

ML Engineer • Technical • medium

What are the differences between PyTorch and TensorFlow for production?

#PyTorch #TensorFlow

Practice

ML Engineer • Technical • medium

How do you profile and debug a slow training run?

#Profiling #Debugging

Practice

ML Engineer • Technical • hard

Explain the RLHF (Reinforcement Learning from Human Feedback) training approach.

#RLHF #Fine-Tuning

Practice

ML Engineer • Technical • hard

What is LoRA (Low-Rank Adaptation)? How does it reduce fine-tuning costs?

#LoRA #Fine-Tuning

Practice

ML Engineer • Technical • hard

What is RAG (Retrieval-Augmented Generation)? Describe its architecture.

#RAG #Vector Search

Practice

ML Engineer • Technical • hard

How would you evaluate an LLM for a production use case?

#Evaluation #Benchmarking

Practice

ML Engineer • Technical • medium

Explain vector databases. What are FAISS, Pinecone, and Weaviate?

#Vector DB #Embeddings

Practice

ML Engineer • Technical • medium

What is model ensembling? When does it help, and when does it hurt?

#Ensembling #Performance

Practice

ML Engineer • Technical • hard

Explain how Meta's DLRM (Deep Learning Recommendation Model) works.

#DLRM #Embeddings

Practice

ML Engineer • Technical • hard

How does PyTorch Distributed work for large-scale model training at Meta?

#PyTorch #DDP

Practice

Meta

The Interview Loop

Recruiter Screen (30 min)

Technical Loop (3-4 Rounds)

Interview Question Bank

Describe a model you deployed to production. What were the biggest challenges?

Tell me about a time you had to optimize a model for latency without sacrificing too much accuracy.

Describe how you collaborated with data scientists to productionize their research code.

Tell me about a time an ML model caused an unexpected real-world impact.

How do you keep up with the rapidly evolving ML landscape?

Describe a time you had to re-architecture a system because the original ML approach didn't scale.

Tell me about a disagreement you had with a researcher. How did you resolve it?

How do you decide when a model is 'good enough' to ship?

What is Meta's approach to responsible AI?

Implement a K-means clustering algorithm from scratch in Python.

Implement logistic regression with gradient descent in NumPy.

Write a custom PyTorch Dataset and DataLoader for irregular time series data.

Implement a sliding window approach to detect anomalies in a time series.

How would you write a batched inference pipeline using Python and Triton server?

Design a CI/CD pipeline for ML models.

What is a feature store? Design one from scratch.

How would you serve a model that needs to respond in under 10ms?

Design a system to retrain models automatically when performance degrades.

Design YouTube's video recommendation system end to end.

Design a real-time content moderation system.

Design a search ranking system for an e-commerce platform.

Design a training and serving architecture for a large language model at scale.

How would you build a personalized ad targeting system?

What is the difference between a data scientist and an ML engineer?

Explain the model training pipeline from raw data to deployment.

What is the difference between online learning and offline learning?

How do you handle missing data in ML model features?

Explain gradient descent variants: batch, stochastic, and mini-batch.

What are learning rate schedulers and why are they important?

Explain the attention mechanism in transformers with mathematical detail.

What is quantization in neural networks? How does it reduce inference cost?

Explain knowledge distillation. When would you use it?

What is the difference between model parallelism and data parallelism in distributed training?

How do you version ML models and datasets? What tools do you use?

Explain blue-green deployment vs canary deployment for ML models.

How do you detect data drift vs model drift? How do you respond to each?

What is shadow mode deployment in ML?

Explain model serialization formats: ONNX, TorchScript, SavedModel.

What is Kubernetes? How is it used for ML model serving?

How do you optimize GPU utilization during training?

Explain mixed precision training (FP16/BF16). What are the risks?

What are the differences between PyTorch and TensorFlow for production?

How do you profile and debug a slow training run?

Explain the RLHF (Reinforcement Learning from Human Feedback) training approach.

What is LoRA (Low-Rank Adaptation)? How does it reduce fine-tuning costs?

What is RAG (Retrieval-Augmented Generation)? Describe its architecture.

How would you evaluate an LLM for a production use case?

Explain vector databases. What are FAISS, Pinecone, and Weaviate?

What is model ensembling? When does it help, and when does it hurt?

Explain how Meta's DLRM (Deep Learning Recommendation Model) works.

How does PyTorch Distributed work for large-scale model training at Meta?

Difficulty Radar

Meet Your Interviewers

The "Standard" Interviewer

Unwritten Rules