Meta
Social media and metaverse company behind Facebook, Instagram, and WhatsApp.
4 Rounds
~21 Days
Very Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Machine Learning Engineer
•
Behavioral
•
medium
Tell me about a time you had a fundamental disagreement with a cross-functional partner, such as a Product Manager, regarding the choice of an ML metric versus a business metric.
#Conflict Resolution
#Cross-functional Collaboration
#Business Acumen
Machine Learning Engineer
•
Behavioral
•
medium
Describe a situation where a machine learning model you deployed degraded in production. How did you detect the degradation, and what steps did you take to resolve it?
#Model Monitoring
#Incident Response
#Ownership
Machine Learning Engineer
•
Behavioral
•
medium
Tell me about a time you had to make a difficult trade-off between model accuracy and inference latency. How did you approach the decision?
#Trade-offs
#Optimization
#System Constraints
Machine Learning Engineer
•
Behavioral
•
medium
Give an example of a project where you had to pivot your technical strategy halfway through due to changing business requirements or unexpected technical roadblocks.
#Agility
#Problem Solving
#Resilience
Machine Learning Engineer
•
Coding
•
medium
Given two sparse vectors represented as arrays of non-zero elements and their indices, write a function to compute their dot product. Optimize for both time and space complexity.
#Arrays
#Hash Table
#Two Pointers
Machine Learning Engineer
•
Coding
•
medium
Write a function to sample a batch of data from a large dataset on disk without loading the entire dataset into memory. Implement a custom PyTorch-like DataLoader class with __iter__ and __next__ methods that supports shuffling and batching.
#Object-Oriented Design
#Data Structures
#Generators
Machine Learning Engineer
•
Coding
•
medium
Given a binary tree, write an algorithm to find the lowest common ancestor (LCA) of two given nodes. Assume each node has a pointer to its parent.
#Trees
#Pointers
#Hash Table
Machine Learning Engineer
•
Coding
•
medium
Implement a function to calculate the Intersection over Union (IoU) of two bounding boxes. Extend this to implement Non-Maximum Suppression (NMS) for a list of bounding boxes and their confidence scores.
#Computer Vision
#Geometry
#Sorting
Machine Learning Engineer
•
System Design
•
hard
Design the machine learning architecture for Instagram Reels recommendations. How would you structure the funnel from candidate generation to final ranking?
#Recommendation Systems
#Two-Tower Models
#Ranking
#Candidate Generation
Machine Learning Engineer
•
System Design
•
hard
Design an Ads Click-Through Rate (CTR) prediction system for Meta's news feed. How do you handle the extreme class imbalance and delayed feedback in ad clicks?
#Ads Ranking
#Imbalanced Data
#Streaming Pipelines
#DLRM
Machine Learning Engineer
•
System Design
•
hard
Design a multimodal content moderation system to detect hate speech in Facebook posts containing both text and images. How do you fuse the modalities?
#Multimodal ML
#Classification
#NLP
#Computer Vision
Machine Learning Engineer
•
System Design
•
hard
Design the 'People You May Know' (PYMK) feature. How would you scale the graph traversals and ML inference to billions of users?
#Graph Neural Networks
#Link Prediction
#Batch Processing
#Scalability
Machine Learning Engineer
•
Technical
•
medium
In a deep learning recommendation model (DLRM), how do you handle the explosion of vocabulary size for categorical features like user IDs or item IDs?
#Embeddings
#Hashing
#Memory Optimization
Machine Learning Engineer
•
Technical
•
medium
Explain the difference between Contrastive Loss and Triplet Loss. In what scenarios would you choose one over the other for training a retrieval model?
#Loss Functions
#Metric Learning
#Retrieval
Machine Learning Engineer
•
Technical
•
hard
You are training a large PyTorch model across multiple GPUs using DistributedDataParallel (DDP) and notice that the GPU utilization is consistently low (around 30%). How do you diagnose and fix this?
#PyTorch
#Distributed Training
#Performance Profiling
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.