Twitter / X
Real-time social platform with petabyte-scale data and ML ranking systems.
4 Rounds
~14 Days
Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Scientist
•
System Design
•
hard
How would you build a machine learning model to detect spam or bot accounts in real-time as they register or tweet?
#Anomaly Detection
#Streaming
#Classification
Data Scientist
•
System Design
•
hard
Design a Graph ML system to power the 'Who to Follow' recommendations.
#Graph Neural Networks
#Link Prediction
#Scalability
Data Scientist
•
System Design
•
medium
How would you design a system to rank 'Trending Topics' in real-time?
#Ranking
#Time Decay
#NLP
Data Scientist
•
System Design
•
hard
Design a recommendation system for the 'For You' timeline. How do you balance chronological relevance with algorithmic personalization?
#Recommender Systems
#Ranking
#Two-Tower Models
Data Scientist
•
Technical
•
medium
How would you build an NLP model to classify and hide highly toxic replies in a tweet thread?
#NLP
#Classification
#Trust & Safety
Data Scientist
•
Technical
•
medium
How would you predict user churn for X Premium subscribers? What features would be most important?
#Classification
#Survival Analysis
#Feature Engineering
Data Scientist
•
Technical
•
hard
How would you optimize the creator ad revenue sharing model to ensure fairness while maximizing overall platform content creation?
#Optimization
#Allocation
#Economics
Machine Learning Engineer
•
Technical
•
hard
Explain how you would implement distributed training for a multi-billion parameter language model (like Grok).
#LLMs
#Distributed Training
#Deep Learning
Machine Learning Engineer
•
Technical
•
medium
How do you address position bias in the Twitter feed ranking model?
#Bias Mitigation
#Ranking
#Data Science
Machine Learning Engineer
•
Technical
•
hard
How would you use Reinforcement Learning to optimize long-term user engagement on the platform?
#Reinforcement Learning
#Recommendation Systems
#Optimization
Machine Learning Engineer
•
Technical
•
hard
Explain how you would train a Graph Neural Network (GNN) on the Twitter follower graph to generate user embeddings.
#Graph Neural Networks
#Embeddings
#Distributed Training
Machine Learning Engineer
•
Technical
•
medium
How do you handle severe class imbalance when training a spam detection model where spam is less than 0.1% of all tweets?
#Imbalanced Data
#Classification
#Loss Functions
Machine Learning Engineer
•
Technical
•
hard
What techniques would you use to reduce the inference latency of a deep learning ranking model in production from 100ms to 20ms?
#Model Optimization
#Inference
#Efficiency
Machine Learning Engineer
•
Technical
•
medium
Explain the difference between offline evaluation (e.g., NDCG, MAP) and online evaluation (A/B testing) for the home timeline. Why might they disagree?
#Evaluation Metrics
#A/B Testing
#Data Science
Machine Learning Engineer
•
Technical
•
medium
How do you handle the cold start problem for new users in the 'For You' feed?
#Cold Start
#Recommendation Systems
#Heuristics
Machine Learning Engineer
•
Technical
•
medium
Contrast Two-Tower models with Cross-Attention models. Why do we use Two-Tower for candidate generation and Cross-Attention for final ranking?
#Deep Learning
#Information Retrieval
#Model Architecture
Machine Learning Engineer
•
Technical
•
hard
How do you evaluate a Generative LLM used for summarizing long Twitter threads or generating Grok responses?
#LLMs
#Evaluation Metrics
#NLP
Machine Learning Engineer
•
Technical
•
hard
Explain the contrastive loss function used in training user-tweet embeddings. How do you select hard negatives?
#Loss Functions
#Representation Learning
#Embeddings
Machine Learning Engineer
•
Technical
•
medium
How would you detect hate speech or toxic replies in real-time under strict latency constraints?
#NLP
#Classification
#Real-time ML
#Efficiency
Machine Learning Engineer
•
Technical
•
hard
What are the trade-offs between using FAISS (IVF-PQ) vs. HNSW for approximate nearest neighbor search in tweet retrieval?
#Vector Databases
#ANN
#Information Retrieval
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.