Twitter / X
Real-time social platform with petabyte-scale data and ML ranking systems.
4 Rounds
~14 Days
Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had to ship a critical data pipeline under an extremely tight, almost impossible deadline.
#Time Management
#Prioritization
#Hardcore Work Ethic
Data Engineer
•
Behavioral
•
medium
Describe a situation where you identified a massive cost inefficiency in cloud infrastructure and the steps you took to fix it.
#Cost Optimization
#Proactivity
#Cloud Architecture
Data Engineer
•
Behavioral
•
medium
How do you handle working in an environment with high ambiguity, minimal documentation, and rapidly changing product requirements?
#Adaptability
#Ambiguity
#Communication
Data Engineer
•
Behavioral
•
medium
Tell me about a time you disagreed with a senior engineer about a technical architecture decision. How was it resolved?
#Conflict Resolution
#Technical Communication
#Ego
Data Engineer
•
Behavioral
•
medium
Walk me through a time when a data pipeline you owned failed in production, causing downstream impact. How did you debug and resolve it?
#Incident Management
#Debugging
#Accountability
Data Engineer
•
Coding
•
medium
Parse a log file of Twitter events to find the top 10 most active users in a given 1-hour window.
#Log Parsing
#Hash Maps
#Sorting
#Priority Queue
Data Engineer
•
Coding
•
medium
Implement a sliding window algorithm to count the number of tweets containing a specific hashtag in the last 5 minutes.
#Sliding Window
#Queues
#Real-time Processing
Data Engineer
•
Coding
•
hard
Given K sorted streams of tweet IDs (in chronological order), merge them into a single sorted stream.
#Heaps
#Pointers
#Stream Processing
Data Engineer
•
Coding
•
hard
Find the shortest path between two users in the Twitter follower graph.
#Graphs
#BFS
#Bidirectional Search
Data Engineer
•
Coding
•
medium
Implement a rate limiter for the Twitter API using a token bucket algorithm.
#Concurrency
#System Design
#Object-Oriented Design
Data Engineer
•
Coding
•
easy
Given a list of trending search terms, group the anagrams together.
#Strings
#Hash Maps
Data Engineer
•
Coding
•
medium
Design an algorithm to find the Top K frequent words in a continuous stream of tweets (Heavy Hitters problem).
#Count-Min Sketch
#Heaps
#Streaming Algorithms
Data Engineer
•
Coding
•
medium
Implement an LRU Cache to store recently accessed user profiles.
#Linked Lists
#Hash Maps
#Caching
Data Engineer
•
Coding
•
medium
Implement a Trie data structure to support Twitter search autocomplete.
#Trees
#Trie
#String Manipulation
Data Engineer
•
Coding
•
easy
Write a function to validate a JSON payload representing a Tweet object, ensuring all required fields are present and correctly typed.
#JSON
#Type Checking
#Error Handling
Data Engineer
•
System Design
•
medium
Design a relational data model for Twitter Spaces analytics, tracking hosts, listeners, and duration.
#Entity-Relationship
#Normalization
#Fact/Dimension Tables
Data Engineer
•
System Design
•
hard
Design the data pipeline for Twitter's View Count feature, ensuring real-time updates and high throughput.
#Stream Processing
#Kafka
#Redis
#Event Sourcing
Data Engineer
•
System Design
•
hard
Design a real-time trending topics system capable of processing millions of tweets per second.
#Heavy Hitters
#Stream Processing
#Distributed Systems
Data Engineer
•
System Design
•
hard
How would you architect the migration of a massive on-premise Hadoop cluster to GCP BigQuery with zero downtime?
#Cloud Migration
#BigQuery
#Dual Writes
#Data Validation
Data Engineer
•
System Design
•
hard
Design an ad-click attribution pipeline that handles late-arriving events and ensures exactly-once processing.
#Exactly-Once Semantics
#Watermarks
#Data Pipelines
Data Engineer
•
System Design
•
hard
Architect a streaming system to detect spam and bot activity in real-time as tweets are published.
#Machine Learning Pipelines
#Real-time Streaming
#Feature Engineering
Data Engineer
•
System Design
•
hard
Design a data lake architecture for storing, partitioning, and querying 10PB of daily tweet logs efficiently.
#Data Lake
#Partitioning
#Parquet
#Iceberg/Hudi
Data Engineer
•
System Design
•
hard
How would you design the batch and streaming data pipelines to generate features for the 'For You' timeline recommendation engine?
#Feature Store
#Lambda Architecture
#Graph Processing
Data Engineer
•
Technical
•
medium
Write a SQL query to calculate the 7-day rolling average of tweets per user.
#Window Functions
#Aggregations
#Time Series
Data Engineer
•
Technical
•
medium
Write a SQL query to find the top 3 trending hashtags per country on a given day using window functions.
#Window Functions
#Ranking
#CTEs
Data Engineer
•
Technical
•
medium
Write a SQL query to find users who have retweeted a specific tweet but do not follow the original author.
#Joins
#Subqueries
#Set Operations
Data Engineer
•
Technical
•
hard
Write a SQL query to calculate the conversion rate of ad impressions to clicks within a 1-hour window for each ad campaign.
#Time-based Joins
#Aggregations
#Performance Tuning
Data Engineer
•
Technical
•
easy
Write a SQL query to identify potential bots by finding users who tweeted more than 100 times in a single minute.
#GROUP BY
#HAVING
#Date Truncation
Data Engineer
•
Technical
•
medium
Write a SQL query to find the median number of followers for users who joined X in 2023.
#Percentiles
#Window Functions
#Statistics
Data Engineer
•
Technical
•
medium
Given a table of user follows, write a SQL query to find all mutuals (users who follow each other).
#Self Joins
#Filtering
Data Engineer
•
Technical
•
hard
Explain how you would optimize a PySpark job that is suffering from severe data skew due to a viral tweet from Elon Musk.
#Spark
#Data Skew
#Salting
#Broadcast Joins
Data Engineer
•
Technical
•
medium
How does Kafka handle message ordering, and how would you ensure ordered processing of a single user's tweets across partitions?
#Kafka
#Partitioning
#Message Ordering
Data Engineer
•
Technical
•
medium
Compare Apache Flink and Spark Streaming. Which would you choose for calculating real-time engagement metrics at X, and why?
#Flink
#Spark Streaming
#Micro-batching vs Native Streaming
Data Engineer
•
Technical
•
easy
Explain the differences between Parquet and Avro file formats. When would you use each in our data ecosystem?
#File Formats
#Parquet
#Avro
#Columnar vs Row-based
Data Engineer
•
Technical
•
hard
How would you handle exactly-once processing semantics in a Kafka to BigQuery streaming pipeline?
#Exactly-Once
#Kafka
#BigQuery
#Idempotency
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.