Uber
Ride-hailing and delivery platform with massive real-time data challenges.
4 Rounds
~21 Days
Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time you discovered a critical data quality issue in a production pipeline. How did you troubleshoot it, and what safeguards did you put in place to prevent recurrence?
#Data Quality
#Incident Management
#Ownership
Data Engineer
•
Behavioral
•
medium
Describe a time when you had to push back on a stakeholder or Product Manager regarding a data engineering requirement, such as an unrealistic deadline or a fundamentally flawed metric definition.
#Communication
#Stakeholder Management
#Prioritization
Data Engineer
•
Coding
•
medium
Write a SQL query to find the top 3 drivers by total earnings in each city over the last 30 days. Only include completed trips and account for potential ties in earnings.
#Window Functions
#Aggregations
#Date/Time Functions
Data Engineer
•
Coding
•
medium
Given a list of driver online sessions as [start_time, end_time] pairs, write a Python function to merge all overlapping sessions and return the total time the driver was online and available for dispatch.
#Arrays
#Sorting
#Intervals
Data Engineer
•
Coding
•
medium
Write a SQL query to calculate the rolling 7-day average of canceled rides per rider. The output should include the rider_id, date, and the 7-day moving average.
#Window Functions
#Moving Averages
#Time Series
Data Engineer
•
Coding
•
easy
Write a Python script to parse a large, nested JSON log file containing raw GPS pings from driver apps, flatten the structure, and filter out pings with a horizontal accuracy worse than 50 meters.
#Python
#JSON Parsing
#Data Cleaning
Data Engineer
•
Coding
•
medium
Write a SQL query to find riders who have taken both an UberX ride and placed an Uber Eats order within the exact same 24-hour window.
#Joins
#Date/Time Math
#Correlated Subqueries
Data Engineer
•
Coding
•
hard
Given a continuous stream of trip distances, design a data structure and write the Python code to return the median trip distance at any given time efficiently.
#Heaps
#Data Streams
#Design
Data Engineer
•
System Design
•
hard
Design a real-time data pipeline to calculate surge pricing multipliers. The system needs to aggregate ride requests and available drivers per H3 hexagon (resolution 9) every 30 seconds.
#Stream Processing
#Kafka
#Flink
#Geospatial Data
#State Management
Data Engineer
•
System Design
•
hard
Design the data architecture and ETL pipelines to generate the daily payout reports for Uber Eats restaurants. Consider that restaurants can have different timezone cutoffs and complex, tiered commission structures.
#ETL/ELT
#Batch Processing
#Airflow
#Data Warehousing
Data Engineer
•
System Design
•
hard
Design a system to ingest and process telemetry data (GPS, speed, heading) from millions of active drivers in real-time to power the live map view and feed ETA machine learning models.
#High Throughput Ingestion
#Kafka
#Data Partitioning
#NoSQL
Data Engineer
•
Technical
•
hard
How would you resolve a severe data skew issue in an Apache Spark job where joining a massive 'trips' table with a 'cities' table causes OutOfMemory errors specifically for the partition handling New York City data?
#Apache Spark
#Performance Tuning
#Data Skew
#Distributed Computing
Data Engineer
•
Technical
•
medium
Design a dimensional data model (Star Schema) to track Uber Eats order lifecycle events (created, accepted by restaurant, picked up, delivered, canceled) to support historical bottleneck and delivery time analysis.
#Dimensional Modeling
#Fact Tables
#Slowly Changing Dimensions
#Event Logging
Data Engineer
•
Technical
•
hard
Explain how you would achieve exactly-once processing semantics in a financial data pipeline moving Uber driver payout events from Kafka to a data warehouse using Spark Streaming or Flink.
#Kafka
#Exactly-Once Semantics
#Idempotency
#Two-Phase Commit
Data Engineer
•
Technical
•
medium
How do you ensure your Airflow DAGs are idempotent? Why is this specifically critical when backfilling historical trip data after a logic change in the fare calculation model?
#Apache Airflow
#Idempotency
#Backfilling
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.