Meta
Social media and metaverse company behind Facebook, Instagram, and WhatsApp.
4 Rounds
~21 Days
Very Hard
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time you simplified a complex data platform decision across multiple teams.
#Communication
#Stakeholders
Data Engineer
•
Behavioral
•
medium
Describe a situation where a data pipeline you owned went down in production. How did you handle it?
#On-Call
#Problem Solving
Data Engineer
•
Behavioral
•
medium
How do you handle disagreements with data analysts or scientists who want features that compromise pipeline reliability?
#Conflict Resolution
Data Engineer
•
Behavioral
•
medium
Tell me about a time you significantly improved the performance of a data system.
#Performance
#Optimization
Data Engineer
•
Behavioral
•
hard
Describe how you've balanced technical debt vs. new feature development in a data platform.
#Prioritization
Data Engineer
•
Behavioral
•
medium
Tell me about a time you onboarded a new data source that had significant quality issues.
#Problem Solving
Data Engineer
•
Behavioral
•
easy
Describe your experience mentoring junior data engineers.
#Mentoring
#Collaboration
Data Engineer
•
Behavioral
•
easy
How do you stay current with rapidly evolving data engineering tools and practices?
#Growth Mindset
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had a disagreement with a cross-functional partner (like a Data Scientist or Product Manager) regarding the definition of a metric or a data pipeline requirement. How did you resolve it?
#Conflict Resolution
#Communication
#Cross-functional Collaboration
Data Engineer
•
Behavioral
•
medium
Tell me about a time you identified a major bottleneck or inefficiency in an existing data pipeline. What steps did you take to optimize it, and what was the impact?
#Impact
#Proactivity
#Optimization
Data Engineer
•
Coding
•
medium
Write a SQL query to find the second highest salary per department.
#Window Functions
#SQL
Data Engineer
•
Coding
•
medium
Write a SQL query to compute a 7-day rolling average of daily sales.
#Window Functions
#Analytics
Data Engineer
•
Coding
•
medium
Given a `user_logins` table with `user_id` and `login_date`, write a SQL query to calculate the 7-day rolling average of Daily Active Users (DAU) for the last 30 days.
#Window Functions
#Rolling Averages
#DAU
Data Engineer
•
Coding
•
easy
Given a `friend_requests` table (sender_id, receiver_id, date, status) and an `acceptances` table, write a SQL query to find the overall acceptance rate of friend requests by date.
#Joins
#Aggregations
#Ratios
Data Engineer
•
Coding
•
medium
Write a Python function to merge overlapping user session intervals. Given an array of intervals where intervals[i] = [start_i, end_i], merge all overlapping intervals and return an array of the non-overlapping intervals.
#Arrays
#Sorting
#Intervals
Data Engineer
•
Coding
•
medium
Given a list of dictionaries representing Facebook post interactions (user_id, post_id, interaction_type, timestamp), write a Python script to return the top 3 most engaged posts for each interaction type.
#Dictionaries
#Heaps
#Data Aggregation
Data Engineer
•
Coding
•
medium
Write a SQL query to find users who have interacted with a Meta ad and subsequently made a purchase on the advertiser's website within 24 hours. You have an `ad_clicks` table and a `conversions` table.
#Joins
#Date/Time Functions
#Attribution
Data Engineer
•
Coding
•
easy
Given an array of integers representing the number of likes on a user's posts, write a Python function to move all zeros (posts with zero likes) to the end of the array while maintaining the relative order of the non-zero elements. Do this in-place.
#Arrays
#Two Pointers
#In-place Manipulation
Data Engineer
•
Coding
•
hard
Write a SQL query to calculate the retention rate of new users on a 1-day, 7-day, and 30-day basis. You are given a `user_activity` table with `user_id` and `activity_date`.
#Cohort Analysis
#Retention
#Self Joins
#Conditional Aggregation
Data Engineer
•
Coding
•
medium
Write a Python script to parse a massive JSONL file (100GB+) containing WhatsApp message metadata. Calculate the total number of messages sent per country code. You cannot load the entire file into memory.
#File I/O
#Memory Management
#Generators
#JSON
Data Engineer
•
System Design
•
hard
Design an ETL pipeline that ingests 10TB of raw clickstream data daily.
#ETL
#Batch Processing
Data Engineer
•
System Design
•
hard
How would you design a data pipeline that needs exactly-once delivery guarantees?
#Exactly-Once
#Kafka
Data Engineer
•
System Design
•
hard
How would you design a real-time anomaly detection pipeline for 100K events/sec?
#Real-Time
#Anomaly Detection
Data Engineer
•
System Design
•
hard
Design a data model for an e-commerce platform tracking orders, users, and products.
#ER Modeling
#Dimensional Modeling
Data Engineer
•
System Design
•
hard
How would you design a data warehouse for a ride-sharing company from scratch?
#Architecture
#Design
Data Engineer
•
System Design
•
hard
How would you design Meta's data pipeline for News Feed ranking signals?
#Ranking
#Pipeline
Data Engineer
•
System Design
•
hard
Design an ad delivery data pipeline that tracks impressions at 10M/sec.
#Streaming
#Scale
Data Engineer
•
System Design
•
hard
Design a data pipeline to process and store telemetry data for Instagram Reels. The pipeline needs to support real-time dashboarding for creators and batch processing for machine learning recommendations.
#Lambda Architecture
#Kafka
#Stream Processing
#Data Warehousing
Data Engineer
•
System Design
•
hard
Design a system to detect ad-click fraud in real-time. The system processes billions of events per day and needs to flag suspicious IPs or user accounts within seconds.
#Real-time Processing
#Fraud Detection
#Distributed Systems
#Caching
Data Engineer
•
Technical
•
medium
Describe how you'd implement circuit breakers in a data pipeline.
#Circuit Breakers
#Fault Tolerance
Data Engineer
•
Technical
•
medium
Explain the difference between OLAP and OLTP systems. When would you use each?
#OLAP
#OLTP
#Databases
Data Engineer
•
Technical
•
hard
What is a slowly changing dimension (SCD)? Describe SCD Type 1, 2, and 3 with examples.
#SCD
#Dimensional Modeling
Data Engineer
•
Technical
•
hard
How would you optimize a SQL query that is running slowly on a 1 billion row table?
#Query Optimization
#Indexing
Data Engineer
•
Technical
•
medium
Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER().
#Window Functions
#SQL
Data Engineer
•
Technical
•
medium
What is a materialized view? How does it differ from a regular view?
#Materialized Views
#Performance
Data Engineer
•
Technical
•
hard
Describe partitioning strategies in a data warehouse. When would you use range vs hash partitioning?
#Partitioning
#Performance
Data Engineer
•
Technical
•
medium
What are CTEs (Common Table Expressions) and how do they differ from subqueries?
#CTEs
#SQL
Data Engineer
•
Technical
•
medium
Explain ACID properties. Which databases sacrifice ACID for performance and why?
#ACID
#Distributed Systems
Data Engineer
•
Technical
•
hard
How do you handle late-arriving data in a streaming pipeline?
#Kafka
#Watermarks
Data Engineer
•
Technical
•
medium
What is idempotency and why is it critical in data pipelines?
#Idempotency
#Data Quality
Data Engineer
•
Technical
•
hard
Explain the Lambda architecture. What are its tradeoffs vs Kappa architecture?
#Lambda
#Kappa
#Streaming
Data Engineer
•
Technical
•
hard
What is backfilling? How do you handle a backfill of 2 years of historical data without impacting production?
#Backfill
#Airflow
Data Engineer
•
Technical
•
medium
How do you monitor data pipeline health in production? What metrics do you track?
#Monitoring
#Alerting
Data Engineer
•
Technical
•
medium
What is Apache Airflow? How does it differ from Prefect or Dagster?
#Airflow
#Prefect
#Dagster
Data Engineer
•
Technical
•
easy
Explain the difference between push-based and pull-based data ingestion.
#Push
#Pull
#CDC
Data Engineer
•
Technical
•
hard
Explain how Apache Spark's execution model works. What is a DAG in Spark?
#Spark
#DAG
#Distributed Computing
Data Engineer
•
Technical
•
hard
What is data skew in Spark? How do you diagnose and fix it?
#Data Skew
#Performance
Data Engineer
•
Technical
•
hard
Explain the difference between map-side and reduce-side joins in MapReduce/Spark.
#Joins
#MapReduce
Data Engineer
•
Technical
•
medium
What is Apache Kafka? Explain topics, partitions, consumer groups, and offsets.
#Kafka
#Streaming
Data Engineer
•
Technical
•
medium
How does Kafka handle message ordering guarantees?
#Ordering
#Partitions
Data Engineer
•
Technical
•
medium
What is the CAP theorem? Give an example of a real-world system tradeoff.
#CAP
#Consistency
#Availability
Data Engineer
•
Technical
•
medium
Explain how Parquet and ORC file formats work and when you'd use each.
#Parquet
#ORC
#Columnar
Data Engineer
•
Technical
•
hard
What is Delta Lake? How does it provide ACID transactions on data lakes?
#Delta Lake
#ACID
#Time Travel
Data Engineer
•
Technical
•
medium
Explain compaction in Delta Lake / Iceberg. Why is it important?
#Compaction
#Performance
Data Engineer
•
Technical
•
medium
What is the star schema vs snowflake schema? When would you use each?
#Star Schema
#Snowflake Schema
Data Engineer
•
Technical
•
hard
What is Data Vault methodology? How does it differ from Kimball?
#Data Vault
#Kimball
Data Engineer
•
Technical
•
medium
Explain the concept of a data lakehouse. What are its advantages over a traditional data warehouse?
#Data Lakehouse
#Data Warehouse
Data Engineer
•
Technical
•
hard
How do you handle schema evolution in a data pipeline without breaking downstream consumers?
#Schema Evolution
#Backward Compatibility
Data Engineer
•
Technical
•
medium
What is a medallion architecture (Bronze/Silver/Gold)?
#Medallion
#Data Lake
Data Engineer
•
Technical
•
medium
How do you implement data quality checks in a production pipeline?
#Great Expectations
#Data Validation
Data Engineer
•
Technical
•
medium
What is data lineage and why is it important? How do you implement it?
#Lineage
#Metadata
Data Engineer
•
Technical
•
hard
How would you detect and handle data drift in a production system?
#Data Drift
#Monitoring
Data Engineer
•
Technical
•
medium
What is PII (Personally Identifiable Information) and how do you handle it in a data pipeline?
#PII
#Privacy
#Compliance
Data Engineer
•
Technical
•
medium
Explain the concept of a data catalog. What tools have you used?
#Data Catalog
#Metadata
Data Engineer
•
Technical
•
hard
Compare AWS Redshift, Google BigQuery, and Snowflake for a petabyte-scale warehouse.
#Redshift
#BigQuery
#Snowflake
Data Engineer
•
Technical
•
hard
How does BigQuery handle large joins efficiently? What is its columnar storage approach?
#BigQuery
#Columnar Storage
Data Engineer
•
Technical
•
medium
Explain the difference between S3, HDFS, and GCS for data storage.
#S3
#HDFS
#GCS
Data Engineer
•
Technical
•
medium
How would you reduce costs in a cloud-based data platform?
#Cloud
#Cost
Data Engineer
•
Technical
•
medium
What is infrastructure as code (IaC)? Have you used Terraform for data infrastructure?
#Terraform
#IaC
Data Engineer
•
Technical
•
hard
What is Presto? How does Meta use it at scale?
#Presto
#SQL
Data Engineer
•
Technical
•
hard
Explain how Meta uses Scribe for structured logging at petabyte scale.
#Scribe
#Infrastructure
Data Engineer
•
Technical
•
hard
How would you handle data consistency across Meta's global sharded MySQL?
#Sharding
#Consistency
Data Engineer
•
Technical
•
medium
Design the data model for Facebook Marketplace. We need to track users, product listings, categories, and transactions. How would you structure the fact and dimension tables to allow product managers to analyze daily sales volume by category and user demographics?
#Dimensional Modeling
#Star Schema
#Fact Tables
#Dimension Tables
Data Engineer
•
Technical
•
hard
How would you handle late-arriving data in a daily ETL pipeline that computes Facebook's Daily Active Users (DAU)? Assume the pipeline runs at 2 AM UTC, but mobile clients might upload offline logs days later.
#ETL
#Late-Arriving Data
#Idempotency
#Backfilling
Data Engineer
•
Technical
•
medium
How do you design a Slowly Changing Dimension (SCD) Type 2 table for Facebook user profiles? Explain how you would handle updates to a user's 'current_city' while preserving the history of their previous locations.
#SCD Type 2
#Data Warehousing
#Historical Tracking
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.