Deloitte
Multinational professional services network with offices in over 150 countries.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time when a client drastically changed the requirements of a data pipeline midway through the sprint. How did you handle it?
#Client Management
#Agile
#Adaptability
Data Engineer
•
Behavioral
•
easy
As a consultant, you often work with non-technical business stakeholders. Give an example of how you explained a complex data architecture concept to a non-technical client.
#Consulting
#Stakeholder Management
#Communication
Data Engineer
•
Behavioral
•
medium
Describe a situation where you faced scope creep on a data engineering project. How did you manage it while keeping the client satisfied?
#Project Management
#Consulting
#Negotiation
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had to deliver a critical data project under a very tight deadline. How did you prioritize your tasks?
#Time Management
#Prioritization
#Stress Management
Data Engineer
•
Behavioral
•
medium
Describe a time when you disagreed with a Senior Architect or Manager regarding a technical design. How did you handle the situation?
#Conflict Resolution
#Communication
#Teamwork
Data Engineer
•
Behavioral
•
medium
Tell me about a time you identified an inefficiency in a process or architecture and optimized it, resulting in cost savings for the client or your company.
#Cost Optimization
#Initiative
#Value Delivery
Data Engineer
•
Behavioral
•
easy
Consultants often have to learn new tools on the fly. Tell me about a time you had to quickly adapt to a new technology stack for a project. How did you get up to speed?
#Continuous Learning
#Adaptability
Data Engineer
•
Coding
•
medium
Write a SQL query to find the top 3 highest-paid employees in each department. If there is a tie, they should have the same rank.
#Window Functions
#DENSE_RANK
#Joins
Data Engineer
•
Coding
•
medium
Write a Python script to parse a deeply nested JSON file containing client transaction data, flatten it, and convert it into a Pandas DataFrame.
#Data Manipulation
#JSON
#Pandas
Data Engineer
•
Coding
•
medium
Write a SQL query to calculate the cumulative sum of revenue per month for the year 2023.
#Window Functions
#Aggregations
Data Engineer
•
Coding
•
medium
You have a table with millions of rows and no primary key. Write a SQL query to delete all duplicate rows, keeping only one instance of each.
#Data Cleansing
#CTEs
#ROW_NUMBER
Data Engineer
•
Coding
•
hard
Write a recursive CTE in SQL to output a company's organizational chart, showing each employee's name, their manager's name, and their depth level in the hierarchy.
#Recursive CTE
#Hierarchical Data
Data Engineer
•
Coding
•
easy
Write a Python function to merge two sorted lists of integers into a single sorted list without using the built-in sort() or sorted() functions.
#Two Pointers
#Arrays
Data Engineer
•
System Design
•
medium
Design a Slowly Changing Dimension (SCD) Type 2 process for a client's customer dimension table. How would you implement this in a cloud data warehouse like Snowflake or Redshift?
#SCD Type 2
#Data Modeling
#ETL
Data Engineer
•
System Design
•
medium
A healthcare client wants to build a Data Lakehouse on Azure/Databricks. Design a Medallion Architecture (Bronze, Silver, Gold) for their patient records and claims data.
#Data Lakehouse
#Medallion Architecture
#Databricks
Data Engineer
•
System Design
•
hard
A large retail client wants to migrate their on-premise Hadoop cluster to AWS. Walk me through your migration strategy, including tool selection and risk mitigation.
#Cloud Migration
#AWS
#Hadoop
Data Engineer
•
System Design
•
hard
Design a real-time streaming pipeline to detect fraudulent credit card transactions. The system must process 10,000 events per second with sub-second latency.
#Streaming
#Kafka
#Spark Streaming
#Fraud Detection
Data Engineer
•
System Design
•
hard
How do you handle 'late-arriving facts' in a data warehouse where the fact record arrives before its corresponding dimension record?
#ETL
#Dimensional Modeling
#Data Integrity
Data Engineer
•
System Design
•
medium
Design an ELT pipeline for a retail company that receives daily CSV dumps from 50 different vendors via SFTP. The data needs to be loaded into Snowflake for reporting.
#ELT
#Cloud Architecture
#Data Ingestion
Data Engineer
•
System Design
•
medium
You are extracting data from a third-party REST API that has a strict rate limit of 100 requests per minute. How do you design your Python extraction script to handle this?
#API Integration
#Python
#Rate Limiting
Data Engineer
•
Technical
•
hard
In a recent client project, you had to process a massive dataset using PySpark, but one of the tasks took significantly longer than the others. How do you identify and resolve data skew in Spark?
#PySpark
#Performance Tuning
#Data Skew
Data Engineer
•
Technical
•
medium
Explain how micro-partitions work in Snowflake. How would you choose a clustering key for a table containing billions of rows of transactional data?
#Snowflake
#Architecture
#Performance Optimization
Data Engineer
•
Technical
•
medium
You have an Apache Airflow DAG with 10 tasks. Task 5 fails intermittently due to an external API timeout. How do you handle this robustly?
#Airflow
#Error Handling
#Retries
Data Engineer
•
Technical
•
medium
What is a Broadcast Hash Join in Spark? When would you use it, and what are its limitations?
#Spark SQL
#Joins
#Optimization
Data Engineer
•
Technical
•
medium
How do you handle PII (Personally Identifiable Information) and PHI (Protected Health Information) in a data pipeline to ensure compliance with GDPR/HIPAA?
#Security
#Compliance
#Data Masking
Data Engineer
•
Technical
•
hard
You are running a PySpark job that keeps failing with an 'OutOfMemoryError: Java heap space'. What steps do you take to debug and fix this?
#PySpark
#Troubleshooting
#Memory Management
Data Engineer
•
Technical
•
easy
Explain the difference between a Star Schema and a Snowflake Schema. When would you recommend one over the other to a client?
#Data Warehousing
#Dimensional Modeling
Data Engineer
•
Technical
•
medium
How do you implement CI/CD for data engineering pipelines? What tools do you use and what does the workflow look like?
#CI/CD
#Git
#Testing
Data Engineer
•
Technical
•
easy
In Python, what is the difference between a list comprehension and a generator expression? When would you use a generator in a data pipeline?
#Memory Management
#Generators
Data Engineer
•
Technical
•
medium
What are the key features of Delta Lake, and how does it solve the limitations of traditional data lakes?
#Delta Lake
#ACID Transactions
#Databricks
Data Engineer
•
Technical
•
medium
What is a factless fact table? Provide a real-world business use case where you would implement one.
#Dimensional Modeling
#Fact Tables
Data Engineer
•
Technical
•
medium
A client complains that a specific reporting query is taking 30 minutes to run. Walk me through your step-by-step approach to optimize this SQL query.
#Performance Tuning
#Query Optimization
Data Engineer
•
Technical
•
medium
Data quality is a major focus at Deloitte. How do you implement automated data quality checks in your ETL pipelines?
#Data Quality
#Testing
#Great Expectations
Data Engineer
•
Technical
•
easy
Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() in SQL. Provide a brief example of when to use each.
#Window Functions
Data Engineer
•
Technical
•
medium
How do you handle schema evolution in a data lake environment when upstream source systems add, remove, or change the data types of columns?
#Schema Evolution
#Parquet
#Delta Lake
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.