KPMG

Multinational professional services network, and one of the Big Four accounting organizations.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics Big Data 6 SQL 5 System Design 4 Python 4 Data Modeling 2 Communication 1 Architecture 1 Testing 1

Data Engineer • Behavioral • medium

Tell me about a time you had to explain a complex data engineering concept to a non-technical client stakeholder.

#Stakeholder Management #Communication #Consulting

Practice

Data Engineer • Behavioral • medium

Describe a situation where a client changed the requirements of a data pipeline midway through the sprint. How did you handle it?

#Agile #Scope Creep #Client Management

Practice

Data Engineer • Behavioral • medium

Tell me about a time you discovered a critical data quality issue right before a client deliverable was due.

#Data Quality #Crisis Management #Integrity

Practice

Data Engineer • Behavioral • easy

How do you prioritize tasks when working on multiple client engagements with competing deadlines?

#Prioritization #Consulting #Organization

Practice

Data Engineer • Behavioral • medium

Describe a time you disagreed with a senior architect or manager on a technical design. How was it resolved?

#Teamwork #Technical Disagreement #Professionalism

Practice

Data Engineer • Coding • medium

Write a SQL query to find the top 3 highest-paid employees in each department, handling ties appropriately.

#Window Functions #DENSE_RANK #Aggregations

Practice

Data Engineer • Coding • medium

Calculate a rolling 7-day average of daily transactions for a financial client using SQL.

#Window Functions #Time Series #Financial Data

Practice

Data Engineer • Coding • hard

Write a SQL query to identify overlapping date ranges in a client's software subscription dataset.

#Self Joins #Date Functions #Complex Logic

Practice

Data Engineer • Coding • medium

Write a Python function to parse a deeply nested JSON file from a REST API and flatten it into a tabular pandas DataFrame.

#JSON Parsing #Pandas #Data Transformation

Practice

Data Engineer • Coding • medium

Write a Python script to process and merge multiple large CSV files (50GB+) that do not fit into memory.

#Chunking #Generators #Memory Management

Practice

Data Engineer • Coding • easy

Implement a Python function to detect and remove duplicate records based on a composite key, keeping the most recently updated record.

#Deduplication #Pandas #Data Cleaning

Practice

Data Engineer • Coding • easy

Given a list of dictionaries representing financial transactions, write a Python function to aggregate total spend by category without using external libraries.

#Data Structures #Dictionaries #Aggregation

Practice

Data Engineer • Coding • medium

Write a SQL query to identify gaps in sequential invoice numbers for an audit client.

#Audit Data #Sequential Gaps #LEAD/LAG

Practice

Data Engineer • Coding • medium

Write a Python algorithm to find the longest consecutive sequence of days a user logged into a client portal.

#Python #Arrays #Logic

Practice

Data Engineer • System Design • hard

Design a batch ETL pipeline to ingest daily transaction data from 50 different regional banks into a centralized Azure Data Lake.

#Batch Processing #Azure #Data Ingestion #Scalability

Practice

Data Engineer • System Design • hard

Design a real-time fraud detection data pipeline for a credit card company.

#Streaming #Kafka #Real-time Processing #Fraud Detection

Practice

Data Engineer • System Design • medium

How would you design a data model for a retail client's Customer 360 dashboard?

#Dimensional Modeling #Customer 360 #Star Schema

Practice

Data Engineer • System Design • hard

Design a system to migrate on-premise legacy SQL Server data to a cloud-native Snowflake environment with minimal downtime.

#Cloud Migration #Snowflake #Change Data Capture (CDC)

Practice

Data Engineer • System Design • medium

Architect a logging, alerting, and monitoring solution for a complex data pipeline to ensure data quality and pipeline reliability.

#Observability #Monitoring #Data Quality

Practice

Data Engineer • Technical • hard

How do you optimize a slow-running SQL query with multiple joins and aggregations that is timing out on a client's database?

#Query Optimization #Indexing #Execution Plans

Practice

Data Engineer • Technical • easy

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER(). Give a specific use case for each in an audit context.

#Window Functions #Data Ranking

Practice

Data Engineer • Technical • medium

How do you handle missing, null, or corrupted data in a large dataset before loading it into a data warehouse?

#Data Cleansing #Imputation #ETL

Practice

Data Engineer • Technical • medium

Explain how Spark handles data partitioning and why it matters for pipeline performance.

#PySpark #Partitioning #Distributed Computing

Practice

Data Engineer • Technical • hard

How do you handle data skewness in a PySpark join operation where one key has millions of records and others have very few?

#PySpark #Data Skew #Performance Tuning

Practice

Data Engineer • Technical • medium

What is a broadcast join in Spark, and when would you use it in a client's ETL pipeline?

#PySpark #Joins #Optimization

Practice

Data Engineer • Technical • easy

Explain the difference between transformations and actions in Spark. Give examples of each.

#PySpark #Lazy Evaluation

Practice

Data Engineer • Technical • hard

How would you troubleshoot and optimize a PySpark job that is failing with an OutOfMemory (OOM) error on the driver node?

#PySpark #Troubleshooting #Memory Management

Practice

Data Engineer • Technical • medium

Describe how you would set up a CI/CD pipeline for Databricks notebooks and data pipelines using Azure DevOps.

#CI/CD #Databricks #Azure DevOps

Practice

Data Engineer • Technical • easy

What is the difference between a Data Warehouse, a Data Lake, and a Data Lakehouse? Why are clients moving towards Lakehouses?

#Data Lakehouse #Data Warehouse #Delta Lake

Practice

Data Engineer • Technical • medium

How do you implement Slowly Changing Dimensions (SCD Type 2) in Snowflake or Databricks?

#SCD Type 2 #Data Warehousing #Snowflake #Databricks

Practice

Data Engineer • Technical • medium

Explain the architecture of Azure Data Factory (ADF). How do you use it to orchestrate complex ETL pipelines?

#Azure Data Factory #Orchestration #ETL

Practice

Data Engineer • Technical • hard

How do you ensure data security, masking, and governance when building a cloud data platform for a highly regulated healthcare or financial client?

#Data Security #PII/PHI #RBAC #Data Governance

Practice

Data Engineer • Technical • medium

Explain the concept of Delta Lake and its advantages over traditional Parquet files in a data lake.

#Delta Lake #ACID Transactions #Time Travel

Practice

Data Engineer • Technical • medium

How would you implement incremental loading in an ETL pipeline using a watermark column?

#Incremental Load #Watermarking #Data Integration

Practice

Data Engineer • Technical • medium

What strategies do you use for testing data pipelines before deploying them to production?

#Data Testing #Unit Testing #Integration Testing

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now