KPMG

Multinational professional services network, and one of the Big Four accounting organizations.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics Machine Learning 10 System Design 5 SQL 5 Algorithms 4 Culture Fit 3 Product Analytics 1 Statistics 1 Client Management 1

Data Scientist • Behavioral • medium

Describe a time you had to push back on a client's request because the data did not support their hypothesis.

#Stakeholder Management #Communication #Conflict Resolution

Practice

Data Scientist • Behavioral • easy

Tell me about a time you had to clean and process a severely messy dataset from a client. What steps did you take?

#Data Cleaning #Problem Solving #Attention to Detail

Practice

Data Scientist • Behavioral • easy

Why do you want to work in Data Science consulting at KPMG specifically, rather than a traditional tech company?

#Motivation #Consulting #Company Knowledge

Practice

Data Scientist • Behavioral • medium

Describe a situation where you had to meet a tight deadline for a client deliverable but encountered a major technical roadblock.

#Time Management #Problem Solving #Resilience

Practice

Data Scientist • Behavioral • hard

Tell me about a time you discovered a significant error in your analysis after you had already presented the preliminary findings to a stakeholder.

#Integrity #Accountability #Communication

Practice

Data Scientist • Behavioral • easy

Describe a time when you had to learn a completely new technology or tool on the fly to complete a client project.

#Adaptability #Continuous Learning #Consulting

Practice

Data Scientist • Coding • medium

Write a SQL query to calculate the 7-day rolling average of daily transactions for a client's retail dataset.

#Window Functions #Time Series #Aggregation

Practice

Data Scientist • Coding • easy

Given an array of integers, write a Python function to return the indices of the two numbers that add up to a specific target.

#Arrays #Hash Maps #Optimization

Practice

Data Scientist • Coding • medium

Write a SQL query using window functions to find the top 3 highest-paid employees in each department.

#Window Functions #Ranking #CTEs

Practice

Data Scientist • Coding • medium

Write a Python function to merge overlapping time intervals. This is often used when analyzing user session logs.

#Arrays #Sorting #Intervals

Practice

Data Scientist • Coding • easy

Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid.

#Stacks #String Parsing

Practice

Data Scientist • Coding • easy

Write a Pandas script to find the percentage of missing values in each column of a DataFrame and drop columns where the missing percentage exceeds 40%.

#Pandas #Data Cleaning

Practice

Data Scientist • Coding • hard

Write a SQL query to find all clients who have made a purchase in every single month of the year 2023.

#Aggregation #Filtering #Date Functions

Practice

Data Scientist • Coding • medium

Write a Python function to group a list of strings into anagrams.

#Strings #Hash Maps

Practice

Data Scientist • Coding • medium

Write a SQL query to calculate the cumulative sum of revenue per client, ordered by the transaction date.

#Window Functions #Cumulative Sum

Practice

Data Scientist • System Design • hard

Design an end-to-end machine learning pipeline to automatically extract and classify entities from unstructured tax documents.

#NLP #OCR #Pipeline Design #Azure ML

Practice

Data Scientist • System Design • hard

Design a credit risk scoring system for a regional bank. What data would you need, and what models would you evaluate?

#Credit Risk #Classification #Feature Engineering #Explainability

Practice

Data Scientist • System Design • hard

Design a churn prediction architecture for a telecommunications client. Include data ingestion, modeling, and deployment on a cloud platform like Azure.

#Churn Prediction #Cloud Architecture #Azure ML #MLOps

Practice

Data Scientist • System Design • hard

Design an anomaly detection system to identify potentially fraudulent expense claims within an organization's internal audit data.

#Anomaly Detection #Audit #Fraud #Unsupervised Learning

Practice

Data Scientist • System Design • hard

Design a recommendation system for a retail client to suggest products to users based on their browsing history and past purchases.

#Recommendation Engines #Collaborative Filtering #Matrix Factorization #Cold Start

Practice

Data Scientist • Technical • medium

Explain how a Random Forest model works to a non-technical audit partner.

#Random Forest #Communication #Ensemble Methods

Practice

Data Scientist • Technical • medium

How do you handle highly imbalanced datasets when building a fraud detection model for a financial services client?

#Imbalanced Data #Fraud Detection #SMOTE #Class Weights

Practice

Data Scientist • Technical • medium

What is the difference between L1 (Lasso) and L2 (Ridge) regularization, and when would you use each in a risk scoring model?

#Regularization #Regression #Feature Selection

Practice

Data Scientist • Technical • hard

How would you approach a time series forecasting problem to predict next quarter's revenue for a manufacturing client?

#Time Series #Forecasting #ARIMA #Prophet

Practice

Data Scientist • Technical • medium

Explain the trade-off between bias and variance. How do you identify if your model is suffering from high bias or high variance?

#Model Evaluation #Bias-Variance Tradeoff #Overfitting/Underfitting

Practice

Data Scientist • Technical • easy

In SQL, explain the difference between a LEFT JOIN and an INNER JOIN, and provide a scenario where you would strictly use a LEFT JOIN.

#Joins #Relational Databases #Data Manipulation

Practice

Data Scientist • Technical • medium

How do you evaluate the performance of an unsupervised learning model, such as K-Means clustering used for customer segmentation?

#Clustering #Unsupervised Learning #Evaluation Metrics

Practice

Data Scientist • Technical • medium

A client notices a sudden 15% drop in user engagement on their platform. Walk me through your analytical approach to find the root cause.

#Root Cause Analysis #Metrics #Hypothesis Testing

Practice

Data Scientist • Technical • medium

Explain the concept of p-value to a business stakeholder who is deciding whether to launch a new marketing campaign based on your A/B test results.

#A/B Testing #Hypothesis Testing #Communication

Practice

Data Scientist • Technical • medium

What is the curse of dimensionality, and how do you handle it when working with high-dimensional client datasets?

#Dimensionality Reduction #PCA #Feature Selection

Practice

Data Scientist • Technical • medium

How does a Gradient Boosting Machine (GBM) differ from a Random Forest? When would you choose one over the other?

#Ensemble Methods #Trees #GBM

Practice

Data Scientist • Technical • medium

Explain how you would deploy a trained machine learning model into production using Docker and an API framework like FastAPI or Flask.

#Model Deployment #Docker #API #FastAPI

Practice

Data Scientist • Technical • easy

What evaluation metrics would you use for a highly imbalanced classification problem, and why is accuracy a poor choice?

#Evaluation Metrics #Precision #Recall #F1-Score

Practice

Data Scientist • Technical • hard

How do you ensure that your machine learning models are fair and unbiased, especially when dealing with sensitive attributes in financial lending?

#AI Ethics #Bias Mitigation #Fairness #Explainability

Practice

Data Scientist • Technical • medium

Given a dataset of client feedback text, how would you approach building a sentiment analysis model from scratch?

#Sentiment Analysis #Text Preprocessing #NLP #TF-IDF

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now