KPMG

KPMG

Multinational professional services network, and one of the Big Four accounting organizations.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Scientist Behavioral medium

Describe a time you had to push back on a client's request because the data did not support their hypothesis.

#Stakeholder Management #Communication #Conflict Resolution
Data Scientist Behavioral easy

Tell me about a time you had to clean and process a severely messy dataset from a client. What steps did you take?

#Data Cleaning #Problem Solving #Attention to Detail
Data Scientist Behavioral easy

Why do you want to work in Data Science consulting at KPMG specifically, rather than a traditional tech company?

#Motivation #Consulting #Company Knowledge
Data Scientist Behavioral medium

Describe a situation where you had to meet a tight deadline for a client deliverable but encountered a major technical roadblock.

#Time Management #Problem Solving #Resilience
Data Scientist Behavioral hard

Tell me about a time you discovered a significant error in your analysis after you had already presented the preliminary findings to a stakeholder.

#Integrity #Accountability #Communication
Data Scientist Behavioral easy

Describe a time when you had to learn a completely new technology or tool on the fly to complete a client project.

#Adaptability #Continuous Learning #Consulting
Data Scientist Coding medium

Write a SQL query to calculate the 7-day rolling average of daily transactions for a client's retail dataset.

#Window Functions #Time Series #Aggregation
Data Scientist Coding easy

Given an array of integers, write a Python function to return the indices of the two numbers that add up to a specific target.

#Arrays #Hash Maps #Optimization
Data Scientist Coding medium

Write a SQL query using window functions to find the top 3 highest-paid employees in each department.

#Window Functions #Ranking #CTEs
Data Scientist Coding medium

Write a Python function to merge overlapping time intervals. This is often used when analyzing user session logs.

#Arrays #Sorting #Intervals
Data Scientist Coding easy

Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid.

#Stacks #String Parsing
Data Scientist Coding easy

Write a Pandas script to find the percentage of missing values in each column of a DataFrame and drop columns where the missing percentage exceeds 40%.

#Pandas #Data Cleaning
Data Scientist Coding hard

Write a SQL query to find all clients who have made a purchase in every single month of the year 2023.

#Aggregation #Filtering #Date Functions
Data Scientist Coding medium

Write a Python function to group a list of strings into anagrams.

#Strings #Hash Maps
Data Scientist Coding medium

Write a SQL query to calculate the cumulative sum of revenue per client, ordered by the transaction date.

#Window Functions #Cumulative Sum
Data Scientist System Design hard

Design an end-to-end machine learning pipeline to automatically extract and classify entities from unstructured tax documents.

#NLP #OCR #Pipeline Design #Azure ML
Data Scientist System Design hard

Design a credit risk scoring system for a regional bank. What data would you need, and what models would you evaluate?

#Credit Risk #Classification #Feature Engineering #Explainability
Data Scientist System Design hard

Design a churn prediction architecture for a telecommunications client. Include data ingestion, modeling, and deployment on a cloud platform like Azure.

#Churn Prediction #Cloud Architecture #Azure ML #MLOps
Data Scientist System Design hard

Design an anomaly detection system to identify potentially fraudulent expense claims within an organization's internal audit data.

#Anomaly Detection #Audit #Fraud #Unsupervised Learning
Data Scientist System Design hard

Design a recommendation system for a retail client to suggest products to users based on their browsing history and past purchases.

#Recommendation Engines #Collaborative Filtering #Matrix Factorization #Cold Start
Data Scientist Technical medium

Explain how a Random Forest model works to a non-technical audit partner.

#Random Forest #Communication #Ensemble Methods
Data Scientist Technical medium

How do you handle highly imbalanced datasets when building a fraud detection model for a financial services client?

#Imbalanced Data #Fraud Detection #SMOTE #Class Weights
Data Scientist Technical medium

What is the difference between L1 (Lasso) and L2 (Ridge) regularization, and when would you use each in a risk scoring model?

#Regularization #Regression #Feature Selection
Data Scientist Technical hard

How would you approach a time series forecasting problem to predict next quarter's revenue for a manufacturing client?

#Time Series #Forecasting #ARIMA #Prophet
Data Scientist Technical medium

Explain the trade-off between bias and variance. How do you identify if your model is suffering from high bias or high variance?

#Model Evaluation #Bias-Variance Tradeoff #Overfitting/Underfitting
Data Scientist Technical easy

In SQL, explain the difference between a LEFT JOIN and an INNER JOIN, and provide a scenario where you would strictly use a LEFT JOIN.

#Joins #Relational Databases #Data Manipulation
Data Scientist Technical medium

How do you evaluate the performance of an unsupervised learning model, such as K-Means clustering used for customer segmentation?

#Clustering #Unsupervised Learning #Evaluation Metrics
Data Scientist Technical medium

A client notices a sudden 15% drop in user engagement on their platform. Walk me through your analytical approach to find the root cause.

#Root Cause Analysis #Metrics #Hypothesis Testing
Data Scientist Technical medium

Explain the concept of p-value to a business stakeholder who is deciding whether to launch a new marketing campaign based on your A/B test results.

#A/B Testing #Hypothesis Testing #Communication
Data Scientist Technical medium

What is the curse of dimensionality, and how do you handle it when working with high-dimensional client datasets?

#Dimensionality Reduction #PCA #Feature Selection
Data Scientist Technical medium

How does a Gradient Boosting Machine (GBM) differ from a Random Forest? When would you choose one over the other?

#Ensemble Methods #Trees #GBM
Data Scientist Technical medium

Explain how you would deploy a trained machine learning model into production using Docker and an API framework like FastAPI or Flask.

#Model Deployment #Docker #API #FastAPI
Data Scientist Technical easy

What evaluation metrics would you use for a highly imbalanced classification problem, and why is accuracy a poor choice?

#Evaluation Metrics #Precision #Recall #F1-Score
Data Scientist Technical hard

How do you ensure that your machine learning models are fair and unbiased, especially when dealing with sensitive attributes in financial lending?

#AI Ethics #Bias Mitigation #Fairness #Explainability
Data Scientist Technical medium

Given a dataset of client feedback text, how would you approach building a sentiment analysis model from scratch?

#Sentiment Analysis #Text Preprocessing #NLP #TF-IDF

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now