KPMG

Multinational professional services network, and one of the Big Four accounting organizations.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Data Scientist 10 Machine Learning Engineer 11

All Topics System Design 48 Algorithms 41 Culture Fit 29 SQL 23 Machine Learning 21 Leadership 15 Security 12 DevOps 11

Data Scientist • Technical • medium

Explain how a Random Forest model works to a non-technical audit partner.

#Random Forest #Communication #Ensemble Methods

Practice

Data Scientist • Technical • medium

How do you handle highly imbalanced datasets when building a fraud detection model for a financial services client?

#Imbalanced Data #Fraud Detection #SMOTE #Class Weights

Practice

Data Scientist • Technical • medium

What is the difference between L1 (Lasso) and L2 (Ridge) regularization, and when would you use each in a risk scoring model?

#Regularization #Regression #Feature Selection

Practice

Data Scientist • Technical • hard

How would you approach a time series forecasting problem to predict next quarter's revenue for a manufacturing client?

#Time Series #Forecasting #ARIMA #Prophet

Practice

Data Scientist • Technical • medium

Explain the trade-off between bias and variance. How do you identify if your model is suffering from high bias or high variance?

#Model Evaluation #Bias-Variance Tradeoff #Overfitting/Underfitting

Practice

Data Scientist • Technical • medium

How do you evaluate the performance of an unsupervised learning model, such as K-Means clustering used for customer segmentation?

#Clustering #Unsupervised Learning #Evaluation Metrics

Practice

Data Scientist • Technical • medium

What is the curse of dimensionality, and how do you handle it when working with high-dimensional client datasets?

#Dimensionality Reduction #PCA #Feature Selection

Practice

Data Scientist • Technical • medium

How does a Gradient Boosting Machine (GBM) differ from a Random Forest? When would you choose one over the other?

#Ensemble Methods #Trees #GBM

Practice

Data Scientist • Technical • easy

What evaluation metrics would you use for a highly imbalanced classification problem, and why is accuracy a poor choice?

#Evaluation Metrics #Precision #Recall #F1-Score

Practice

Data Scientist • Technical • hard

How do you ensure that your machine learning models are fair and unbiased, especially when dealing with sensitive attributes in financial lending?

#AI Ethics #Bias Mitigation #Fairness #Explainability

Practice

Machine Learning Engineer • Technical • medium

In the context of credit card fraud detection for a financial client, how would you handle a highly imbalanced dataset where fraudulent transactions represent less than 0.1% of the data?

#Imbalanced Data #Sampling Techniques #Evaluation Metrics

Practice

Machine Learning Engineer • Technical • hard

KPMG often works with highly regulated clients. How do you ensure model explainability (XAI) for a complex deep learning model?

#Explainable AI #SHAP #LIME

Practice

Machine Learning Engineer • Technical • medium

Explain the difference between Random Forest and Gradient Boosting. Which would you prefer for modeling tabular financial risk data, and why?

#Ensemble Methods #Decision Trees #Model Selection

Practice

Machine Learning Engineer • Technical • medium

How do you evaluate an NLP model used for extracting specific regulatory clauses from lengthy legal contracts?

#NLP #Information Extraction #Evaluation Metrics

Practice

Machine Learning Engineer • Technical • hard

What is data leakage, and how do you prevent it specifically in time-series forecasting models?

#Time-series #Data Leakage #Cross-validation

Practice

Machine Learning Engineer • Technical • hard

Explain how you would use Retrieval-Augmented Generation (RAG) to build a secure Q&A bot for internal tax policy documents.

#LLMs #RAG #Vector Databases

Practice

Machine Learning Engineer • Technical • easy

How do you choose between L1 (Lasso) and L2 (Ridge) regularization? When would you use Elastic Net?

#Regularization #Feature Selection #Linear Models

Practice

Machine Learning Engineer • Technical • medium

Explain the vanishing gradient problem in deep neural networks and discuss methods to mitigate it.

#Deep Learning #Neural Networks #Optimization

Practice

Machine Learning Engineer • Technical • medium

What metrics would you use to evaluate a classification model where false positives are extremely costly (e.g., flagging a compliant client as high-risk)?

#Evaluation Metrics #Precision vs Recall #Business Impact

Practice

Machine Learning Engineer • Technical • hard

How do you handle missing values in a dataset where the missingness is not at random (MNAR)?

#Data Imputation #Statistics #Data Quality

Practice

Machine Learning Engineer • Technical • hard

Describe the attention mechanism in Transformer models. Why is it more effective than RNNs for processing long documents?

#Transformers #NLP #Deep Learning

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now