KPMG
Multinational professional services network, and one of the Big Four accounting organizations.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Scientist
•
Technical
•
medium
Explain how a Random Forest model works to a non-technical audit partner.
#Random Forest
#Communication
#Ensemble Methods
Data Scientist
•
Technical
•
medium
How do you handle highly imbalanced datasets when building a fraud detection model for a financial services client?
#Imbalanced Data
#Fraud Detection
#SMOTE
#Class Weights
Data Scientist
•
Technical
•
medium
What is the difference between L1 (Lasso) and L2 (Ridge) regularization, and when would you use each in a risk scoring model?
#Regularization
#Regression
#Feature Selection
Data Scientist
•
Technical
•
hard
How would you approach a time series forecasting problem to predict next quarter's revenue for a manufacturing client?
#Time Series
#Forecasting
#ARIMA
#Prophet
Data Scientist
•
Technical
•
medium
Explain the trade-off between bias and variance. How do you identify if your model is suffering from high bias or high variance?
#Model Evaluation
#Bias-Variance Tradeoff
#Overfitting/Underfitting
Data Scientist
•
Technical
•
medium
How do you evaluate the performance of an unsupervised learning model, such as K-Means clustering used for customer segmentation?
#Clustering
#Unsupervised Learning
#Evaluation Metrics
Data Scientist
•
Technical
•
medium
What is the curse of dimensionality, and how do you handle it when working with high-dimensional client datasets?
#Dimensionality Reduction
#PCA
#Feature Selection
Data Scientist
•
Technical
•
medium
How does a Gradient Boosting Machine (GBM) differ from a Random Forest? When would you choose one over the other?
#Ensemble Methods
#Trees
#GBM
Data Scientist
•
Technical
•
easy
What evaluation metrics would you use for a highly imbalanced classification problem, and why is accuracy a poor choice?
#Evaluation Metrics
#Precision
#Recall
#F1-Score
Data Scientist
•
Technical
•
hard
How do you ensure that your machine learning models are fair and unbiased, especially when dealing with sensitive attributes in financial lending?
#AI Ethics
#Bias Mitigation
#Fairness
#Explainability
Machine Learning Engineer
•
Technical
•
medium
In the context of credit card fraud detection for a financial client, how would you handle a highly imbalanced dataset where fraudulent transactions represent less than 0.1% of the data?
#Imbalanced Data
#Sampling Techniques
#Evaluation Metrics
Machine Learning Engineer
•
Technical
•
hard
KPMG often works with highly regulated clients. How do you ensure model explainability (XAI) for a complex deep learning model?
#Explainable AI
#SHAP
#LIME
Machine Learning Engineer
•
Technical
•
medium
Explain the difference between Random Forest and Gradient Boosting. Which would you prefer for modeling tabular financial risk data, and why?
#Ensemble Methods
#Decision Trees
#Model Selection
Machine Learning Engineer
•
Technical
•
medium
How do you evaluate an NLP model used for extracting specific regulatory clauses from lengthy legal contracts?
#NLP
#Information Extraction
#Evaluation Metrics
Machine Learning Engineer
•
Technical
•
hard
What is data leakage, and how do you prevent it specifically in time-series forecasting models?
#Time-series
#Data Leakage
#Cross-validation
Machine Learning Engineer
•
Technical
•
hard
Explain how you would use Retrieval-Augmented Generation (RAG) to build a secure Q&A bot for internal tax policy documents.
#LLMs
#RAG
#Vector Databases
Machine Learning Engineer
•
Technical
•
easy
How do you choose between L1 (Lasso) and L2 (Ridge) regularization? When would you use Elastic Net?
#Regularization
#Feature Selection
#Linear Models
Machine Learning Engineer
•
Technical
•
medium
Explain the vanishing gradient problem in deep neural networks and discuss methods to mitigate it.
#Deep Learning
#Neural Networks
#Optimization
Machine Learning Engineer
•
Technical
•
medium
What metrics would you use to evaluate a classification model where false positives are extremely costly (e.g., flagging a compliant client as high-risk)?
#Evaluation Metrics
#Precision vs Recall
#Business Impact
Machine Learning Engineer
•
Technical
•
hard
How do you handle missing values in a dataset where the missingness is not at random (MNAR)?
#Data Imputation
#Statistics
#Data Quality
Machine Learning Engineer
•
Technical
•
hard
Describe the attention mechanism in Transformer models. Why is it more effective than RNNs for processing long documents?
#Transformers
#NLP
#Deep Learning
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.