KPMG
Multinational professional services network, and one of the Big Four accounting organizations.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Machine Learning Engineer
•
Behavioral
•
medium
Imagine you are presenting the results of a complex predictive model to a non-technical audit partner at KPMG. How do you explain the model's predictions and build trust in the system?
#Stakeholder Management
#Explainable AI
#Consulting
Machine Learning Engineer
•
Behavioral
•
medium
Tell me about a time you had to push back on a client or stakeholder who had unrealistic expectations about what AI/ML could achieve.
#Client Management
#Scope Management
#Expectation Setting
Machine Learning Engineer
•
Behavioral
•
hard
KPMG places a high value on ethical AI. How do you ensure your machine learning models do not introduce or amplify bias, especially in loan approval risk assessments?
#Ethical AI
#Bias Mitigation
#Fairness
Machine Learning Engineer
•
Behavioral
•
medium
Describe a time you had to build a model using messy, unstructured, or incomplete client data. How did you handle it?
#Data Cleaning
#Problem Solving
#Resilience
Machine Learning Engineer
•
Behavioral
•
medium
Tell me about a situation where a deployed model's performance degraded over time. How did you diagnose and resolve the issue?
#Troubleshooting
#Production ML
#Continuous Improvement
Machine Learning Engineer
•
Behavioral
•
medium
Describe a time you collaborated with a cross-functional team (data engineers, SMEs, business analysts) to deliver an end-to-end ML solution.
#Teamwork
#Cross-functional Collaboration
#Project Delivery
Machine Learning Engineer
•
Coding
•
medium
Write a SQL query to identify the top 3 highest-value suspicious transactions for each corporate client over the past 30 days.
#Window Functions
#Aggregations
#Time-series
Machine Learning Engineer
•
Coding
•
easy
Write a Python function to calculate the 7-day moving average of a time series array representing daily transaction volumes.
#Arrays
#Sliding Window
#Python
Machine Learning Engineer
•
Coding
•
medium
Given a Pandas dataframe of audit system logs, write code to efficiently group by user and calculate the time difference between consecutive logins.
#Pandas
#Data Wrangling
#Time-series
Machine Learning Engineer
•
Coding
•
medium
Implement a Python function to compute the cosine similarity between two sparse vectors represented as dictionaries.
#Math
#Data Structures
#Sparse Matrices
Machine Learning Engineer
•
Coding
•
medium
Write a SQL query using window functions to calculate the cumulative sum of revenue per department, ordered by transaction date.
#Window Functions
#Cumulative Sum
#Data Aggregation
Machine Learning Engineer
•
Coding
•
easy
Given a string of unstructured text from a financial report, write a Python script using regex to extract all monetary values (e.g., '$1,000.50', '€500').
#Regex
#Python
#Text Processing
Machine Learning Engineer
•
Coding
•
hard
Write an algorithm to detect a cycle in a directed graph. This is often used in anti-money laundering (AML) to detect circular financial transactions.
#Graphs
#DFS
#Cycle Detection
Machine Learning Engineer
•
Coding
•
medium
How would you optimize a Pandas script that is running out of memory on a 10GB dataset?
#Pandas
#Memory Optimization
#Python
Machine Learning Engineer
•
System Design
•
hard
Design an end-to-end machine learning system to automatically extract and categorize line items from millions of scanned tax documents and receipts.
#OCR
#NLP
#Batch Processing
#Cloud Architecture
Machine Learning Engineer
•
System Design
•
hard
Design a real-time fraud detection API. What are the latency requirements, and how do you ensure the model meets them under high load?
#Real-time Inference
#API Design
#Latency Optimization
Machine Learning Engineer
•
System Design
•
medium
Explain your approach to monitoring data drift and concept drift in a production ML environment.
#Model Monitoring
#Data Drift
#Concept Drift
Machine Learning Engineer
•
System Design
•
medium
How do you manage machine learning experiments, track hyperparameters, and handle model versioning in a collaborative team setting?
#Experiment Tracking
#Model Registry
#MLflow
Machine Learning Engineer
•
System Design
•
hard
Design a system to continuously retrain a financial forecasting model as new transaction data arrives weekly.
#CI/CD for ML
#Automated Retraining
#Pipeline Orchestration
Machine Learning Engineer
•
System Design
•
medium
What are the trade-offs between batch inference and real-time inference? Give an example of a KPMG use case for each.
#Batch Processing
#Real-time Processing
#Architecture
Machine Learning Engineer
•
System Design
•
hard
How would you design a scalable data pipeline using PySpark to process terabytes of transaction logs for feature engineering?
#PySpark
#Big Data
#Distributed Computing
Machine Learning Engineer
•
System Design
•
hard
Explain how you would secure sensitive Personally Identifiable Information (PII) data within an ML training pipeline.
#Data Privacy
#Security
#Compliance
Machine Learning Engineer
•
System Design
•
medium
How would you approach building a predictive model to identify advisory clients at risk of churn?
#Predictive Modeling
#Feature Engineering
#Business Strategy
Machine Learning Engineer
•
System Design
•
medium
Walk me through how you would deploy a trained PyTorch model as a scalable web service using Azure Machine Learning.
#Azure ML
#Model Deployment
#Cloud Computing
Machine Learning Engineer
•
Technical
•
medium
In the context of credit card fraud detection for a financial client, how would you handle a highly imbalanced dataset where fraudulent transactions represent less than 0.1% of the data?
#Imbalanced Data
#Sampling Techniques
#Evaluation Metrics
Machine Learning Engineer
•
Technical
•
medium
Explain the difference between Random Forest and Gradient Boosting. Which would you prefer for modeling tabular financial risk data, and why?
#Ensemble Methods
#Decision Trees
#Model Selection
Machine Learning Engineer
•
Technical
•
medium
How do you evaluate an NLP model used for extracting specific regulatory clauses from lengthy legal contracts?
#NLP
#Information Extraction
#Evaluation Metrics
Machine Learning Engineer
•
Technical
•
hard
What is data leakage, and how do you prevent it specifically in time-series forecasting models?
#Time-series
#Data Leakage
#Cross-validation
Machine Learning Engineer
•
Technical
•
hard
Explain how you would use Retrieval-Augmented Generation (RAG) to build a secure Q&A bot for internal tax policy documents.
#LLMs
#RAG
#Vector Databases
Machine Learning Engineer
•
Technical
•
easy
How do you choose between L1 (Lasso) and L2 (Ridge) regularization? When would you use Elastic Net?
#Regularization
#Feature Selection
#Linear Models
Machine Learning Engineer
•
Technical
•
medium
Explain the vanishing gradient problem in deep neural networks and discuss methods to mitigate it.
#Deep Learning
#Neural Networks
#Optimization
Machine Learning Engineer
•
Technical
•
medium
What metrics would you use to evaluate a classification model where false positives are extremely costly (e.g., flagging a compliant client as high-risk)?
#Evaluation Metrics
#Precision vs Recall
#Business Impact
Machine Learning Engineer
•
Technical
•
hard
How do you handle missing values in a dataset where the missingness is not at random (MNAR)?
#Data Imputation
#Statistics
#Data Quality
Machine Learning Engineer
•
Technical
•
hard
Describe the attention mechanism in Transformer models. Why is it more effective than RNNs for processing long documents?
#Transformers
#NLP
#Deep Learning
Machine Learning Engineer
•
Technical
•
hard
KPMG often works with highly regulated clients. How do you ensure model explainability (XAI) for a complex deep learning model?
#Explainable AI
#SHAP
#LIME
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.