PwC

PwC

PricewaterhouseCoopers, a multinational professional services network.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Scientist Behavioral medium

Explain how a Random Forest model works to a non-technical Audit Partner who is skeptical about using AI for risk assessment.

#Stakeholder Management #Model Explainability #Consulting Skills
Data Scientist Behavioral medium

Tell me about a time you discovered a significant data quality issue in a client's dataset right before a major deliverable. How did you handle it?

#Data Quality #Time Management #Client Communication
Data Scientist Behavioral hard

You are building a credit risk model. The client insists on using a complex deep learning model, but regulations require strict explainability. How do you proceed?

#Model Explainability #Client Management #Regulatory Compliance
Data Scientist Behavioral easy

Describe a time when you had to learn a completely new technology or framework to deliver a project on a tight deadline.

#Continuous Learning #Agile #Time Management
Data Scientist Behavioral medium

Tell me about a time you disagreed with a senior team member or Manager regarding the technical approach to a data science problem.

#Conflict Resolution #Leadership #Communication
Data Scientist Behavioral hard

Describe a situation where a model you built failed or underperformed in production. What was the root cause and how did you fix it?

#Failure #Debugging #Continuous Improvement
Data Scientist Behavioral easy

Why PwC? With your technical background, why are you interested in consulting rather than working for a tech company or a startup?

#Career Goals #Consulting #Motivation
Data Scientist Behavioral medium

You are managing a data science project where the client keeps adding new feature requests (scope creep). How do you manage this while keeping the client happy?

#Scope Management #Client Communication #Agile
Data Scientist Coding medium

Write a SQL query to find the top 3 highest-value transactions for each client in our audit database, including ties.

#Window Functions #DENSE_RANK #Data Aggregation
Data Scientist Coding medium

Write a Python function using Pandas to identify and merge duplicate client records based on fuzzy matching of company names and exact matching of tax IDs.

#Pandas #Fuzzy Matching #Data Cleaning
Data Scientist Coding medium

Given a table of employee timesheets, write a SQL query to calculate the rolling 7-day average of billable hours per consultant.

#Window Functions #Time Series #Moving Averages
Data Scientist Coding medium

Write a Python script to parse a directory of JSON files containing nested audit logs, extract specific error codes, and output a flattened CSV.

#Python #JSON Parsing #File I/O #Data Flattening
Data Scientist Coding medium

Write a SQL query to find the month-over-month percentage growth in revenue for each product category.

#LAG Function #CTEs #Percentage Calculation
Data Scientist Coding easy

Write a Python function that takes a list of strings representing financial document titles and returns the longest common prefix among them.

#Strings #Arrays #Optimization
Data Scientist Coding hard

Write a SQL query to identify 'island' periods of continuous active subscription for users, given a table of start and end dates that may overlap.

#Gaps and Islands #Advanced SQL #Self Joins
Data Scientist Coding medium

Write a Python script using PySpark to read a 10TB CSV file from an S3 bucket, filter out invalid records, and aggregate total sales by region.

#PySpark #Distributed Computing #Data Aggregation
Data Scientist Coding hard

Write a SQL query to find the median salary of employees in each department without using the built-in MEDIAN() function.

#Percentiles #Window Functions #Math
Data Scientist Coding easy

Implement a binary search algorithm in Python to find a specific transaction ID in a sorted list of 10 million records.

#Binary Search #Time Complexity #Python
Data Scientist System Design hard

Design an end-to-end document extraction system using Generative AI and RAG to pull key clauses from thousands of PDF vendor contracts.

#NLP #RAG #LLMs #Vector Databases #OCR
Data Scientist System Design medium

Design a churn prediction pipeline for a telecommunications client. How do you ensure the model's predictions are actionable for their marketing team?

#Classification #Pipeline Design #Actionable Insights #SHAP
Data Scientist System Design hard

Design a scalable architecture on Azure to process daily batches of 50GB of retail transaction data, run a forecasting model, and update a PowerBI dashboard.

#Azure #Data Factory #Databricks #Batch Processing
Data Scientist System Design medium

Design a recommendation engine for a wealth management firm to suggest investment products to high-net-worth individuals.

#Recommendation Systems #Collaborative Filtering #Cold Start Problem
Data Scientist System Design hard

Design an anomaly detection system for a client's IT network to identify potential cybersecurity breaches in real-time.

#Anomaly Detection #Streaming Data #Kafka #Unsupervised Learning
Data Scientist System Design medium

A client wants to use a Large Language Model (LLM) to automatically draft responses to customer complaints. What are the primary risks, and how do you mitigate them?

#LLMs #Generative AI #Risk Management #Hallucinations
Data Scientist System Design hard

Design a system to match millions of incoming bank transactions to open invoices in an ERP system to automate account reconciliation.

#Record Linkage #Optimization #ERP #Heuristics
Data Scientist Technical hard

Given a dataset of financial transactions with a 0.1% fraud rate, how would you build and evaluate a machine learning model to detect fraudulent activities for a banking client?

#Imbalanced Data #Fraud Detection #Evaluation Metrics #SMOTE
Data Scientist Technical medium

What is the difference between L1 and L2 regularization, and in what specific consulting scenario would you choose one over the other?

#Regularization #Lasso #Ridge #Feature Selection
Data Scientist Technical easy

How would you explain the concept of a p-value to a client who wants to know if their new marketing campaign was successful?

#Hypothesis Testing #A/B Testing #Communication
Data Scientist Technical medium

What are the assumptions of linear regression? How would you test for them, and what would you do if the homoscedasticity assumption is violated?

#Linear Regression #Statistical Assumptions #Heteroscedasticity
Data Scientist Technical hard

How does Gradient Boosting differ from AdaBoost? Explain the mathematical intuition behind how XGBoost optimizes its objective function.

#Ensemble Methods #XGBoost #Optimization #Mathematics
Data Scientist Technical medium

A client provides you with a dataset where 40% of the values in a critical column are missing. Walk me through your strategy for handling this.

#Missing Data #Imputation #EDA
Data Scientist Technical medium

Explain the concept of Data Drift and Concept Drift. How would you monitor for these in a deployed pricing optimization model?

#Model Monitoring #Data Drift #Concept Drift #MLOps
Data Scientist Technical medium

How do you evaluate the performance of an unsupervised clustering model, such as K-Means, when you don't have ground truth labels?

#Clustering #Unsupervised Learning #Evaluation Metrics
Data Scientist Technical medium

What is the curse of dimensionality, and how does it affect distance-based algorithms like KNN? How do you mitigate it?

#Dimensionality Reduction #KNN #PCA #Feature Engineering
Data Scientist Technical hard

Explain the architecture and mathematical intuition behind Transformers. Why have they largely replaced RNNs/LSTMs in NLP tasks?

#Transformers #Attention Mechanism #NLP #Deep Learning

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now