EY
Ernst & Young Global Limited, a multinational professional services partnership.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Scientist
•
Behavioral
•
medium
Tell me about a time you had to explain a highly technical data science concept or model limitation to a client or senior stakeholder who had no technical background.
#Stakeholder Management
#Communication
#Consulting Skills
Data Scientist
•
Behavioral
•
medium
Describe a situation where a client provided you with data that was incredibly messy, incomplete, or structurally flawed. How did you handle it to deliver the project on time?
#Data Cleaning
#Client Management
#Adaptability
Data Scientist
•
Behavioral
•
hard
Tell me about a time you disagreed with a senior manager or partner regarding the technical approach to a data science problem. How did you resolve it?
#Conflict Resolution
#Communication
#Influence
Data Scientist
•
Behavioral
•
medium
Describe a time when a machine learning model you deployed failed or degraded in production. How did you diagnose and fix the issue?
#MLOps
#Troubleshooting
#Accountability
Data Scientist
•
Behavioral
•
easy
Why do you want to work at EY, and how do you see the role of a Data Scientist differing in a Big 4 consulting firm compared to a traditional tech company?
#Company Knowledge
#Career Goals
#Consulting Mindset
Data Scientist
•
Behavioral
•
medium
Tell me about a time you had to manage multiple competing priorities across different client engagements. How did you ensure nothing fell through the cracks?
#Prioritization
#Organization
#Consulting Skills
Data Scientist
•
Behavioral
•
medium
Describe a time when you realized halfway through a project that your initial machine learning approach was not going to work. How did you pivot?
#Problem Solving
#Agile
#Resilience
Data Scientist
•
Coding
•
medium
Write a SQL query using window functions to find the top 3 clients by revenue in each of EY's global service lines (Assurance, Consulting, Tax, Strategy and Transactions) for the last fiscal year.
#SQL
#Window Functions
#Data Aggregation
Data Scientist
•
Coding
•
easy
Given an array of transaction amounts and a target fraudulent sum, write a function to return the indices of the two transactions that add up exactly to the target sum.
#Arrays
#Hash Maps
#Time Complexity
Data Scientist
•
Coding
•
medium
Write a Python function using Pandas to merge a client CRM dataset with a transaction log dataset, ensuring that missing values in the 'industry' column are imputed with the most frequent industry for that region.
#Python
#Pandas
#Data Cleaning
#Imputation
Data Scientist
•
Coding
•
medium
Write a SQL query to calculate the month-over-month percentage growth in billable hours for each consultant.
#SQL
#Window Functions
#Time Series Data
Data Scientist
•
Coding
•
easy
Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid. This is useful for parsing nested JSON logs from Azure.
#Stacks
#String Parsing
Data Scientist
•
Coding
•
medium
Write a Python script to find the Kth largest transaction amount in an unsorted array of millions of transactions.
#Heaps
#Sorting
#Optimization
Data Scientist
•
Coding
•
medium
You have a table of employee project assignments with start and end dates. Write a SQL query to find any overlapping project assignments for the same employee.
#SQL
#Self Joins
#Date Functions
Data Scientist
•
Coding
•
medium
Given a list of intervals representing meeting times, write a function to merge all overlapping intervals. This is useful for calculating continuous billable periods.
#Arrays
#Sorting
#Intervals
Data Scientist
•
Coding
•
hard
Write a Python function to calculate the TF-IDF scores for a corpus of documents from scratch (without using scikit-learn).
#NLP
#Math
#Python
Data Scientist
•
Coding
•
easy
Write a SQL query to find the 3rd highest salary from an Employee table. If there is no 3rd highest salary, return null.
#SQL
#Subqueries
#LIMIT/OFFSET
Data Scientist
•
Coding
•
easy
Given a dataset of historical stock prices, write a Python algorithm to calculate the maximum profit you could achieve from a single buy and a single sell transaction.
#Arrays
#Dynamic Programming
#Optimization
Data Scientist
•
System Design
•
hard
Design an end-to-end machine learning pipeline on Microsoft Azure to predict client churn for our tax advisory services. Walk me through data ingestion, model training, deployment, and monitoring.
#Azure ML
#MLOps
#Pipeline Design
#Model Deployment
Data Scientist
•
System Design
•
hard
We are conducting due diligence for an M&A deal and have thousands of PDF contracts. How would you design an NLP solution to extract specific liability clauses and summarize them?
#NLP
#OCR
#Information Extraction
#LLMs
Data Scientist
•
System Design
•
hard
Design a real-time anomaly detection system for a global bank's SWIFT transactions to prevent money laundering.
#Real-time Processing
#Anomaly Detection
#Streaming
#Kafka
Data Scientist
•
System Design
•
hard
Design a recommendation engine to cross-sell EY's advisory services to existing audit clients, ensuring strict compliance with independence rules.
#Recommendation Systems
#Collaborative Filtering
#Data Privacy
#Business Logic
Data Scientist
•
System Design
•
hard
Design a scalable data architecture to ingest, process, and analyze daily point-of-sale data from 10,000 retail locations for a supply chain optimization project.
#Big Data
#Data Warehousing
#ETL/ELT
#Cloud Architecture
Data Scientist
•
System Design
•
medium
Design a system to automatically redact Personally Identifiable Information (PII) from millions of client emails and documents before they are used for model training.
#Data Privacy
#NLP
#Data Engineering
#Security
Data Scientist
•
Technical
•
medium
In financial crime consulting, we often deal with highly imbalanced datasets (e.g., 0.01% fraud cases). How would you approach building and evaluating a machine learning model for this scenario?
#Imbalanced Data
#Fraud Detection
#Evaluation Metrics
#SMOTE
Data Scientist
•
Technical
•
hard
Explain how you would use SHAP or LIME to explain a complex XGBoost credit risk model to a non-technical audit partner who needs to sign off on its regulatory compliance.
#Model Interpretability
#SHAP
#LIME
#Stakeholder Communication
Data Scientist
•
Technical
•
medium
What is the bias-variance tradeoff? How does it apply when tuning hyperparameters for a Random Forest model predicting audit anomalies?
#Model Theory
#Bias-Variance Tradeoff
#Random Forest
#Hyperparameter Tuning
Data Scientist
•
Technical
•
medium
Explain the difference between L1 (Lasso) and L2 (Ridge) regularization. In what EY consulting scenario would you prefer L1 over L2?
#Regularization
#Linear Models
#Feature Selection
Data Scientist
•
Technical
•
medium
How do you check for stationarity in a time series dataset, and why is it a necessary step before building an ARIMA model for supply chain demand forecasting?
#Time Series Forecasting
#ARIMA
#Stationarity
#Statistical Tests
Data Scientist
•
Technical
•
medium
What is multicollinearity? How do you detect it, and how does it impact a logistic regression model used for credit scoring?
#Regression
#Statistics
#Multicollinearity
#VIF
Data Scientist
•
Technical
•
hard
Explain the mathematical intuition behind Gradient Boosting. How does it differ from AdaBoost?
#Ensemble Methods
#Gradient Boosting
#Algorithm Theory
Data Scientist
•
Technical
•
medium
How would you evaluate the performance of an Unsupervised Learning model, specifically a K-Means clustering algorithm used to segment retail customers?
#Unsupervised Learning
#Clustering
#Evaluation Metrics
Data Scientist
•
Technical
•
medium
Explain the concept of Data Drift and Concept Drift. How would you implement monitoring for these in a deployed Azure ML model?
#Model Monitoring
#Data Drift
#Concept Drift
#Azure ML
Data Scientist
•
Technical
•
medium
What are Word Embeddings? Compare Word2Vec with transformer-based embeddings like BERT in the context of analyzing financial sentiment in news articles.
#NLP
#Word Embeddings
#BERT
#Word2Vec
Data Scientist
•
Technical
•
hard
How do you ensure that a machine learning model used for HR recruiting or loan approvals does not exhibit algorithmic bias?
#Algorithmic Fairness
#Bias Mitigation
#Responsible AI
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.