EY

EY

Ernst & Young Global Limited, a multinational professional services partnership.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Scientist Behavioral medium

Tell me about a time you had to explain a highly technical data science concept or model limitation to a client or senior stakeholder who had no technical background.

#Stakeholder Management #Communication #Consulting Skills
Data Scientist Behavioral medium

Describe a situation where a client provided you with data that was incredibly messy, incomplete, or structurally flawed. How did you handle it to deliver the project on time?

#Data Cleaning #Client Management #Adaptability
Data Scientist Behavioral hard

Tell me about a time you disagreed with a senior manager or partner regarding the technical approach to a data science problem. How did you resolve it?

#Conflict Resolution #Communication #Influence
Data Scientist Behavioral medium

Describe a time when a machine learning model you deployed failed or degraded in production. How did you diagnose and fix the issue?

#MLOps #Troubleshooting #Accountability
Data Scientist Behavioral easy

Why do you want to work at EY, and how do you see the role of a Data Scientist differing in a Big 4 consulting firm compared to a traditional tech company?

#Company Knowledge #Career Goals #Consulting Mindset
Data Scientist Behavioral medium

Tell me about a time you had to manage multiple competing priorities across different client engagements. How did you ensure nothing fell through the cracks?

#Prioritization #Organization #Consulting Skills
Data Scientist Behavioral medium

Describe a time when you realized halfway through a project that your initial machine learning approach was not going to work. How did you pivot?

#Problem Solving #Agile #Resilience
Data Scientist Coding medium

Write a SQL query using window functions to find the top 3 clients by revenue in each of EY's global service lines (Assurance, Consulting, Tax, Strategy and Transactions) for the last fiscal year.

#SQL #Window Functions #Data Aggregation
Data Scientist Coding easy

Given an array of transaction amounts and a target fraudulent sum, write a function to return the indices of the two transactions that add up exactly to the target sum.

#Arrays #Hash Maps #Time Complexity
Data Scientist Coding medium

Write a Python function using Pandas to merge a client CRM dataset with a transaction log dataset, ensuring that missing values in the 'industry' column are imputed with the most frequent industry for that region.

#Python #Pandas #Data Cleaning #Imputation
Data Scientist Coding medium

Write a SQL query to calculate the month-over-month percentage growth in billable hours for each consultant.

#SQL #Window Functions #Time Series Data
Data Scientist Coding easy

Given a string containing just the characters '(', ')', '{', '}', '[' and ']', determine if the input string is valid. This is useful for parsing nested JSON logs from Azure.

#Stacks #String Parsing
Data Scientist Coding medium

Write a Python script to find the Kth largest transaction amount in an unsorted array of millions of transactions.

#Heaps #Sorting #Optimization
Data Scientist Coding medium

You have a table of employee project assignments with start and end dates. Write a SQL query to find any overlapping project assignments for the same employee.

#SQL #Self Joins #Date Functions
Data Scientist Coding medium

Given a list of intervals representing meeting times, write a function to merge all overlapping intervals. This is useful for calculating continuous billable periods.

#Arrays #Sorting #Intervals
Data Scientist Coding hard

Write a Python function to calculate the TF-IDF scores for a corpus of documents from scratch (without using scikit-learn).

#NLP #Math #Python
Data Scientist Coding easy

Write a SQL query to find the 3rd highest salary from an Employee table. If there is no 3rd highest salary, return null.

#SQL #Subqueries #LIMIT/OFFSET
Data Scientist Coding easy

Given a dataset of historical stock prices, write a Python algorithm to calculate the maximum profit you could achieve from a single buy and a single sell transaction.

#Arrays #Dynamic Programming #Optimization
Data Scientist System Design hard

Design an end-to-end machine learning pipeline on Microsoft Azure to predict client churn for our tax advisory services. Walk me through data ingestion, model training, deployment, and monitoring.

#Azure ML #MLOps #Pipeline Design #Model Deployment
Data Scientist System Design hard

We are conducting due diligence for an M&A deal and have thousands of PDF contracts. How would you design an NLP solution to extract specific liability clauses and summarize them?

#NLP #OCR #Information Extraction #LLMs
Data Scientist System Design hard

Design a real-time anomaly detection system for a global bank's SWIFT transactions to prevent money laundering.

#Real-time Processing #Anomaly Detection #Streaming #Kafka
Data Scientist System Design hard

Design a recommendation engine to cross-sell EY's advisory services to existing audit clients, ensuring strict compliance with independence rules.

#Recommendation Systems #Collaborative Filtering #Data Privacy #Business Logic
Data Scientist System Design hard

Design a scalable data architecture to ingest, process, and analyze daily point-of-sale data from 10,000 retail locations for a supply chain optimization project.

#Big Data #Data Warehousing #ETL/ELT #Cloud Architecture
Data Scientist System Design medium

Design a system to automatically redact Personally Identifiable Information (PII) from millions of client emails and documents before they are used for model training.

#Data Privacy #NLP #Data Engineering #Security
Data Scientist Technical medium

In financial crime consulting, we often deal with highly imbalanced datasets (e.g., 0.01% fraud cases). How would you approach building and evaluating a machine learning model for this scenario?

#Imbalanced Data #Fraud Detection #Evaluation Metrics #SMOTE
Data Scientist Technical hard

Explain how you would use SHAP or LIME to explain a complex XGBoost credit risk model to a non-technical audit partner who needs to sign off on its regulatory compliance.

#Model Interpretability #SHAP #LIME #Stakeholder Communication
Data Scientist Technical medium

What is the bias-variance tradeoff? How does it apply when tuning hyperparameters for a Random Forest model predicting audit anomalies?

#Model Theory #Bias-Variance Tradeoff #Random Forest #Hyperparameter Tuning
Data Scientist Technical medium

Explain the difference between L1 (Lasso) and L2 (Ridge) regularization. In what EY consulting scenario would you prefer L1 over L2?

#Regularization #Linear Models #Feature Selection
Data Scientist Technical medium

How do you check for stationarity in a time series dataset, and why is it a necessary step before building an ARIMA model for supply chain demand forecasting?

#Time Series Forecasting #ARIMA #Stationarity #Statistical Tests
Data Scientist Technical medium

What is multicollinearity? How do you detect it, and how does it impact a logistic regression model used for credit scoring?

#Regression #Statistics #Multicollinearity #VIF
Data Scientist Technical hard

Explain the mathematical intuition behind Gradient Boosting. How does it differ from AdaBoost?

#Ensemble Methods #Gradient Boosting #Algorithm Theory
Data Scientist Technical medium

How would you evaluate the performance of an Unsupervised Learning model, specifically a K-Means clustering algorithm used to segment retail customers?

#Unsupervised Learning #Clustering #Evaluation Metrics
Data Scientist Technical medium

Explain the concept of Data Drift and Concept Drift. How would you implement monitoring for these in a deployed Azure ML model?

#Model Monitoring #Data Drift #Concept Drift #Azure ML
Data Scientist Technical medium

What are Word Embeddings? Compare Word2Vec with transformer-based embeddings like BERT in the context of analyzing financial sentiment in news articles.

#NLP #Word Embeddings #BERT #Word2Vec
Data Scientist Technical hard

How do you ensure that a machine learning model used for HR recruiting or loan approvals does not exhibit algorithmic bias?

#Algorithmic Fairness #Bias Mitigation #Responsible AI

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now