EY

Ernst & Young Global Limited, a multinational professional services partnership.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics SQL 5 PySpark 5 System Design 4 Culture Fit 3 Cloud 3 Leadership 3 Python 3 Orchestration 2

Data Engineer • Behavioral • medium

Tell me about a time you had to explain a complex data pipeline failure or technical issue to a non-technical client partner.

#Communication #Client Management #Consulting

Practice

Data Engineer • Behavioral • medium

Describe a situation where a client changed the data requirements halfway through a sprint. How did you handle it?

#Agile #Adaptability #Stakeholder Management

Practice

Data Engineer • Behavioral • easy

Why do you want to work at EY? How do your career goals align with our mission of 'Building a better working world'?

#Company Knowledge #Motivation

Practice

Data Engineer • Behavioral • medium

Tell me about a time you found a critical data discrepancy in a production environment. What was your troubleshooting process?

#Problem Solving #Incident Management #Accountability

Practice

Data Engineer • Behavioral • medium

Describe a time you disagreed with a senior engineer or architect's design on a client project. How did you resolve the disagreement?

#Conflict Resolution #Teamwork #Professionalism

Practice

Data Engineer • Behavioral • medium

Describe a time you optimized a slow-running ETL pipeline. What specific metrics did you improve, and what was the business impact?

#Performance Optimization #Impact #Technical Leadership

Practice

Data Engineer • Coding • medium

Write a SQL query using window functions to calculate the 7-day rolling average of daily transaction volumes for our financial audit clients.

#Window Functions #Data Aggregation #Time Series

Practice

Data Engineer • Coding • hard

Given a table of user logins, write a SQL query to find the maximum number of consecutive days each user logged in.

#Advanced SQL #Gaps and Islands Problem #CTEs

Practice

Data Engineer • Coding • medium

Write a SQL query to identify and delete duplicate records in a massive transaction table without using the DISTINCT keyword.

#Data Cleansing #CTEs #Window Functions

Practice

Data Engineer • Coding • medium

Write a PySpark script to read a CSV file from Azure Data Lake, filter out records with null client IDs, and write the output to Parquet format partitioned by transaction date.

#Data I/O #DataFrames #Partitioning

Practice

Data Engineer • Coding • easy

Write a PySpark snippet to perform a left anti join. Explain a business use case for this operation in an audit context.

#Joins #Data Validation

Practice

Data Engineer • Coding • hard

Write a Python function to flatten a deeply nested JSON object representing complex financial records from a client API.

#Data Manipulation #Recursion #JSON

Practice

Data Engineer • Coding • medium

Write a Python script using boto3 or azure-storage-blob to upload a local file to cloud storage, including basic error handling and logging.

#Cloud SDKs #Error Handling #I/O

Practice

Data Engineer • Coding • medium

Write a SQL query to find the second highest salary in each department. If a department has less than two employees, return null for that department.

#Window Functions #CTEs #Aggregations

Practice

Data Engineer • System Design • hard

Design a batch processing pipeline to ingest daily financial audit logs from 50 different client on-premise systems into a centralized Azure Data Lake.

#Azure Data Factory #Batch Processing #Data Integration

Practice

Data Engineer • System Design • hard

Design a real-time streaming pipeline for detecting fraudulent credit card transactions for a banking client.

#Stream Processing #Kafka/Event Hubs #Fraud Detection

Practice

Data Engineer • System Design • hard

How would you design a data migration strategy from an on-premise Oracle database to Azure Synapse Analytics with minimal downtime?

#Cloud Migration #Azure Synapse #Change Data Capture (CDC)

Practice

Data Engineer • System Design • hard

Design a data reconciliation process to ensure data integrity between a source ERP system and a target cloud data warehouse.

#Data Quality #Reconciliation #Audit

Practice

Data Engineer • Technical • easy

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER(). Provide a scenario in a client reporting dashboard where you would choose DENSE_RANK() over RANK().

#Window Functions #Data Ranking

Practice

Data Engineer • Technical • hard

How do you handle data skewness when performing a join in PySpark on a massive dataset of retail transactions?

#Performance Tuning #Data Skew #Distributed Computing

Practice

Data Engineer • Technical • medium

Explain the difference between narrow and wide transformations in Spark. Why is this distinction important for optimizing ETL pipelines?

#Spark Architecture #Transformations #Shuffling

Practice

Data Engineer • Technical • easy

What is the difference between repartition() and coalesce() in PySpark? When would you use one over the other?

#Partitioning #Performance Tuning

Practice

Data Engineer • Technical • medium

Explain the Medallion Architecture (Bronze, Silver, Gold). How have you implemented this in Databricks for a client project?

#Data Lakehouse #Databricks #Data Modeling

Practice

Data Engineer • Technical • medium

Explain the architecture of Azure Data Factory. What is the role of an Integration Runtime, and when would you use a Self-Hosted IR?

#Azure Data Factory #Cloud Architecture

Practice

Data Engineer • Technical • hard

How do you secure data at rest and in transit in Azure Data Lake Storage Gen2? How do you manage access for different client teams?

#Cloud Security #Azure #IAM

Practice

Data Engineer • Technical • medium

In Azure Databricks, how do you manage secrets and credentials securely without hardcoding them in your notebooks?

#Security #Databricks #Azure Key Vault

Practice

Data Engineer • Technical • medium

What is Delta Lake? Explain how it achieves ACID transactions on top of cloud object storage.

#Delta Lake #ACID #Data Lakehouse

Practice

Data Engineer • Technical • easy

Explain the Parquet file format. Why is it preferred over CSV or JSON in big data processing pipelines?

#File Formats #Performance

Practice

Data Engineer • Technical • medium

What are Slowly Changing Dimensions (SCD)? Explain the difference between Type 1, Type 2, and Type 3 with examples.

#Data Warehousing #Dimensional Modeling #SCD

Practice

Data Engineer • Technical • medium

Explain the difference between a Star Schema and a Snowflake Schema. Which is generally preferred in modern cloud data warehouses like Snowflake or Synapse, and why?

#Data Warehousing #Schema Design

Practice

Data Engineer • Technical • medium

How do you manage memory efficiently when processing large datasets in Python (e.g., 10GB CSV) without using distributed frameworks like Spark?

#Memory Management #Pandas #Generators

Practice

Data Engineer • Technical • medium

How do you handle dependency management and failure retries in Apache Airflow?

#Apache Airflow #DAGs #Error Handling

Practice

Data Engineer • Technical • medium

Explain the concept of XComs in Airflow. What are their limitations, and how do you pass large datasets between tasks?

#Apache Airflow #Data Passing

Practice

Data Engineer • Technical • medium

What is dbt (data build tool)? How does it fit into the modern data stack, and what are the benefits of using it for transformations?

#dbt #ELT #Data Transformation

Practice

Data Engineer • Technical • medium

How do you ensure CI/CD (Continuous Integration / Continuous Deployment) in your data engineering projects? Describe the tools and workflow.

#CI/CD #Git #Azure DevOps/GitHub Actions

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now