EY

EY

Ernst & Young Global Limited, a multinational professional services partnership.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time you had to explain a complex data pipeline failure or technical issue to a non-technical client partner.

#Communication #Client Management #Consulting
Data Engineer Behavioral medium

Describe a situation where a client changed the data requirements halfway through a sprint. How did you handle it?

#Agile #Adaptability #Stakeholder Management
Data Engineer Behavioral easy

Why do you want to work at EY? How do your career goals align with our mission of 'Building a better working world'?

#Company Knowledge #Motivation
Data Engineer Behavioral medium

Tell me about a time you found a critical data discrepancy in a production environment. What was your troubleshooting process?

#Problem Solving #Incident Management #Accountability
Data Engineer Behavioral medium

Describe a time you disagreed with a senior engineer or architect's design on a client project. How did you resolve the disagreement?

#Conflict Resolution #Teamwork #Professionalism
Data Engineer Behavioral medium

Describe a time you optimized a slow-running ETL pipeline. What specific metrics did you improve, and what was the business impact?

#Performance Optimization #Impact #Technical Leadership
Data Engineer Coding medium

Write a SQL query using window functions to calculate the 7-day rolling average of daily transaction volumes for our financial audit clients.

#Window Functions #Data Aggregation #Time Series
Data Engineer Coding hard

Given a table of user logins, write a SQL query to find the maximum number of consecutive days each user logged in.

#Advanced SQL #Gaps and Islands Problem #CTEs
Data Engineer Coding medium

Write a SQL query to identify and delete duplicate records in a massive transaction table without using the DISTINCT keyword.

#Data Cleansing #CTEs #Window Functions
Data Engineer Coding medium

Write a PySpark script to read a CSV file from Azure Data Lake, filter out records with null client IDs, and write the output to Parquet format partitioned by transaction date.

#Data I/O #DataFrames #Partitioning
Data Engineer Coding easy

Write a PySpark snippet to perform a left anti join. Explain a business use case for this operation in an audit context.

#Joins #Data Validation
Data Engineer Coding hard

Write a Python function to flatten a deeply nested JSON object representing complex financial records from a client API.

#Data Manipulation #Recursion #JSON
Data Engineer Coding medium

Write a Python script using boto3 or azure-storage-blob to upload a local file to cloud storage, including basic error handling and logging.

#Cloud SDKs #Error Handling #I/O
Data Engineer Coding medium

Write a SQL query to find the second highest salary in each department. If a department has less than two employees, return null for that department.

#Window Functions #CTEs #Aggregations
Data Engineer System Design hard

Design a batch processing pipeline to ingest daily financial audit logs from 50 different client on-premise systems into a centralized Azure Data Lake.

#Azure Data Factory #Batch Processing #Data Integration
Data Engineer System Design hard

Design a real-time streaming pipeline for detecting fraudulent credit card transactions for a banking client.

#Stream Processing #Kafka/Event Hubs #Fraud Detection
Data Engineer System Design hard

How would you design a data migration strategy from an on-premise Oracle database to Azure Synapse Analytics with minimal downtime?

#Cloud Migration #Azure Synapse #Change Data Capture (CDC)
Data Engineer System Design hard

Design a data reconciliation process to ensure data integrity between a source ERP system and a target cloud data warehouse.

#Data Quality #Reconciliation #Audit
Data Engineer Technical easy

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER(). Provide a scenario in a client reporting dashboard where you would choose DENSE_RANK() over RANK().

#Window Functions #Data Ranking
Data Engineer Technical hard

How do you handle data skewness when performing a join in PySpark on a massive dataset of retail transactions?

#Performance Tuning #Data Skew #Distributed Computing
Data Engineer Technical medium

Explain the difference between narrow and wide transformations in Spark. Why is this distinction important for optimizing ETL pipelines?

#Spark Architecture #Transformations #Shuffling
Data Engineer Technical easy

What is the difference between repartition() and coalesce() in PySpark? When would you use one over the other?

#Partitioning #Performance Tuning
Data Engineer Technical medium

Explain the Medallion Architecture (Bronze, Silver, Gold). How have you implemented this in Databricks for a client project?

#Data Lakehouse #Databricks #Data Modeling
Data Engineer Technical medium

Explain the architecture of Azure Data Factory. What is the role of an Integration Runtime, and when would you use a Self-Hosted IR?

#Azure Data Factory #Cloud Architecture
Data Engineer Technical hard

How do you secure data at rest and in transit in Azure Data Lake Storage Gen2? How do you manage access for different client teams?

#Cloud Security #Azure #IAM
Data Engineer Technical medium

In Azure Databricks, how do you manage secrets and credentials securely without hardcoding them in your notebooks?

#Security #Databricks #Azure Key Vault
Data Engineer Technical medium

What is Delta Lake? Explain how it achieves ACID transactions on top of cloud object storage.

#Delta Lake #ACID #Data Lakehouse
Data Engineer Technical easy

Explain the Parquet file format. Why is it preferred over CSV or JSON in big data processing pipelines?

#File Formats #Performance
Data Engineer Technical medium

What are Slowly Changing Dimensions (SCD)? Explain the difference between Type 1, Type 2, and Type 3 with examples.

#Data Warehousing #Dimensional Modeling #SCD
Data Engineer Technical medium

Explain the difference between a Star Schema and a Snowflake Schema. Which is generally preferred in modern cloud data warehouses like Snowflake or Synapse, and why?

#Data Warehousing #Schema Design
Data Engineer Technical medium

How do you manage memory efficiently when processing large datasets in Python (e.g., 10GB CSV) without using distributed frameworks like Spark?

#Memory Management #Pandas #Generators
Data Engineer Technical medium

How do you handle dependency management and failure retries in Apache Airflow?

#Apache Airflow #DAGs #Error Handling
Data Engineer Technical medium

Explain the concept of XComs in Airflow. What are their limitations, and how do you pass large datasets between tasks?

#Apache Airflow #Data Passing
Data Engineer Technical medium

What is dbt (data build tool)? How does it fit into the modern data stack, and what are the benefits of using it for transformations?

#dbt #ELT #Data Transformation
Data Engineer Technical medium

How do you ensure CI/CD (Continuous Integration / Continuous Deployment) in your data engineering projects? Describe the tools and workflow.

#CI/CD #Git #Azure DevOps/GitHub Actions

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now