Accenture
Global professional services company with leading capabilities in digital, cloud and security.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had to push back on a client's unrealistic technical requirement or deadline. How did you handle it?
#Communication
#Stakeholder Management
#Consulting
Data Engineer
•
Behavioral
•
medium
Tell me about a time your ETL pipeline failed in production during off-hours. What was your troubleshooting process?
#Incident Management
#Troubleshooting
#Reliability
Data Engineer
•
Behavioral
•
medium
Tell me about a time you optimized an existing data pipeline and significantly reduced cloud compute costs.
#Performance Tuning
#Cloud Costs
#FinOps
Data Engineer
•
Behavioral
•
medium
Explain a complex data engineering concept (like MapReduce or Data Lakehouse) to a non-technical business stakeholder.
#Stakeholder Management
#Consulting
#Communication
Data Engineer
•
Behavioral
•
easy
Describe a situation where you had to learn a new cloud technology or tool very quickly to deliver a project for a client.
#Continuous Learning
#Consulting
#Agile
Data Engineer
•
Coding
•
medium
Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() in SQL. Can you write a query to find the 3rd highest salary in each department using one of these?
#Window Functions
#Data Aggregation
#SQL Queries
Data Engineer
•
Coding
•
medium
Write a Python function to flatten a deeply nested JSON object or dictionary into a single-level dictionary with concatenated keys.
#Python
#Data Structures
#Recursion
#JSON
Data Engineer
•
Coding
•
easy
Write a SQL query to find all employees who earn more than their direct managers.
#Self Joins
#SQL Queries
Data Engineer
•
Coding
•
medium
You have a large CSV file (50GB) that cannot fit into memory. Write a Python script (without using Pandas/Spark) to read the file, count the occurrences of a specific value in a column, and output the result.
#Python
#Memory Management
#File I/O
Data Engineer
•
Coding
•
medium
Write a PySpark script to read a parquet file, drop duplicate rows based on 'user_id' keeping the row with the most recent 'update_timestamp', and write back to a Delta table.
#PySpark
#Window Functions
#Delta Lake
Data Engineer
•
Coding
•
medium
Write a SQL query to calculate the rolling 7-day average of sales for each product.
#Window Functions
#Time Series
#SQL Queries
Data Engineer
•
Coding
•
easy
Given an array of integers, write a Python function to return the indices of the two numbers that add up to a specific target. (Two Sum)
#Python
#Arrays
#Hash Maps
Data Engineer
•
Coding
•
medium
Write a SQL query to delete duplicate rows from a table without using a temporary table, keeping only the row with the lowest ID.
#Data Cleansing
#SQL Queries
Data Engineer
•
Coding
•
medium
Write a Python script to merge two dictionaries. If there are overlapping keys, sum their values.
#Python
#Dictionaries
#Data Manipulation
Data Engineer
•
Coding
•
medium
Write a SQL query to pivot rows into columns. For example, turning a table of 'Employee, Month, Salary' into 'Employee, Jan_Salary, Feb_Salary, etc.'
#Data Transformation
#SQL Queries
#Pivot
Data Engineer
•
System Design
•
hard
Design a batch ETL pipeline to migrate 10TB of historical data from an on-premise Oracle database to Snowflake on Azure. What tools would you use and how would you orchestrate it?
#Azure Data Factory
#Snowflake
#ETL
#Data Migration
Data Engineer
•
System Design
•
hard
Design a real-time streaming pipeline for an e-commerce client to process clickstream data and detect fraudulent transactions within seconds.
#Kafka
#Spark Streaming
#Real-time Processing
#Fraud Detection
Data Engineer
•
System Design
•
medium
How do you ensure data quality and handle bad records in your ETL pipelines?
#Data Governance
#ETL
#Error Handling
Data Engineer
•
System Design
•
medium
Design a data reconciliation framework to verify that all data from a source SQL Server made it accurately to the target Snowflake data warehouse.
#Data Reconciliation
#Data Quality
#ETL
Data Engineer
•
System Design
•
hard
How would you design a Data Lake architecture for a healthcare client ensuring strict PII and HIPAA compliance?
#Data Security
#Healthcare
#Data Lake
#Compliance
Data Engineer
•
System Design
•
medium
How do you handle late-arriving data in a daily batch ETL pipeline?
#Data Engineering
#Batch Processing
#Data Quality
Data Engineer
•
Technical
•
hard
How do you handle data skewness in PySpark when joining a very large fact table with a dimension table?
#PySpark
#Performance Tuning
#Data Skewness
#Salting
Data Engineer
•
Technical
•
easy
What is the difference between REPARTITION and COALESCE in PySpark? When would you use one over the other?
#PySpark
#Data Partitioning
#Performance Optimization
Data Engineer
•
Technical
•
medium
Explain Slowly Changing Dimensions (SCD). How would you implement an SCD Type 2 in a data warehouse using SQL or PySpark?
#Data Warehousing
#SCD
#Dimensional Modeling
Data Engineer
•
Technical
•
hard
How does the Catalyst Optimizer work in Spark? Explain the logical and physical plan generation.
#Spark Internals
#Catalyst Optimizer
#Query Execution
Data Engineer
•
Technical
•
medium
Describe the architecture of Snowflake. What are virtual warehouses and micro-partitions?
#Snowflake
#Cloud Architecture
#Storage and Compute
Data Engineer
•
Technical
•
medium
In Azure Data Factory, how do you pass parameters dynamically between a pipeline and a dataset?
#Azure Data Factory
#Parameterization
#CI/CD
Data Engineer
•
Technical
•
medium
What is the 'small files problem' in Hadoop/Spark, and how do you resolve it?
#HDFS
#Spark
#Performance Tuning
#File Formats
Data Engineer
•
Technical
•
easy
Explain the difference between a Star Schema and a Snowflake Schema. Which one performs better in a modern columnar cloud data warehouse?
#Data Warehousing
#Star Schema
#Snowflake Schema
Data Engineer
•
Technical
•
medium
What are ACID transactions, and how does Delta Lake implement them on top of cloud object storage like S3 or ADLS?
#Delta Lake
#ACID
#Databricks
Data Engineer
•
Technical
•
medium
Explain how Airflow architecture works. What are DAGs, Operators, and XComs?
#Apache Airflow
#Data Orchestration
#DAGs
Data Engineer
•
Technical
•
medium
How do Kafka consumer groups work? What happens if you have more consumers in a group than partitions in a topic?
#Apache Kafka
#Streaming
#Distributed Systems
Data Engineer
•
Technical
•
medium
What is a Factless Fact Table? Give a real-world business scenario where you would use one.
#Data Warehousing
#Dimensional Modeling
Data Engineer
•
Technical
•
easy
What is the difference between an RDD, a DataFrame, and a Dataset in Spark?
#Spark
#Data Structures
Data Engineer
•
Technical
•
medium
Compare AWS Glue and Amazon EMR. When would you choose one over the other for an ETL workload?
#AWS
#ETL
#Big Data
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.