Accenture

Accenture

Global professional services company with leading capabilities in digital, cloud and security.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time you had to push back on a client's unrealistic technical requirement or deadline. How did you handle it?

#Communication #Stakeholder Management #Consulting
Data Engineer Behavioral medium

Tell me about a time your ETL pipeline failed in production during off-hours. What was your troubleshooting process?

#Incident Management #Troubleshooting #Reliability
Data Engineer Behavioral medium

Tell me about a time you optimized an existing data pipeline and significantly reduced cloud compute costs.

#Performance Tuning #Cloud Costs #FinOps
Data Engineer Behavioral medium

Explain a complex data engineering concept (like MapReduce or Data Lakehouse) to a non-technical business stakeholder.

#Stakeholder Management #Consulting #Communication
Data Engineer Behavioral easy

Describe a situation where you had to learn a new cloud technology or tool very quickly to deliver a project for a client.

#Continuous Learning #Consulting #Agile
Data Engineer Coding medium

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() in SQL. Can you write a query to find the 3rd highest salary in each department using one of these?

#Window Functions #Data Aggregation #SQL Queries
Data Engineer Coding medium

Write a Python function to flatten a deeply nested JSON object or dictionary into a single-level dictionary with concatenated keys.

#Python #Data Structures #Recursion #JSON
Data Engineer Coding easy

Write a SQL query to find all employees who earn more than their direct managers.

#Self Joins #SQL Queries
Data Engineer Coding medium

You have a large CSV file (50GB) that cannot fit into memory. Write a Python script (without using Pandas/Spark) to read the file, count the occurrences of a specific value in a column, and output the result.

#Python #Memory Management #File I/O
Data Engineer Coding medium

Write a PySpark script to read a parquet file, drop duplicate rows based on 'user_id' keeping the row with the most recent 'update_timestamp', and write back to a Delta table.

#PySpark #Window Functions #Delta Lake
Data Engineer Coding medium

Write a SQL query to calculate the rolling 7-day average of sales for each product.

#Window Functions #Time Series #SQL Queries
Data Engineer Coding easy

Given an array of integers, write a Python function to return the indices of the two numbers that add up to a specific target. (Two Sum)

#Python #Arrays #Hash Maps
Data Engineer Coding medium

Write a SQL query to delete duplicate rows from a table without using a temporary table, keeping only the row with the lowest ID.

#Data Cleansing #SQL Queries
Data Engineer Coding medium

Write a Python script to merge two dictionaries. If there are overlapping keys, sum their values.

#Python #Dictionaries #Data Manipulation
Data Engineer Coding medium

Write a SQL query to pivot rows into columns. For example, turning a table of 'Employee, Month, Salary' into 'Employee, Jan_Salary, Feb_Salary, etc.'

#Data Transformation #SQL Queries #Pivot
Data Engineer System Design hard

Design a batch ETL pipeline to migrate 10TB of historical data from an on-premise Oracle database to Snowflake on Azure. What tools would you use and how would you orchestrate it?

#Azure Data Factory #Snowflake #ETL #Data Migration
Data Engineer System Design hard

Design a real-time streaming pipeline for an e-commerce client to process clickstream data and detect fraudulent transactions within seconds.

#Kafka #Spark Streaming #Real-time Processing #Fraud Detection
Data Engineer System Design medium

How do you ensure data quality and handle bad records in your ETL pipelines?

#Data Governance #ETL #Error Handling
Data Engineer System Design medium

Design a data reconciliation framework to verify that all data from a source SQL Server made it accurately to the target Snowflake data warehouse.

#Data Reconciliation #Data Quality #ETL
Data Engineer System Design hard

How would you design a Data Lake architecture for a healthcare client ensuring strict PII and HIPAA compliance?

#Data Security #Healthcare #Data Lake #Compliance
Data Engineer System Design medium

How do you handle late-arriving data in a daily batch ETL pipeline?

#Data Engineering #Batch Processing #Data Quality
Data Engineer Technical hard

How do you handle data skewness in PySpark when joining a very large fact table with a dimension table?

#PySpark #Performance Tuning #Data Skewness #Salting
Data Engineer Technical easy

What is the difference between REPARTITION and COALESCE in PySpark? When would you use one over the other?

#PySpark #Data Partitioning #Performance Optimization
Data Engineer Technical medium

Explain Slowly Changing Dimensions (SCD). How would you implement an SCD Type 2 in a data warehouse using SQL or PySpark?

#Data Warehousing #SCD #Dimensional Modeling
Data Engineer Technical hard

How does the Catalyst Optimizer work in Spark? Explain the logical and physical plan generation.

#Spark Internals #Catalyst Optimizer #Query Execution
Data Engineer Technical medium

Describe the architecture of Snowflake. What are virtual warehouses and micro-partitions?

#Snowflake #Cloud Architecture #Storage and Compute
Data Engineer Technical medium

In Azure Data Factory, how do you pass parameters dynamically between a pipeline and a dataset?

#Azure Data Factory #Parameterization #CI/CD
Data Engineer Technical medium

What is the 'small files problem' in Hadoop/Spark, and how do you resolve it?

#HDFS #Spark #Performance Tuning #File Formats
Data Engineer Technical easy

Explain the difference between a Star Schema and a Snowflake Schema. Which one performs better in a modern columnar cloud data warehouse?

#Data Warehousing #Star Schema #Snowflake Schema
Data Engineer Technical medium

What are ACID transactions, and how does Delta Lake implement them on top of cloud object storage like S3 or ADLS?

#Delta Lake #ACID #Databricks
Data Engineer Technical medium

Explain how Airflow architecture works. What are DAGs, Operators, and XComs?

#Apache Airflow #Data Orchestration #DAGs
Data Engineer Technical medium

How do Kafka consumer groups work? What happens if you have more consumers in a group than partitions in a topic?

#Apache Kafka #Streaming #Distributed Systems
Data Engineer Technical medium

What is a Factless Fact Table? Give a real-world business scenario where you would use one.

#Data Warehousing #Dimensional Modeling
Data Engineer Technical easy

What is the difference between an RDD, a DataFrame, and a Dataset in Spark?

#Spark #Data Structures
Data Engineer Technical medium

Compare AWS Glue and Amazon EMR. When would you choose one over the other for an ETL workload?

#AWS #ETL #Big Data

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now