LTIMindtree
Global technology consulting and digital solutions company.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time your data pipeline failed in production. How did you troubleshoot, resolve it, and ensure it didn't happen again?
#Incident Management
#Root Cause Analysis
#Continuous Improvement
Data Engineer
•
Behavioral
•
medium
Describe a situation where you had to explain a complex technical data architecture or pipeline issue to a non-technical stakeholder.
#Stakeholder Management
#Communication
#Business Acumen
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had to handle a sudden change in project requirements from the client right before a sprint deadline.
#Agile Methodology
#Time Management
#Client Management
Data Engineer
•
Behavioral
•
hard
Describe a time when you disagreed with a senior engineer or architect regarding the choice of a tool or design pattern for a data pipeline.
#Conflict Resolution
#Collaboration
#Technical Leadership
Data Engineer
•
Behavioral
•
easy
Why do you want to join LTIMindtree, and how does this Data Engineer role align with your long-term career goals?
#Career Goals
#Company Culture
#Self-Awareness
Data Engineer
•
Behavioral
•
medium
Tell me about a time your data pipeline failed in production. How did you troubleshoot it and communicate the delay to business stakeholders?
#Incident Management
#Communication
#Problem Solving
Data Engineer
•
Behavioral
•
easy
Describe your experience working in an Agile environment. How do you estimate story points for a complex data engineering task?
#Agile
#Scrum
#Estimation
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had to optimize a slow-running query or pipeline. What steps did you take?
#Performance Tuning
#Problem Solving
#Impact
Data Engineer
•
Coding
•
medium
Write a SQL query to find the second highest salary per department without using the LIMIT keyword.
#Window Functions
#DENSE_RANK
#Subqueries
Data Engineer
•
Coding
•
medium
Write a Python function to flatten a deeply nested JSON dictionary representing API responses.
#Recursion
#Data Structures
#JSON Parsing
Data Engineer
•
Coding
•
hard
Write a SQL query to find consecutive days where daily sales exceeded $10,000 (Gaps and Islands problem).
#Advanced SQL
#Window Functions
#Gaps and Islands
Data Engineer
•
Coding
•
easy
Given a table of employee logins, write a query to find the first and last login time for each employee per day.
#Aggregation
#GROUP BY
#Date Functions
Data Engineer
•
Coding
•
medium
Write a Python script using Pandas to read a 10GB CSV file in chunks, filter rows based on a condition, and append to a Parquet file.
#Pandas
#Memory Management
#Parquet
Data Engineer
•
Coding
•
hard
Write a SQL query to calculate the cumulative sum of revenue by month, but reset the sum at the start of each financial year (April).
#Window Functions
#Cumulative Sum
#PARTITION BY
Data Engineer
•
Coding
•
medium
Write a Python program to find the longest common prefix string amongst an array of strings.
#Strings
#Arrays
#Loops
Data Engineer
•
Coding
•
hard
Write a SQL query to find the top 3 selling products in each category, along with their percentage contribution to the category's total sales.
#Window Functions
#CTEs
#Math Operations
Data Engineer
•
System Design
•
hard
Design a scalable data ingestion framework that can handle schema evolution (e.g., new columns added, data types changed) from upstream APIs without breaking downstream reporting.
Data Engineer
•
System Design
•
medium
Design a batch processing architecture on Azure/AWS to ingest 500GB of daily log data from multiple sources, transform it, and load it into a centralized Data Lakehouse.
Data Engineer
•
System Design
•
hard
Design a real-time streaming data pipeline to process e-commerce transaction logs, detect fraudulent activities, and update a live dashboard.
Data Engineer
•
System Design
•
hard
How would you design a data pipeline to migrate on-premise legacy SQL Server database tables to a cloud data warehouse, ensuring zero data loss and handling incremental loads?
Data Engineer
•
System Design
•
medium
Design an architecture to handle Slowly Changing Dimensions (SCD Type 2) for a customer dimension table containing 50 million records in a cloud data warehouse.
Data Engineer
•
System Design
•
hard
Design a real-time fraud detection data pipeline for a banking client using the Azure or AWS stack.
#Streaming
#Kafka
#Event Hubs
#Databricks
#Architecture
Data Engineer
•
System Design
•
medium
How would you design a batch ingestion pipeline to load 500 GB of daily incremental data from an on-premise Oracle DB to Azure Data Lake?
#Azure
#Data Ingestion
#Incremental Load
#Self-Hosted Integration Runtime
Data Engineer
•
System Design
•
hard
Design a data model and pipeline for a retail client to track inventory levels across 1000+ stores in near real-time.
#Data Modeling
#Real-time Processing
#Retail Domain
Data Engineer
•
System Design
•
medium
A client wants to migrate their legacy on-premise Hadoop cluster to AWS. Walk me through your migration strategy and the AWS services you would use.
#Cloud Migration
#AWS
#Hadoop
#EMR
#S3
Data Engineer
•
Technical
•
medium
Write a SQL query to find the 3rd highest salary in each department using window functions. What happens if there are ties?
#Window Functions
#DENSE_RANK()
#CTEs
Data Engineer
•
Technical
•
medium
Explain the difference between narrow and wide transformations in PySpark. Give two examples of each and explain how they impact the DAG.
#Spark Architecture
#Transformations
#Shuffling
Data Engineer
•
Technical
•
hard
How do you handle data skewness while performing joins in PySpark? Explain the techniques you would use to optimize the job.
#Salting
#Broadcast Joins
#Spark Optimization
Data Engineer
•
Technical
•
medium
Write a Python script using Pandas or PySpark to read a large CSV file, drop rows where a specific column has null values, and write the output partitioned by date to a Parquet file.
#Data Cleaning
#File Formats
#Partitioning
Data Engineer
•
Technical
•
medium
Explain the difference between a Star Schema and a Snowflake Schema. In what scenario would you prefer Snowflake over Star?
#Dimensional Modeling
#Normalization
#Data Warehousing
Data Engineer
•
Technical
•
hard
How do you handle data skewness in PySpark when joining a massive transaction table with a customer table?
#PySpark
#Performance Tuning
#Salting
#Broadcast Joins
Data Engineer
•
Technical
•
medium
Explain the difference between Copy Activity and Mapping Data Flow in Azure Data Factory. When would you use each?
#Azure Data Factory
#ETL
#Data Integration
Data Engineer
•
Technical
•
medium
Explain the difference between repartition() and coalesce() in PySpark. When should you use which?
#PySpark
#Partitioning
#Optimization
Data Engineer
•
Technical
•
medium
How does Snowflake handle clustering, and when would you define a custom clustering key instead of relying on micro-partitions?
#Snowflake
#Performance Tuning
#Micro-partitions
Data Engineer
•
Technical
•
hard
Explain the Catalyst Optimizer in Spark. How does it generate the physical plan from a logical plan?
#Spark Internals
#Catalyst Optimizer
#Query Execution
Data Engineer
•
Technical
•
easy
What is the difference between a Star Schema and a Snowflake Schema? Which one do you prefer for a modern cloud data warehouse?
#Dimensional Modeling
#Star Schema
#Snowflake Schema
Data Engineer
•
Technical
•
hard
How do you ensure exactly-once processing semantics in a Kafka to Spark Streaming pipeline?
#Kafka
#Spark Streaming
#Exactly-Once Semantics
Data Engineer
•
Technical
•
medium
How do you pass data between tasks in Apache Airflow? Explain XComs and their limitations.
#Airflow
#XCom
#Task Dependencies
Data Engineer
•
Technical
•
medium
What is a Broadcast Join in Spark? What is the default threshold, and what happens if the broadcasted table exceeds driver memory?
#PySpark
#Joins
#Optimization
Data Engineer
•
Technical
•
easy
Explain the concept of lazy evaluation in Spark. Can you give an example of an action vs. a transformation?
#Spark Basics
#Lazy Evaluation
#Transformations vs Actions
Data Engineer
•
Technical
•
hard
Explain Time Travel and Fail-safe in Snowflake. How do they impact storage costs?
#Snowflake
#Architecture
#Cost Management
Data Engineer
•
Technical
•
medium
What is Delta Lake? Explain the ACID transaction capabilities it brings to Apache Spark.
#Databricks
#Delta Lake
#ACID Transactions
Data Engineer
•
Technical
•
medium
Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() with an example.
#Window Functions
#Ranking
Data Engineer
•
Technical
•
hard
How do you optimize a Spark job that is failing with an OutOfMemory (OOM) error during a groupByKey operation?
#PySpark
#OOM
#Optimization
#reduceByKey
Data Engineer
•
Technical
•
medium
In Azure Synapse Analytics, what is the difference between Dedicated SQL pools and Serverless SQL pools?
#Azure Synapse
#Data Warehousing
#Compute
Data Engineer
•
Technical
•
easy
How do you manage dependencies and virtual environments in a Python data engineering project?
#Python
#Environment Management
#CI/CD
Data Engineer
•
Technical
•
medium
What are accumulators in Spark? Provide a real-world use case for using them.
#Spark Internals
#Accumulators
#Shared Variables
Data Engineer
•
Technical
•
medium
What is a Slowly Changing Dimension (SCD)? Explain the difference between SCD Type 1, Type 2, and Type 3.
#Data Warehousing
#SCD
#Dimensional Modeling
Data Engineer
•
Technical
•
hard
How do you handle dynamic task generation in Airflow based on a configuration file or database table?
#Airflow
#Dynamic DAGs
#Python
Data Engineer
•
Technical
•
hard
Explain the Z-Ordering optimization in Delta Lake. When should you use it instead of partitioning?
#Databricks
#Delta Lake
#Z-Ordering
#Performance Tuning
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.