LTIMindtree

Global technology consulting and digital solutions company.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 69 Cloud Engineer 35 Data Engineer 50 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35

All Topics Big Data 10 SQL 7 System Design 4 Python 4 Data Modeling 3 Data Warehousing 2 Cloud Data Engineering 2

Data Engineer • Behavioral • medium

Tell me about a time your data pipeline failed in production. How did you troubleshoot, resolve it, and ensure it didn't happen again?

#Incident Management #Root Cause Analysis #Continuous Improvement

Practice

Data Engineer • Behavioral • medium

Describe a situation where you had to explain a complex technical data architecture or pipeline issue to a non-technical stakeholder.

#Stakeholder Management #Communication #Business Acumen

Practice

Data Engineer • Behavioral • medium

Tell me about a time you had to handle a sudden change in project requirements from the client right before a sprint deadline.

#Agile Methodology #Time Management #Client Management

Practice

Data Engineer • Behavioral • hard

Describe a time when you disagreed with a senior engineer or architect regarding the choice of a tool or design pattern for a data pipeline.

#Conflict Resolution #Collaboration #Technical Leadership

Practice

Data Engineer • Behavioral • easy

Why do you want to join LTIMindtree, and how does this Data Engineer role align with your long-term career goals?

#Career Goals #Company Culture #Self-Awareness

Practice

Data Engineer • Behavioral • medium

Tell me about a time your data pipeline failed in production. How did you troubleshoot it and communicate the delay to business stakeholders?

#Incident Management #Communication #Problem Solving

Practice

Data Engineer • Behavioral • easy

Describe your experience working in an Agile environment. How do you estimate story points for a complex data engineering task?

#Agile #Scrum #Estimation

Practice

Data Engineer • Behavioral • medium

Tell me about a time you had to optimize a slow-running query or pipeline. What steps did you take?

#Performance Tuning #Problem Solving #Impact

Practice

Data Engineer • Coding • medium

Write a SQL query to find the second highest salary per department without using the LIMIT keyword.

#Window Functions #DENSE_RANK #Subqueries

Practice

Data Engineer • Coding • medium

Write a Python function to flatten a deeply nested JSON dictionary representing API responses.

#Recursion #Data Structures #JSON Parsing

Practice

Data Engineer • Coding • hard

Write a SQL query to find consecutive days where daily sales exceeded $10,000 (Gaps and Islands problem).

#Advanced SQL #Window Functions #Gaps and Islands

Practice

Data Engineer • Coding • easy

Given a table of employee logins, write a query to find the first and last login time for each employee per day.

#Aggregation #GROUP BY #Date Functions

Practice

Data Engineer • Coding • medium

Write a Python script using Pandas to read a 10GB CSV file in chunks, filter rows based on a condition, and append to a Parquet file.

#Pandas #Memory Management #Parquet

Practice

Data Engineer • Coding • hard

Write a SQL query to calculate the cumulative sum of revenue by month, but reset the sum at the start of each financial year (April).

#Window Functions #Cumulative Sum #PARTITION BY

Practice

Data Engineer • Coding • medium

Write a Python program to find the longest common prefix string amongst an array of strings.

#Strings #Arrays #Loops

Practice

Data Engineer • Coding • hard

Write a SQL query to find the top 3 selling products in each category, along with their percentage contribution to the category's total sales.

#Window Functions #CTEs #Math Operations

Practice

Data Engineer • System Design • hard

Design a scalable data ingestion framework that can handle schema evolution (e.g., new columns added, data types changed) from upstream APIs without breaking downstream reporting.

Practice

Data Engineer • System Design • medium

Design a batch processing architecture on Azure/AWS to ingest 500GB of daily log data from multiple sources, transform it, and load it into a centralized Data Lakehouse.

Practice

Data Engineer • System Design • hard

Design a real-time streaming data pipeline to process e-commerce transaction logs, detect fraudulent activities, and update a live dashboard.

Practice

Data Engineer • System Design • hard

How would you design a data pipeline to migrate on-premise legacy SQL Server database tables to a cloud data warehouse, ensuring zero data loss and handling incremental loads?

Practice

Data Engineer • System Design • medium

Design an architecture to handle Slowly Changing Dimensions (SCD Type 2) for a customer dimension table containing 50 million records in a cloud data warehouse.

Practice

Data Engineer • System Design • hard

Design a real-time fraud detection data pipeline for a banking client using the Azure or AWS stack.

#Streaming #Kafka #Event Hubs #Databricks #Architecture

Practice

Data Engineer • System Design • medium

How would you design a batch ingestion pipeline to load 500 GB of daily incremental data from an on-premise Oracle DB to Azure Data Lake?

#Azure #Data Ingestion #Incremental Load #Self-Hosted Integration Runtime

Practice

Data Engineer • System Design • hard

Design a data model and pipeline for a retail client to track inventory levels across 1000+ stores in near real-time.

#Data Modeling #Real-time Processing #Retail Domain

Practice

Data Engineer • System Design • medium

A client wants to migrate their legacy on-premise Hadoop cluster to AWS. Walk me through your migration strategy and the AWS services you would use.

#Cloud Migration #AWS #Hadoop #EMR #S3

Practice

Data Engineer • Technical • medium

Write a SQL query to find the 3rd highest salary in each department using window functions. What happens if there are ties?

#Window Functions #DENSE_RANK() #CTEs

Practice

Data Engineer • Technical • medium

Explain the difference between narrow and wide transformations in PySpark. Give two examples of each and explain how they impact the DAG.

#Spark Architecture #Transformations #Shuffling

Practice

Data Engineer • Technical • hard

How do you handle data skewness while performing joins in PySpark? Explain the techniques you would use to optimize the job.

#Salting #Broadcast Joins #Spark Optimization

Practice

Data Engineer • Technical • medium

Write a Python script using Pandas or PySpark to read a large CSV file, drop rows where a specific column has null values, and write the output partitioned by date to a Parquet file.

#Data Cleaning #File Formats #Partitioning

Practice

Data Engineer • Technical • medium

Explain the difference between a Star Schema and a Snowflake Schema. In what scenario would you prefer Snowflake over Star?

#Dimensional Modeling #Normalization #Data Warehousing

Practice

Data Engineer • Technical • hard

How do you handle data skewness in PySpark when joining a massive transaction table with a customer table?

#PySpark #Performance Tuning #Salting #Broadcast Joins

Practice

Data Engineer • Technical • medium

Explain the difference between Copy Activity and Mapping Data Flow in Azure Data Factory. When would you use each?

#Azure Data Factory #ETL #Data Integration

Practice

Data Engineer • Technical • medium

Explain the difference between repartition() and coalesce() in PySpark. When should you use which?

#PySpark #Partitioning #Optimization

Practice

Data Engineer • Technical • medium

How does Snowflake handle clustering, and when would you define a custom clustering key instead of relying on micro-partitions?

#Snowflake #Performance Tuning #Micro-partitions

Practice

Data Engineer • Technical • hard

Explain the Catalyst Optimizer in Spark. How does it generate the physical plan from a logical plan?

#Spark Internals #Catalyst Optimizer #Query Execution

Practice

Data Engineer • Technical • easy

What is the difference between a Star Schema and a Snowflake Schema? Which one do you prefer for a modern cloud data warehouse?

#Dimensional Modeling #Star Schema #Snowflake Schema

Practice

Data Engineer • Technical • hard

How do you ensure exactly-once processing semantics in a Kafka to Spark Streaming pipeline?

#Kafka #Spark Streaming #Exactly-Once Semantics

Practice

Data Engineer • Technical • medium

How do you pass data between tasks in Apache Airflow? Explain XComs and their limitations.

#Airflow #XCom #Task Dependencies

Practice

Data Engineer • Technical • medium

What is a Broadcast Join in Spark? What is the default threshold, and what happens if the broadcasted table exceeds driver memory?

#PySpark #Joins #Optimization

Practice

Data Engineer • Technical • easy

Explain the concept of lazy evaluation in Spark. Can you give an example of an action vs. a transformation?

#Spark Basics #Lazy Evaluation #Transformations vs Actions

Practice

Data Engineer • Technical • hard

Explain Time Travel and Fail-safe in Snowflake. How do they impact storage costs?

#Snowflake #Architecture #Cost Management

Practice

Data Engineer • Technical • medium

What is Delta Lake? Explain the ACID transaction capabilities it brings to Apache Spark.

#Databricks #Delta Lake #ACID Transactions

Practice

Data Engineer • Technical • medium

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() with an example.

#Window Functions #Ranking

Practice

Data Engineer • Technical • hard

How do you optimize a Spark job that is failing with an OutOfMemory (OOM) error during a groupByKey operation?

#PySpark #OOM #Optimization #reduceByKey

Practice

Data Engineer • Technical • medium

In Azure Synapse Analytics, what is the difference between Dedicated SQL pools and Serverless SQL pools?

#Azure Synapse #Data Warehousing #Compute

Practice

Data Engineer • Technical • easy

How do you manage dependencies and virtual environments in a Python data engineering project?

#Python #Environment Management #CI/CD

Practice

Data Engineer • Technical • medium

What are accumulators in Spark? Provide a real-world use case for using them.

#Spark Internals #Accumulators #Shared Variables

Practice

Data Engineer • Technical • medium

What is a Slowly Changing Dimension (SCD)? Explain the difference between SCD Type 1, Type 2, and Type 3.

#Data Warehousing #SCD #Dimensional Modeling

Practice

Data Engineer • Technical • hard

How do you handle dynamic task generation in Airflow based on a configuration file or database table?

#Airflow #Dynamic DAGs #Python

Practice

Data Engineer • Technical • hard

Explain the Z-Ordering optimization in Delta Lake. When should you use it instead of partitioning?

#Databricks #Delta Lake #Z-Ordering #Performance Tuning

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now