Tech Mahindra

Tech Mahindra

Multinational IT services and consulting company.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time you had to push back on a client or stakeholder who demanded an unrealistic deadline for a data delivery.

#Stakeholder Management #Communication
Data Engineer Behavioral medium

How do you handle a situation where the upstream source system changes its data format (e.g., adding/removing columns) without notifying the data engineering team?

#Problem Solving #Resilience
Data Engineer Behavioral hard

Describe the most challenging bug you have faced in a production data pipeline. How did you troubleshoot and resolve it?

#Troubleshooting #Experience
Data Engineer Behavioral easy

Working at an IT services company often means handling multiple client deliverables at once. How do you prioritize your tasks?

#Time Management #Prioritization
Data Engineer Behavioral medium

Explain the concept of 'Data Partitioning' to a non-technical business stakeholder.

#Communication #Mentoring
Data Engineer Coding medium

Write a SQL query to find the 3rd highest salary from an Employee table without using the LIMIT keyword.

#Subqueries #Correlated Queries #Window Functions
Data Engineer Coding medium

Given a table of Telecom Call Detail Records (CDRs), write a SQL query to calculate the rolling 7-day cumulative data usage for each specific user.

#Window Functions #Time Series #Data Aggregation
Data Engineer Coding medium

How do you find and delete duplicate records in a massive SQL table without creating a temporary table?

#Data Cleansing #CTEs #DELETE Statements
Data Engineer Coding medium

Write a Python script using Pandas or PySpark to read a 10GB CSV file, drop rows where the 'customer_id' is null, and write the output partitioned by 'region' into Parquet format.

#Data I/O #Data Cleaning #Partitioning
Data Engineer Coding easy

Write a Python function to find the first non-repeating character in a given string. Optimize it for time complexity.

#Strings #Hash Maps
Data Engineer Coding medium

You need to extract data from a third-party REST API for a client project. The API limits responses to 100 records per request. Write a Python snippet to handle pagination and extract all records.

#API Integration #Pagination #Requests Library
Data Engineer Coding easy

Given a list of dictionaries representing employee data (id, name, department), write Python code to group the employees by department.

#Data Manipulation #Dictionaries #Collections
Data Engineer Coding hard

Write a PySpark snippet to merge new incoming data into an existing Delta Lake table, updating existing records and inserting new ones (Upsert).

#Delta Lake #PySpark #Upserts
Data Engineer Coding medium

Write a SQL query to pivot a table containing 'Year', 'Month', and 'Revenue' so that each Month becomes a column with the corresponding Revenue.

#Pivot #Data Transformation
Data Engineer Coding easy

Write a Python script to connect to an AWS S3 bucket, list all files with a '.json' extension, and print their sizes.

#Boto3 #AWS #Scripting
Data Engineer System Design medium

Design an ETL pipeline on AWS to ingest daily Call Detail Records (CDRs) from an SFTP server, transform them, and load them into Redshift for reporting.

#AWS #ETL Architecture #Data Warehousing
Data Engineer System Design hard

Design a real-time streaming pipeline to process IoT sensor data from manufacturing plants, detect anomalies, and store the results.

#Streaming #Kafka #Spark Streaming #NoSQL
Data Engineer System Design hard

A healthcare client wants to move from a traditional data warehouse to a Data Lakehouse architecture. How would you design this using Databricks?

#Data Lakehouse #Databricks #Medallion Architecture
Data Engineer System Design hard

Design a batch processing pipeline to ingest 500GB of transactional data daily. How do you handle incremental loads?

#Batch Processing #Incremental Load #Architecture
Data Engineer Technical medium

How do you schedule and monitor your data pipelines? Explain the core components of Apache Airflow.

#Airflow #Orchestration
Data Engineer Technical easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). Provide a scenario where you would specifically choose DENSE_RANK() over RANK().

#Window Functions #Data Ranking
Data Engineer Technical hard

A client complains that a critical reporting query taking data from a 50-million row table is running too slow. Walk me through your step-by-step approach to optimize it.

#Query Optimization #Indexing #Execution Plans
Data Engineer Technical medium

Explain the internal architecture of Apache Spark. What happens under the hood when you submit a Spark job?

#Spark Architecture #Driver #Executors #Cluster Manager
Data Engineer Technical hard

During a data migration project, your PySpark job is running extremely slow and some tasks are taking much longer than others. How do you identify and resolve data skewness?

#Performance Tuning #Data Skew #Salting
Data Engineer Technical medium

What is the difference between a Broadcast Hash Join and a Sort Merge Join in Spark? When would you force a Broadcast join?

#Spark Joins #Optimization
Data Engineer Technical easy

Explain the concept of Lazy Evaluation in Spark. Why is it beneficial for performance?

#Spark Core #Transformations vs Actions
Data Engineer Technical hard

Your Spark job fails with an OutOfMemory (OOM) error on the executor side. What parameters would you tweak or what code changes would you make?

#Troubleshooting #Memory Management #Spark Configuration
Data Engineer Technical medium

In Azure Data Factory (ADF), how do you design a dynamic pipeline that can copy data from 50 different on-premise SQL Server tables to Azure Data Lake without creating 50 separate copy activities?

#Azure Data Factory #Dynamic Pipelines #Metadata Driven ETL
Data Engineer Technical medium

Explain the architecture of Snowflake. How does its separation of storage and compute benefit a multi-tenant client environment?

#Snowflake #Cloud Architecture
Data Engineer Technical medium

What is Slowly Changing Dimension (SCD) Type 2? Explain how you would implement it in a data warehouse.

#Dimensional Modeling #SCD
Data Engineer Technical easy

Compare Star Schema and Snowflake Schema. If a client prioritizes query read performance over storage space, which would you recommend and why?

#Data Modeling #Schema Design
Data Engineer Technical medium

How do you ensure data quality and integrity in your ETL pipelines? What specific checks do you automate?

#Data Validation #Testing
Data Engineer Technical easy

Explain your Git workflow for deploying data engineering code across Development, QA, and Production environments.

#CI/CD #Version Control
Data Engineer Technical medium

What is 'Idempotency' in the context of data engineering? Why is it critical for data pipelines?

#Pipeline Design #Reliability
Data Engineer Technical easy

In PySpark, what is the difference between repartition() and coalesce()? When should you use which?

#PySpark #Partitioning

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now