Tech Mahindra

Multinational IT services and consulting company.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics Big Data 7 SQL 6 System Design 5 Python 3 Culture Fit 3 Leadership 2 Tools 2 Data Warehousing 2

Data Engineer • Behavioral • medium

Tell me about a time you had to push back on a client or stakeholder who demanded an unrealistic deadline for a data delivery.

#Stakeholder Management #Communication

Practice

Data Engineer • Behavioral • medium

How do you handle a situation where the upstream source system changes its data format (e.g., adding/removing columns) without notifying the data engineering team?

#Problem Solving #Resilience

Practice

Data Engineer • Behavioral • hard

Describe the most challenging bug you have faced in a production data pipeline. How did you troubleshoot and resolve it?

#Troubleshooting #Experience

Practice

Data Engineer • Behavioral • easy

Working at an IT services company often means handling multiple client deliverables at once. How do you prioritize your tasks?

#Time Management #Prioritization

Practice

Data Engineer • Behavioral • medium

Explain the concept of 'Data Partitioning' to a non-technical business stakeholder.

#Communication #Mentoring

Practice

Data Engineer • Coding • medium

Write a SQL query to find the 3rd highest salary from an Employee table without using the LIMIT keyword.

#Subqueries #Correlated Queries #Window Functions

Practice

Data Engineer • Coding • medium

Given a table of Telecom Call Detail Records (CDRs), write a SQL query to calculate the rolling 7-day cumulative data usage for each specific user.

#Window Functions #Time Series #Data Aggregation

Practice

Data Engineer • Coding • medium

How do you find and delete duplicate records in a massive SQL table without creating a temporary table?

#Data Cleansing #CTEs #DELETE Statements

Practice

Data Engineer • Coding • medium

Write a Python script using Pandas or PySpark to read a 10GB CSV file, drop rows where the 'customer_id' is null, and write the output partitioned by 'region' into Parquet format.

#Data I/O #Data Cleaning #Partitioning

Practice

Data Engineer • Coding • easy

Write a Python function to find the first non-repeating character in a given string. Optimize it for time complexity.

#Strings #Hash Maps

Practice

Data Engineer • Coding • medium

You need to extract data from a third-party REST API for a client project. The API limits responses to 100 records per request. Write a Python snippet to handle pagination and extract all records.

#API Integration #Pagination #Requests Library

Practice

Data Engineer • Coding • easy

Given a list of dictionaries representing employee data (id, name, department), write Python code to group the employees by department.

#Data Manipulation #Dictionaries #Collections

Practice

Data Engineer • Coding • hard

Write a PySpark snippet to merge new incoming data into an existing Delta Lake table, updating existing records and inserting new ones (Upsert).

#Delta Lake #PySpark #Upserts

Practice

Data Engineer • Coding • medium

Write a SQL query to pivot a table containing 'Year', 'Month', and 'Revenue' so that each Month becomes a column with the corresponding Revenue.

#Pivot #Data Transformation

Practice

Data Engineer • Coding • easy

Write a Python script to connect to an AWS S3 bucket, list all files with a '.json' extension, and print their sizes.

#Boto3 #AWS #Scripting

Practice

Data Engineer • System Design • medium

Design an ETL pipeline on AWS to ingest daily Call Detail Records (CDRs) from an SFTP server, transform them, and load them into Redshift for reporting.

#AWS #ETL Architecture #Data Warehousing

Practice

Data Engineer • System Design • hard

Design a real-time streaming pipeline to process IoT sensor data from manufacturing plants, detect anomalies, and store the results.

#Streaming #Kafka #Spark Streaming #NoSQL

Practice

Data Engineer • System Design • hard

A healthcare client wants to move from a traditional data warehouse to a Data Lakehouse architecture. How would you design this using Databricks?

#Data Lakehouse #Databricks #Medallion Architecture

Practice

Data Engineer • System Design • hard

Design a batch processing pipeline to ingest 500GB of transactional data daily. How do you handle incremental loads?

#Batch Processing #Incremental Load #Architecture

Practice

Data Engineer • Technical • medium

How do you schedule and monitor your data pipelines? Explain the core components of Apache Airflow.

#Airflow #Orchestration

Practice

Data Engineer • Technical • easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). Provide a scenario where you would specifically choose DENSE_RANK() over RANK().

#Window Functions #Data Ranking

Practice

Data Engineer • Technical • hard

A client complains that a critical reporting query taking data from a 50-million row table is running too slow. Walk me through your step-by-step approach to optimize it.

#Query Optimization #Indexing #Execution Plans

Practice

Data Engineer • Technical • medium

Explain the internal architecture of Apache Spark. What happens under the hood when you submit a Spark job?

#Spark Architecture #Driver #Executors #Cluster Manager

Practice

Data Engineer • Technical • hard

During a data migration project, your PySpark job is running extremely slow and some tasks are taking much longer than others. How do you identify and resolve data skewness?

#Performance Tuning #Data Skew #Salting

Practice

Data Engineer • Technical • medium

What is the difference between a Broadcast Hash Join and a Sort Merge Join in Spark? When would you force a Broadcast join?

#Spark Joins #Optimization

Practice

Data Engineer • Technical • easy

Explain the concept of Lazy Evaluation in Spark. Why is it beneficial for performance?

#Spark Core #Transformations vs Actions

Practice

Data Engineer • Technical • hard

Your Spark job fails with an OutOfMemory (OOM) error on the executor side. What parameters would you tweak or what code changes would you make?

#Troubleshooting #Memory Management #Spark Configuration

Practice

Data Engineer • Technical • medium

In Azure Data Factory (ADF), how do you design a dynamic pipeline that can copy data from 50 different on-premise SQL Server tables to Azure Data Lake without creating 50 separate copy activities?

#Azure Data Factory #Dynamic Pipelines #Metadata Driven ETL

Practice

Data Engineer • Technical • medium

Explain the architecture of Snowflake. How does its separation of storage and compute benefit a multi-tenant client environment?

#Snowflake #Cloud Architecture

Practice

Data Engineer • Technical • medium

What is Slowly Changing Dimension (SCD) Type 2? Explain how you would implement it in a data warehouse.

#Dimensional Modeling #SCD

Practice

Data Engineer • Technical • easy

Compare Star Schema and Snowflake Schema. If a client prioritizes query read performance over storage space, which would you recommend and why?

#Data Modeling #Schema Design

Practice

Data Engineer • Technical • medium

How do you ensure data quality and integrity in your ETL pipelines? What specific checks do you automate?

#Data Validation #Testing

Practice

Data Engineer • Technical • easy

Explain your Git workflow for deploying data engineering code across Development, QA, and Production environments.

#CI/CD #Version Control

Practice

Data Engineer • Technical • medium

What is 'Idempotency' in the context of data engineering? Why is it critical for data pipelines?

#Pipeline Design #Reliability

Practice

Data Engineer • Technical • easy

In PySpark, what is the difference between repartition() and coalesce()? When should you use which?

#PySpark #Partitioning

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now