TCS

Large multinational IT services and consulting enterprise based in India.

3 Rounds ~14 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 34 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics Big Data 11 System Design 6 Culture Fit 5 Cloud Engineering 4 SQL 3 Data Modeling 3 Python 2 Leadership 1

Data Engineer • Behavioral • medium

Tell me about a time when a client changed the requirements of a data pipeline in the middle of a sprint. How did you handle it?

#Agile #Client Management #Adaptability

Data Engineer • Behavioral • hard

A critical production data pipeline fails at 2 AM on a Saturday, and the client SLA is at risk. Walk me through your immediate actions.

#Incident Management #Communication #Problem Solving

Data Engineer • Behavioral • medium

How do you explain a complex technical issue, like a Spark data skew causing pipeline delays, to a non-technical business stakeholder?

#Communication #Stakeholder Management

Data Engineer • Behavioral • medium

Describe a situation where you had a disagreement with the QA/Testing team regarding a bug they raised in your ETL code. How was it resolved?

#Conflict Resolution #Teamwork

Data Engineer • Behavioral • easy

TCS places a heavy emphasis on continuous learning through platforms like Elevate and Xplore. How do you keep your technical skills updated in the rapidly changing Data Engineering landscape?

#Continuous Learning #TCS Values

Data Engineer • Behavioral • easy

Why do you want to join TCS as a Data Engineer, and how does your previous experience align with our focus on enterprise-scale digital transformations?

#Motivation #Company Knowledge

Data Engineer • Coding • medium

Write a SQL query to find the second highest salary in each department without using the MAX() function.

#SQL #Window Functions #Aggregations

Data Engineer • Coding • medium

How do you delete duplicate records from a massive table in SQL while keeping exactly one instance of each duplicate?

#SQL #Data Cleansing #CTEs

Data Engineer • Coding • medium

Write a Python generator function to read a 50GB CSV file chunk by chunk to prevent Out-Of-Memory (OOM) errors.

#Python #Memory Management #Generators

Data Engineer • Coding • medium

Write a Python decorator that calculates and logs the execution time of any data transformation function it is applied to.

#Python #Decorators #Logging

Data Engineer • Coding • medium

Write a PySpark snippet to flatten a deeply nested JSON structure containing arrays of structs.

#PySpark #Data Manipulation #JSON

Data Engineer • Coding • medium

Explain Slowly Changing Dimension (SCD) Type 2. Write a conceptual SQL merge statement to implement it.

#Data Modeling #Data Warehousing #SQL

Data Engineer • Coding • medium

Write a PySpark script to read a CSV file, drop rows with null values in the 'customer_id' column, fill nulls in 'age' with the average age, and write to Parquet.

#PySpark #Data Cleansing #Coding

Data Engineer • Coding • medium

What are Window functions in PySpark? Write a PySpark code snippet to calculate the running total of sales per region ordered by date.

#PySpark #Window Functions #Coding

Data Engineer • System Design • medium

Design an Azure Data Factory (ADF) pipeline to perform a daily incremental load (Delta load) from an on-premise SQL Server to Azure Synapse Analytics.

#Azure Data Factory #ETL #Incremental Load

Data Engineer • System Design • medium

Explain the Medallion Architecture in Databricks. What specific transformations happen between the Bronze, Silver, and Gold layers?

#Databricks #Data Lakehouse #Architecture

Data Engineer • System Design • hard

Design a data migration strategy for a client moving 500TB of historical data from an on-premise Hadoop cluster to AWS S3 / Azure Data Lake.

#Cloud Migration #AWS #Azure #Architecture

Data Engineer • System Design • hard

Design an end-to-end data pipeline for a retail client that generates 10TB of POS transaction data daily. The data needs to be available for BI reporting by 8 AM every day.

#ETL #Batch Processing #Architecture

Data Engineer • System Design • hard

How do you handle 'late-arriving data' in a daily batch ETL pipeline? For example, data from Monday arrives on Wednesday.

#ETL #Data Quality #Architecture

Data Engineer • System Design • medium

Explain the difference between Lambda and Kappa architectures. Which one is more prevalent in modern cloud data engineering?

#Architecture #Streaming #Batch Processing

Data Engineer • Technical • medium

Explain the difference between a CTE and a Temporary Table. In a scenario where you are processing 50 million rows for a client report, which one would you choose and why?

#SQL #Performance Tuning #Database Architecture

Data Engineer • Technical • medium

In PySpark, what is the difference between a Broadcast Hash Join and a Sort Merge Join? When does Spark automatically choose a Broadcast Join?

#PySpark #Joins #Optimization

Data Engineer • Technical • hard

You are joining a massive fact table with a dimension table in PySpark, and the job is stuck at 99% on the last task. How do you identify and resolve this issue?

#PySpark #Data Skewness #Performance Tuning

Data Engineer • Technical • easy

Explain the exact difference between repartition() and coalesce() in PySpark. If you are writing data to an Azure Data Lake, which one do you use to reduce the number of output files?

#PySpark #Partitioning #Shuffling

Data Engineer • Technical • hard

Your PySpark ETL job running on an AWS EMR cluster is failing with an 'Out Of Memory (OOM)' error. Walk me through your step-by-step debugging process.

#PySpark #Troubleshooting #Memory Management

Data Engineer • Technical • medium

What is the difference between Cache() and Persist() in Spark? What are the different storage levels available?

#PySpark #Caching #Optimization

Data Engineer • Technical • medium

Why are Python UDFs (User Defined Functions) considered bad for performance in PySpark, and what is the modern alternative?

#PySpark #UDFs #Optimization

Data Engineer • Technical • hard

How do you optimize a slow-running Copy Activity in Azure Data Factory that is transferring 5TB of data?

#Azure Data Factory #Performance Tuning

Data Engineer • Technical • hard

How does Delta Lake implement ACID transactions on top of cloud object storage (like S3 or ADLS)?

#Databricks #Delta Lake #ACID

Data Engineer • Technical • medium

In Snowflake, what are micro-partitions and how do clustering keys improve query performance on large tables?

#Snowflake #Architecture #Performance Tuning

Data Engineer • Technical • easy

Explain the concept of 'Time Travel' in Snowflake or Delta Lake. How would you recover a table that was accidentally dropped yesterday?

#Snowflake #Delta Lake #Data Recovery

Data Engineer • Technical • hard

How do you achieve exactly-once processing semantics in an Apache Kafka to Spark Structured Streaming pipeline?

#Kafka #Spark Streaming #Architecture

Data Engineer • Technical • medium

You notice a high consumer lag in your Kafka topic. What are the potential reasons, and how would you resolve it?

#Kafka #Troubleshooting #Streaming

Data Engineer • Technical • medium

What is a Factless Fact Table? Give a real-world business scenario where you would use one.

#Data Modeling #Data Warehousing

Data Engineer • Technical • easy

Compare Star Schema and Snowflake Schema. If a client prioritizes fast read performance for a BI dashboard, which one do you recommend and why?

#Data Modeling #Data Warehousing

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.