TCS

TCS

Large multinational IT services and consulting enterprise based in India.

3 Rounds ~14 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time when a client changed the requirements of a data pipeline in the middle of a sprint. How did you handle it?

#Agile #Client Management #Adaptability
Data Engineer Behavioral hard

A critical production data pipeline fails at 2 AM on a Saturday, and the client SLA is at risk. Walk me through your immediate actions.

#Incident Management #Communication #Problem Solving
Data Engineer Behavioral medium

How do you explain a complex technical issue, like a Spark data skew causing pipeline delays, to a non-technical business stakeholder?

#Communication #Stakeholder Management
Data Engineer Behavioral medium

Describe a situation where you had a disagreement with the QA/Testing team regarding a bug they raised in your ETL code. How was it resolved?

#Conflict Resolution #Teamwork
Data Engineer Behavioral easy

TCS places a heavy emphasis on continuous learning through platforms like Elevate and Xplore. How do you keep your technical skills updated in the rapidly changing Data Engineering landscape?

#Continuous Learning #TCS Values
Data Engineer Behavioral easy

Why do you want to join TCS as a Data Engineer, and how does your previous experience align with our focus on enterprise-scale digital transformations?

#Motivation #Company Knowledge
Data Engineer Coding medium

Write a SQL query to find the second highest salary in each department without using the MAX() function.

#SQL #Window Functions #Aggregations
Data Engineer Coding medium

How do you delete duplicate records from a massive table in SQL while keeping exactly one instance of each duplicate?

#SQL #Data Cleansing #CTEs
Data Engineer Coding medium

Write a Python generator function to read a 50GB CSV file chunk by chunk to prevent Out-Of-Memory (OOM) errors.

#Python #Memory Management #Generators
Data Engineer Coding medium

Write a Python decorator that calculates and logs the execution time of any data transformation function it is applied to.

#Python #Decorators #Logging
Data Engineer Coding medium

Write a PySpark snippet to flatten a deeply nested JSON structure containing arrays of structs.

#PySpark #Data Manipulation #JSON
Data Engineer Coding medium

Explain Slowly Changing Dimension (SCD) Type 2. Write a conceptual SQL merge statement to implement it.

#Data Modeling #Data Warehousing #SQL
Data Engineer Coding medium

Write a PySpark script to read a CSV file, drop rows with null values in the 'customer_id' column, fill nulls in 'age' with the average age, and write to Parquet.

#PySpark #Data Cleansing #Coding
Data Engineer Coding medium

What are Window functions in PySpark? Write a PySpark code snippet to calculate the running total of sales per region ordered by date.

#PySpark #Window Functions #Coding
Data Engineer System Design medium

Design an Azure Data Factory (ADF) pipeline to perform a daily incremental load (Delta load) from an on-premise SQL Server to Azure Synapse Analytics.

#Azure Data Factory #ETL #Incremental Load
Data Engineer System Design medium

Explain the Medallion Architecture in Databricks. What specific transformations happen between the Bronze, Silver, and Gold layers?

#Databricks #Data Lakehouse #Architecture
Data Engineer System Design hard

Design a data migration strategy for a client moving 500TB of historical data from an on-premise Hadoop cluster to AWS S3 / Azure Data Lake.

#Cloud Migration #AWS #Azure #Architecture
Data Engineer System Design hard

Design an end-to-end data pipeline for a retail client that generates 10TB of POS transaction data daily. The data needs to be available for BI reporting by 8 AM every day.

#ETL #Batch Processing #Architecture
Data Engineer System Design hard

How do you handle 'late-arriving data' in a daily batch ETL pipeline? For example, data from Monday arrives on Wednesday.

#ETL #Data Quality #Architecture
Data Engineer System Design medium

Explain the difference between Lambda and Kappa architectures. Which one is more prevalent in modern cloud data engineering?

#Architecture #Streaming #Batch Processing
Data Engineer Technical medium

Explain the difference between a CTE and a Temporary Table. In a scenario where you are processing 50 million rows for a client report, which one would you choose and why?

#SQL #Performance Tuning #Database Architecture
Data Engineer Technical medium

In PySpark, what is the difference between a Broadcast Hash Join and a Sort Merge Join? When does Spark automatically choose a Broadcast Join?

#PySpark #Joins #Optimization
Data Engineer Technical hard

You are joining a massive fact table with a dimension table in PySpark, and the job is stuck at 99% on the last task. How do you identify and resolve this issue?

#PySpark #Data Skewness #Performance Tuning
Data Engineer Technical easy

Explain the exact difference between repartition() and coalesce() in PySpark. If you are writing data to an Azure Data Lake, which one do you use to reduce the number of output files?

#PySpark #Partitioning #Shuffling
Data Engineer Technical hard

Your PySpark ETL job running on an AWS EMR cluster is failing with an 'Out Of Memory (OOM)' error. Walk me through your step-by-step debugging process.

#PySpark #Troubleshooting #Memory Management
Data Engineer Technical medium

What is the difference between Cache() and Persist() in Spark? What are the different storage levels available?

#PySpark #Caching #Optimization
Data Engineer Technical medium

Why are Python UDFs (User Defined Functions) considered bad for performance in PySpark, and what is the modern alternative?

#PySpark #UDFs #Optimization
Data Engineer Technical hard

How do you optimize a slow-running Copy Activity in Azure Data Factory that is transferring 5TB of data?

#Azure Data Factory #Performance Tuning
Data Engineer Technical hard

How does Delta Lake implement ACID transactions on top of cloud object storage (like S3 or ADLS)?

#Databricks #Delta Lake #ACID
Data Engineer Technical medium

In Snowflake, what are micro-partitions and how do clustering keys improve query performance on large tables?

#Snowflake #Architecture #Performance Tuning
Data Engineer Technical easy

Explain the concept of 'Time Travel' in Snowflake or Delta Lake. How would you recover a table that was accidentally dropped yesterday?

#Snowflake #Delta Lake #Data Recovery
Data Engineer Technical hard

How do you achieve exactly-once processing semantics in an Apache Kafka to Spark Structured Streaming pipeline?

#Kafka #Spark Streaming #Architecture
Data Engineer Technical medium

You notice a high consumer lag in your Kafka topic. What are the potential reasons, and how would you resolve it?

#Kafka #Troubleshooting #Streaming
Data Engineer Technical medium

What is a Factless Fact Table? Give a real-world business scenario where you would use one.

#Data Modeling #Data Warehousing
Data Engineer Technical easy

Compare Star Schema and Snowflake Schema. If a client prioritizes fast read performance for a BI dashboard, which one do you recommend and why?

#Data Modeling #Data Warehousing

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now