TCS
Large multinational IT services and consulting enterprise based in India.
3 Rounds
~14 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time when a client changed the requirements of a data pipeline in the middle of a sprint. How did you handle it?
#Agile
#Client Management
#Adaptability
Data Engineer
•
Behavioral
•
hard
A critical production data pipeline fails at 2 AM on a Saturday, and the client SLA is at risk. Walk me through your immediate actions.
#Incident Management
#Communication
#Problem Solving
Data Engineer
•
Behavioral
•
medium
How do you explain a complex technical issue, like a Spark data skew causing pipeline delays, to a non-technical business stakeholder?
#Communication
#Stakeholder Management
Data Engineer
•
Behavioral
•
medium
Describe a situation where you had a disagreement with the QA/Testing team regarding a bug they raised in your ETL code. How was it resolved?
#Conflict Resolution
#Teamwork
Data Engineer
•
Behavioral
•
easy
TCS places a heavy emphasis on continuous learning through platforms like Elevate and Xplore. How do you keep your technical skills updated in the rapidly changing Data Engineering landscape?
#Continuous Learning
#TCS Values
Data Engineer
•
Behavioral
•
easy
Why do you want to join TCS as a Data Engineer, and how does your previous experience align with our focus on enterprise-scale digital transformations?
#Motivation
#Company Knowledge
Data Engineer
•
Coding
•
medium
Write a SQL query to find the second highest salary in each department without using the MAX() function.
#SQL
#Window Functions
#Aggregations
Data Engineer
•
Coding
•
medium
How do you delete duplicate records from a massive table in SQL while keeping exactly one instance of each duplicate?
#SQL
#Data Cleansing
#CTEs
Data Engineer
•
Coding
•
medium
Write a Python generator function to read a 50GB CSV file chunk by chunk to prevent Out-Of-Memory (OOM) errors.
#Python
#Memory Management
#Generators
Data Engineer
•
Coding
•
medium
Write a Python decorator that calculates and logs the execution time of any data transformation function it is applied to.
#Python
#Decorators
#Logging
Data Engineer
•
Coding
•
medium
Write a PySpark snippet to flatten a deeply nested JSON structure containing arrays of structs.
#PySpark
#Data Manipulation
#JSON
Data Engineer
•
Coding
•
medium
Explain Slowly Changing Dimension (SCD) Type 2. Write a conceptual SQL merge statement to implement it.
#Data Modeling
#Data Warehousing
#SQL
Data Engineer
•
Coding
•
medium
Write a PySpark script to read a CSV file, drop rows with null values in the 'customer_id' column, fill nulls in 'age' with the average age, and write to Parquet.
#PySpark
#Data Cleansing
#Coding
Data Engineer
•
Coding
•
medium
What are Window functions in PySpark? Write a PySpark code snippet to calculate the running total of sales per region ordered by date.
#PySpark
#Window Functions
#Coding
Data Engineer
•
System Design
•
medium
Design an Azure Data Factory (ADF) pipeline to perform a daily incremental load (Delta load) from an on-premise SQL Server to Azure Synapse Analytics.
#Azure Data Factory
#ETL
#Incremental Load
Data Engineer
•
System Design
•
medium
Explain the Medallion Architecture in Databricks. What specific transformations happen between the Bronze, Silver, and Gold layers?
#Databricks
#Data Lakehouse
#Architecture
Data Engineer
•
System Design
•
hard
Design a data migration strategy for a client moving 500TB of historical data from an on-premise Hadoop cluster to AWS S3 / Azure Data Lake.
#Cloud Migration
#AWS
#Azure
#Architecture
Data Engineer
•
System Design
•
hard
Design an end-to-end data pipeline for a retail client that generates 10TB of POS transaction data daily. The data needs to be available for BI reporting by 8 AM every day.
#ETL
#Batch Processing
#Architecture
Data Engineer
•
System Design
•
hard
How do you handle 'late-arriving data' in a daily batch ETL pipeline? For example, data from Monday arrives on Wednesday.
#ETL
#Data Quality
#Architecture
Data Engineer
•
System Design
•
medium
Explain the difference between Lambda and Kappa architectures. Which one is more prevalent in modern cloud data engineering?
#Architecture
#Streaming
#Batch Processing
Data Engineer
•
Technical
•
medium
Explain the difference between a CTE and a Temporary Table. In a scenario where you are processing 50 million rows for a client report, which one would you choose and why?
#SQL
#Performance Tuning
#Database Architecture
Data Engineer
•
Technical
•
medium
In PySpark, what is the difference between a Broadcast Hash Join and a Sort Merge Join? When does Spark automatically choose a Broadcast Join?
#PySpark
#Joins
#Optimization
Data Engineer
•
Technical
•
hard
You are joining a massive fact table with a dimension table in PySpark, and the job is stuck at 99% on the last task. How do you identify and resolve this issue?
#PySpark
#Data Skewness
#Performance Tuning
Data Engineer
•
Technical
•
easy
Explain the exact difference between repartition() and coalesce() in PySpark. If you are writing data to an Azure Data Lake, which one do you use to reduce the number of output files?
#PySpark
#Partitioning
#Shuffling
Data Engineer
•
Technical
•
hard
Your PySpark ETL job running on an AWS EMR cluster is failing with an 'Out Of Memory (OOM)' error. Walk me through your step-by-step debugging process.
#PySpark
#Troubleshooting
#Memory Management
Data Engineer
•
Technical
•
medium
What is the difference between Cache() and Persist() in Spark? What are the different storage levels available?
#PySpark
#Caching
#Optimization
Data Engineer
•
Technical
•
medium
Why are Python UDFs (User Defined Functions) considered bad for performance in PySpark, and what is the modern alternative?
#PySpark
#UDFs
#Optimization
Data Engineer
•
Technical
•
hard
How do you optimize a slow-running Copy Activity in Azure Data Factory that is transferring 5TB of data?
#Azure Data Factory
#Performance Tuning
Data Engineer
•
Technical
•
hard
How does Delta Lake implement ACID transactions on top of cloud object storage (like S3 or ADLS)?
#Databricks
#Delta Lake
#ACID
Data Engineer
•
Technical
•
medium
In Snowflake, what are micro-partitions and how do clustering keys improve query performance on large tables?
#Snowflake
#Architecture
#Performance Tuning
Data Engineer
•
Technical
•
easy
Explain the concept of 'Time Travel' in Snowflake or Delta Lake. How would you recover a table that was accidentally dropped yesterday?
#Snowflake
#Delta Lake
#Data Recovery
Data Engineer
•
Technical
•
hard
How do you achieve exactly-once processing semantics in an Apache Kafka to Spark Structured Streaming pipeline?
#Kafka
#Spark Streaming
#Architecture
Data Engineer
•
Technical
•
medium
You notice a high consumer lag in your Kafka topic. What are the potential reasons, and how would you resolve it?
#Kafka
#Troubleshooting
#Streaming
Data Engineer
•
Technical
•
medium
What is a Factless Fact Table? Give a real-world business scenario where you would use one.
#Data Modeling
#Data Warehousing
Data Engineer
•
Technical
•
easy
Compare Star Schema and Snowflake Schema. If a client prioritizes fast read performance for a BI dashboard, which one do you recommend and why?
#Data Modeling
#Data Warehousing
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.