Infosys

Infosys

Global leader in next-generation digital services and consulting.

3 Rounds ~14 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time you had a disagreement with a client regarding a technical architecture choice. How did you resolve it?

#Client Communication #Conflict Resolution #Consulting
Data Engineer Behavioral medium

Describe a situation where you had to quickly learn a new technology to deliver an urgent project requirement.

#Adaptability #Continuous Learning #Delivery
Data Engineer Behavioral medium

How do you manage communication and ensure alignment when working in a distributed model with onshore and offshore teams?

#Communication #Agile #Global Delivery Model
Data Engineer Coding medium

Write a SQL query to find the 3rd highest salary from an Employee table without using the LIMIT keyword.

#Window Functions #Subqueries #DENSE_RANK
Data Engineer Coding medium

Write a Python function to flatten a deeply nested JSON object/dictionary into a single-level dictionary.

#Python #Recursion #Data Structures #JSON
Data Engineer Coding medium

Write a SQL query to delete duplicate rows from a table, keeping only the record with the lowest ID.

#CTEs #ROW_NUMBER() #Data Cleansing
Data Engineer Coding medium

Write a Python script to read a large CSV file (10GB) that doesn't fit into memory, filter rows based on a condition, and write to a new file.

#Python #Generators #File I/O #Memory Management
Data Engineer Coding easy

Write a SQL query to find employees who earn more than their direct managers.

#Self Join #Filtering
Data Engineer Coding medium

Given an array of strings, write a Python function to group anagrams together.

#Python #Hash Maps #Strings #Sorting
Data Engineer Coding medium

Write a SQL query to calculate the cumulative sum of sales per region, ordered by date.

#Window Functions #SUM() OVER #Aggregations
Data Engineer Coding easy

Write a Python program to find the missing number in an array containing integers from 1 to N.

#Python #Math #Arrays
Data Engineer Coding medium

Write a PySpark script to read a CSV, drop rows with nulls in a specific column, group by another column, and write the output as Parquet.

#PySpark #DataFrame API #Data Cleaning
Data Engineer Coding medium

Write a SQL query to find the top 3 selling products in each category.

#Window Functions #RANK() #PARTITION BY
Data Engineer System Design medium

Design an ETL pipeline to migrate on-premise SQL Server data to Azure Synapse Analytics for a retail client.

#Azure Data Factory #Azure Synapse #Data Migration #ETL
Data Engineer System Design hard

Design a real-time streaming pipeline to process clickstream data and generate hourly aggregations.

#Kafka #Spark Structured Streaming #Real-time Processing #Cloud Architecture
Data Engineer System Design hard

Design a batch data pipeline to process 10TB of daily transaction logs, ensuring idempotency and fault tolerance.

#Batch Processing #Idempotency #Fault Tolerance #Data Lakehouse
Data Engineer System Design hard

Design a data quality framework for a newly built data lakehouse. What checks would you implement?

#Data Quality #Data Governance #Lakehouse
Data Engineer Technical hard

How do you handle data skewness in a PySpark join operation?

#PySpark #Performance Tuning #Data Skewness #Salting
Data Engineer Technical easy

Explain the difference between repartition() and coalesce() in PySpark. When would you use one over the other?

#PySpark #Data Partitioning #Shuffle
Data Engineer Technical medium

What is a Slowly Changing Dimension (SCD) Type 2? How would you implement it using PySpark?

#Data Warehousing #SCD #PySpark #ETL
Data Engineer Technical medium

Explain the internal execution hierarchy of a Spark application.

#Spark Architecture #Jobs #Stages #Tasks
Data Engineer Technical hard

How do you handle Out of Memory (OOM) errors in a PySpark application?

#PySpark #Troubleshooting #Memory Management
Data Engineer Technical easy

What is the difference between Star Schema and Snowflake Schema? Which one performs better in a modern cloud data warehouse?

#Data Warehousing #Star Schema #Snowflake Schema #Normalization
Data Engineer Technical medium

Explain the architecture of Snowflake. How does it separate storage and compute?

#Snowflake #Cloud Data Warehouse #Architecture
Data Engineer Technical medium

How do you implement incremental data loading in Azure Data Factory (ADF)?

#Azure Data Factory #ETL #Incremental Load #Watermarking
Data Engineer Technical hard

What is the Catalyst Optimizer in Spark? Explain its phases.

#Spark Internals #Catalyst Optimizer #Query Plans
Data Engineer Technical medium

How do you pass data between different tasks in an Apache Airflow DAG?

#Airflow #Orchestration #XComs
Data Engineer Technical medium

What are Delta Lake's ACID properties? How does it handle concurrent writes?

#Databricks #Delta Lake #ACID #Concurrency
Data Engineer Technical medium

Explain the differences between Parquet, ORC, and Avro file formats. When would you choose Parquet over Avro?

#File Formats #Storage Optimization #Parquet #Avro
Data Engineer Technical easy

What is the difference between cache() and persist() in PySpark?

#PySpark #Memory Management #Caching
Data Engineer Technical hard

How do you ensure exactly-once processing semantics in an Apache Kafka streaming application?

#Kafka #Streaming #Exactly-once Semantics
Data Engineer Technical medium

What is a Factless Fact table? Provide a real-world example of when you would use one.

#Data Warehousing #Fact Tables #Dimensional Modeling
Data Engineer Technical medium

Explain Time Travel and Fail-safe features in Snowflake. How do they differ?

#Snowflake #Data Recovery #Architecture
Data Engineer Technical medium

Compare AWS Glue and Amazon EMR. When would you recommend one over the other to a client?

#AWS #Glue #EMR #Serverless
Data Engineer Technical medium

What are Broadcast Variables and Accumulators in Spark? Give use cases for each.

#PySpark #Shared Variables #Optimization

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now