Cognizant

American multinational information technology services and consulting company.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 Frontend Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics PySpark 8 SQL 6 System Design 4 Python 3 Data Modeling 2 Experience 2 Performance Tuning 1 Leadership 1

Data Engineer • Behavioral • medium

Tell me about a time when a client changed the data requirements in the middle of a sprint. How did you handle the pipeline refactoring and communicate the impact?

#Agile #Client Communication #Adaptability

Practice

Data Engineer • Behavioral • hard

Tell me about a complex data pipeline you built from scratch. What were the biggest technical challenges and how did you overcome them?

#Project Experience #Problem Solving #End-to-End Delivery

Practice

Data Engineer • Behavioral • medium

Have you ever disagreed with a Data Architect or Lead regarding a pipeline design? How did you handle the disagreement?

#Conflict Resolution #Communication #Teamwork

Practice

Data Engineer • Behavioral • medium

Describe a time you had to optimize cloud costs for a data engineering project at a client site.

#Cost Optimization #Cloud #Client Delivery

Practice

Data Engineer • Coding • medium

Write a SQL query to find the top 3 highest earning employees in each department, handling ties appropriately.

#Window Functions #DENSE_RANK #PARTITION BY

Practice

Data Engineer • Coding • medium

Write a PySpark script to read a CSV file from S3, drop rows with null values in a specific column, group by another column to find the average, and write the output back to S3 as Parquet.

#DataFrames #I/O #Aggregations

Practice

Data Engineer • Coding • easy

Write a Python function to check if two strings are anagrams of each other. Optimize it for time complexity.

#Python #Strings #Hash Maps

Practice

Data Engineer • Coding • medium

Write a SQL query to find the cumulative sum of sales per day for the current month.

#Window Functions #Aggregations #Date Functions

Practice

Data Engineer • Coding • easy

Given a list of dictionaries representing employee data, write a Python script using list comprehensions to extract the names of employees who belong to the 'IT' department and have a salary > 80000.

#List Comprehensions #Data Manipulation

Practice

Data Engineer • Coding • medium

Write a SQL query to delete duplicate rows from a table, keeping only the record with the lowest ID.

#Data Cleaning #CTEs #DELETE

Practice

Data Engineer • Coding • medium

Write a Python generator function that reads a massive log file line by line and yields lines containing the word 'ERROR'. Why use a generator here?

#Generators #Memory Management #File I/O

Practice

Data Engineer • Coding • medium

Write a PySpark DataFrame query to pivot a table. You have columns: 'Store', 'Month', and 'Revenue'. Pivot the 'Month' column so each month is a separate column showing the revenue.

#Pivot #Data Aggregation

Practice

Data Engineer • Coding • easy

Write a SQL query to find all employees who earn more than their direct managers. The table 'Employee' has columns: Id, Name, Salary, ManagerId.

#Self Join #Filtering

Practice

Data Engineer • Coding • hard

Write a Python script to flatten a deeply nested JSON object representing a client's API response into a flat dictionary.

#Recursion #JSON #Data Parsing

Practice

Data Engineer • System Design • hard

Design a batch ETL pipeline to migrate 500GB of daily transactional data from an on-premise Oracle database to Snowflake on AWS. What tools and architecture would you use?

#AWS #Snowflake #Data Migration #ETL Architecture

Practice

Data Engineer • System Design • hard

Design a real-time streaming pipeline to process clickstream data from a retail client's website and update a live dashboard.

#Streaming #Kafka #Spark Streaming #Real-time Analytics

Practice

Data Engineer • System Design • hard

Design a system to ingest and process daily healthcare claims data (HIPAA compliant). The data arrives as CSVs in an SFTP server.

#Healthcare #Security #ETL #Cloud Architecture

Practice

Data Engineer • System Design • medium

Design a CI/CD pipeline for deploying Data Engineering assets (Airflow DAGs, Snowflake SQL scripts, PySpark code).

#CI/CD #DevOps #Git #Jenkins/GitHub Actions

Practice

Data Engineer • Technical • easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). In what client reporting scenario would you choose DENSE_RANK over RANK?

#Window Functions #Data Analysis

Practice

Data Engineer • Technical • hard

How do you handle data skewness in PySpark? Walk me through the exact steps you would take if a join operation is taking too long due to a skewed key.

#Performance Tuning #Data Skew #Salting #Broadcast Join

Practice

Data Engineer • Technical • easy

What is the difference between a narrow and wide transformation in Spark? Give examples of each.

#Spark Architecture #Transformations #Shuffling

Practice

Data Engineer • Technical • medium

Explain the architecture of Snowflake. How does its separation of compute and storage benefit a multi-tenant consulting project?

#Snowflake #Architecture #Virtual Warehouses

Practice

Data Engineer • Technical • medium

How do you implement Slowly Changing Dimension (SCD) Type 2 in a data warehouse? Describe the necessary columns and the update logic.

#SCD #Data Warehousing #ETL

Practice

Data Engineer • Technical • hard

You have a PySpark job failing with an OutOfMemory (OOM) error on the executor side. What are the potential causes and how do you troubleshoot it?

#Troubleshooting #Memory Management #OOM

Practice

Data Engineer • Technical • medium

What is the difference between repartition() and coalesce() in PySpark? When would you use one over the other?

#Partitioning #Performance Tuning

Practice

Data Engineer • Technical • medium

Explain the concept of a Data Mesh. How does it differ from a traditional centralized Data Lake architecture?

#Data Mesh #Data Lake #Decentralization

Practice

Data Engineer • Technical • medium

Describe your experience with Apache Airflow. How do you pass data between tasks in an Airflow DAG?

#Airflow #XComs #DAGs

Practice

Data Engineer • Technical • easy

What are Parquet files? Why are they preferred over CSV or JSON in Big Data processing?

#File Formats #Parquet #Columnar Storage

Practice

Data Engineer • Technical • medium

How does Spark handle fault tolerance? Explain the role of the DAG and RDD lineage.

#Fault Tolerance #Lineage #DAG

Practice

Data Engineer • Technical • easy

What is the difference between a Left Join and an Inner Join? What happens to the result set if the right table has multiple matching rows for a single row in the left table?

#Joins #Data Duplication

Practice

Data Engineer • Technical • medium

Explain the concept of Predicate Pushdown in Spark and Snowflake. How does it improve query performance?

#Predicate Pushdown #Query Optimization

Practice

Data Engineer • Technical • medium

In AWS, what is the difference between Amazon Redshift and Amazon Athena? When would you use Athena for a client project?

#AWS #Redshift #Athena #Serverless

Practice

Data Engineer • Technical • medium

What are User Defined Functions (UDFs) in PySpark? Why are Python UDFs generally discouraged, and what is the alternative?

#UDFs #Performance Tuning #Pandas UDFs

Practice

Data Engineer • Technical • hard

Explain the CAP theorem. How does it apply to choosing a NoSQL database like Cassandra vs MongoDB for a specific use case?

#CAP Theorem #NoSQL #System Architecture

Practice

Data Engineer • Technical • medium

What is the difference between Star Schema and Snowflake Schema? Which one is preferred in modern columnar data warehouses and why?

#Star Schema #Snowflake Schema #Dimensional Modeling

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now