Cognizant

Cognizant

American multinational information technology services and consulting company.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Tell me about a time when a client changed the data requirements in the middle of a sprint. How did you handle the pipeline refactoring and communicate the impact?

#Agile #Client Communication #Adaptability
Data Engineer Behavioral hard

Tell me about a complex data pipeline you built from scratch. What were the biggest technical challenges and how did you overcome them?

#Project Experience #Problem Solving #End-to-End Delivery
Data Engineer Behavioral medium

Have you ever disagreed with a Data Architect or Lead regarding a pipeline design? How did you handle the disagreement?

#Conflict Resolution #Communication #Teamwork
Data Engineer Behavioral medium

Describe a time you had to optimize cloud costs for a data engineering project at a client site.

#Cost Optimization #Cloud #Client Delivery
Data Engineer Coding medium

Write a SQL query to find the top 3 highest earning employees in each department, handling ties appropriately.

#Window Functions #DENSE_RANK #PARTITION BY
Data Engineer Coding medium

Write a PySpark script to read a CSV file from S3, drop rows with null values in a specific column, group by another column to find the average, and write the output back to S3 as Parquet.

#DataFrames #I/O #Aggregations
Data Engineer Coding easy

Write a Python function to check if two strings are anagrams of each other. Optimize it for time complexity.

#Python #Strings #Hash Maps
Data Engineer Coding medium

Write a SQL query to find the cumulative sum of sales per day for the current month.

#Window Functions #Aggregations #Date Functions
Data Engineer Coding easy

Given a list of dictionaries representing employee data, write a Python script using list comprehensions to extract the names of employees who belong to the 'IT' department and have a salary > 80000.

#List Comprehensions #Data Manipulation
Data Engineer Coding medium

Write a SQL query to delete duplicate rows from a table, keeping only the record with the lowest ID.

#Data Cleaning #CTEs #DELETE
Data Engineer Coding medium

Write a Python generator function that reads a massive log file line by line and yields lines containing the word 'ERROR'. Why use a generator here?

#Generators #Memory Management #File I/O
Data Engineer Coding medium

Write a PySpark DataFrame query to pivot a table. You have columns: 'Store', 'Month', and 'Revenue'. Pivot the 'Month' column so each month is a separate column showing the revenue.

#Pivot #Data Aggregation
Data Engineer Coding easy

Write a SQL query to find all employees who earn more than their direct managers. The table 'Employee' has columns: Id, Name, Salary, ManagerId.

#Self Join #Filtering
Data Engineer Coding hard

Write a Python script to flatten a deeply nested JSON object representing a client's API response into a flat dictionary.

#Recursion #JSON #Data Parsing
Data Engineer System Design hard

Design a batch ETL pipeline to migrate 500GB of daily transactional data from an on-premise Oracle database to Snowflake on AWS. What tools and architecture would you use?

#AWS #Snowflake #Data Migration #ETL Architecture
Data Engineer System Design hard

Design a real-time streaming pipeline to process clickstream data from a retail client's website and update a live dashboard.

#Streaming #Kafka #Spark Streaming #Real-time Analytics
Data Engineer System Design hard

Design a system to ingest and process daily healthcare claims data (HIPAA compliant). The data arrives as CSVs in an SFTP server.

#Healthcare #Security #ETL #Cloud Architecture
Data Engineer System Design medium

Design a CI/CD pipeline for deploying Data Engineering assets (Airflow DAGs, Snowflake SQL scripts, PySpark code).

#CI/CD #DevOps #Git #Jenkins/GitHub Actions
Data Engineer Technical easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK(). In what client reporting scenario would you choose DENSE_RANK over RANK?

#Window Functions #Data Analysis
Data Engineer Technical hard

How do you handle data skewness in PySpark? Walk me through the exact steps you would take if a join operation is taking too long due to a skewed key.

#Performance Tuning #Data Skew #Salting #Broadcast Join
Data Engineer Technical easy

What is the difference between a narrow and wide transformation in Spark? Give examples of each.

#Spark Architecture #Transformations #Shuffling
Data Engineer Technical medium

Explain the architecture of Snowflake. How does its separation of compute and storage benefit a multi-tenant consulting project?

#Snowflake #Architecture #Virtual Warehouses
Data Engineer Technical medium

How do you implement Slowly Changing Dimension (SCD) Type 2 in a data warehouse? Describe the necessary columns and the update logic.

#SCD #Data Warehousing #ETL
Data Engineer Technical hard

You have a PySpark job failing with an OutOfMemory (OOM) error on the executor side. What are the potential causes and how do you troubleshoot it?

#Troubleshooting #Memory Management #OOM
Data Engineer Technical medium

What is the difference between repartition() and coalesce() in PySpark? When would you use one over the other?

#Partitioning #Performance Tuning
Data Engineer Technical medium

Explain the concept of a Data Mesh. How does it differ from a traditional centralized Data Lake architecture?

#Data Mesh #Data Lake #Decentralization
Data Engineer Technical medium

Describe your experience with Apache Airflow. How do you pass data between tasks in an Airflow DAG?

#Airflow #XComs #DAGs
Data Engineer Technical easy

What are Parquet files? Why are they preferred over CSV or JSON in Big Data processing?

#File Formats #Parquet #Columnar Storage
Data Engineer Technical medium

How does Spark handle fault tolerance? Explain the role of the DAG and RDD lineage.

#Fault Tolerance #Lineage #DAG
Data Engineer Technical easy

What is the difference between a Left Join and an Inner Join? What happens to the result set if the right table has multiple matching rows for a single row in the left table?

#Joins #Data Duplication
Data Engineer Technical medium

Explain the concept of Predicate Pushdown in Spark and Snowflake. How does it improve query performance?

#Predicate Pushdown #Query Optimization
Data Engineer Technical medium

In AWS, what is the difference between Amazon Redshift and Amazon Athena? When would you use Athena for a client project?

#AWS #Redshift #Athena #Serverless
Data Engineer Technical medium

What are User Defined Functions (UDFs) in PySpark? Why are Python UDFs generally discouraged, and what is the alternative?

#UDFs #Performance Tuning #Pandas UDFs
Data Engineer Technical hard

Explain the CAP theorem. How does it apply to choosing a NoSQL database like Cassandra vs MongoDB for a specific use case?

#CAP Theorem #NoSQL #System Architecture
Data Engineer Technical medium

What is the difference between Star Schema and Snowflake Schema? Which one is preferred in modern columnar data warehouses and why?

#Star Schema #Snowflake Schema #Dimensional Modeling

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now