Wipro

Global information technology, consulting and business process services company.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics SQL 7 Big Data 7 PySpark 4 System Design 4 Culture Fit 3 Python 3 Data Warehousing 2 Cloud 1

Data Engineer • Behavioral • medium

Describe a time when client requirements were highly ambiguous. How did you ensure the data pipeline you built met their actual business needs?

#Client Communication #Agile #Requirement Gathering

Practice

Data Engineer • Behavioral • medium

Tell me about a time you optimized a data pipeline that resulted in significant cost savings for the client.

#Cost Optimization #Client Impact #Performance Tuning

Practice

Data Engineer • Behavioral • medium

Describe a challenging production bug you encountered in a data pipeline. How did you troubleshoot and resolve it under strict SLA pressure?

#Troubleshooting #Incident Management #SLA

Practice

Data Engineer • Behavioral • hard

How do you ensure data quality and governance in a multi-tenant client environment?

#Data Governance #Security #Multi-tenancy

Practice

Data Engineer • Coding • medium

Write a SQL query to find all employees who earn more than their direct managers.

#Self Join #Filtering

Practice

Data Engineer • Coding • medium

Write a SQL query to find the 3rd highest salary from an Employee table without using the LIMIT keyword.

#Window Functions #DENSE_RANK #Subqueries

Practice

Data Engineer • Coding • easy

Write a PySpark script to read a CSV file, filter out records where 'status' is 'inactive', group by 'department', and write the output to a Parquet file.

#DataFrame API #I/O Operations #Transformations

Practice

Data Engineer • Coding • medium

Write a Python function to read a 50GB JSON file and extract specific fields without running out of memory.

#Generators #Memory Management #File I/O

Practice

Data Engineer • Coding • easy

Write a Python script using Pandas to merge two datasets on a common key, handle missing values by filling them with the column mean, and drop duplicate rows.

#Pandas #Data Cleaning #Data Manipulation

Practice

Data Engineer • Coding • medium

Write a SQL query to calculate the cumulative sum of sales per month for each product category.

#Window Functions #Aggregations

Practice

Data Engineer • Coding • medium

Given an array of integers, write a Python function to move all zeros to the end of the array while maintaining the relative order of the non-zero elements. Do this in-place.

#Arrays #Two Pointers #Python

Practice

Data Engineer • Coding • medium

Write a PySpark UDF (User Defined Function) to mask the first 12 digits of a 16-digit credit card number.

#UDF #Data Security #String Manipulation

Practice

Data Engineer • Coding • hard

Write a SQL query to find the cancellation rate of unbanned users on a specific date. (Similar to LeetCode 'Trips and Users').

#Joins #Aggregations #Case Statements

Practice

Data Engineer • Coding • medium

Write a Python decorator that calculates and prints the execution time of a function.

#Decorators #Performance Monitoring

Practice

Data Engineer • Coding • medium

Write a SQL query to identify duplicate records in a table based on 'email' and 'username', and keep only the record with the latest 'created_at' date.

#CTEs #Window Functions #Data Cleansing

Practice

Data Engineer • Coding • medium

Write a PySpark code snippet to perform a Broadcast Hash Join between a massive 'transactions' table and a small 'currency_conversion' table.

#Broadcast Join #Optimization #DataFrame API

Practice

Data Engineer • System Design • medium

Design an ETL pipeline using Azure Data Factory (ADF) and Databricks to ingest daily incremental sales data from an on-premise SQL Server to Azure Data Lake.

#Azure Data Factory #Databricks #Incremental Load #Cloud Architecture

Practice

Data Engineer • System Design • hard

Design a real-time streaming architecture to process website clickstream data, detect anomalies, and store the results for reporting.

#Kafka #Spark Streaming #Real-time Processing #Architecture

Practice

Data Engineer • System Design • hard

Design a data migration strategy to move 100TB of historical data from an on-premise Hadoop cluster to AWS S3 / Databricks with minimal downtime.

#Data Migration #AWS #Hadoop #Strategy

Practice

Data Engineer • System Design • medium

Design a data model for a ride-sharing application (like Uber) to support analytical queries such as 'average revenue per driver per city'.

#Data Modeling #Fact and Dimension Tables #Analytics

Practice

Data Engineer • Technical • medium

How does Spark handle fault tolerance? Explain the role of RDD lineage.

#Spark Architecture #Fault Tolerance #RDD

Practice

Data Engineer • Technical • easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() with a real-world data engineering example.

#Window Functions #Ranking

Practice

Data Engineer • Technical • hard

How would you optimize a slow-running PySpark job that is suffering from data skewness during a join operation?

#PySpark #Performance Tuning #Data Skew

Practice

Data Engineer • Technical • medium

What is the difference between repartition() and coalesce() in PySpark, and when would you use each?

#PySpark #Partitioning #Shuffling

Practice

Data Engineer • Technical • medium

Explain Slowly Changing Dimensions (SCD). How do you implement SCD Type 2 in a data warehouse?

#SCD #Dimensional Modeling #ETL

Practice

Data Engineer • Technical • medium

What are Broadcast variables and Accumulators in Spark? Provide a use case for each.

#PySpark #Shared Variables #Optimization

Practice

Data Engineer • Technical • medium

Explain the architecture of Snowflake. How does its separation of storage and compute benefit data engineering workloads?

#Snowflake #Cloud Architecture #Scaling

Practice

Data Engineer • Technical • medium

How do you handle corrupt or malformed records when reading a JSON or CSV file in PySpark?

#Data Quality #Error Handling #PySpark

Practice

Data Engineer • Technical • medium

What is Delta Lake? Explain its ACID properties and how it implements time travel.

#Delta Lake #Databricks #ACID #Data Lakehouse

Practice

Data Engineer • Technical • easy

Explain the difference between a Star Schema and a Snowflake Schema. When would you choose one over the other?

#Data Modeling #Schema Design

Practice

Data Engineer • Technical • easy

What are the different types of triggers available in Azure Data Factory (ADF)?

#Azure Data Factory #Orchestration

Practice

Data Engineer • Technical • medium

How do you handle late-arriving data in a batch ETL pipeline?

#ETL #Data Quality #Batch Processing

Practice

Data Engineer • Technical • medium

Explain the concept of 'Predicate Pushdown' in Spark and Parquet.

#Spark #Parquet #Optimization

Practice

Data Engineer • Technical • easy

What is the difference between an Inner Join, Left Join, and Full Outer Join? How do NULL values affect these joins?

#Joins #Relational Algebra

Practice

Data Engineer • Technical • easy

Explain the difference between Managed and External tables in Hive/Databricks.

#Hive #Databricks #Storage

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now