Wipro

Wipro

Global information technology, consulting and business process services company.

4 Rounds ~21 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Engineer Behavioral medium

Describe a time when client requirements were highly ambiguous. How did you ensure the data pipeline you built met their actual business needs?

#Client Communication #Agile #Requirement Gathering
Data Engineer Behavioral medium

Tell me about a time you optimized a data pipeline that resulted in significant cost savings for the client.

#Cost Optimization #Client Impact #Performance Tuning
Data Engineer Behavioral medium

Describe a challenging production bug you encountered in a data pipeline. How did you troubleshoot and resolve it under strict SLA pressure?

#Troubleshooting #Incident Management #SLA
Data Engineer Behavioral hard

How do you ensure data quality and governance in a multi-tenant client environment?

#Data Governance #Security #Multi-tenancy
Data Engineer Coding medium

Write a SQL query to find all employees who earn more than their direct managers.

#Self Join #Filtering
Data Engineer Coding medium

Write a SQL query to find the 3rd highest salary from an Employee table without using the LIMIT keyword.

#Window Functions #DENSE_RANK #Subqueries
Data Engineer Coding easy

Write a PySpark script to read a CSV file, filter out records where 'status' is 'inactive', group by 'department', and write the output to a Parquet file.

#DataFrame API #I/O Operations #Transformations
Data Engineer Coding medium

Write a Python function to read a 50GB JSON file and extract specific fields without running out of memory.

#Generators #Memory Management #File I/O
Data Engineer Coding easy

Write a Python script using Pandas to merge two datasets on a common key, handle missing values by filling them with the column mean, and drop duplicate rows.

#Pandas #Data Cleaning #Data Manipulation
Data Engineer Coding medium

Write a SQL query to calculate the cumulative sum of sales per month for each product category.

#Window Functions #Aggregations
Data Engineer Coding medium

Given an array of integers, write a Python function to move all zeros to the end of the array while maintaining the relative order of the non-zero elements. Do this in-place.

#Arrays #Two Pointers #Python
Data Engineer Coding medium

Write a PySpark UDF (User Defined Function) to mask the first 12 digits of a 16-digit credit card number.

#UDF #Data Security #String Manipulation
Data Engineer Coding hard

Write a SQL query to find the cancellation rate of unbanned users on a specific date. (Similar to LeetCode 'Trips and Users').

#Joins #Aggregations #Case Statements
Data Engineer Coding medium

Write a Python decorator that calculates and prints the execution time of a function.

#Decorators #Performance Monitoring
Data Engineer Coding medium

Write a SQL query to identify duplicate records in a table based on 'email' and 'username', and keep only the record with the latest 'created_at' date.

#CTEs #Window Functions #Data Cleansing
Data Engineer Coding medium

Write a PySpark code snippet to perform a Broadcast Hash Join between a massive 'transactions' table and a small 'currency_conversion' table.

#Broadcast Join #Optimization #DataFrame API
Data Engineer System Design medium

Design an ETL pipeline using Azure Data Factory (ADF) and Databricks to ingest daily incremental sales data from an on-premise SQL Server to Azure Data Lake.

#Azure Data Factory #Databricks #Incremental Load #Cloud Architecture
Data Engineer System Design hard

Design a real-time streaming architecture to process website clickstream data, detect anomalies, and store the results for reporting.

#Kafka #Spark Streaming #Real-time Processing #Architecture
Data Engineer System Design hard

Design a data migration strategy to move 100TB of historical data from an on-premise Hadoop cluster to AWS S3 / Databricks with minimal downtime.

#Data Migration #AWS #Hadoop #Strategy
Data Engineer System Design medium

Design a data model for a ride-sharing application (like Uber) to support analytical queries such as 'average revenue per driver per city'.

#Data Modeling #Fact and Dimension Tables #Analytics
Data Engineer Technical medium

How does Spark handle fault tolerance? Explain the role of RDD lineage.

#Spark Architecture #Fault Tolerance #RDD
Data Engineer Technical easy

Explain the difference between ROW_NUMBER(), RANK(), and DENSE_RANK() with a real-world data engineering example.

#Window Functions #Ranking
Data Engineer Technical hard

How would you optimize a slow-running PySpark job that is suffering from data skewness during a join operation?

#PySpark #Performance Tuning #Data Skew
Data Engineer Technical medium

What is the difference between repartition() and coalesce() in PySpark, and when would you use each?

#PySpark #Partitioning #Shuffling
Data Engineer Technical medium

Explain Slowly Changing Dimensions (SCD). How do you implement SCD Type 2 in a data warehouse?

#SCD #Dimensional Modeling #ETL
Data Engineer Technical medium

What are Broadcast variables and Accumulators in Spark? Provide a use case for each.

#PySpark #Shared Variables #Optimization
Data Engineer Technical medium

Explain the architecture of Snowflake. How does its separation of storage and compute benefit data engineering workloads?

#Snowflake #Cloud Architecture #Scaling
Data Engineer Technical medium

How do you handle corrupt or malformed records when reading a JSON or CSV file in PySpark?

#Data Quality #Error Handling #PySpark
Data Engineer Technical medium

What is Delta Lake? Explain its ACID properties and how it implements time travel.

#Delta Lake #Databricks #ACID #Data Lakehouse
Data Engineer Technical easy

Explain the difference between a Star Schema and a Snowflake Schema. When would you choose one over the other?

#Data Modeling #Schema Design
Data Engineer Technical easy

What are the different types of triggers available in Azure Data Factory (ADF)?

#Azure Data Factory #Orchestration
Data Engineer Technical medium

How do you handle late-arriving data in a batch ETL pipeline?

#ETL #Data Quality #Batch Processing
Data Engineer Technical medium

Explain the concept of 'Predicate Pushdown' in Spark and Parquet.

#Spark #Parquet #Optimization
Data Engineer Technical easy

What is the difference between an Inner Join, Left Join, and Full Outer Join? How do NULL values affect these joins?

#Joins #Relational Algebra
Data Engineer Technical easy

Explain the difference between Managed and External tables in Hive/Databricks.

#Hive #Databricks #Storage

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now