The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Coding
•
medium
Write PySpark code to explode an array column into multiple rows.
#PySpark
#DataFrames
#Functions
Data Engineer
•
Coding
•
easy
Write a PySpark script to read a massive CSV file, filter out rows with null values in a specific column, group by another column to find the count, and write the output to Parquet format.
#PySpark
#DataFrames
#I/O
Data Engineer
•
Technical
•
easy
Explain the concept of lazy evaluation in Spark. What are its benefits?
#PySpark
#Spark Architecture
#DAG
Data Engineer
•
Technical
•
hard
How do you troubleshoot and resolve an OutOfMemory (OOM) error in a PySpark application?
#PySpark
#Debugging
#Memory Management
Data Engineer
•
Technical
•
hard
How does Apache Spark handle memory management? Explain the difference between execution memory and storage memory.
#PySpark
#Memory Management
#Spark Architecture
Data Engineer
•
Technical
•
medium
Explain Broadcast Hash Join vs. Sort Merge Join in Spark. When would you use a Broadcast Join?
#PySpark
#Joins
#Optimization
Data Engineer
•
Technical
•
hard
You are running a PySpark job that is taking unusually long and you notice that one task is taking 90% of the time while others finish quickly. What is the issue and how do you fix it?
#PySpark
#Data Skewness
#Performance Tuning
Data Engineer
•
Technical
•
easy
What is the difference between repartition() and coalesce() in PySpark? When should you use each?
#PySpark
#Partitions
#Shuffling
Data Engineer
•
Technical
•
easy
What is the difference between an external table and a managed table in Hive or Databricks?
#Hive
#Databricks
#Data Storage
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.