DXC Technology
American multinational B2B IT services provider.
4 Rounds
~21 Days
Medium
The Interview Loop
Recruiter Screen (30 min)
Standard fit check, behavioral questions, and resume overview.
Technical Loop (3-4 Rounds)
Deep dive into domain knowledge, coding, and system design.
Interview Question Bank
Data Engineer
•
Behavioral
•
medium
Tell me about a time you had a disagreement with a client or stakeholder regarding technical requirements. How did you resolve it?
#Communication
#Conflict Resolution
#Consulting
Data Engineer
•
Behavioral
•
medium
How do you manage scope creep in the middle of a complex data migration project?
#Project Management
#Client Management
#Agile
Data Engineer
•
Behavioral
•
medium
Describe a time you optimized a data pipeline and saved the company or client money.
#Optimization
#Cost Reduction
#Impact
Data Engineer
•
Behavioral
•
easy
How do you explain complex data engineering concepts, like data lakes or distributed computing, to non-technical business stakeholders?
#Communication
#Stakeholder Management
Data Engineer
•
Behavioral
•
medium
Tell me about a time you failed to meet a project deadline. What happened and what did you learn?
#Accountability
#Continuous Improvement
Data Engineer
•
Behavioral
•
easy
Why do you want to work at DXC Technology as a Data Engineer?
#Company Knowledge
#Motivation
Data Engineer
•
Behavioral
•
medium
How do you ensure data quality and governance in the pipelines you build?
#Data Quality
#Governance
#Best Practices
Data Engineer
•
Behavioral
•
easy
Describe your experience working in an Agile/Scrum environment with globally distributed teams.
#Agile
#Teamwork
#Remote Work
Data Engineer
•
Coding
•
medium
Write a SQL query to find the second highest salary from an Employee table without using the LIMIT or TOP keywords.
#SQL
#Subqueries
#Aggregations
Data Engineer
•
Coding
•
medium
Write a SQL query to calculate the cumulative sum of sales by month for each region.
#SQL
#Window Functions
#Data Aggregation
Data Engineer
•
Coding
•
medium
Write a Python function to parse a large server log file and return the top 5 most frequent IP addresses.
#Python
#File I/O
#Hash Maps
#Collections
Data Engineer
•
Coding
•
easy
Write a Python script using Pandas or PySpark to read a CSV file, filter out rows where the 'status' column is 'failed', and write the output to a Parquet file.
#Python
#Pandas
#PySpark
#ETL
Data Engineer
•
Coding
•
medium
Write a PySpark snippet to group a dataframe by 'department' and calculate the average salary and total employee count for each.
#PySpark
#Aggregations
#DataFrames
Data Engineer
•
Coding
•
medium
Write a Python generator function that reads a file and yields one line at a time. Why is this useful?
#Python
#Generators
#Memory Management
Data Engineer
•
Coding
•
easy
Given a list of dictionaries representing employee records, write a Python function to sort the list by the 'salary' key in descending order.
#Python
#Sorting
#Data Structures
Data Engineer
•
System Design
•
hard
Design an ETL pipeline to migrate daily transactional data from an on-premise SQL Server to an AWS S3 data lake, and then to Redshift for reporting.
#AWS
#ETL Architecture
#Data Lake
#Redshift
Data Engineer
•
System Design
•
hard
How would you design a real-time streaming pipeline to process IoT sensor data and generate alerts for anomalies?
#Streaming
#Kafka
#Spark Streaming
#Real-time Processing
Data Engineer
•
System Design
•
medium
Explain the Lambda Architecture. What are its pros and cons compared to the Kappa Architecture?
#Data Architecture
#Batch Processing
#Stream Processing
Data Engineer
•
System Design
•
hard
Design a Data Lakehouse architecture for a large retail client. How does it differ from a traditional Data Lake?
#Data Lakehouse
#Databricks
#Delta Lake
#Architecture
Data Engineer
•
Technical
•
easy
Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() in SQL. Provide a scenario where you would use each.
#SQL
#Window Functions
Data Engineer
•
Technical
•
medium
You have a query that is taking too long to execute. Walk me through the steps you would take to optimize it.
#Performance Tuning
#Execution Plans
#Indexing
Data Engineer
•
Technical
•
medium
How would you handle processing a 50GB file in Python on a machine with only 8GB of RAM?
#Python
#Generators
#Memory Management
#Chunking
Data Engineer
•
Technical
•
medium
Explain the difference between Repartition and Coalesce in PySpark. When would you use one over the other?
#PySpark
#Data Partitioning
#Performance Optimization
Data Engineer
•
Technical
•
medium
What is a Broadcast Join in Spark? How does it improve performance compared to a Sort Merge Join?
#PySpark
#Joins
#Distributed Computing
Data Engineer
•
Technical
•
hard
During a PySpark job, you notice that one task takes significantly longer than the others, causing a bottleneck. What is the likely cause and how do you fix it?
#PySpark
#Data Skew
#Troubleshooting
Data Engineer
•
Technical
•
easy
Explain lazy evaluation in Apache Spark. Why is it beneficial?
#Spark Architecture
#DAG
#Transformations vs Actions
Data Engineer
•
Technical
•
medium
What is the difference between a Star Schema and a Snowflake Schema? Which one would you prefer for a modern cloud data warehouse?
#Data Warehousing
#Dimensional Modeling
Data Engineer
•
Technical
•
medium
Explain Slowly Changing Dimensions (SCD). How do you implement an SCD Type 2 in an ETL pipeline?
#Data Warehousing
#ETL
#SCD
Data Engineer
•
Technical
•
medium
In AWS, when would you choose to use AWS Glue versus Amazon EMR for your data transformation workloads?
#AWS
#Glue
#EMR
#Serverless
Data Engineer
•
Technical
•
medium
How does Snowflake handle data storage and indexing? Explain the concept of micro-partitions.
#Snowflake
#Cloud Data Warehousing
#Micro-partitions
Data Engineer
•
Technical
•
medium
What is the optimal file size for storing data in Amazon S3 for querying with Athena or Spark? Why?
#AWS S3
#Big Data Storage
#Performance Optimization
Data Engineer
•
Technical
•
medium
How do you handle task dependencies and retries in Apache Airflow?
#Airflow
#Orchestration
#DAGs
Data Engineer
•
Technical
•
medium
What is the difference between a Common Table Expression (CTE) and a Temporary Table? When would you use one over the other?
#SQL
#Database Architecture
#Performance
Data Engineer
•
Technical
•
hard
Explain the concept of 'salting' in PySpark. Write a conceptual code snippet showing how you would implement it to fix a skewed join.
#PySpark
#Data Skew
#Advanced Optimization
Data Engineer
•
Technical
•
medium
Compare Azure Data Factory (ADF) and Azure Databricks. In a modern data stack, how do they complement each other?
#Azure
#ADF
#Databricks
#ETL
Difficulty Radar
Based on recent AI-sourced data.
Meet Your Interviewers
The "Standard" Interviewer
Senior EngineerFocuses on core competencies, system constraints, and clear communication.
SimulateUnwritten Rules
Think Out Loud
Always explain your thought process before writing code or drawing architecture.