Capgemini

Global leader in partnering with companies to transform and manage their business by harnessing the power of technology.

4 Rounds ~21 Days Medium

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Backend Engineer 35 Cloud Engineer 35 Data Engineer 35 Data Scientist 35 DevOps Engineer 35 Frontend Engineer 35 Full Stack Engineer 35 Machine Learning Engineer 35 Product Manager 35 Software Engineer 35

All Topics Machine Learning 8 SQL 4 System Design 4 Algorithms 3 Python 2 Communication 2 Generative AI 2 MLOps 2

Data Scientist • Behavioral • medium

Tell me about a time you had to explain a complex machine learning model's predictions to a non-technical client stakeholder.

#Stakeholder Management #Model Interpretability #Consulting

Practice

Data Scientist • Behavioral • medium

Describe a situation where a client provided very messy, undocumented, or incomplete data. How did you proceed?

#Problem Solving #Client Handling #Data Quality

Practice

Data Scientist • Behavioral • medium

Explain the concept of a p-value to a business stakeholder who has no statistical background.

#Statistics #Stakeholder Management #A/B Testing

Practice

Data Scientist • Behavioral • medium

Tell me about a time you disagreed with a team member or a lead on the choice of an algorithm or architecture. How did you resolve it?

#Conflict Resolution #Teamwork #Decision Making

Practice

Data Scientist • Behavioral • easy

How do you prioritize tasks when working on multiple client deliverables with tight deadlines?

#Prioritization #Consulting #Agile

Practice

Data Scientist • Behavioral • hard

Describe a time when a model you built failed in production or didn't meet client expectations. What did you learn?

#Failure #Continuous Improvement #Production ML

Practice

Data Scientist • Coding • medium

Write a SQL query using window functions to calculate the 7-day rolling average of sales for each product category.

#SQL #Window Functions #Data Aggregation

Practice

Data Scientist • Coding • easy

Write a Python function using Pandas to merge two datasets on a common key, and explain how you would handle missing values in the resulting DataFrame.

#Pandas #Data Manipulation #Data Cleaning

Practice

Data Scientist • Coding • medium

Write a SQL query to find the second highest salary by department without using the LIMIT keyword.

#SQL #Subqueries #Window Functions

Practice

Data Scientist • Coding • medium

Given a list of strings, write a Python program to group anagrams together.

#Python #Hash Maps #String Manipulation

Practice

Data Scientist • Coding • medium

Write a Python script to scrape data from a paginated REST API, handle rate limits, and store the results in a SQL database.

#API Integration #Data Engineering #Python

Practice

Data Scientist • Coding • hard

Implement a Python function to calculate the TF-IDF scores for a given corpus of documents from scratch (without using scikit-learn).

#Python #NLP #Math Implementation

Practice

Data Scientist • Coding • hard

Write a SQL query to find users who have logged into an application on 3 consecutive days.

#SQL #Advanced Window Functions #Date Manipulation

Practice

Data Scientist • Coding • medium

Write a Python function to find the longest palindromic substring in a given string.

#Python #Dynamic Programming #String Manipulation

Practice

Data Scientist • System Design • hard

Design an end-to-end architecture for deploying a churn prediction model on Azure for a telecommunications client.

#Azure #MLOps #Model Deployment

Practice

Data Scientist • System Design • hard

Design a recommendation engine for an e-commerce client. What data would you need, and what algorithms would you use?

#Recommendation Systems #Collaborative Filtering #System Architecture

Practice

Data Scientist • System Design • medium

How would you design a system to automatically classify and route incoming customer support emails using NLP?

#NLP #Text Classification #System Architecture

Practice

Data Scientist • System Design • hard

Design a fraud detection system for real-time credit card transactions. Focus on the latency requirements and feature store architecture.

#Real-time Processing #Fraud Detection #Feature Store #Streaming

Practice

Data Scientist • Technical • easy

Explain the difference between Bagging and Boosting. Give an example of an algorithm for each.

#Ensemble Methods #Random Forest #XGBoost

Practice

Data Scientist • Technical • medium

How do you handle highly imbalanced datasets in a fraud detection project? What metrics would you use to evaluate your model?

#Imbalanced Data #SMOTE #Evaluation Metrics

Practice

Data Scientist • Technical • medium

Explain the architecture of a Retrieval-Augmented Generation (RAG) system. Why is it preferred over fine-tuning for certain enterprise use cases?

#NLP #LLMs #RAG #Vector Databases

Practice

Data Scientist • Technical • medium

What is the curse of dimensionality, and how does Principal Component Analysis (PCA) help mitigate it?

#Dimensionality Reduction #PCA #Feature Engineering

Practice

Data Scientist • Technical • medium

How do you evaluate a clustering model when ground truth labels are not available?

#Unsupervised Learning #Clustering #Evaluation Metrics

Practice

Data Scientist • Technical • easy

What are the key differences between L1 (Lasso) and L2 (Ridge) regularization? When would you use one over the other?

#Regularization #Linear Models #Feature Selection

Practice

Data Scientist • Technical • hard

How do you detect and handle data drift in a production machine learning model?

#Model Monitoring #Data Drift #Production ML

Practice

Data Scientist • Technical • hard

How does the self-attention mechanism work in Transformer models?

#NLP #Transformers #Attention Mechanism

Practice

Data Scientist • Technical • easy

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER() with a practical example.

#SQL #Window Functions

Practice

Data Scientist • Technical • medium

Why would you choose XGBoost over a Random Forest for a tabular dataset?

#XGBoost #Random Forest #Model Selection

Practice

Data Scientist • Technical • hard

What are the trade-offs between fine-tuning an open-source LLM (like Llama 3) versus using a prompt-engineered proprietary API (like OpenAI GPT-4)?

#LLMs #Fine-tuning #Prompt Engineering #Cloud Architecture

Practice

Data Scientist • Technical • medium

Explain the ROC curve and AUC. When would you use Precision-Recall AUC instead of ROC-AUC?

#Evaluation Metrics #Classification

Practice

Data Scientist • Technical • easy

What is the difference between batch inference and real-time inference? Give a Capgemini-style consulting use case for each.

#Model Deployment #Inference #Architecture

Practice

Data Scientist • Technical • medium

What is A/B testing, and how do you determine the required sample size for an experiment?

#A/B Testing #Hypothesis Testing #Statistical Significance

Practice

Data Scientist • Technical • medium

How does a Support Vector Machine (SVM) handle non-linear data?

#SVM #Kernel Trick #Math

Practice

Data Scientist • Technical • medium

What is target leakage in machine learning, and how do you prevent it during feature engineering?

#Data Leakage #Feature Engineering #Model Validation

Practice

Data Scientist • Technical • medium

Explain the concept of Word2Vec. What is the difference between the CBOW and Skip-gram architectures?

#NLP #Word Embeddings #Word2Vec

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now