Infosys

Infosys

Global leader in next-generation digital services and consulting.

3 Rounds ~14 Days Medium
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

Data Scientist Behavioral easy

Can you explain p-value and confidence intervals to a non-technical business stakeholder?

#Statistics #Stakeholder Management #Hypothesis Testing
Data Scientist Behavioral medium

Tell me about a time you had to push back on a client's unrealistic expectations regarding a machine learning model's accuracy.

#Stakeholder Management #Communication #Expectation Setting
Data Scientist Behavioral medium

Describe a situation where your model performed well in training but failed in production. How did you troubleshoot and fix it?

#Debugging #Production Issues #Overfitting #Data Leakage
Data Scientist Behavioral medium

Infosys often works with legacy systems. How would you approach extracting, cleaning, and modeling data from an outdated, poorly documented mainframe system?

#Legacy Systems #Data Cleaning #Adaptability #Consulting
Data Scientist Behavioral easy

Tell me about a time you had to learn a completely new technology stack or framework within a few weeks to deliver a client project.

#Adaptability #Continuous Learning #Agile
Data Scientist Behavioral medium

A client wants to use an expensive GenAI solution to solve a business problem, but you realize a simple rule-based system or basic regression would be more effective and cheaper. How do you convince them?

#Consulting #Integrity #Communication #Cost-Benefit Analysis
Data Scientist Coding easy

Write a Python function using Pandas to find the second highest salary from an employee dataset, handling cases where multiple employees might have the same salary.

#Python #Pandas #Data Cleaning
Data Scientist Coding medium

Given a list of unstructured client transaction strings, write a Python function using regex to extract the transaction ID, date, and amount, and return them as a structured dictionary.

#Python #Regex #String Parsing
Data Scientist Coding medium

Write a SQL query to find the top 3 products by revenue in each region. The table contains product_id, region, and revenue.

#Window Functions #RANK() #DENSE_RANK() #Aggregation
Data Scientist Coding hard

Given a table of user logins (user_id, login_date), write a SQL query to find users who logged in on 3 consecutive days.

#Self Joins #Window Functions #Date/Time Functions
Data Scientist Coding medium

Write a SQL query to calculate the month-over-month growth rate of active users from a daily activity log.

#Window Functions #CTEs #Aggregation
Data Scientist Coding easy

Write a Python script to perform a cross-validation on a dataset using Scikit-Learn, and explain why cross-validation is preferred over a simple train-test split.

#Python #Scikit-Learn #Cross-Validation
Data Scientist System Design hard

A retail client wants to forecast inventory demand across 500 stores. How do you approach building a scalable time-series forecasting model?

#ARIMA #Prophet #Forecasting #Scalability
Data Scientist System Design medium

How would you design an NLP pipeline to automatically categorize incoming IT support tickets into different resolution queues?

#Text Classification #TF-IDF #Transformers #Pipeline Design
Data Scientist System Design hard

A client wants to implement a Retrieval-Augmented Generation (RAG) system for their internal HR documents. Walk me through the architecture.

#RAG #LLMs #Vector Databases #Embeddings
Data Scientist System Design hard

Design a recommendation system for an e-commerce client to suggest products based on user browsing history and past purchases.

#Collaborative Filtering #Content-Based Filtering #Matrix Factorization
Data Scientist System Design hard

Design a system to predict equipment failure in a manufacturing plant using IoT sensor data. How do you handle the high-frequency streaming data?

#IoT #Streaming Data #Predictive Maintenance #Kafka
Data Scientist Technical medium

How would you handle a dataset with 50 million rows in Python if it exceeds your available RAM during a client engagement?

#Memory Management #Dask #Chunking #PySpark
Data Scientist Technical medium

Explain the difference between Random Forest and Gradient Boosting. Which one would you prefer for predicting client churn for a telecom client and why?

#Ensemble Methods #Bagging #Boosting #Classification
Data Scientist Technical medium

How do you handle highly imbalanced datasets in a fraud detection model for a banking client? What metrics would you use?

#Imbalanced Data #SMOTE #Precision-Recall #F1-Score
Data Scientist Technical medium

What is the curse of dimensionality, and how do you address it when working with high-dimensional enterprise data?

#PCA #Feature Selection #Dimensionality Reduction
Data Scientist Technical hard

Explain the mathematical intuition behind Support Vector Machines (SVM). What is the kernel trick and when do you use it?

#SVM #Mathematics #Kernels
Data Scientist Technical medium

Explain L1 and L2 regularization. When would you use Lasso over Ridge regression in a predictive maintenance model?

#Regularization #Regression #Feature Selection
Data Scientist Technical hard

How do you detect and handle data drift in a machine learning model deployed in production for a financial client?

#Data Drift #Model Monitoring #Evidently AI
Data Scientist Technical medium

What is the difference between K-Means and Hierarchical clustering? How do you determine the optimal number of clusters?

#Clustering #Unsupervised Learning #Elbow Method #Silhouette Score
Data Scientist Technical hard

Explain the architecture of a Transformer model. Why has it largely replaced RNNs and LSTMs in modern NLP tasks?

#Transformers #Attention Mechanism #NLP
Data Scientist Technical hard

How do you fine-tune an open-source LLM like Llama-3 for a specific enterprise use case while minimizing compute costs?

#LoRA #PEFT #Fine-tuning #Quantization
Data Scientist Technical medium

What are word embeddings? Compare traditional embeddings like Word2Vec with contextual embeddings like BERT.

#Embeddings #Word2Vec #BERT
Data Scientist Technical medium

How do you handle vanishing and exploding gradients in deep neural networks?

#Neural Networks #Optimization #Activation Functions
Data Scientist Technical easy

Explain the difference between a Star schema and a Snowflake schema in data warehousing.

#Data Modeling #Warehousing #Schema Design
Data Scientist Technical medium

How do you optimize a slow-running SQL query that joins multiple large tables in a client's database?

#Query Optimization #Indexing #Execution Plan
Data Scientist Technical medium

Walk me through how you would deploy a Scikit-learn model as a REST API using FastAPI and Dockerize it for deployment on Azure.

#Model Deployment #FastAPI #Docker #Azure
Data Scientist Technical medium

How do you ensure data privacy and compliance, such as GDPR, when building predictive models using sensitive customer data?

#Data Privacy #GDPR #Anonymization #PII
Data Scientist Technical medium

Explain the concept of Continuous Integration and Continuous Deployment (CI/CD) specifically in the context of Machine Learning (CT/CD).

#CI/CD #Automation #ML Pipelines #Continuous Training
Data Scientist Technical easy

What is the ROC curve and AUC? How would you explain an AUC of 0.5 to a project manager?

#Evaluation Metrics #ROC-AUC #Classification

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now