Nvidia

Hardware and AI software leader powering the global generative AI revolution.

4 Rounds ~25 Days Very Hard

Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

All Roles Cloud Engineer 3 Data Engineer 8 Data Scientist 8 Machine Learning Engineer 9 Software Engineer 8

All Topics System Design 41 Algorithms 36 Culture Fit 26 Deep Learning 18 SQL 14 Product Strategy 8 Big Data Frameworks 6 Machine Learning 6

Cloud Engineer • Coding • hard

Implement a concurrent job scheduler in Go that limits the number of active workers to N. Jobs have different priorities and dependencies. Ensure that high-priority jobs are executed first and dependencies are respected.

#Concurrency #Go #Graph Algorithms

Practice

Cloud Engineer • Coding • medium

Write a script to parse a large distributed system log file (e.g., 50GB) to find all instances of a specific OOM (Out of Memory) error, group them by node ID, and output the top 5 nodes with the most errors. Optimize for memory usage.

#File I/O #Data Structures #Scripting

Practice

Cloud Engineer • Coding • medium

Design and implement a thread-safe token bucket rate limiter in Python or Go. How would you scale this across multiple distributed API servers handling requests for Nvidia's NGC container registry?

#Concurrency #Distributed Systems #Python/Go

Practice

Data Engineer • Coding • medium

Given a massive log file containing billions of error codes, write a Python program to find the top K most frequent error codes. The file is too large to fit in memory.

#Python #Heaps #External Sorting #Generators

Practice

Data Engineer • Coding • hard

Implement an LRU (Least Recently Used) Cache in Python. This is often used to cache database lookups in our ingestion layer.

#Python #Data Structures #Hash Maps #Linked Lists

Practice

Data Engineer • Coding • medium

Write a Python function to implement a Rate Limiter using the Token Bucket algorithm. This is used to throttle API requests to our internal data services.

#Python #System Design Concepts #Concurrency

Practice

Data Engineer • Coding • hard

Given a list of task dependencies (e.g., Task A must finish before Task B), write a Python function to determine a valid execution order for the tasks. If there is a circular dependency, return an error.

#Graphs #Topological Sort #Python

Practice

Data Engineer • Coding • medium

Design and implement a Least Recently Used (LRU) cache in Python. This is often used in our data access layers to cache frequently queried model metadata.

#Data Structures #Hash Map #Doubly Linked List

Practice

Data Engineer • Coding • medium

Given a massive log file of error codes generated by our DGX systems that cannot fit into memory, write a Python script to find the top K most frequent error codes.

#Python #Heaps #File I/O #Memory Management

Practice

Data Engineer • Coding • medium

Given a list of intervals representing GPU job execution times (start_time, end_time), write a Python function to merge all overlapping intervals.

#Python #Arrays #Sorting

Practice

Data Engineer • Coding • medium

Given an array of GPU job execution intervals where intervals[i] = [start_i, end_i], merge all overlapping intervals and return an array of the non-overlapping intervals that cover all the jobs.

#Arrays #Sorting #Python

Practice

Data Scientist • Coding • easy

Given an array of integers representing GPU memory allocations in MB, find the indices of two allocations that sum up exactly to a specific target memory limit.

#Hash Maps #Arrays

Practice

Data Scientist • Coding • medium

Given a string, write a function to find the length of the longest substring without repeating characters.

#Strings #Sliding Window #Hash Map

Practice

Data Scientist • Coding • hard

Given a Directed Acyclic Graph (DAG) representing dependencies of CUDA kernels, write a function to find the critical path (the path with the longest total execution time).

#Graphs #Dynamic Programming #Topological Sort

Practice

Data Scientist • Coding • hard

Write an algorithm to schedule a computational Directed Acyclic Graph (DAG) representing neural network layers across multiple GPUs to minimize cross-device communication overhead.

#Graphs #Topological Sort #Dynamic Programming

Practice

Data Scientist • Coding • medium

Given an M x N matrix representing a batch of images, write a function to perform a 2D convolution with a given K x K kernel without using external libraries like SciPy or PyTorch.

#Arrays #Matrix Manipulation #Computer Vision

Practice

Data Scientist • Coding • medium

Write a Python function to simulate a Monte Carlo estimation of Pi. Then, explain and write the vectorized version using NumPy or CuPy.

#Simulation #Vectorization #Math

Practice

Data Scientist • Coding • medium

Implement a Trie (Prefix Tree) data structure to efficiently store and search through millions of generated text tokens from an LLM.

#Trees #Trie #Strings

Practice

Data Scientist • Coding • medium

Implement a sliding window algorithm to find the maximum GPU temperature over a rolling 5-minute window given a continuous stream of timestamped telemetry data.

#Sliding Window #Queues #Time Series

Practice

Machine Learning Engineer • Coding • medium

Find the Kth largest element in an unsorted array. Optimize for average time complexity.

#QuickSelect #Heap #Sorting

Practice

Machine Learning Engineer • Coding • medium

Find the Lowest Common Ancestor (LCA) of two nodes in a Binary Tree.

#Trees #Recursion #DFS

Practice

Machine Learning Engineer • Coding • medium

Implement a sparse matrix multiplication algorithm. Assume the matrices are too large to fit into memory in a dense format.

#Arrays #Math #Data Structures

Practice

Machine Learning Engineer • Coding • hard

Given an array of k linked-lists, each linked-list is sorted in ascending order. Merge all the linked-lists into one sorted linked-list and return it.

#Linked Lists #Heaps #Divide and Conquer

Practice

Machine Learning Engineer • Coding • medium

Given a Directed Acyclic Graph (DAG) representing a neural network computation graph, write an algorithm to find the longest path (critical path) from the input node to the output node.

#Graphs #Dynamic Programming #Topological Sort

Practice

Machine Learning Engineer • Coding • medium

Implement an autocomplete system using a Trie data structure. Include methods to insert a word and return all words that start with a given prefix.

#Trees #Tries #Strings

Practice

Machine Learning Engineer • Coding • hard

Write a function to perform Matrix Multiplication. Optimize it for cache locality using tiling/blocking.

#Matrix Operations #Cache Optimization #C++

Practice

Machine Learning Engineer • Coding • medium

Given a 2D grid map of '1's (land) and '0's (water), count the number of islands. (Context: Autonomous Vehicle occupancy grid analysis).

#Graph Theory #DFS #BFS

Practice

Machine Learning Engineer • Coding • hard

Merge K sorted linked lists into one sorted linked list.

#Linked Lists #Divide and Conquer #Heap

Practice

Software Engineer • Coding • easy

Find the maximum subarray sum (Kadane's Algorithm).

#Arrays #Dynamic Programming

Practice

Software Engineer • Coding • medium

Design and implement an LRU (Least Recently Used) cache in C++.

#Hash Map #Doubly Linked List #C++

Practice

Software Engineer • Coding • easy

Given an integer, write a function to determine if it is a power of two using bitwise operators.

#Bit Manipulation #Math

Practice

Software Engineer • Coding • hard

You have K sorted streams of telemetry data coming from different sensors. Write an algorithm to merge them into a single sorted stream in real-time.

#Heap #Priority Queue #Linked List

Practice

Software Engineer • Coding • medium

Given an array of integers containing n + 1 integers where each integer is in the range [1, n] inclusive, find the one repeated number without modifying the array and using only O(1) extra space.

#Two Pointers #Array

Practice

Software Engineer • Coding • medium

Write a function to multiply two dense matrices. Then, optimize it for CPU cache locality.

#Arrays #Math #Cache Optimization

Practice

Software Engineer • Coding • hard

Merge K sorted linked lists.

#Heaps #Linked Lists #Divide and Conquer

Practice

Software Engineer • Coding • medium

Given an array of integers, return the indices of the two numbers that add up to a specific target. How would you optimize this for a highly parallel architecture?

#Parallel Computing #Hash Maps #Arrays

Practice

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now