Google

Google

Leading technology company specializing in search, cloud, and AI.

4 Rounds ~21 Days Very Hard
Start Mock Interview

The Interview Loop

Recruiter Screen (30 min)

Standard fit check, behavioral questions, and resume overview.

Technical Loop (3-4 Rounds)

Deep dive into domain knowledge, coding, and system design.

Interview Question Bank

AI Engineer Behavioral hard

Tell me about an AI project where you had to balance innovation with reliability.

#Reliability #Innovation
AI Engineer Behavioral medium

How do you handle stakeholder uncertainty around AI capabilities and limitations?

#Stakeholders #Expectations
AI Engineer Behavioral medium

Tell me about a time you optimized an LLM application for cost or latency.

#Cost #Latency
AI Engineer Behavioral medium

Describe a time you had to choose between using an AI model and a simpler rule-based system.

#Tradeoffs #Pragmatism
AI Engineer Behavioral easy

How do you stay current with the fast-moving AI/ML research landscape?

#Research #Continuous Learning
AI Engineer Behavioral hard

Tell me about a time an AI system you built produced unexpected or harmful outputs.

#Responsibility #Ethics
AI Engineer Behavioral hard

Describe an AI product you built from scratch. What were the key technical decisions?

#Product Development
AI Engineer Behavioral hard

Describe a situation where you had to debug a hard-to-reproduce AI model failure.

#Problem Solving
AI Engineer Coding hard

Implement a semantic chunking strategy for long documents.

#Chunking #Embeddings
AI Engineer Coding medium

Write a Python class to manage conversation history for a multi-turn chatbot.

#Chatbot #Memory
AI Engineer Coding hard

Implement a simple RAG pipeline using Python, LangChain, and FAISS.

#RAG #Python
AI Engineer Coding medium

Write a retry mechanism with exponential backoff for LLM API calls.

#Reliability #APIs
AI Engineer System Design hard

How would you architect an AI platform that supports 1000 concurrent LLM requests?

#Scaling #LLM Serving
AI Engineer System Design hard

Design an AI-powered customer support chatbot for an e-commerce platform.

#Chatbot #LLM
AI Engineer System Design hard

Design a document question-answering system using RAG.

#RAG #Vector Search
AI Engineer System Design hard

Design an AI code review system that integrates with GitHub PRs.

#Code Review #LLM
AI Engineer System Design hard

How would you build a multi-modal AI system that processes both text and images?

#Multi-Modal #Vision
AI Engineer System Design hard

Design a real-time AI safety filter for user-generated content.

#Content Moderation #Real-Time
AI Engineer System Design hard

Design an AI agent system that can autonomously browse the web and complete tasks.

#Agents #Tool Use
AI Engineer Technical medium

How do you choose the right embedding model for a domain-specific search task?

#Embedding Models #Search
AI Engineer Technical hard

Explain positional encoding in transformers. What are the differences between absolute and rotary position embeddings?

#Positional Encoding #RoPE
AI Engineer Technical hard

What is hallucination in LLMs? How do you detect and mitigate it?

#Hallucination #Safety
AI Engineer Technical medium

Explain the difference between autoregressive and masked language modeling.

#Autoregressive #Masked LM
AI Engineer Technical hard

What is a mixture of experts (MoE) architecture? How does it scale?

#MoE #Scaling
AI Engineer Technical hard

What is Constitutional AI? How does Anthropic use it?

#Constitutional AI #Anthropic
AI Engineer Technical hard

How do you red-team an AI system?

#Red Teaming #Security
AI Engineer Technical medium

What are guardrails in LLM applications? How do they work?

#Guardrails #Output Filtering
AI Engineer Technical medium

How do you integrate OpenAI API or Gemini API into a production application?

#OpenAI #Gemini
AI Engineer Technical medium

What is LangChain? What are its key components (Chains, Agents, Tools)?

#LangChain #Agents
AI Engineer Technical medium

What is streaming response from an LLM API? How do you implement it in a web app?

#Streaming #API
AI Engineer Technical medium

Explain structured output generation from LLMs (JSON mode, Instructor library).

#Structured Output #JSON
AI Engineer Technical hard

Explain how vector similarity search works. What are HNSW and IVF indices?

#HNSW #Similarity Search
AI Engineer Technical medium

Compare vector databases: Pinecone, Weaviate, Qdrant, and pgvector.

#Vector DB #Embeddings
AI Engineer Technical medium

What is semantic search? How does it differ from keyword-based search?

#Semantic Search #NLP
AI Engineer Technical hard

Explain the difference between dense and sparse retrieval in RAG.

#Dense Retrieval #BM25
AI Engineer Technical hard

How do you evaluate retrieval quality in a RAG system?

#Evaluation #Retrieval
AI Engineer Technical hard

How do you evaluate the quality of an LLM-generated response?

#LLM Evaluation #RAGAS
AI Engineer Technical hard

What is AI alignment? What are the key safety concerns with large-scale AI deployment?

#Alignment #Safety
AI Engineer Technical hard

Explain the concept of AI bias. How do you detect and mitigate it in production?

#Bias #Fairness
AI Engineer Technical medium

How do you manage LLM API rate limits and costs in production?

#Rate Limiting #Cost
AI Engineer Technical hard

Explain function calling / tool use in LLMs. How do you implement it?

#Function Calling #Tool Use
AI Engineer Technical hard

Explain the difference between GPT, BERT, and T5 architectures.

#GPT #BERT #T5
AI Engineer Technical medium

What is prompt engineering? What are few-shot, zero-shot, and chain-of-thought prompting?

#Prompt Engineering #Few-Shot
AI Engineer Technical hard

Explain how RLHF (Reinforcement Learning from Human Feedback) improves LLMs.

#RLHF #Alignment
AI Engineer Technical hard

What is RAG (Retrieval-Augmented Generation)? When would you use it over fine-tuning?

#RAG #Fine-Tuning
AI Engineer Technical medium

Explain the difference between fine-tuning and in-context learning.

#Fine-Tuning #ICL
AI Engineer Technical medium

What is token context window? How do you handle documents longer than the context limit?

#Context Window #Chunking
Cloud Engineer Behavioral medium

Tell me about a time you significantly reduced cloud infrastructure costs.

#FinOps #Impact
Cloud Engineer Behavioral medium

Describe a situation where a critical production system went down, and there was no runbook. How did you handle it?

#Incident Management #Ambiguity #Ownership #SRE
Cloud Engineer Behavioral medium

Tell me about a time you had to work with a difficult stakeholder or team member who strongly disagreed with your technical approach. How did you resolve it?

#Conflict Resolution #Communication #Teamwork #Influence
Cloud Engineer Behavioral medium

Tell me about a time you had to push back on a customer's architectural choice because you knew it would lead to scalability issues down the line.

#Customer Empathy #Communication #Pushback #Consulting
Cloud Engineer Behavioral medium

Describe your experience with incident post-mortems. What do you include?

#Post-Mortem #Learning
Cloud Engineer Behavioral medium

How do you communicate a complex cloud architecture to non-technical stakeholders?

#Stakeholders
Cloud Engineer Behavioral medium

Tell me about a time you improved the reliability of a cloud-based data system.

#SRE #Impact
Cloud Engineer Behavioral medium

Describe a situation where you had to choose between two cloud architectures. How did you decide?

#Architecture #Tradeoffs
Cloud Engineer Behavioral hard

Tell me about a major cloud outage you experienced. How did you respond?

#Outage #On-Call
Cloud Engineer Behavioral hard

Describe a time you migrated a critical workload to the cloud with zero downtime.

#Cloud Migration
Cloud Engineer Behavioral easy

How do you stay updated with new cloud services and features?

#Continuous Learning
Cloud Engineer Coding medium

Write a function to validate if a given string is a valid IPv4 address, and then extend it to check if it belongs to a specific CIDR block.

#String Manipulation #Bitwise Operations #Networking
Cloud Engineer Coding medium

Given a list of log entries with timestamps and error codes, write a function to find the top 3 most frequent error codes within a sliding window of 5 minutes.

#Sliding Window #Hash Map #Queue #Data Structures
Cloud Engineer Coding medium

Write a Python script to find and delete all unattached persistent disks in a GCP project that are older than 30 days to save costs.

#Python #GCP API #Cost Optimization #Scripting
Cloud Engineer System Design hard

How would you set up a streaming data pipeline on GCP using Pub/Sub and Dataflow?

#GCP #Pub/Sub #Dataflow
Cloud Engineer System Design hard

Design a data lake on AWS using S3, Glue, and Athena.

#AWS #S3 #Athena
Cloud Engineer System Design hard

Design a real-time streaming data pipeline on GCP to ingest, process, and analyze millions of IoT sensor events per second.

#Pub/Sub #Dataflow #BigQuery #IoT
Cloud Engineer System Design hard

Design a highly available, globally distributed web application on GCP that handles sudden, massive spikes in traffic (e.g., a viral news site).

#Global Load Balancer #Cloud CDN #Cloud Run #Cloud Spanner
Cloud Engineer System Design hard

A customer wants to migrate a monolithic on-premise application backed by an Oracle database to GCP. Walk me through your migration strategy.

#Cloud Migration #Strangler Fig #Database Migration Service #Bare Metal Solution
Cloud Engineer System Design hard

How do you implement disaster recovery for a cloud data warehouse?

#DR #RTO #RPO
Cloud Engineer System Design hard

How would you architect a data platform that reduces spend by 40% without impacting performance?

#FinOps #Cloud
Cloud Engineer Technical medium

What is the shared responsibility model in cloud security?

#Cloud Security #IAM
Cloud Engineer Technical hard

What is a VPC (Virtual Private Cloud)? How do you design a secure VPC architecture?

#VPC #Security
Cloud Engineer Technical easy

Explain the difference between regions, availability zones, and edge locations.

#Regions #AZs
Cloud Engineer Technical medium

How does auto-scaling work? What are the different scaling strategies?

#Auto-Scaling #EC2
Cloud Engineer Technical medium

What is a cloud-native application? How does it differ from a lifted-and-shifted one?

#Cloud Native #Migration
Cloud Engineer Technical hard

Explain multi-cloud vs hybrid cloud architectures and their tradeoffs.

#Multi-Cloud #Hybrid
Cloud Engineer Technical hard

Explain Kubernetes architecture: control plane, nodes, pods, and services.

#K8s #Containers
Cloud Engineer Technical hard

What is a Kubernetes Operator and when would you build one?

#Operators #CRD
Cloud Engineer Technical hard

How does container networking work in Kubernetes?

#Networking #CNI
Cloud Engineer Technical hard

Explain how you would design a cross-project IAM strategy for a large enterprise using Shared VPCs and least privilege principles.

#IAM #Shared VPC #Security #Resource Hierarchy
Cloud Engineer Technical hard

What happens exactly when you type `ls -l` in a Linux terminal? Go as deep into the OS level as possible.

#Linux #Syscalls #File Systems #Process Management
Cloud Engineer Technical medium

A customer complains that their GKE pods cannot reach an external API. Walk me through your troubleshooting steps.

#GKE #Networking #VPC #Cloud NAT
Cloud Engineer Technical medium

How would you implement a zero-downtime deployment strategy for a microservice running on Cloud Run?

#Cloud Run #CI/CD #Traffic Splitting #SRE
Cloud Engineer Technical medium

Explain the difference between a Readiness probe and a Liveness probe in Kubernetes. What happens if you misconfigure them?

#Kubernetes #GKE #Reliability #Microservices
Cloud Engineer Technical easy

Compare and contrast Cloud Storage, Persistent Disk, and Filestore. Give specific use cases for when you would choose one over the others.

#Storage #GCS #Block Storage #File Storage
Cloud Engineer Technical medium

What is OpenTelemetry? How does it standardize observability?

#OpenTelemetry #Tracing
Cloud Engineer Technical medium

How would you set up CloudWatch dashboards for a data pipeline?

#CloudWatch #AWS
Cloud Engineer Technical medium

Explain the three pillars of observability: logs, metrics, and traces.

#Logs #Metrics #Traces
Cloud Engineer Technical easy

What is a runbook? How do you create effective runbooks for data infrastructure?

#Runbook #On-Call
Cloud Engineer Technical medium

How do you do capacity planning for a cloud data platform?

#Scaling #Planning
Cloud Engineer Technical hard

Explain chaos engineering. How would you implement it for a data pipeline?

#Chaos Engineering #Fault Injection
Cloud Engineer Technical medium

What are SLOs, SLAs, and SLIs? How do you define them for a data platform?

#SLO #Reliability
Cloud Engineer Technical hard

How would you implement network segmentation for a multi-tier application?

#Security #Subnets
Cloud Engineer Technical medium

What is AWS PrivateLink? When would you use it?

#PrivateLink #VPC
Cloud Engineer Technical medium

How do cloud IAM roles and policies work? Explain least-privilege principle.

#IAM #Permissions
Cloud Engineer Technical medium

Explain TLS/SSL termination in a cloud load balancer.

#TLS #Load Balancer
Cloud Engineer Technical hard

What is zero-trust networking? How do you implement it on cloud?

#Zero Trust #Networking
Cloud Engineer Technical medium

How does AWS Glue Data Catalog work with Athena?

#Glue #Athena
Cloud Engineer Technical medium

Explain AWS S3 storage classes and lifecycle policies.

#S3 #Cost
Cloud Engineer Technical hard

What is BigQuery Slots? How do you optimize BigQuery query costs?

#GCP #Cost
Cloud Engineer Technical medium

Explain the difference between AWS Lambda and EC2 for data processing.

#Lambda #Serverless
Cloud Engineer Technical hard

Compare AWS EMR, GCP Dataproc, and Azure HDInsight for Spark workloads.

#EMR #Dataproc #Spark
Cloud Engineer Technical hard

How do you handle Terraform state across multiple teams?

#State Management #Collaboration
Cloud Engineer Technical medium

Explain idempotency in infrastructure provisioning.

#Idempotency #Terraform
Cloud Engineer Technical medium

How do you manage secrets in cloud infrastructure? (HashiCorp Vault, AWS Secrets Manager)

#Secrets Management #Vault
Cloud Engineer Technical medium

What is the difference between Terraform and Pulumi?

#Terraform #Pulumi
Cloud Engineer Technical hard

Explain Terraform's state management. What happens if the state file is corrupted?

#IaC #State
Cloud Engineer Technical medium

How does a Kubernetes Ingress controller work?

#Ingress #Load Balancing
Cloud Engineer Technical medium

Explain the difference between Docker and containerd.

#Docker #containerd
Cloud Engineer Technical hard

How would you set up horizontal pod autoscaling based on custom metrics?

#HPA #Custom Metrics
Cloud Engineer Technical hard

What is a service mesh? Explain how Istio works.

#Istio #Service Mesh
Cloud Engineer Technical medium

Explain Kubernetes resource requests vs limits. What happens if a pod exceeds its memory limit?

#Resources #OOM
Cloud Engineer Technical hard

Compare AWS, GCP, and Azure for a data-intensive workload. What are the key differentiators?

#AWS #GCP #Azure
Cloud Engineer Technical easy

Explain IaaS, PaaS, and SaaS with examples.

#IaaS #PaaS #SaaS
Data Analyst Behavioral medium

Tell me about an analysis that changed a major business decision.

#Business Impact #Influence
Data Analyst Behavioral medium

How do you handle a situation where a stakeholder challenges your analysis?

#Stakeholders #Confidence
Data Analyst Behavioral medium

Describe a time you found an insight that was counterintuitive.

#Curiosity
Data Analyst Behavioral hard

Tell me about a time you had incomplete data but still needed to deliver analysis.

#Ambiguity
Data Analyst Behavioral easy

How do you ensure your analyses are reproducible?

#Reproducibility
Data Analyst Behavioral medium

Tell me about a time you discovered data quality issues mid-analysis. What did you do?

#Problem Solving
Data Analyst Behavioral medium

How do you prioritize analytical requests when multiple teams need you?

#Time Management
Data Analyst Behavioral medium

Describe a dashboard you built that was widely adopted. What made it successful?

#Visualization
Data Analyst Coding hard

Write a SQL query to find customers who made purchases in both January and February but not March.

#Set Operations
Data Analyst Coding easy

Explain how groupby and agg work in pandas with an example.

#Pandas #GroupBy
Data Analyst Coding hard

What is a funnel query? Write one for a 3-step user onboarding flow.

#Funnel Analysis
Data Analyst Coding medium

Explain window functions. Write a query using LAG() to compute day-over-day change.

#Window Functions
Data Analyst Coding hard

Write a SQL query to calculate the rolling 28-day average session duration per user.

#Rolling Average #Sessions
Data Analyst Coding hard

How would you detect anomalies in a daily revenue time series using SQL?

#Anomaly Detection #SQL
Data Analyst Coding medium

What is a pivot table in SQL? How would you implement it without native PIVOT support?

#Pivot #Data Transformation
Data Analyst Coding medium

How would you merge two large DataFrames efficiently in pandas?

#Pandas #Merging
Data Analyst Coding medium

Describe how to detect and handle outliers in a dataset using Python.

#Outliers #Data Cleaning
Data Analyst Coding easy

Write Python code to load a CSV, clean missing values, and compute summary statistics.

#Data Cleaning #Pandas
Data Analyst Coding medium

Write a SQL query to calculate month-over-month revenue growth.

#Revenue #Growth Analytics
Data Analyst Coding hard

How would you build a cohort analysis for user retention in SQL?

#Cohort Analysis #Retention
Data Analyst Coding medium

How would you use pandas to compute a 7-day rolling average of sessions?

#Pandas #Time Series
Data Analyst Technical medium

Describe your process for creating an executive-level analytics presentation.

#Executive Reporting
Data Analyst Technical easy

How do you choose between a bar chart, line chart, and scatter plot?

#Charts #Design
Data Analyst Technical easy

Explain the difference between a HAVING clause and a WHERE clause.

#SQL Basics
Data Analyst Technical medium

How do you handle timezone conversions in SQL analytics?

#Timezones #Analytics
Data Analyst Technical hard

Daily Active Users dropped 15% yesterday. Walk me through how you'd investigate.

#Root Cause Analysis #Metrics
Data Analyst Technical medium

What is customer lifetime value (LTV)? How would you calculate it?

#LTV #Retention
Data Analyst Technical easy

Explain the difference between DAU, WAU, and MAU. Which is most useful and when?

#Engagement #KPIs
Data Analyst Technical medium

How would you measure the success of a new feature launch?

#Feature Success #Metrics
Data Analyst Technical easy

What is ARPU (Average Revenue Per User)? How do you segment ARPU analysis?

#ARPU #Revenue
Data Analyst Technical hard

Explain the concept of attribution modeling. What are last-click vs multi-touch models?

#Marketing Analytics
Data Analyst Technical medium

How would you build a dashboard to monitor e-commerce funnel health?

#Visualization #Funnel
Data Analyst Technical hard

What metrics would you use to measure the health of a marketplace?

#Marketplace #Supply & Demand
Data Analyst Technical easy

What is net promoter score (NPS)? How do you analyse NPS trends?

#NPS #Customer Satisfaction
Data Analyst Technical hard

How would you measure the impact of a pricing change on revenue?

#Pricing #A/B Test
Data Analyst Technical hard

Explain how you'd set up an A/B test to validate a new checkout flow.

#A/B Testing #Statistics
Data Analyst Technical hard

What sample size do you need for an A/B test? How do you calculate it?

#Sample Size #Power
Data Analyst Technical hard

A/B test shows p=0.04, but the effect size is tiny. Would you ship?

#Practical Significance #Decision Making
Data Analyst Technical medium

What is a novelty effect in experimentation? How do you account for it?

#Novelty Effect #Bias
Data Analyst Technical hard

How do you handle multiple metrics in an A/B test (metric tradeoffs)?

#Multiple Metrics #Tradeoffs
Data Analyst Technical medium

What makes a good data visualization? Walk me through your design principles.

#Design #Communication
Data Analyst Technical medium

How would you explain statistical significance to a non-technical product manager?

#Storytelling #Statistics
Data Analyst Technical easy

What tools do you use for dashboarding? Compare Tableau, Looker, and Metabase.

#Tableau #Looker
Data Engineer Behavioral medium

Describe a situation where you disagreed with a senior engineer or product manager on a technical design choice. How did you resolve it?

#Conflict Resolution #Collaboration #Googleyness
Data Engineer Behavioral medium

Tell me about a time you simplified a complex data platform decision across multiple teams.

#Communication #Stakeholders
Data Engineer Behavioral medium

Describe a situation where a data pipeline you owned went down in production. How did you handle it?

#On-Call #Problem Solving
Data Engineer Behavioral medium

How do you handle disagreements with data analysts or scientists who want features that compromise pipeline reliability?

#Conflict Resolution
Data Engineer Behavioral medium

Tell me about a time you significantly improved the performance of a data system.

#Performance #Optimization
Data Engineer Behavioral hard

Describe how you've balanced technical debt vs. new feature development in a data platform.

#Prioritization
Data Engineer Behavioral medium

Tell me about a time you onboarded a new data source that had significant quality issues.

#Problem Solving
Data Engineer Behavioral easy

Describe your experience mentoring junior data engineers.

#Mentoring #Collaboration
Data Engineer Behavioral easy

How do you stay current with rapidly evolving data engineering tools and practices?

#Growth Mindset
Data Engineer Behavioral medium

Tell me about a time you had to design a data pipeline with highly ambiguous requirements. How did you figure out what to build?

#Ambiguity #Googleyness #Communication
Data Engineer Behavioral medium

Tell me about a time a data pipeline you owned failed in production. What was the business impact, and what steps did you take to fix it and prevent it from happening again?

#Incident Management #Ownership #Post-mortem
Data Engineer Coding medium

Given a list of user session time intervals on YouTube represented as [start_time, end_time], write a Python function to merge all overlapping sessions and return the consolidated active viewing periods.

#Arrays #Sorting #Intervals
Data Engineer Coding hard

Implement a rate limiter in Python for an API using a sliding window approach. The rate limiter should allow a maximum of N requests per minute per user.

#Queues #Sliding Window #Concurrency
Data Engineer Coding medium

Write a SQL query to calculate the rolling 7-day average of daily video views per category on YouTube, ensuring days with zero views are still accounted for in the average.

#Window Functions #Aggregation #Rolling Averages
Data Engineer Coding medium

Write a Python function to parse a massive (100GB+) log file of Google Search queries and return the top K most frequent IP addresses. You have limited RAM.

#Heaps #Hash Maps #Log Parsing #Big O
Data Engineer Coding medium

Write a SQL query to find the top 3 highest-grossing apps in each region, but only include regions that have at least 100 active apps.

#CTEs #Window Functions #Filtering
Data Engineer Coding medium

Write a SQL query to find the second highest salary per department.

#Window Functions #SQL
Data Engineer Coding medium

Write a SQL query to compute a 7-day rolling average of daily sales.

#Window Functions #Analytics
Data Engineer Coding hard

Write a SQL query to find the longest streak of consecutive days a user has logged into Google Workspace. The input table has user_id and login_date.

#Window Functions #Gaps and Islands #CTEs
Data Engineer System Design hard

Design the data warehouse architecture for Google Play Store analytics. Stakeholders need daily reports on app downloads, revenue, and crash rates by region and device type.

#Data Warehousing #BigQuery #Schema Design #ETL
Data Engineer System Design hard

Design a data pipeline for Google Search query logs at 100K QPS.

#Scale #Google
Data Engineer System Design hard

Design a data model for an e-commerce platform tracking orders, users, and products.

#ER Modeling #Dimensional Modeling
Data Engineer System Design hard

Design an ETL pipeline that ingests 10TB of raw clickstream data daily.

#ETL #Batch Processing
Data Engineer System Design hard

How would you design a data pipeline that needs exactly-once delivery guarantees?

#Exactly-Once #Kafka
Data Engineer System Design hard

How would you design a real-time anomaly detection pipeline for 100K events/sec?

#Real-Time #Anomaly Detection
Data Engineer System Design hard

Design a real-time streaming data pipeline to detect click fraud in Google Ads. How would you ingest, process, and store the data to flag fraudulent clicks within seconds?

#Streaming #Pub/Sub #Dataflow #Fraud Detection
Data Engineer System Design hard

How would you design a data warehouse for a ride-sharing company from scratch?

#Architecture #Design
Data Engineer System Design hard

Design a batch processing pipeline to update Google Maps ETA prediction models based on daily historical traffic data. The data volume is petabytes per day.

#Batch Processing #MapReduce #DAGs #Orchestration
Data Engineer Technical medium

Explain the difference between S3, HDFS, and GCS for data storage.

#S3 #HDFS #GCS
Data Engineer Technical medium

Explain the concept of a data lakehouse. What are its advantages over a traditional data warehouse?

#Data Lakehouse #Data Warehouse
Data Engineer Technical hard

How do you handle schema evolution in a data pipeline without breaking downstream consumers?

#Schema Evolution #Backward Compatibility
Data Engineer Technical medium

What is a medallion architecture (Bronze/Silver/Gold)?

#Medallion #Data Lake
Data Engineer Technical medium

How do you implement data quality checks in a production pipeline?

#Great Expectations #Data Validation
Data Engineer Technical medium

What is data lineage and why is it important? How do you implement it?

#Lineage #Metadata
Data Engineer Technical hard

How would you detect and handle data drift in a production system?

#Data Drift #Monitoring
Data Engineer Technical medium

What is PII (Personally Identifiable Information) and how do you handle it in a data pipeline?

#PII #Privacy #Compliance
Data Engineer Technical medium

Explain the concept of a data catalog. What tools have you used?

#Data Catalog #Metadata
Data Engineer Technical hard

Compare AWS Redshift, Google BigQuery, and Snowflake for a petabyte-scale warehouse.

#Redshift #BigQuery #Snowflake
Data Engineer Technical hard

How does BigQuery handle large joins efficiently? What is its columnar storage approach?

#BigQuery #Columnar Storage
Data Engineer Technical medium

How would you reduce costs in a cloud-based data platform?

#Cloud #Cost
Data Engineer Technical medium

What is infrastructure as code (IaC)? Have you used Terraform for data infrastructure?

#Terraform #IaC
Data Engineer Technical hard

What is Data Vault methodology? How does it differ from Kimball?

#Data Vault #Kimball
Data Engineer Technical hard

You have a PySpark job running on Dataproc that joins a massive user table with a smaller transaction table. The job is taking hours and failing with OOM errors due to data skew. How do you optimize it?

#Spark #Data Skew #Salting #Performance Tuning
Data Engineer Technical medium

Explain the difference between partitioning and clustering in BigQuery. When would you use one over the other, and when would you use both?

#BigQuery #Partitioning #Clustering #Optimization
Data Engineer Technical medium

In Apache Beam or Google Cloud Dataflow, how do you handle late-arriving data in a windowed streaming pipeline?

#Streaming #Watermarks #Late Data #Apache Beam
Data Engineer Technical medium

What is the star schema vs snowflake schema? When would you use each?

#Star Schema #Snowflake Schema
Data Engineer Technical medium

Explain compaction in Delta Lake / Iceberg. Why is it important?

#Compaction #Performance
Data Engineer Technical hard

What is Delta Lake? How does it provide ACID transactions on data lakes?

#Delta Lake #ACID #Time Travel
Data Engineer Technical medium

Explain how Parquet and ORC file formats work and when you'd use each.

#Parquet #ORC #Columnar
Data Engineer Technical medium

What is the CAP theorem? Give an example of a real-world system tradeoff.

#CAP #Consistency #Availability
Data Engineer Technical medium

How does Kafka handle message ordering guarantees?

#Ordering #Partitions
Data Engineer Technical medium

What is Apache Kafka? Explain topics, partitions, consumer groups, and offsets.

#Kafka #Streaming
Data Engineer Technical hard

Explain the difference between map-side and reduce-side joins in MapReduce/Spark.

#Joins #MapReduce
Data Engineer Technical hard

What is data skew in Spark? How do you diagnose and fix it?

#Data Skew #Performance
Data Engineer Technical hard

Explain how Apache Spark's execution model works. What is a DAG in Spark?

#Spark #DAG #Distributed Computing
Data Engineer Technical easy

Explain the difference between push-based and pull-based data ingestion.

#Push #Pull #CDC
Data Engineer Technical medium

What is Apache Airflow? How does it differ from Prefect or Dagster?

#Airflow #Prefect #Dagster
Data Engineer Technical medium

How do you monitor data pipeline health in production? What metrics do you track?

#Monitoring #Alerting
Data Engineer Technical medium

Describe how you'd implement circuit breakers in a data pipeline.

#Circuit Breakers #Fault Tolerance
Data Engineer Technical hard

What is backfilling? How do you handle a backfill of 2 years of historical data without impacting production?

#Backfill #Airflow
Data Engineer Technical hard

Explain the Lambda architecture. What are its tradeoffs vs Kappa architecture?

#Lambda #Kappa #Streaming
Data Engineer Technical medium

What is idempotency and why is it critical in data pipelines?

#Idempotency #Data Quality
Data Engineer Technical hard

How do you handle late-arriving data in a streaming pipeline?

#Kafka #Watermarks
Data Engineer Technical medium

Explain ACID properties. Which databases sacrifice ACID for performance and why?

#ACID #Distributed Systems
Data Engineer Technical medium

What are CTEs (Common Table Expressions) and how do they differ from subqueries?

#CTEs #SQL
Data Engineer Technical hard

Describe partitioning strategies in a data warehouse. When would you use range vs hash partitioning?

#Partitioning #Performance
Data Engineer Technical medium

What is a materialized view? How does it differ from a regular view?

#Materialized Views #Performance
Data Engineer Technical medium

Explain the difference between RANK(), DENSE_RANK(), and ROW_NUMBER().

#Window Functions #SQL
Data Engineer Technical hard

How would you optimize a SQL query that is running slowly on a 1 billion row table?

#Query Optimization #Indexing
Data Engineer Technical hard

What is a slowly changing dimension (SCD)? Describe SCD Type 1, 2, and 3 with examples.

#SCD #Dimensional Modeling
Data Engineer Technical medium

Explain the difference between OLAP and OLTP systems. When would you use each?

#OLAP #OLTP #Databases
Data Engineer Technical hard

How would you model and optimize a BigQuery dataset for petabyte-scale ad-click attribution?

#BigQuery #Attribution
Data Engineer Technical hard

Explain how Google Spanner achieves global consistency with TrueTime.

#Spanner #TrueTime
Data Engineer Technical hard

How would you use Dataflow (Apache Beam) for a streaming aggregation job?

#Dataflow #Beam
Data Engineer Technical hard

What is Bigtable? When would you choose it over BigQuery?

#Bigtable #GCP
Data Engineer Technical medium

How do you optimize BigQuery costs for ad-hoc analytical queries?

#Cost #Optimization
Data Scientist Behavioral medium

Describe a project where you had to iterate significantly on your initial approach.

#Iteration #Learning
Data Scientist Behavioral medium

How do you prioritize between multiple data science requests from different teams?

#Stakeholder Management
Data Scientist Behavioral medium

How do you approach ethical considerations in ML model building?

#Fairness #Bias
Data Scientist Behavioral hard

Tell me about a time your model failed in production. What did you learn?

#Production #MLOps
Data Scientist Behavioral hard

Describe a time you used data to challenge a widely held assumption in your organization.

#Influence #Analytics
Data Scientist Behavioral medium

Tell me about a data science project where the results surprised you. What did you do?

#Analytical Thinking
Data Scientist Behavioral medium

Describe how you communicated a complex model result to a non-technical stakeholder.

#Storytelling
Data Scientist Behavioral hard

Tell me about a time you had to push back on a business request for an analysis that would be misleading.

#Ethics #Communication
Data Scientist Coding hard

Write a SQL query to calculate 30-day user retention.

#Retention #Analytics
Data Scientist Coding hard

How would you write a funnel analysis query in SQL?

#Funnel #Analytics
Data Scientist Coding medium

Write a query to identify duplicate records and deduplicate them.

#Deduplication #Data Quality
Data Scientist System Design hard

Design a feature store. What are its key components?

#Feature Store #MLOps
Data Scientist System Design hard

How would you build and deploy a churn prediction model?

#Churn #MLOps
Data Scientist System Design hard

Design a real-time fraud detection system for a payments platform.

#Fraud Detection #Real-Time ML
Data Scientist System Design hard

How would you build a recommendation system? Compare collaborative vs content-based filtering.

#Collaborative Filtering #Content-Based
Data Scientist Technical medium

How do you choose a north star metric for a product?

#Metrics #Product Strategy
Data Scientist Technical medium

How do you handle class imbalance in a classification problem?

#Imbalanced Data #SMOTE
Data Scientist Technical medium

What is regularization? Explain L1 vs L2 regularization and their effects.

#Regularization #L1 #L2
Data Scientist Technical medium

How does a Random Forest work? What are its hyperparameters and how do you tune them?

#Random Forest #Hyperparameter Tuning
Data Scientist Technical hard

Explain gradient boosting. How does XGBoost differ from a standard gradient boosting machine?

#Gradient Boosting #XGBoost
Data Scientist Technical medium

How would you detect and handle multicollinearity in a regression model?

#Multicollinearity #Regression
Data Scientist Technical hard

Explain the curse of dimensionality and its implications for ML models.

#Dimensionality #Feature Engineering
Data Scientist Technical medium

What is a confidence interval? How does it differ from a prediction interval?

#Confidence Interval #Intervals
Data Scientist Technical hard

Explain Bayesian vs Frequentist statistics. When would you use each?

#Bayesian #Frequentist
Data Scientist Technical hard

How do you evaluate the quality of a search ranking change at Google's scale?

#Search Ranking #Evaluation
Data Scientist Technical hard

How do you design an A/B test for a new product feature?

#A/B Testing #Statistics
Data Scientist Technical easy

What is the difference between Type I and Type II errors?

#Hypothesis Testing #Errors
Data Scientist Technical medium

Explain the central limit theorem and its importance in data science.

#CLT #Sampling
Data Scientist Technical medium

What is a p-value? Why is a p-value of 0.05 not always sufficient?

#Hypothesis Testing #p-value
Data Scientist Technical hard

Explain how Google's NDCG metric works for search relevance.

#NDCG #Relevance
Data Scientist Technical medium

Explain the bias-variance tradeoff. How does it influence model selection?

#Bias-Variance #Model Selection
Data Scientist Technical hard

What statistical techniques would you use to analyse Search CTR experiments?

#CTR #Statistics
Data Scientist Technical hard

What is the multiple testing problem? How do you correct for it?

#Bonferroni #FDR
Data Scientist Technical medium

Explain the ROC curve and AUC metric. When would you prefer AUC over accuracy?

#ROC #AUC #Metrics
Data Scientist Technical hard

What is a network effect in experimentation? How do you handle SUTVA violation?

#SUTVA #Network Effects
Data Scientist Technical hard

How would you design an experiment to measure the impact of a new ranking algorithm?

#Experimentation #Metrics
Data Scientist Technical medium

How would you detect and mitigate overfitting in a neural network?

#Overfitting #Dropout #Regularization
Data Scientist Technical medium

Explain batch normalization and why it helps training.

#Batch Normalization #Training
Data Scientist Technical medium

What is embedding? How do word embeddings like Word2Vec and GloVe work?

#Embeddings #Word2Vec
Data Scientist Technical medium

How would you approach an NLP problem like sentiment analysis from scratch?

#Sentiment Analysis #Text Classification
Data Scientist Technical medium

What is transfer learning? How would you fine-tune a pre-trained model?

#Transfer Learning #Fine-Tuning
Data Scientist Technical hard

Explain the transformer architecture. What are attention mechanisms?

#Transformers #Attention #BERT
Data Scientist Technical hard

What is the vanishing gradient problem? How do LSTM and ResNet address it?

#LSTM #ResNet #Gradients
Data Scientist Technical medium

Explain how backpropagation works.

#Backpropagation #Neural Networks
Data Scientist Technical medium

What is principal component analysis (PCA)? What are its limitations?

#PCA #SVD
Data Scientist Technical medium

Explain the difference between bagging and boosting.

#Bagging #Boosting
Data Scientist Technical medium

How do you approach feature selection?

#Feature Selection #LASSO
Data Scientist Technical medium

What is cross-validation? Explain k-fold and stratified k-fold.

#Cross Validation #k-Fold
Data Scientist Technical hard

How do you monitor model performance in production? What is model drift?

#Model Drift #Monitoring
Data Scientist Technical easy

Explain the difference between INNER JOIN, LEFT JOIN, and CROSS JOIN.

#Joins #SQL
Data Scientist Technical easy

What is an experiment holdout group?

#Holdout #Control Group
Data Scientist Technical hard

How would you identify the root cause of a sudden 20% drop in DAU?

#Root Cause Analysis #Debugging
Data Scientist Technical easy

Explain the difference between a leading indicator and a lagging indicator.

#Metrics #KPIs
Machine Learning Engineer Behavioral medium

Describe a situation where you had to push back on a Product Manager who wanted to launch an ML feature that achieved high accuracy but failed to meet the required P99 latency SLA.

#Conflict Resolution #Prioritization #Cross-functional Collaboration
Machine Learning Engineer Behavioral medium

Tell me about a time you discovered a significant bias or data leakage in your ML model right before deployment. How did you handle it, and how did you communicate the delay to stakeholders?

#Googleyness #Communication #Model Debugging #Ethics
Machine Learning Engineer Coding medium

Given a 2D grid representing a cluster of TPU v5e pods where '1' is an active pod and '0' is inactive, write an algorithm to find the maximum area of connected active pods. Pods are connected horizontally or vertically.

#Graph Theory #Depth-First Search #Breadth-First Search #Matrix
Machine Learning Engineer Coding medium

Implement a custom sparse matrix-vector multiplication (SpMV) algorithm. Assume the sparse matrix is provided in Compressed Sparse Row (CSR) format.

#Linear Algebra #Data Structures #Performance Optimization
Machine Learning Engineer Coding hard

Write a function to sample a random node from a massive, distributed graph where you only have access to an API `get_neighbors(node_id)`. You do not know the total number of nodes.

#Randomized Algorithms #Graph Theory #Reservoir Sampling #Markov Chains
Machine Learning Engineer Coding hard

Given an array of integers representing the execution times of ML training jobs and an integer K representing the number of available GPUs, partition the jobs to minimize the maximum execution time on any single GPU. Jobs must be scheduled in contiguous subarrays.

#Binary Search #Greedy Algorithms #Dynamic Programming
Machine Learning Engineer System Design hard

Design an autocomplete/typeahead system for Google Docs using a neural language model. The system must run within strict latency constraints (<50ms). How do you optimize the model and serving infrastructure?

#Low Latency Serving #Model Quantization #Sequence-to-Sequence #Edge ML
Machine Learning Engineer System Design hard

Design the recommendation system for YouTube Shorts. Specifically, how would you handle the cold-start problem for new creators and optimize for real-time engagement metrics like watch time and swipe-aways?

#Recommendation Systems #Two-Tower Models #Cold Start #Real-time Streaming
Machine Learning Engineer System Design hard

Design a system to predict Ad Click-Through Rate (CTR) for Google Search. How do you handle categorical features with massive cardinality, and how do you update the model with fresh data throughout the day?

#CTR Prediction #Feature Engineering #Continuous Training #Embeddings
Machine Learning Engineer System Design medium

Design a system to detect policy-violating images (e.g., hate speech, extreme violence) uploaded to Google Drive. The system must process millions of images per minute with extreme precision to avoid false positives on user data.

#Computer Vision #High Throughput #Active Learning #Anomaly Detection
Machine Learning Engineer Technical medium

Explain the mathematical difference between Layer Normalization and Batch Normalization. Why is Layer Normalization almost exclusively used in Transformer architectures instead of Batch Normalization?

#Normalization #Transformers #Mathematics
Machine Learning Engineer Technical hard

Explain how you would implement KV-caching in a Transformer model during autoregressive inference. What are the memory bottlenecks, and how do techniques like PagedAttention address them?

#Transformers #LLM Inference #Memory Optimization #Attention Mechanisms
Machine Learning Engineer Technical medium

How would you evaluate the quality of a Retrieval-Augmented Generation (RAG) system built for Google Cloud enterprise search? What specific metrics would you use for the retrieval component vs. the generation component?

#RAG #LLM Evaluation #Information Retrieval #Metrics
Machine Learning Engineer Technical medium

You are training a multimodal model (text and image) using a contrastive loss similar to CLIP. You notice the text loss converges much faster than the image loss, leading to poor alignment. How do you diagnose and fix this?

#Multimodal ML #Loss Optimization #Contrastive Learning #Debugging
Machine Learning Engineer Technical hard

How does Distributed Data Parallel (DDP) differ from Fully Sharded Data Parallel (FSDP) or ZeRO optimization when training large language models? When would you choose one over the other?

#Distributed Training #Model Parallelism #Data Parallelism #Memory Management
ML Engineer Behavioral medium

Tell me about a disagreement you had with a researcher. How did you resolve it?

#Communication
ML Engineer Behavioral easy

How do you keep up with the rapidly evolving ML landscape?

#Continuous Learning
ML Engineer Behavioral hard

Tell me about a time an ML model caused an unexpected real-world impact.

#Responsibility #AI Safety
ML Engineer Behavioral medium

Describe how you collaborated with data scientists to productionize their research code.

#Research to Production
ML Engineer Behavioral hard

Tell me about a time you had to optimize a model for latency without sacrificing too much accuracy.

#Latency #Accuracy
ML Engineer Behavioral medium

Describe a model you deployed to production. What were the biggest challenges?

#Deployment #Challenges
ML Engineer Behavioral medium

How do you decide when a model is 'good enough' to ship?

#Quality #Judgment
ML Engineer Behavioral hard

Describe a time you had to re-architecture a system because the original ML approach didn't scale.

#Scalability
ML Engineer Coding hard

Write a custom PyTorch Dataset and DataLoader for irregular time series data.

#PyTorch #DataLoader
ML Engineer Coding hard

Implement logistic regression with gradient descent in NumPy.

#Logistic Regression #NumPy
ML Engineer Coding hard

Implement a K-means clustering algorithm from scratch in Python.

#K-Means #Clustering
ML Engineer Coding hard

How would you write a batched inference pipeline using Python and Triton server?

#Triton #Batching
ML Engineer Coding medium

Implement a sliding window approach to detect anomalies in a time series.

#Anomaly Detection #Time Series
ML Engineer System Design hard

Design a system to retrain models automatically when performance degrades.

#Retraining #Automation
ML Engineer System Design hard

Design YouTube's video recommendation system end to end.

#Recommendations #Ranking
ML Engineer System Design hard

How would you build a personalized ad targeting system?

#Targeting #ML Systems
ML Engineer System Design hard

Design a training and serving architecture for a large language model at scale.

#Infrastructure #Scale
ML Engineer System Design hard

Design a search ranking system for an e-commerce platform.

#Ranking #Relevance
ML Engineer System Design hard

What is a feature store? Design one from scratch.

#Feature Engineering #MLOps
ML Engineer System Design hard

Design a CI/CD pipeline for ML models.

#CI/CD #Deployment
ML Engineer System Design hard

Design a real-time content moderation system.

#NLP #Real-Time
ML Engineer System Design hard

How would you serve a model that needs to respond in under 10ms?

#Low Latency #Serving
ML Engineer Technical hard

How do you detect data drift vs model drift? How do you respond to each?

#Drift #Production
ML Engineer Technical medium

How would you deploy a model with Vertex AI Predictions?

#Vertex AI #GCP
ML Engineer Technical medium

What is model ensembling? When does it help, and when does it hurt?

#Ensembling #Performance
ML Engineer Technical medium

Explain vector databases. What are FAISS, Pinecone, and Weaviate?

#Vector DB #Embeddings
ML Engineer Technical hard

How would you evaluate an LLM for a production use case?

#Evaluation #Benchmarking
ML Engineer Technical medium

What is Vertex AI? How does it compare to SageMaker?

#Vertex AI #SageMaker
ML Engineer Technical hard

What is quantization in neural networks? How does it reduce inference cost?

#Quantization #Inference
ML Engineer Technical hard

Explain knowledge distillation. When would you use it?

#Distillation #Compression
ML Engineer Technical hard

What is the difference between model parallelism and data parallelism in distributed training?

#Parallelism #Training
ML Engineer Technical medium

How do you version ML models and datasets? What tools do you use?

#Versioning #DVC #MLflow
ML Engineer Technical hard

Explain blue-green deployment vs canary deployment for ML models.

#Blue-Green #Canary
ML Engineer Technical medium

What is shadow mode deployment in ML?

#Shadow Mode #A/B Testing
ML Engineer Technical hard

What is RAG (Retrieval-Augmented Generation)? Describe its architecture.

#RAG #Vector Search
ML Engineer Technical hard

What is LoRA (Low-Rank Adaptation)? How does it reduce fine-tuning costs?

#LoRA #Fine-Tuning
ML Engineer Technical hard

Explain the RLHF (Reinforcement Learning from Human Feedback) training approach.

#RLHF #Fine-Tuning
ML Engineer Technical medium

How do you profile and debug a slow training run?

#Profiling #Debugging
ML Engineer Technical medium

What are the differences between PyTorch and TensorFlow for production?

#PyTorch #TensorFlow
ML Engineer Technical hard

Explain mixed precision training (FP16/BF16). What are the risks?

#Mixed Precision #Performance
ML Engineer Technical hard

How do you optimize GPU utilization during training?

#GPU #Performance
ML Engineer Technical medium

What is Kubernetes? How is it used for ML model serving?

#Kubernetes #Serving
ML Engineer Technical medium

Explain model serialization formats: ONNX, TorchScript, SavedModel.

#ONNX #Serialization
ML Engineer Technical hard

Explain how Google's Two-Tower model works for recommendations.

#Two-Tower #Embeddings
ML Engineer Technical hard

How does Google's TensorFlow Extended (TFX) pipeline work?

#TFX #Pipelines
ML Engineer Technical easy

What is the difference between a data scientist and an ML engineer?

#Roles #MLOps
ML Engineer Technical medium

Explain the model training pipeline from raw data to deployment.

#Pipeline #Training
ML Engineer Technical medium

What is the difference between online learning and offline learning?

#Online Learning #Batch Learning
ML Engineer Technical medium

How do you handle missing data in ML model features?

#Imputation #Missing Data
ML Engineer Technical medium

Explain gradient descent variants: batch, stochastic, and mini-batch.

#Gradient Descent #Optimization
ML Engineer Technical medium

What are learning rate schedulers and why are they important?

#Learning Rate #Training
ML Engineer Technical hard

Explain the attention mechanism in transformers with mathematical detail.

#Attention #Transformers
Product Manager Behavioral medium

Tell me about a time you had to align conflicting stakeholders across engineering and design on a tight deadline.

#Stakeholder Management #Conflict Resolution #Cross-functional Collaboration
Product Manager Behavioral hard

Describe a situation where you had to convince an engineering team to build a feature they strongly disagreed with.

#Influence without Authority #Engineering Collaboration #Communication
Product Manager Behavioral medium

Tell me about a product or feature you launched that failed. What metrics indicated it failed, and what did you learn?

#Post-mortem #Resilience #Data-driven
Product Manager Coding easy

Write a SQL query to find the top 3 most watched YouTube video categories in the last 30 days, given a 'views' table and a 'videos' table.

#SQL #Data Analysis #YouTube
Product Manager System Design hard

Explain how Google Search autocomplete works at a high level and how you would scale it for a newly supported language.

#Google Search #Latency #Scalability #Data Structures
Product Manager System Design hard

Design the backend architecture for a real-time collaborative editing feature in Google Docs.

#Google Docs #Concurrency #Distributed Systems
Product Manager Technical medium

You are the PM for Google Chrome. You have a proposed feature that increases page load speed by 5% but increases memory usage by 15%. Do you launch it?

#Google Chrome #Trade-offs #Performance Metrics
Product Manager Technical medium

Design a product for Google Maps that helps users find parking using AI.

#Google Maps #Artificial Intelligence #User Experience
Product Manager Technical hard

Microsoft is aggressively integrating ChatGPT into Bing. What should be Google Search's strategic response over the next 2 years?

#Google Search #Competitive Analysis #Generative AI #Gemini
Product Manager Technical medium

YouTube Shorts engagement has dropped by 10% week-over-week. How do you investigate and resolve this?

#YouTube Shorts #Root Cause Analysis #Data Analytics
Product Manager Technical hard

How would you integrate Gemini (Google's LLM) into Google Workspace (Docs/Sheets) specifically for enterprise B2B users?

#Generative AI #Google Workspace #B2B Enterprise
Product Manager Technical medium

Estimate the total bandwidth consumed by Google Photos backups globally in a single day.

#Google Photos #Fermi Problem #Data Storage
Product Manager Technical easy

Design a Google Nest smart display device specifically for the elderly.

#Google Nest #Accessibility #Hardware PM
Product Manager Technical medium

How would you monetize Google Maps further without showing intrusive ads on the core map interface?

#Google Maps #Monetization #B2B APIs
Product Manager Technical hard

Should Google spin off YouTube into a completely separate company? Walk me through your strategic reasoning.

#YouTube #Business Strategy #Market Dynamics
Software Engineer Behavioral medium

Tell me about a time you pushed back on a product manager's feature request because you believed it would negatively impact system latency or reliability. How did you resolve the conflict?

#Communication #Googlyness #Prioritization #Conflict Resolution
Software Engineer Behavioral medium

Describe a time when you had to lead a project across multiple timezones or distributed teams. How did you ensure alignment and handle communication breakdowns?

#Cross-functional Collaboration #Leadership #Project Management
Software Engineer Behavioral easy

Tell me about a time you discovered a critical bug or security vulnerability in a production system. What was your immediate action, and how did you ensure it wouldn't happen again?

#Incident Management #Post-mortem #Ownership
Software Engineer Behavioral medium

Tell me about a time you had to pivot your technical approach completely because of changing business requirements from product managers. How did you manage the transition and team morale?

#Adaptability #Conflict Resolution #Team Dynamics
Software Engineer Behavioral medium

Describe a situation where you had to navigate a highly ambiguous project with no clear technical direction. How did you define the milestones and deliver the solution?

#Ambiguity #Project Management #Googlyness
Software Engineer Behavioral medium

Tell me about a time you discovered a significant flaw in a system's architecture right before a major launch. How did you balance the need to ship on time with the need to fix the technical debt?

#Googlyness #Decision Making #Risk Management #Communication
Software Engineer Coding medium

You are given a stream of Google Ads click events. Implement a sliding window counter that returns the number of clicks in the exact last 5 minutes. The stream is highly concurrent.

#Concurrency #Sliding Window #Queues #Data Streams
Software Engineer Coding medium

Given a list of Google Calendar events for 'n' users, where each event consists of a start and end time, and a required meeting duration 'k', find all available time slots where all 'n' users can attend a meeting.

#Intervals #Two Pointers #Sorting
Software Engineer Coding medium

Implement a system for Google Calendar that takes in a list of N users' daily schedules (lists of busy intervals) and their working hours, and returns all available time slots of duration T where all N users can meet.

#Intervals #Sorting #Two Pointers #Arrays
Software Engineer Coding hard

Given a list of daily budgets and expected returns for various Google Ads campaigns, write an algorithm to allocate a total budget B to maximize the overall return. You can allocate partial budgets to campaigns, but returns diminish non-linearly.

#Dynamic Programming #Greedy Algorithms #Optimization #Math
Software Engineer Coding hard

Design a data structure for Google Search Autocomplete that supports adding new search queries, updating the frequency of a query, and retrieving the top 'k' most frequent queries that start with a given prefix in real-time.

#Trie #Heap #Hash Map #Design
Software Engineer Coding hard

You are given a map represented as a weighted directed graph. You need to route an electric vehicle from point A to point B. The EV has a maximum battery capacity, and certain nodes have charging stations. Find the shortest path such that the EV never runs out of battery.

#Graphs #Dijkstra's Algorithm #State Space Search
Software Engineer Coding medium

You are given a map represented as a 2D grid where 0 is a road, 1 is a building, and 2 is an EV charging station. Given a starting position, find the shortest path to an EV charging station. Follow-up: How would you optimize this if we have millions of queries per second for different starting positions on a static map?

#Graph Theory #BFS #Dynamic Programming #Caching
Software Engineer Coding medium

Implement a thread-safe LRU (Least Recently Used) cache. This cache will be used in a high-throughput microservice. Explain how you would minimize lock contention.

#Concurrency #Data Structures #Linked List #Hash Map
Software Engineer Coding hard

You are building a feature for Google Docs. Given a string representing a document and a list of operations (insert, delete, replace at specific indices), apply the operations efficiently. How do you handle overlapping operations?

#String Manipulation #Operational Transformation #Design
Software Engineer Coding medium

Given a stream of user watch events on YouTube (user_id, video_id, timestamp, duration_watched), write a function to find the longest contiguous sequence of videos a user watched where they completed at least 90% of each video.

#Sliding Window #Hash Map #Stream Processing
Software Engineer Coding hard

Design a data structure for Google Search autocomplete. It must support inserting a string with a frequency, and querying the top K most frequent strings that start with a given prefix. Optimize for query latency.

#Trie #Priority Queue #Design #String Manipulation
Software Engineer Coding medium

Implement a function to evaluate a mathematical expression given as a string (e.g., '3 + 5 / (2 - 1) * 4'). The expression can contain parentheses, and you must follow standard order of operations. This is used in Google Search's calculator widget.

#Stacks #String Parsing #Math
Software Engineer Coding medium

You are given an array of integers representing the CPU load of a Google server cluster over time. Find the maximum contiguous subarray sum, but you are allowed to delete exactly one element to maximize the sum.

#Dynamic Programming #Arrays #Kadane's Algorithm
Software Engineer Coding hard

Given a 2D grid representing a Google Maps satellite image where '1' is land and '0' is water, find the minimum number of days to connect two disconnected islands. You can change one '0' to '1' per day.

#BFS #DFS #Matrix
Software Engineer System Design medium

Design a distributed rate limiter for Google Cloud API Gateway that can handle millions of requests per second with minimal latency overhead.

#Redis #Token Bucket #Distributed Systems #Hashing
Software Engineer System Design hard

Design the real-time view counter for YouTube live streams. The system must handle massive spikes in traffic (e.g., a Super Bowl stream) and provide eventually consistent view counts to the UI with sub-second latency.

#Stream Processing #Event Sourcing #Scalability #Data Aggregation
Software Engineer System Design hard

Design Google Photos' auto-backup feature for mobile devices. How do you handle intermittent network connectivity, deduplication, and efficient storage?

#Blob Storage #Checksum #Mobile Sync #Resumable Uploads
Software Engineer System Design hard

Design Google Search's ranking pipeline.

#Ranking #Scale
Software Engineer System Design hard

Design the backend for Google Docs collaborative editing. How do you handle concurrent edits from multiple users offline and online to ensure eventual consistency?

#Operational Transformation #CRDTs #WebSockets #Concurrency
Software Engineer System Design hard

Design a distributed, highly available rate limiter for Google Cloud APIs. It needs to support millions of requests per second, enforce limits per customer per API, and add minimal latency to the critical path.

#Distributed Caching #Token Bucket #Redis #High Availability
Software Engineer System Design hard

How would you design a distributed key-value store like Bigtable?

#Key-Value Store
Software Engineer System Design hard

Design the block-level file synchronization mechanism for Google Drive. How do you handle concurrent edits offline, minimize bandwidth for large files, and resolve conflicts when the client reconnects?

#Distributed Systems #Data Synchronization #Concurrency #Network Optimization
Software Engineer System Design hard

Design the video recommendation feed for YouTube. Focus on how you would fetch, rank, and serve the recommendations at scale within a 200ms latency budget.

#Machine Learning Infra #Caching #Microservices #Recommendation Systems
Software Engineer System Design hard

Design a distributed web crawler for Google Search. How do you handle DNS resolution bottlenecks, avoid crawler traps, prioritize high-quality domains, and ensure you don't DDoS the target servers?

#Distributed Systems #Graph Traversal #Politeness Policies #Queueing
Software Engineer Technical medium

Implement a thread-safe LRU cache in your language of choice. It must support get() and put() in O(1) time, and handle concurrent access from multiple threads without race conditions or deadlocks.

#Concurrency #Hash Map #Doubly Linked List #Mutex
Software Engineer Technical medium

Given two tables: `search_logs` (query_id, user_id, query_string, timestamp, region) and `clicks` (query_id, url_clicked, rank_position), write an optimized SQL query to find the top 3 queries with the highest click-through rate in the 'US' region over the last 7 days, partitioned by day.

#Window Functions #Joins #Aggregations #Performance Optimization
Software Engineer Technical hard

What is MapReduce? How does it work at Google's scale?

#MapReduce

Difficulty Radar

Based on recent AI-sourced data.

Meet Your Interviewers

The "Standard" Interviewer

Senior Engineer

Focuses on core competencies, system constraints, and clear communication.

Simulate

Unwritten Rules

Think Out Loud

Always explain your thought process before writing code or drawing architecture.

Practice Now