Let's be honest. You didn't get into data science to spend hours writing the same repetitive SQL queries over and over again.
You wanted to build machine learning models. Uncover hidden insights. Create stunning visualizations.
But instead, you're stuck trying to remember whether it's WHERE column IN value
or WHERE column = value
for the hundredth time.
What if I told you that learning a few simple SQL tricks could cut your query time in half? That you could stop fighting with syntax and start getting answers?
I've been analyzing data for over a decade at companies you've definitely heard of, and I'm going to show you the exact techniques that helped me go from SQL struggler to query master.
The 3-SQL Rule That Changed Everything
Early in my career, a senior engineer dropped this wisdom on me: "If your query is more than three levels deep, you're probably doing it wrong."
I didn't believe him until I spent an entire afternoon debugging a 15-line monstrosity that was supposed to find our top-selling products.
The query was so complex that when it finally ran, it returned... nothing. Empty. Zilch.
After rewriting it using the techniques I'll show you, it ran in seconds and returned exactly what we needed.
Here's what I learned:
1. Stop Using OR When You Could Use IN
The painful way:
SELECT *
FROM users
WHERE country = 'USA'
OR country = 'Canada'
OR country = 'UK'
OR country = 'Australia';
The elegant way:
SELECT *
FROM users
WHERE country IN ('USA', 'Canada', 'UK', 'Australia');
Not only is the second query cleaner, it's also significantly faster. The database can optimize IN
statements much better than multiple OR
conditions.
2. Become Best Friends WITH CTEs
Common Table Expressions (CTEs) are like creating mini-tables within your query. They make complex logic readable.
The confusing way:
SELECT
o.order_id,
o.total_amount,
u.name,
(SELECT COUNT(*) FROM orders o2 WHERE o2.user_id = o.user_id) as total_orders
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE o.create_date > '2024-01-01'
AND u.country IN ('USA', 'Canada')
AND o.status = 'completed'
ORDER BY o.total_amount DESC
LIMIT 100;
The clear way:
WITH user_orders AS (
SELECT
user_id,
COUNT(*) as total_orders
FROM orders
WHERE status = 'completed'
GROUP BY user_id
),
recent_orders AS (
SELECT *
FROM orders
WHERE create_date > '2024-01-01'
AND status = 'completed'
)
SELECT
ro.order_id,
ro.total_amount,
u.name,
uo.total_orders
FROM recent_orders ro
JOIN users u ON ro.user_id = u.user_id
JOIN user_orders uo ON ro.user_id = uo.user_id
WHERE u.country IN ('USA', 'Canada')
ORDER BY ro.total_amount DESC
LIMIT 100;
The second query might be longer, but it's dramatically easier to understand, debug, and modify. Each section does one clear thing.
3. Stop Writing the Same JOINs Over and Over
If you find yourself joining the same tables repeatedly across different queries, you're working too hard.
The repetitive way (across multiple queries):
-- In query 1
SELECT u.name, o.order_date
FROM users u
JOIN orders o ON u.user_id = o.user_id;
-- In query 2
SELECT u.country, COUNT(o.order_id)
FROM users u
JOIN orders o ON u.user_id = o.user_id
GROUP BY u.country;
-- In query 3
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.total_amount > 100;
The smart way: Create a view
CREATE VIEW user_orders_view AS
SELECT
u.*,
o.order_id,
o.order_date,
o.total_amount,
o.status
FROM users u
JOIN orders o ON u.user_id = o.user_id;
Now all your future queries become simpler:
-- Query 1 becomes:
SELECT name, order_date FROM user_orders_view;
-- Query 2 becomes:
SELECT country, COUNT(order_id) FROM user_orders_view GROUP BY country;
-- Query 3 becomes:
SELECT name, total_amount FROM user_orders_view WHERE total_amount > 100;
The One Feature That Will Make You Feel Like a SQL Wizard
Window functions were my "I've leveled up" moment in SQL. They look intimidating but are incredibly powerful.
The problem: Find each user's most recent order
The old way (slow and complicated):
SELECT
o1.*
FROM orders o1
WHERE o1.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.user_id = o1.user_id
);
The magical window function way:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as rn
FROM orders
) ranked_orders
WHERE rn = 1;
The window function approach is not just cleaner - it's significantly faster on large datasets because it avoids correlated subqueries.
Your SQL Cheat Sheet (Copy/Paste Ready)
Bookmark these. I use them daily:
-- Find duplicates (life-saver for data cleaning)
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;
-- Calculate running totals (perfect for time series)
SELECT
date,
sales,
SUM(sales) OVER (ORDER BY date) as running_total
FROM daily_sales;
-- Compare row to previous row
SELECT
date,
sales,
sales - LAG(sales) OVER (ORDER BY date) as change_from_previous
FROM daily_sales;
-- Calculate percent of total
SELECT
category,
sales,
sales / SUM(sales) OVER () * 100 as percent_of_total
FROM category_sales;
From SQL Struggler to Query Master
The difference between struggling with SQL and mastering it isn't about memorizing every function. It's about:
- Writing readable queries (your future self will thank you)
- Letting the database do the heavy lifting (use its optimization powers)
- Building a toolkit of patterns you can adapt for different problems
The best way to learn? Stop reading and start practicing.
Want to try these techniques right now? We've built an interactive SQL playground where you can run these exact queries against real datasets. No setup required.
👉 Try Our SQL Playground Here
This is part of our Data Engineer Career Path, where we take you from SQL basics to advanced optimization techniques that will make you the go-to data person on your team.
What's your biggest SQL headache? Reply to this on Twitter and I might just solve it in our next post.
Comments (0)
No comments yet. Be the first to comment!
Please login to leave a comment.