The Lazy Data Scientist's Guide to SQL: Write Less Code, Get Better Results

Let's be honest. You didn't get into data science to spend hours writing the same repetitive SQL queries over and over again.

You wanted to build machine learning models. Uncover hidden insights. Create stunning visualizations.

But instead, you're stuck trying to remember whether it's WHERE column IN value or WHERE column = value for the hundredth time.

What if I told you that learning a few simple SQL tricks could cut your query time in half? That you could stop fighting with syntax and start getting answers?

I've been analyzing data for over a decade at companies you've definitely heard of, and I'm going to show you the exact techniques that helped me go from SQL struggler to query master.

The 3-SQL Rule That Changed Everything

Early in my career, a senior engineer dropped this wisdom on me: "If your query is more than three levels deep, you're probably doing it wrong."

I didn't believe him until I spent an entire afternoon debugging a 15-line monstrosity that was supposed to find our top-selling products.

The query was so complex that when it finally ran, it returned... nothing. Empty. Zilch.

After rewriting it using the techniques I'll show you, it ran in seconds and returned exactly what we needed.

Here's what I learned:

1. Stop Using OR When You Could Use IN

The painful way:

SELECT *
FROM users
WHERE country = 'USA' 
   OR country = 'Canada'
   OR country = 'UK'
   OR country = 'Australia';

The elegant way:

SELECT *
FROM users
WHERE country IN ('USA', 'Canada', 'UK', 'Australia');

Not only is the second query cleaner, it's also significantly faster. The database can optimize IN statements much better than multiple OR conditions.

2. Become Best Friends WITH CTEs

Common Table Expressions (CTEs) are like creating mini-tables within your query. They make complex logic readable.

The confusing way:

SELECT 
  o.order_id,
  o.total_amount,
  u.name,
  (SELECT COUNT(*) FROM orders o2 WHERE o2.user_id = o.user_id) as total_orders
FROM orders o
JOIN users u ON o.user_id = u.user_id
WHERE o.create_date > '2024-01-01'
AND u.country IN ('USA', 'Canada')
AND o.status = 'completed'
ORDER BY o.total_amount DESC
LIMIT 100;

The clear way:

WITH user_orders AS (
  SELECT 
    user_id,
    COUNT(*) as total_orders
  FROM orders
  WHERE status = 'completed'
  GROUP BY user_id
),
recent_orders AS (
  SELECT *
  FROM orders
  WHERE create_date > '2024-01-01'
  AND status = 'completed'
)

SELECT 
  ro.order_id,
  ro.total_amount,
  u.name,
  uo.total_orders
FROM recent_orders ro
JOIN users u ON ro.user_id = u.user_id
JOIN user_orders uo ON ro.user_id = uo.user_id
WHERE u.country IN ('USA', 'Canada')
ORDER BY ro.total_amount DESC
LIMIT 100;

The second query might be longer, but it's dramatically easier to understand, debug, and modify. Each section does one clear thing.

3. Stop Writing the Same JOINs Over and Over

If you find yourself joining the same tables repeatedly across different queries, you're working too hard.

The repetitive way (across multiple queries):

-- In query 1
SELECT u.name, o.order_date
FROM users u
JOIN orders o ON u.user_id = o.user_id;

-- In query 2  
SELECT u.country, COUNT(o.order_id)
FROM users u
JOIN orders o ON u.user_id = o.user_id
GROUP BY u.country;

-- In query 3
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.user_id = o.user_id
WHERE o.total_amount > 100;

The smart way: Create a view

CREATE VIEW user_orders_view AS
SELECT 
  u.*,
  o.order_id,
  o.order_date, 
  o.total_amount,
  o.status
FROM users u
JOIN orders o ON u.user_id = o.user_id;

Now all your future queries become simpler:

-- Query 1 becomes:
SELECT name, order_date FROM user_orders_view;

-- Query 2 becomes:
SELECT country, COUNT(order_id) FROM user_orders_view GROUP BY country;

-- Query 3 becomes:  
SELECT name, total_amount FROM user_orders_view WHERE total_amount > 100;

The One Feature That Will Make You Feel Like a SQL Wizard

Window functions were my "I've leveled up" moment in SQL. They look intimidating but are incredibly powerful.

The problem: Find each user's most recent order

The old way (slow and complicated):

SELECT 
  o1.*
FROM orders o1
WHERE o1.order_date = (
  SELECT MAX(o2.order_date)
  FROM orders o2
  WHERE o2.user_id = o1.user_id
);

The magical window function way:

SELECT *
FROM (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date DESC) as rn
  FROM orders
) ranked_orders
WHERE rn = 1;

The window function approach is not just cleaner - it's significantly faster on large datasets because it avoids correlated subqueries.

Your SQL Cheat Sheet (Copy/Paste Ready)

Bookmark these. I use them daily:

-- Find duplicates (life-saver for data cleaning)
SELECT email, COUNT(*)
FROM users
GROUP BY email
HAVING COUNT(*) > 1;

-- Calculate running totals (perfect for time series)
SELECT 
  date,
  sales,
  SUM(sales) OVER (ORDER BY date) as running_total
FROM daily_sales;

-- Compare row to previous row
SELECT 
  date,
  sales,
  sales - LAG(sales) OVER (ORDER BY date) as change_from_previous
FROM daily_sales;

-- Calculate percent of total
SELECT 
  category,
  sales,
  sales / SUM(sales) OVER () * 100 as percent_of_total
FROM category_sales;

From SQL Struggler to Query Master

The difference between struggling with SQL and mastering it isn't about memorizing every function. It's about:

  1. Writing readable queries (your future self will thank you)
  2. Letting the database do the heavy lifting (use its optimization powers)
  3. Building a toolkit of patterns you can adapt for different problems

The best way to learn? Stop reading and start practicing.

Want to try these techniques right now? We've built an interactive SQL playground where you can run these exact queries against real datasets. No setup required.

👉 Try Our SQL Playground Here

This is part of our Data Engineer Career Path, where we take you from SQL basics to advanced optimization techniques that will make you the go-to data person on your team.


What's your biggest SQL headache? Reply to this on Twitter and I might just solve it in our next post.

Back to Blog
techyvia_admin

About techyvia_admin

No bio available for this author.

Comments (0)

No comments yet. Be the first to comment!