Building on Foundations: Mastering Intermediate SQL Concepts
Aggregate Functions
What Are Aggregate Functions?
Aggregate functions help summarize large amounts of data into meaningful insights. They answer questions like:
- How many books are in the library?
- What’s the total price of all books?
- What’s the average book price?
Here are the most commonly used aggregate functions:
COUNT
- Counts the number of rows.SUM
- Adds up the values in a column.AVG
- Calculates the average value in a column.MIN
- Finds the smallest value in a column.MAX
- Finds the largest value in a column.
1. COUNT: Counting Rows
The COUNT
function counts the number of rows that match a condition.
Syntax:
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
Example 1: Count All Rows
Count the total number of books:
SELECT COUNT(*) AS total_books
FROM books;
Example 2: Count with a Condition
Count books published after 2000:
SELECT COUNT(*) AS books_after_2000
FROM books
WHERE year_published > 2000;
2. SUM: Adding Up Values
The SUM
function calculates the total value of a numeric column.
Syntax:
SELECT SUM(column_name)
FROM table_name
WHERE condition;
Example 1: Sum All Values
Calculate the total price of all books:
SELECT SUM(price) AS total_price
FROM books;
3. AVG: Calculating Averages
The AVG
function calculates the average value of a numeric column.
Syntax:
SELECT AVG(column_name)
FROM table_name
WHERE condition;
Example 1: Average Price of All Books
Find the average price of books:
SELECT AVG(price) AS average_price
FROM books;
4. MIN: Finding the Minimum Value
The MIN
function finds the smallest value in a column.
Syntax:
SELECT MIN(column_name)
FROM table_name
WHERE condition;
Example 1: Find the Cheapest Book
Get the price of the cheapest book:
SELECT MIN(price) AS cheapest_book
FROM books;
5. MAX: Finding the Maximum Value
The MAX
function finds the largest value in a column.
Syntax:
SELECT MAX(column_name)
FROM table_name
WHERE condition;
Example 1: Find the Most Expensive Book
Get the price of the most expensive book:
SELECT MAX(price) AS most_expensive_book
FROM books;
Combining Aggregate Functions
You can use multiple aggregate functions in a single query.
Example: Find the total books, average price, and most expensive book
SELECT
COUNT(*) AS total_books,
AVG(price) AS average_price,
MAX(price) AS most_expensive_book
FROM books;
Common Mistakes to Avoid
- Using Aggregate Functions Without GROUP BY: If used with other columns,
GROUP BY
is required (covered in the next section). - Forgetting Aliases: Use
AS
for readability. - Mixing Data Types: Ensure the column used has the correct data type (e.g., don’t use
SUM
on a text column).
What’s Next?
Now that you know how to use aggregate functions, it’s time to go further! In the next section, we’ll explore **Grouping Data with GROUP BY** to summarize data for specific groups.
Interactive Practice
Try these challenges in the SQL editor:
Count books published before 1950:
SELECT COUNT(*) AS books_before_1950
FROM books
WHERE year_published < 1950;
Find the total price of books by George Orwell:
SELECT SUM(price) AS george_orwell_total_price
FROM books
WHERE author = 'George Orwell';
Joins: Combining Data from Multiple Tables
What Are Joins?
Joins allow you to combine data from multiple tables based on a related column.
For example, if you have a books
table and an authors
table,
you can join them to see book titles along with their respective authors.
Types of SQL Joins:
INNER JOIN
: Returns only matching rows from both tables.LEFT JOIN
(LEFT OUTER JOIN): Returns all rows from the left table and matching rows from the right.RIGHT JOIN
(RIGHT OUTER JOIN): Returns all rows from the right table and matching rows from the left.FULL JOIN
(FULL OUTER JOIN): Returns all rows where there is a match in either table.CROSS JOIN
: Returns every possible combination of rows from both tables.
1. INNER JOIN
An INNER JOIN
returns only rows that have matching values in both tables.
Syntax:
SELECT columns
FROM table1
INNER JOIN table2
ON table1.column = table2.column;
Example: Books and Authors
Retrieve book titles along with their authors:
SELECT books.title, authors.author_name
FROM books
INNER JOIN authors
ON books.author_id = authors.author_id;
2. LEFT JOIN (LEFT OUTER JOIN)
A LEFT JOIN
returns all rows from the left table and matching rows from the right table.
Syntax:
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.column = table2.column;
Example: Books and Authors
Retrieve all books, even if the author information is missing:
SELECT books.title, authors.author_name
FROM books
LEFT JOIN authors
ON books.author_id = authors.author_id;
3. RIGHT JOIN (RIGHT OUTER JOIN)
A RIGHT JOIN
returns all rows from the right table and matching rows from the left table.
Syntax:
SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.column = table2.column;
Example: Books and Authors
Retrieve all authors, even if they haven’t written any books:
SELECT books.title, authors.author_name
FROM books
RIGHT JOIN authors
ON books.author_id = authors.author_id;
4. FULL JOIN (FULL OUTER JOIN)
A FULL JOIN
returns all rows where there is a match in either table.
Syntax:
SELECT columns
FROM table1
FULL JOIN table2
ON table1.column = table2.column;
Example: Books and Authors
Retrieve all books and all authors, even if there’s no match:
SELECT books.title, authors.author_name
FROM books
FULL JOIN authors
ON books.author_id = authors.author_id;
5. CROSS JOIN
A CROSS JOIN
returns the Cartesian product of both tables (every row from one table combined
with every row from the other).
Syntax:
SELECT columns
FROM table1
CROSS JOIN table2;
Example: Books and Authors
Combine every book with every author:
SELECT books.title, authors.author_name
FROM books
CROSS JOIN authors;
Common Mistakes to Avoid
- Forgetting the ON Clause: Always specify the condition for joining tables to avoid a Cartesian product.
- Using the Wrong Join Type: Choose the right join based on your needs (e.g.,
INNER JOIN
for exact matches,LEFT JOIN
for all left-side records). - Ambiguous Column Names: When both tables have columns with the same name, use table aliases to avoid confusion.
What’s Next?
Now that you know how to join tables, it's time to explore **Subqueries**. Subqueries allow you to nest one query inside another, making your SQL even more powerful.
Interactive Practice
Try these challenges in the SQL editor:
Join books and authors to get the title and author_name:
SELECT books.title, authors.author_name
FROM books
INNER JOIN authors
ON books.author_id = authors.author_id;
Use a LEFT JOIN to retrieve all books and their authors, even if author info is missing:
SELECT books.title, authors.author_name
FROM books
LEFT JOIN authors
ON books.author_id = authors.author_id;
Subqueries: Queries Within Queries
A subquery is a query nested inside another SQL query. It allows you to perform calculations and filter results dynamically by using one query's result within another.
Types of Subqueries
- Single-Row Subqueries: Return only one row of results.
- Multi-Row Subqueries: Return multiple rows of results.
- Correlated Subqueries: Depend on the outer query for their values.
1. Single-Row Subqueries
Single-row subqueries return only one row of results and are used with comparison operators such as
=
, >
, and <
.
Example: Books More Expensive Than the Average Price
To find books that cost more than the average book price:
SELECT title, price
FROM books
WHERE price > (SELECT AVG(price) FROM books);
2. Multi-Row Subqueries
Multi-row subqueries return multiple rows and are used with IN
, ANY
, or
ALL
operators.
Example: Books by Authors with More Than 2 Books
To list books written by authors who have written more than two books:
SELECT title, author_id
FROM books
WHERE author_id IN (
SELECT author_id
FROM books
GROUP BY author_id
HAVING COUNT(*) > 2
);
3. Correlated Subqueries
Correlated subqueries run once for each row in the outer query and depend on its values.
Example: Books More Expensive Than the Author’s Average Price
To find books that are more expensive than the average price of books by the same author:
SELECT title, price
FROM books b1
WHERE price > (
SELECT AVG(price)
FROM books b2
WHERE b2.author_id = b1.author_id
);
Common Mistakes to Avoid
- Forgetting Parentheses: Always enclose subqueries in parentheses.
- Using the Wrong Operator: Use
IN
for multi-row subqueries and=
for single-row subqueries. - Performance Issues: Correlated subqueries can be slow on large datasets. Optimize where possible.
What’s Next?
Now that you understand subqueries, the next step is learning about Window Functions, which allow you to perform calculations across a set of related rows.
Interactive Practice
Try these challenges in your SQL editor:
- Find books more expensive than the average price of all books.
- Find authors who have written more than 2 books.
- Find books published in the same year as any book by J.K. Rowling.
Set Operations: Combining and Comparing Results
Set operations allow you to combine or compare results from multiple SELECT
queries, similar to
mathematical set operations. These operations are useful for merging datasets, finding common elements, or
identifying unique records.
Types of Set Operations in SQL
- UNION: Combines results from two queries and removes duplicates.
- UNION ALL: Combines results but retains duplicates.
- INTERSECT: Returns only rows that appear in both queries.
- EXCEPT (or MINUS): Returns rows from the first query that are not in the second.
1. UNION: Combining Results Without Duplicates
The UNION
operator merges two result sets while eliminating duplicate rows.
Syntax
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
Example: Combining Customer Lists
To merge customer lists from two years without duplicates:
SELECT customer_name FROM customers_2022
UNION
SELECT customer_name FROM customers_2023;
2. UNION ALL: Combining Results with Duplicates
Unlike UNION
, the UNION ALL
operator retains duplicate records.
Example: Merging Sales Data
To combine sales records from two quarters, including duplicates:
SELECT sale_id, amount FROM sales_q1
UNION ALL
SELECT sale_id, amount FROM sales_q2;
3. INTERSECT: Finding Common Rows
The INTERSECT
operator returns only rows that appear in both result sets.
Example: Identifying Returning Customers
To find customers who made purchases in both 2022 and 2023:
SELECT customer_name FROM customers_2022
INTERSECT
SELECT customer_name FROM customers_2023;
4. EXCEPT (or MINUS): Finding Unique Rows
The EXCEPT
(or MINUS
in some databases) operator returns rows from the first query
that do not appear in the second.
Example: Identifying Customers Who Shopped Only in 2022
To find customers who made purchases in 2022 but not in 2023:
SELECT customer_name FROM customers_2022
EXCEPT
SELECT customer_name FROM customers_2023;
Common Mistakes to Avoid
- Mismatched Columns: Both queries must have the same number of columns and matching data types.
- Ignoring Duplicates: Use
UNION
to remove duplicates, andUNION ALL
to retain them. - Confusing INTERSECT and EXCEPT:
INTERSECT
finds common rows, whileEXCEPT
finds unique rows from the first query.
What’s Next?
Now that you understand set operations, the next step is learning about subqueries, which allow you to nest one query inside another for more complex SQL queries. Stay tuned!
Interactive Practice
Try these challenges in your SQL editor:
- Combine the
product_name
s from two warehouses usingUNION
. - Find customers who made purchases in both 2022 and 2023 using
INTERSECT
. - Identify books available only in
library_a
usingEXCEPT
.
String and Date Functions in SQL
String and Date functions help in manipulating text and handling date/time values efficiently in SQL. These functions are essential for data transformation, cleaning, and analysis.
String Functions
1. CONCAT: Combining Strings
The CONCAT
function merges multiple strings into one.
SELECT CONCAT(first_name, ' ', last_name) AS full_name
FROM employees;
2. SUBSTRING: Extracting Part of a String
The SUBSTRING
function extracts a portion of a string.
SELECT SUBSTRING(product_code, 1, 3) AS short_code
FROM products;
3. LENGTH: Finding the Length of a String
The LENGTH
function returns the number of characters in a string.
SELECT product_name, LENGTH(product_name) AS name_length
FROM products;
4. UPPER and LOWER: Changing Case
The UPPER
function converts text to uppercase, while LOWER
converts it to
lowercase.
SELECT UPPER(product_name) AS upper_name, LOWER(category) AS lower_category
FROM products;
5. TRIM: Removing Spaces
The TRIM
function removes leading and trailing spaces.
SELECT TRIM(product_name) AS trimmed_name
FROM products;
6. REPLACE: Replacing Text
The REPLACE
function substitutes occurrences of a substring within a string.
SELECT REPLACE(product_name, 'Old', 'New') AS updated_name
FROM products;
Date Functions
1. NOW(): Current Date and Time
The NOW
function retrieves the current date and time.
SELECT NOW() AS current_datetime;
2. CURDATE(): Current Date
The CURDATE
function returns the current date.
SELECT CURDATE() AS current_date;
3. DATE_ADD: Adding Time
The DATE_ADD
function adds a time interval to a date.
SELECT DATE_ADD(order_date, INTERVAL 7 DAY) AS new_date
FROM orders;
4. DATE_SUB: Subtracting Time
The DATE_SUB
function subtracts a time interval from a date.
SELECT DATE_SUB(order_date, INTERVAL 3 MONTH) AS new_date
FROM orders;
5. DATEDIFF: Calculating Date Differences
The DATEDIFF
function finds the difference between two dates.
SELECT DATEDIFF(delivery_date, order_date) AS days_diff
FROM orders;
6. DATE_FORMAT: Formatting Dates
The DATE_FORMAT
function formats a date into a readable string.
SELECT DATE_FORMAT(order_date, '%Y-%m-%d') AS formatted_date
FROM orders;
Common Mistakes to Avoid
- Mismatched Data Types: Ensure correct data types when using string or date functions.
- Incorrect Format Strings: Use correct format patterns for
DATE_FORMAT
. - Ignoring Time Zones: Consider time zone adjustments when working with datetime values.
What’s Next?
Now that you understand string and date functions, the next step is learning about Aggregate Functions like
COUNT
, SUM
, AVG
, MIN
, and MAX
for
summarizing data.
Interactive Practice
Try these challenges in your SQL editor:
- Combine
first_name
andlast_name
into a full name. - Extract the year from
order_date
. - Calculate the number of days between
start_date
andend_date
.
Constraints: Enforcing Data Integrity
Constraints are rules applied to database columns to ensure data integrity and consistency. They prevent invalid data entry and help maintain database reliability.
Types of Constraints
- NOT NULL: Ensures a column cannot have NULL values.
- UNIQUE: Ensures all values in a column are unique.
- PRIMARY KEY: Uniquely identifies each row in a table.
- FOREIGN KEY: Ensures referential integrity by linking tables.
- CHECK: Ensures a column meets a specified condition.
- DEFAULT: Assigns a default value to a column if none is provided.
1. NOT NULL Constraint
The NOT NULL
constraint ensures that a column cannot contain NULL values.
CREATE TABLE customers (
customer_id INT PRIMARY KEY,
customer_name VARCHAR(100) NOT NULL,
email VARCHAR(100) NOT NULL
);
2. UNIQUE Constraint
The UNIQUE
constraint ensures that all values in a column are distinct.
CREATE TABLE users (
user_id INT PRIMARY KEY,
username VARCHAR(50) UNIQUE,
password VARCHAR(50)
);
3. PRIMARY KEY Constraint
The PRIMARY KEY
uniquely identifies each row in a table by combining NOT NULL
and
UNIQUE
.
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT
);
4. FOREIGN KEY Constraint
The FOREIGN KEY
constraint ensures that a column's value matches a value in another table’s
column.
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);
5. CHECK Constraint
The CHECK
constraint ensures that a column meets a specified condition.
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
price DECIMAL(10, 2) CHECK (price > 0)
);
6. DEFAULT Constraint
The DEFAULT
constraint sets a default value for a column if no value is provided.
CREATE TABLE orders (
order_id INT PRIMARY KEY,
order_date DATE DEFAULT CURDATE(),
customer_id INT
);
Common Mistakes to Avoid
- Forgetting PRIMARY KEY: Every table should have a primary key.
- Mismatched Data Types: Ensure data types match when using foreign keys.
- Overusing Constraints: Apply constraints selectively to avoid unnecessary complexity.
What’s Next?
Now that you understand constraints, the next step is learning about Indexes, which improve query performance by optimizing data retrieval.
Interactive Practice
Try these challenges in your SQL editor:
- Create a
customers
table where theemail
column cannot be NULL. - Create a
products
table with aCHECK
constraint ensuring the price is greater than 0. - Create an
orders
table with aDEFAULT
constraint on theorder_date
column.
Indexes: Speeding Up Data Retrieval
An index in SQL is a data structure that improves query performance by allowing the database to find rows faster, similar to an index in a book.
Why Are Indexes Important?
- Speed: Queries run faster by reducing the number of rows scanned.
- Efficiency: Reduces workload on the database, especially for read-heavy operations.
- Scalability: Helps maintain performance as data grows.
1. Creating and Dropping Indexes
Creating an Index
You can create an index on one or more columns to speed up queries.
CREATE INDEX idx_email
ON customers (email);
Dropping an Index
If an index is no longer needed, you can drop it.
DROP INDEX idx_email
ON customers;
2. Types of Indexes
Single-Column Index
An index on a single column improves search performance for that column.
CREATE INDEX idx_last_name
ON employees (last_name);
Composite Index
A composite index is created on multiple columns.
CREATE INDEX idx_full_name
ON employees (first_name, last_name);
Unique Index
A unique index ensures all values in a column are distinct.
CREATE UNIQUE INDEX idx_username
ON users (username);
Primary Key Index
Automatically created when a primary key is defined.
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100),
enrollment_date DATE
);
3. Pros and Cons of Indexing
Pros of Indexing
- Faster Queries: Reduces search time, especially in large tables.
- Improved Performance: Benefits queries with
WHERE
,JOIN
,ORDER BY
, andGROUP BY
. - Efficient Data Access: Reduces row scans for improved efficiency.
Cons of Indexing
- Slower Writes: Insert, update, and delete operations take longer.
- Storage Overhead: Indexes consume additional disk space.
- Maintenance: Indexes require optimization over time.
When to Use Indexes
- Frequently Queried Columns: Index columns used in WHERE, JOIN, ORDER BY, or GROUP BY.
- Large Tables: Best suited for tables with thousands or millions of rows.
- High Selectivity: Index columns with many unique values.
When NOT to Use Indexes
- Small Tables: Indexing small tables may not improve performance.
- Frequently Updated Columns: Avoid indexing columns that change often.
- Low Selectivity: Indexing columns with many duplicate values may not help.
Common Mistakes to Avoid
- Over-Indexing: Too many indexes can slow down write operations.
- Ignoring Query Patterns: Create indexes based on query needs.
- Not Maintaining Indexes: Use
ANALYZE
orREINDEX
to keep indexes optimized.
What’s Next?
Now that you know how to use indexes, the next step is learning about Query Optimization Techniques to further improve database performance.
Interactive Practice
Try these challenges in your SQL editor:
- Create an index on the
order_date
column in theorders
table. - Create a composite index on
first_name
andlast_name
in theemployees
table. - Drop the
idx_order_date
index from theorders
table.