#20 Query Optimization
SQL Crash Course #20
All the course material is stored in the SQL Crash Course repository.
Hi everyone! Josep and Cornellius Yudha Wijaya from Non-Brand Data here ๐๐ป
As promised, today we are publishing the next two issues of our SQL Crash Course โ From Zero to Hero! ๐
I am sure you are here to continue our SQL Crash Course Journey!๐
If this is your first time or youโve forgotten what weโll cover, we will examine seven key SQL topics, each divided into multiple posts across Non-Brand Data and DataBites.
๐ Remember last time we already saw:
๐ #17 โ Indexing and Optimization
๐ #18 โ Generating Modular Code
In todayโs topic, we will finish up the Crafting Good Query topic with:
๐ #19 โ SQL Execution Order by Josep Ferrer in DataBites
๐ #20 โ Query Optimization, which is the current article you are reading!
So donโt miss outโletโs keep the SQL momentum going!
๐งฑ What is SQL Query Optimization?
SQL Query Optimization is a method that involves refining SQL queries.
Without optimization, SQL Query could:
Waste server resources
Crash under heavy loads
Frustrate users with delays
Thatโs why SQL Query Optimization comes to help you to:
Improve performance
Reduce resource consumption
Ensure efficient data retrieval
Many methods exist to optimize SQL queries, including the indexing method we have learned previously.
Although this article will further explore the SQL queries, you will still learn many new things.
โ
When to Optimize SQL Queries?
You don't always have to optimize your SQL queries. Here are situations when optimization may be necessary:
๐ Queries are slow or timing out
๐ Handling large datasets or complex joins
๐ Database load affects application performance
๐ Before scaling infrastructure to ensure resource efficiency
Letโs explore a few practical strategies to optimize SQL Queries! ๐
1๏ธโฃ Use EXISTS Instead of JOIN + DISTINCT
/* ๐ซ Bad */
SELECT DISTINCT n.*
FROM newsletters n
JOIN posts p ON n.id = p.newsletter_id
WHERE p.name = 'DataBites';
/* โ
Good */
SELECT *
FROM newsletters n
WHERE EXISTS (
SELECT 1
FROM posts p
WHERE p.newsletter_id = n.id
AND p.name = 'DataBites'
); ๐ง Why this matters:EXISTS stops scanning at the first match, while JOIN + DISTINCT processes all matching rows before deduping. This reduces I/O operations and avoids unnecessary data shuffling.
2๏ธโฃ Select Only What You Need
/* ๐ซ Bad */
SELECT * FROM posts WHERE published_at >= '2024-01-01';
/* โ
Good */
SELECT id, name, published_at
FROM posts
WHERE published_at >= '2024-01-01'; ๐ง Why this matters:
Unused columns bloat memory usage and network transfer. Explicit column selection can cut data volume by 60 %+ in large tables.
3๏ธโฃ Index Strategically
/* ๐ซ Bad */
SELECT name FROM posts
WHERE published_at BETWEEN '2024-01-01' AND '2024-03-31';
/* โ
Good */
CREATE INDEX idx_posts_date ON posts(published_at); ๐ง Why this matters:
Indexes act like a bookโs index, they let the database jump directly to relevant rows instead of scanning every page. Date-range queries become 10x faster.
4๏ธโฃ Break Queries into CTEs
/* ๐ซ Bad */
SELECT p.id
FROM posts p
JOIN interactions i ON p.id = i.post_id
GROUP BY p.id
HAVING SUM(i.points) > (
SELECT AVG(total) FROM (
SELECT SUM(points) AS total FROM interactions GROUP BY post_id
) sub
);
/* โ
Good */
WITH post_totals AS (
SELECT post_id, SUM(points) AS total
FROM interactions
GROUP BY post_id
)
SELECT p.id
FROM posts p
JOIN post_totals pt ON p.id = pt.post_id
WHERE pt.total > (SELECT AVG(total) FROM post_totals); ๐ง Why this matters:
CTEs turn nested spaghetti logic into reusable building blocks. You calculate post totals once and reference them twice โ no repeated aggregation.
5๏ธโฃ Avoid SELECT * in Subqueries
/* ๐ซ Bad */
SELECT n.name,
(SELECT * FROM posts p WHERE p.newsletter_id = n.id) AS posts
FROM newsletters n;
/* โ
Good */
SELECT n.name,
(SELECT COUNT(id) FROM posts p WHERE p.newsletter_id = n.id) AS post_count
FROM newsletters n; ๐ง Why this matters:
Subqueries with SELECT * pull entire rows even if you only need one value. This is like ordering a pizza to eat just one slice, wasteful and slow.
6๏ธโฃ Use JOIN Instead of Subqueries for Filters
/* ๐ซ Bad */
SELECT id FROM newsletters
WHERE id IN (SELECT newsletter_id FROM posts WHERE name LIKE '%SQL%');
/* โ
Good */
SELECT DISTINCT n.id
FROM newsletters n
JOIN posts p ON n.id = p.newsletter_id
WHERE p.name LIKE '%SQL%'; ๐ง Why this matters:
Most databases optimize JOIN filters better than IN subqueries. Joins can leverage indexes on posts.name, while subqueries often trigger full scans.
Putting It All Together
๐ You now have a query that:
Runs in milliseconds instead of minutes
Uses 70% less memory by avoiding
SELECT *bloatScales gracefully with strategic indexing
Stays readable through CTEs and modular logic
๐ Conclusion
SQL optimization is iterative, not one-time. Start with:
The slowest query in your system
The most frequently run query
The query with the most extensive dataset
Remember: A 10% improvement on a query running 1000x/day saves more resources than a 50% improvement on a query running once a week.
Once you start optimizing, your SQL Query will become much better.
Once you understand it, you won't revert.
๐๐ป SQL playground with the optimization example
How to Get Started ๐
Over the previous weeks, we already going through:
โ
SQL Fundamentals
โ
Intermediate SQL
โ
Advanced SQL
โ
Database Operations
โ
Writing Efficient Queries
In the next issues, we will start working on real-world projects and solving hands-on data problems.
Whatโs Next? โก๏ธ
To get the full experience and fully immersed in the learning:
๐ Subscribe to Databites.tech (By Josep)
๐ Subscribe to Non-Brand Data (By Cornellius)
๐ Check out the SQL Crash Course GitHub repo
๐ Share with your friend and whoever needs it!
๐๏ธ Every Thursday, you will have two new issues in your inbox!
Letโs dive in and make SQL less scary, more fun, and way more useful! ๐
Josep & Cornellius





