#20 Query Optimization

SQL Crash Course #20

Cornellius Yudha Wijaya and Josep Ferrer

May 16, 2025

All the course material is stored in the SQL Crash Course repository.

Hi everyone! Josep and Cornellius Yudha Wijaya from Non-Brand Data here 👋🏻

As promised, today we are publishing the next two issues of our SQL Crash Course – From Zero to Hero! 🚀

I am sure you are here to continue our SQL Crash Course Journey!📚

If this is your first time or you’ve forgotten what we’ll cover, we will examine seven key SQL topics, each divided into multiple posts across Non-Brand Data and DataBites.

📚 Remember last time we already saw:

📌 #17 – Indexing and Optimization
📌 #18 – Generating Modular Code

In today’s topic, we will finish up the Crafting Good Query topic with:

📌 #19 – SQL Execution Order by Josep Ferrer in DataBites

📌 #20 – Query Optimization, which is the current article you are reading!

So don’t miss out—let’s keep the SQL momentum going!

🧱 What is SQL Query Optimization?

SQL Query Optimization is a method that involves refining SQL queries.

Without optimization, SQL Query could:

Waste server resources
Crash under heavy loads
Frustrate users with delays

That’s why SQL Query Optimization comes to help you to:

Improve performance
Reduce resource consumption
Ensure efficient data retrieval

Many methods exist to optimize SQL queries, including the indexing method we have learned previously.

#17 Indexing and Optimization

Cornellius Yudha Wijaya and Josep Ferrer

May 8, 2025

Read full story

Although this article will further explore the SQL queries, you will still learn many new things.

✅ When to Optimize SQL Queries?

You don't always have to optimize your SQL queries. Here are situations when optimization may be necessary:

👉 Queries are slow or timing out
👉 Handling large datasets or complex joins
👉 Database load affects application performance
👉 Before scaling infrastructure to ensure resource efficiency

Let’s explore a few practical strategies to optimize SQL Queries! 👇

1️⃣ Use `EXISTS` Instead of `JOIN + DISTINCT`

/* 🚫 Bad */  
SELECT DISTINCT n.*  
FROM newsletters n  
JOIN posts p ON n.id = p.newsletter_id  
WHERE p.name = 'DataBites';  

/* ✅ Good */  
SELECT *  
FROM newsletters n  
WHERE EXISTS (  
  SELECT 1  
  FROM posts p  
  WHERE p.newsletter_id = n.id  
  AND p.name = 'DataBites'  
);

🧠 Why this matters:
EXISTS stops scanning at the first match, while JOIN + DISTINCT processes all matching rows before deduping. This reduces I/O operations and avoids unnecessary data shuffling.

2️⃣ Select Only What You Need

/* 🚫 Bad */  
SELECT * FROM posts WHERE published_at >= '2024-01-01';  

/* ✅ Good */  
SELECT id, name, published_at  
FROM posts  
WHERE published_at >= '2024-01-01';

🧠 Why this matters:
Unused columns bloat memory usage and network transfer. Explicit column selection can cut data volume by 60 %+ in large tables.

3️⃣ Index Strategically

/* 🚫 Bad */  
SELECT name FROM posts  
WHERE published_at BETWEEN '2024-01-01' AND '2024-03-31';  

/* ✅ Good */  
CREATE INDEX idx_posts_date ON posts(published_at);

🧠 Why this matters:
Indexes act like a book’s index, they let the database jump directly to relevant rows instead of scanning every page. Date-range queries become 10x faster.

4️⃣ Break Queries into CTEs

/* 🚫 Bad */  
SELECT p.id  
FROM posts p  
JOIN interactions i ON p.id = i.post_id  
GROUP BY p.id  
HAVING SUM(i.points) > (  
  SELECT AVG(total) FROM (  
    SELECT SUM(points) AS total FROM interactions GROUP BY post_id  
  ) sub  
);  

/* ✅ Good */  
WITH post_totals AS (  
  SELECT post_id, SUM(points) AS total  
  FROM interactions  
  GROUP BY post_id  
)  
SELECT p.id  
FROM posts p  
JOIN post_totals pt ON p.id = pt.post_id  
WHERE pt.total > (SELECT AVG(total) FROM post_totals);

🧠 Why this matters:
CTEs turn nested spaghetti logic into reusable building blocks. You calculate post totals once and reference them twice – no repeated aggregation.

5️⃣ Avoid `SELECT *` in Subqueries

/* 🚫 Bad */  
SELECT n.name,  
  (SELECT * FROM posts p WHERE p.newsletter_id = n.id) AS posts  
FROM newsletters n;  

/* ✅ Good */  
SELECT n.name,  
  (SELECT COUNT(id) FROM posts p WHERE p.newsletter_id = n.id) AS post_count  
FROM newsletters n;

🧠 Why this matters:
Subqueries with SELECT * pull entire rows even if you only need one value. This is like ordering a pizza to eat just one slice, wasteful and slow.

6️⃣ Use `JOIN` Instead of Subqueries for Filters

/* 🚫 Bad */  
SELECT id FROM newsletters  
WHERE id IN (SELECT newsletter_id FROM posts WHERE name LIKE '%SQL%');  

/* ✅ Good */  
SELECT DISTINCT n.id  
FROM newsletters n  
JOIN posts p ON n.id = p.newsletter_id  
WHERE p.name LIKE '%SQL%';

🧠 Why this matters:
Most databases optimize JOIN filters better than IN subqueries. Joins can leverage indexes on posts.name, while subqueries often trigger full scans.

🔍 You now have a query that:

Runs in milliseconds instead of minutes
Uses 70% less memory by avoiding SELECT * bloat
Scales gracefully with strategic indexing
Stays readable through CTEs and modular logic

🏁 Conclusion

SQL optimization is iterative, not one-time. Start with:

The slowest query in your system
The most frequently run query
The query with the most extensive dataset

Remember: A 10% improvement on a query running 1000x/day saves more resources than a 50% improvement on a query running once a week.

Once you start optimizing, your SQL Query will become much better.

Once you understand it, you won't revert.

👉🏻 SQL playground with the optimization example

How to Get Started 🚀

Over the previous weeks, we already going through:

✅ SQL Fundamentals
✅ Intermediate SQL
✅ Advanced SQL
✅ Database Operations
✅ Writing Efficient Queries

In the next issues, we will start working on real-world projects and solving hands-on data problems.

What’s Next? ➡️

To get the full experience and fully immersed in the learning:

👉 Subscribe to Databites.tech (By Josep)

👉 Subscribe to Non-Brand Data (By Cornellius)

👉 Check out the SQL Crash Course GitHub repo

👉 Share with your friend and whoever needs it!

🗓️ Every Thursday, you will have two new issues in your inbox!

Let’s dive in and make SQL less scary, more fun, and way more useful! 🚀

Josep & Cornellius

Non-Brand Data

#17 Indexing and Optimization

Discussion about this post

Ready for more?

Non-Brand Data

#20 Query Optimization

SQL Crash Course #20

🧱 What is SQL Query Optimization?

#17 Indexing and Optimization

✅ When to Optimize SQL Queries?

1️⃣ Use EXISTS Instead of JOIN + DISTINCT

2️⃣ Select Only What You Need

3️⃣ Index Strategically

4️⃣ Break Queries into CTEs

5️⃣ Avoid SELECT * in Subqueries

6️⃣ Use JOIN Instead of Subqueries for Filters

Putting It All Together

🔍 You now have a query that:

🏁 Conclusion

How to Get Started 🚀

What’s Next? ➡️

Discussion about this post

Ready for more?

1️⃣ Use `EXISTS` Instead of `JOIN + DISTINCT`

5️⃣ Avoid `SELECT *` in Subqueries

6️⃣ Use `JOIN` Instead of Subqueries for Filters