Non-Brand Data

Non-Brand Data

Home
Podcast
Chat
LinkedIn
👨‍💼Personalized Mentorship …
AI 🤖
Python 🐍
SQL crash course 🔢
RAG-To-Know🔍
Data Resource 🗂️
Like a Pro🏆
Data Career💼
About Non-Brand Data👇
Archive

Share this post

Non-Brand Data
Non-Brand Data
#20 Query Optimization

#20 Query Optimization

SQL Crash Course #20

Cornellius Yudha Wijaya's avatar
Josep Ferrer's avatar
Cornellius Yudha Wijaya
and
Josep Ferrer
May 16, 2025
8

Share this post

Non-Brand Data
Non-Brand Data
#20 Query Optimization
1
Share
Cross-post from Non-Brand Data
Let's keep our SQL learning on fire with the latest issue, understanding how to optimize queries! Go check it out now 👇🏻 -
Josep Ferrer

All the course material is stored in the SQL Crash Course repository.

Hi everyone! Josep and Cornellius Yudha Wijaya from Non-Brand Data here 👋🏻

As promised, today we are publishing the next two issues of our SQL Crash Course – From Zero to Hero! 🚀

I am sure you are here to continue our SQL Crash Course Journey!📚

If this is your first time or you’ve forgotten what we’ll cover, we will examine seven key SQL topics, each divided into multiple posts across Non-Brand Data and DataBites.

📚 Remember last time we already saw:

📌 #17 – Indexing and Optimization
📌 #18 – Generating Modular Code

In today’s topic, we will finish up the Crafting Good Query topic with:

📌 #19 – SQL Execution Order by

Josep Ferrer
in
DataBites

📌 #20 – Query Optimization, which is the current article you are reading!

So don’t miss out—let’s keep the SQL momentum going!


🧱 What is SQL Query Optimization?

SQL Query Optimization is a method that involves refining SQL queries.

Without optimization, SQL Query could:

  • Waste server resources

  • Crash under heavy loads

  • Frustrate users with delays

That’s why SQL Query Optimization comes to help you to:

  • Improve performance

  • Reduce resource consumption

  • Ensure efficient data retrieval

Many methods exist to optimize SQL queries, including the indexing method we have learned previously.

#17 Indexing and Optimization

#17 Indexing and Optimization

Cornellius Yudha Wijaya and Josep Ferrer
·
May 8
Read full story

Although this article will further explore the SQL queries, you will still learn many new things.

✅ When to Optimize SQL Queries?

You don't always have to optimize your SQL queries. Here are situations when optimization may be necessary:

👉 Queries are slow or timing out
👉 Handling large datasets or complex joins
👉 Database load affects application performance
👉 Before scaling infrastructure to ensure resource efficiency

Let’s explore a few practical strategies to optimize SQL Queries! 👇

1️⃣ Use EXISTS Instead of JOIN + DISTINCT

/* 🚫 Bad */  
SELECT DISTINCT n.*  
FROM newsletters n  
JOIN posts p ON n.id = p.newsletter_id  
WHERE p.name = 'DataBites';  

/* ✅ Good */  
SELECT *  
FROM newsletters n  
WHERE EXISTS (  
  SELECT 1  
  FROM posts p  
  WHERE p.newsletter_id = n.id  
  AND p.name = 'DataBites'  
);  

🧠 Why this matters:
EXISTS stops scanning at the first match, while JOIN + DISTINCT processes all matching rows before deduping. This reduces I/O operations and avoids unnecessary data shuffling.


2️⃣ Select Only What You Need

/* 🚫 Bad */  
SELECT * FROM posts WHERE published_at >= '2024-01-01';  

/* ✅ Good */  
SELECT id, name, published_at  
FROM posts  
WHERE published_at >= '2024-01-01';  

🧠 Why this matters:
Unused columns bloat memory usage and network transfer. Explicit column selection can cut data volume by 60 %+ in large tables.


3️⃣ Index Strategically

/* 🚫 Bad */  
SELECT name FROM posts  
WHERE published_at BETWEEN '2024-01-01' AND '2024-03-31';  

/* ✅ Good */  
CREATE INDEX idx_posts_date ON posts(published_at);  

🧠 Why this matters:
Indexes act like a book’s index, they let the database jump directly to relevant rows instead of scanning every page. Date-range queries become 10x faster.


4️⃣ Break Queries into CTEs

/* 🚫 Bad */  
SELECT p.id  
FROM posts p  
JOIN interactions i ON p.id = i.post_id  
GROUP BY p.id  
HAVING SUM(i.points) > (  
  SELECT AVG(total) FROM (  
    SELECT SUM(points) AS total FROM interactions GROUP BY post_id  
  ) sub  
);  

/* ✅ Good */  
WITH post_totals AS (  
  SELECT post_id, SUM(points) AS total  
  FROM interactions  
  GROUP BY post_id  
)  
SELECT p.id  
FROM posts p  
JOIN post_totals pt ON p.id = pt.post_id  
WHERE pt.total > (SELECT AVG(total) FROM post_totals);  

🧠 Why this matters:
CTEs turn nested spaghetti logic into reusable building blocks. You calculate post totals once and reference them twice – no repeated aggregation.


5️⃣ Avoid SELECT * in Subqueries

/* 🚫 Bad */  
SELECT n.name,  
  (SELECT * FROM posts p WHERE p.newsletter_id = n.id) AS posts  
FROM newsletters n;  

/* ✅ Good */  
SELECT n.name,  
  (SELECT COUNT(id) FROM posts p WHERE p.newsletter_id = n.id) AS post_count  
FROM newsletters n;  

🧠 Why this matters:
Subqueries with SELECT * pull entire rows even if you only need one value. This is like ordering a pizza to eat just one slice, wasteful and slow.


6️⃣ Use JOIN Instead of Subqueries for Filters

/* 🚫 Bad */  
SELECT id FROM newsletters  
WHERE id IN (SELECT newsletter_id FROM posts WHERE name LIKE '%SQL%');  

/* ✅ Good */  
SELECT DISTINCT n.id  
FROM newsletters n  
JOIN posts p ON n.id = p.newsletter_id  
WHERE p.name LIKE '%SQL%';  

🧠 Why this matters:
Most databases optimize JOIN filters better than IN subqueries. Joins can leverage indexes on posts.name, while subqueries often trigger full scans.

Share


Putting It All Together

🔍 You now have a query that:

  1. Runs in milliseconds instead of minutes

  2. Uses 70% less memory by avoiding SELECT * bloat

  3. Scales gracefully with strategic indexing

  4. Stays readable through CTEs and modular logic

🏁 Conclusion

SQL optimization is iterative, not one-time. Start with:

  1. The slowest query in your system

  2. The most frequently run query

  3. The query with the most extensive dataset

Remember: A 10% improvement on a query running 1000x/day saves more resources than a 50% improvement on a query running once a week.

Once you start optimizing, your SQL Query will become much better.

Once you understand it, you won't revert.

👉🏻 SQL playground with the optimization example


How to Get Started 🚀

Over the previous weeks, we already going through:

✅ SQL Fundamentals
✅ Intermediate SQL
✅ Advanced SQL
✅ Database Operations
✅ Writing Efficient Queries

In the next issues, we will start working on real-world projects and solving hands-on data problems.


What’s Next? ➡️

To get the full experience and fully immersed in the learning:

👉 Subscribe to Databites.tech (By Josep)

👉 Subscribe to Non-Brand Data (By Cornellius)

👉 Check out the SQL Crash Course GitHub repo

👉 Share with your friend and whoever needs it!

🗓️ Every Thursday, you will have two new issues in your inbox!

Let’s dive in and make SQL less scary, more fun, and way more useful! 🚀

Josep & Cornellius

8

Share this post

Non-Brand Data
Non-Brand Data
#20 Query Optimization
1
Share

No posts

© 2025 Cornellius Yudha Wijaya
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share