A Fundamental Guide to SQL Query Optimization

Koushik Thota
4 min readMar 20, 2022

Best practices which made me write performant queries

Photo by Morgan Housel on Unsplash

SQL Query optimization is the process of choosing the most efficient means of executing an SQL statement and is an essential component of any application as it can improve the user experience.

Even slight changes in SQL queries can improve performance drastically and over time I have employed a few best practices which have helped me improve query performance and hope these will help anyone who is looking to write performant queries.

Prefer UNION ALL to UNION

If duplicates are not an issue, UNION ALL won’t discard them, and since UNION ALL isn’t tasked with removing duplicates, the query will be more efficient.

Use TRUNCATE instead of DELETE to delete all rows in a table

The TRUNCATE statement removes all rows from a table by dropping the table and recreating a new table with the same name. This performs better than using DELETE, which performs multiple transactions to delete all rows.

Drop index before loading bulk data

Inserting thousands of rows into a table having an index slows down the ingestion of data.

In such cases it’s preferable to drop indexes before loading the table. Once the load is done, recreate the indexes on the table.

Avoid too many subquery or CTE joins

Joining many subqueries or CTE’s might lead to holding all the rows in memory and requires constant spilling of data to disk in order to facilitate the join operation and it takes a long time to finish the query.

A solution to this is to load the data or all the subqueries/CTE’s into intermediate/temp tables and then use those tables to do the join, as this doesn’t involve holding in memory and uses the pre-computed data available in disk via the intermediate/temp tables to compute the query.

This was something which helped me cut down the query execution time from 7+ hours to just 10 mins. That’s just ridiculous !!!!!

Use Data Compression Whenever Possible

This works really well for columnar data warehouses like redshift and snowflake and also row based databases like SQL Server.

A majority of the time, it takes longer to read from the disk subsystem than it does to decompress the data. If the bottleneck is at the I/O level, using compression helps reducing the size of the table or index on the disk and speeds up the query execution.

Use Same Datatype on JOIN and WHERE Clauses

When joining or comparing two fields with different datatypes, SQL must do an on-the-fly conversion of the field before it can do a comparison, even if the fields are indexed which requires resources and might reduce performance. If mismatched datatypes are unavoidable, try to cast the larger datatype to the smaller datatype whenever possible.

Make sure your queries take advantage of INDEXING

One of the most common things folks run into when experiencing performance issues with database queries is a lack of adequate indexing.

Which columns you should index usually depends on the columns you’re filtering by (i.e., which columns typically end up in your WHERE clauses). If you find that you’re always filtering by a common set of columns, you should consider indexing those columns.

Avoid correlated subqueries unless required

A correlated subquery depends on the outer query. Since it executes row by row, it decreases the overall speed of the process.

Avoid

SELECT c.Name, c.City,
(
SELECT CompanyName
FROM Company
WHERE ID = c.CompanyID
) AS CompanyName
FROM Customer c

Prefer

SELECT c.Name, c.City, co.CompanyName 
FROM
Customer c
LEFT JOIN
Company co
ON c.CompanyID = co.CompanyID

Use Equals Operator (=) instead of LIKE clause

‘=’ and LIKE clauses are used to match rows having a particular value. The main difference between the two is that the LIKE operator is used for matching wildcards such as % to search for partial strings, whereas the equals operator “=” looks for exact matches.

If you have to choose between the two, always prefer the equals operator (“=”), as it makes use of indexed columns which makes it faster in searching through the value, but if the search requirement is about searching on a particular pattern then LIKE can be used.

Avoid concatenation of columns in WHERE clause

Avoid concatenation in WHERE clause whenever possible. You should Avoid concatenating multiple columns in WHERE clause. If there is concatenation, break the query into multiple conditions.

Avoid

SELECT name, surname 
FROM class
WHERE name || surname = ‘Alexmercer’

Prefer

SELECT name, surname
FROM class
WHERE name = ‘Alex’
AND surname = ‘mercer’

Avoid SELECT DISTINCT for large tables

SELECT DISTINCT clause allows you to obtain unique entries from a query by removing duplicate entries. However, SELECT DISTINCT is computationally expensive. It’s advisable to avoid it as much as possible until it’s a must.

--

--

Koushik Thota

Ex - child | Data Engineer | Loves Tech and Finance