Optimising PostgreSQL query performance — Understanding Explain Analyse (A beginning)
You are sitting peacefully sipping your favourite flavour of Caffeine when suddenly you get a notification that your Backend Server’s error rate is surging. There are a lot of timeouts, you got up from that comfy chair, almost spilling the coffee. You recollected that Marketing had just increased their budget and there is sudden load on the Application server and your DB. Duty is calling, you need to get up and start working on the resolution.
Sounds familiar? Well that’s a usual scenario in today’s fast paced world of Build Fast, Break Faster.
This article is a beginning in the series of DB optimisations based on Queries, Indexing, Parameter manipulations, and what not! Most of you working in Backend for a long time will already be knowing it, but I would like to share my learnings so that it will help someone someday struggling to figure out why their DB is behaving in a manner it is not intended to.
Issue
A query on one of our servers was taking 3.9 seconds to execute. Well 3.9 is a number which infers different meaning in different context. If you are Usain Bolt, and you took 3.9 seconds to run that 100 meters, BRAVO! But if you are a Database Administrator and you saw a query taking the same amount of time, you will begin to sweat. Sadly I am not Usain Bolt, and I was in the latter’s position. My task was cut out that day. I need to figure out what was the issue that was causing the havoc.
Investigation
I started by explain analysing the query and I got the following -
(NOTE: The name of the tables and columns are changed for obvious reasons.)
We can see from explain analyse of the query that bulk of the time is taken by Bitmap Heap Scan, approximately ~2.7 Seconds.
Our job is cut out to work on Bitmap Heap Scan part of the query analyser. It says the Recheck condition is working on two columns — india_four and xray_charlie. The DB was checking these two columns for the Recheck condition, maybe we can leverage by using a composite index on these two columns.
Solution
Because of the recheck condition on the two columns, we can create a composite index on india_four and xray_charlie. The following are the results obtained after creating the index -
Time taken by the query -
As you can see that there’s a change from Bitmap Heap Scan to Index Scan and the query execution time drastically reduced from 3.9 seconds to 181 ms.
The magic potion, the elixir was the composite index lurking and hiding in the 100s of lines of explain analyse. It’s important for a DBA/Backend Engineer to understand how to read the explain analyse and do the optimisations based on them. A good resource —
Understanding Bitmap Heap Scan and Index Scan -
- https://www.cybertec-postgresql.com/en/postgresql-indexing-index-scan-vs-bitmap-scan-vs-sequential-scan-basics/
- https://dba.stackexchange.com/questions/119386/understanding-bitmap-heap-scan-and-bitmap-index-scan
- https://www.youtube.com/watch?v=sMbQW7XNUZs
There are a lot of other optimisations that needs to be done over the query, but this is just a starting.
Conclusion
Understanding explain analyse is one of the most important part of a DBA/Backend Engineer job. It’s like reading between the lines, it can be daunting at times, but you just need to pay special attention to those lines and you will eventually get to THE ANSWER.
Engineering is all about finding such beautiful optimisations on a day to day basis that eventually helps somebody out there in the world relying on us to do our best. So give your best and world will be a beautiful place.
PS: I hope you like the article, please suggest if I am wrong anywhere, happy to learn :)