Optimizing queries for not so big data in PostgreSQL

Speaker(s) Stephanie Mifsud

Sub Community: PyData

Hotjar’s user recordings count above 400 million, with supporting tables containing 4.5 billion records. This 5TB data fits nicely into Postgres and doesn’t quite merit the full big data suite of tools. However, at the rate of 1000 recordings per minute, and overall request rate of 750K per minute, the penalty of inefficient queries and updates can quickly cause nasty performance spikes if not thought out well. This talk is about the challenges we faced at the lower end of big data: the good decisions which helped keep our application running and other lessons we had to learn the hard way.

Considerations for Database Design

Design entities for the domain
Balance normalization with performance
Sharding later has big migration costs, consider designing for this early

Speak to the database from your Web Application

Why use ORMs and at which level of abstraction?
Stored Procedures are fast, should we have more of those?

Bringing data closer to the application

Materialize Views
Defer aggregations
Application Level Caching

Handling Operational Troubles

Explain(analyze, buffers) is your friend
Detect and manage Index Bloat
Reduce Deadlocks

Reducing Impact of Background Maintenance Jobs

Keep impact on database low with cursors and streaming
Plan data retention policies early, so cleaning can be an ongoing process

in on Thursday 13 July at 15:45 See schedule

Do you have some questions on this talk?

New comment

Comment:

Name:

Email address:

URL:

Captcha: