Saturday, February 08, 2020

Report from Prague PostgreSQL Developer Day 2020

Another year behind us and the next P2D2 conference of PostgreSQL users and developers is ahead of us. Well, behind now as well actually. Slides will hopefully be soon at https://p2d2.cz/rocnik-2020/prednasky, but let's see what I personally considered interesting.

What's new in PostgreSQL 12 (a 13)

The first talk is traditionally given by Tomas Vondra and its about news in the latest version of PostgreSQL, this time v12 and upcoming v13. From those that seem interesting to me are improvements GSSAPI support that now support new auth mechanisms, new progress monitors for vacuuming and creating that allow admins to watch progress of those resource demanding operations. Performance can be also improved by reindexing concurrently. With PostgreSQL v12, users can also turn on checksums when seeing troubles with disk (or turn them off in case of performance issues).
Plugable access API will allow to implement and use different engines similar to what MySQL offers. So, we can slowly start looking forward to a columnar storage, or other innovative solutions.
Generated columns might be useful when developers need to store some calculated values.
Experience of working with JSON is improved as well, developers can now use new functions to work with JSON paths. And many more enhancements are coming to v13 as well.

JavaScript inside PostgreSQL

Simone Sanfratello then talked why and how use JavaScript inside PostgreSQL server, using plv8. Some examples of reasonable use are sanitizing values and doing some more advanced string operations like URL encoding and decoding. It can be extra useful in connection with JSON, that is more and more common in PostgreSQL world. What was a bit surprising that comparing to plpython3, plv8 was faster in the testing scenarios and a bit closer to the native postgresql implementation.

Is JSONB a Silver Bullet?

Angus Dippenaar continued in JSON and JSOB topic, first compared what are the benefits of JSON and where the JSONB is better. Thanks to parsing and indexing, JSONB is better when we need to work with JSON data effectively and make PostgreSQL to understand them. On the other hand, JSON is more effective when we only need to store and load it as a string, because it is only validated to be a valid syntax.
The most interesting part of the talk was slightly more practical part, where Angus showed mostly performance data from work with Twitter data, stored in PostgreSQL as JSON or JSONB. Comparison of either spead of storing or searching, same as comparison of table sizes was very interesting.

PostGIS in practice

Michal Páleník started after lunch, but didn't let the audience sleep long, because his example on the map projection on a mandarine was really fun. Since he was talking about PostGIS, the necessary geography basics were handy, but most people probably appreciated the queries that showed how to solve several tasks with PostGIS -- for example which Prague pub is the closest or most detached, pretty useful. He connected that with the other side of the web-app, using a simple JavaScript it is possible to show geography data pretty simply and I also was not aware that instead of OpenStreetMap, there is also http://project-osrm.org, a project that offers open source routing machine.

Experiences gained by optimizing an SQL query and consequences on testing the application

Aleš Zelený, as he likes it, shared his real experiences from practice, to be concrete experiences from disasters. He explained that a very simple task, at least that was his understanding in the beginning, turned to be a real horror story. From invalid data, real world issues and weired behaviour of PostgreSQL connector driver for R, through not little data (dozens of GB). He obvoiusly learned a lot on this task, and it was very interesting to hear the experiences he shared. One from all -- write meaningful comments, not those that describe the code, but that describe the thinking of the author. And never trust data, just because sample looks ok. What also helped was decomposing the big query, analyze and optimize it part by part. The fact that the final SQL is big does actually not mean it is wrong, it is just not readable and understandable. It is also handy to know the latest features of the PostgreSQL, as the covering indexes improved the overall performance as well. Overall, very intersting talk.

Professional PostgreSQL scheduling made easy

Pavlo Golub then talked about their solution aka yet another scheduler in PostgreSQL. This one has golang backend and looked pretty interesting.

Monitoring PostgreSQL

Roman Fišer talked about monitoring generally, what makes sense to monitor, so basically a handy 101 for PostgreSQL admins beginners. Then he showed Prometheus which seems to be one of the leading implementations in this field.

Profiling PlpgSQL tools

Pavel Stehule also shared some experiences from the field, that he met on the customers' side. For example, customers sometimes do not use the tools that already exist. Some of the tools are available by default, no need to install anything extra, so it's good to know about them. For example EXPLAIN ANALYZE statement, or auto_explain config. These are later used by external tools for profiling that can present pretty simple results readable even for less skilled admin. And for the end, Pavel gave some extra tips why queries might be too slow, like that PostgreSQL doesn't have stats for functions like MSSQL or Oracle have, or that it doesn't have plan cache. For complex queries, we can tell PostgreSQL to spend more time on optimizing, when it makes sense. Or a function that returns how much indexes are used, which might either be a sign for wrong query or really unneeded index that just slows down data manipulation queries.

So, again as every year, a lot of food for thought. Really great one-day conference.