Friday, February 13, 2015

Thoughts and notes from Prague PostgreSQL Developers Day 2015

The day started with news in PostgreSQL 9.4 presented by Tomáš Vondra, now working for 2ndQuadrant. He talked about improvements in replication, GIN indexes, numeric aggregates, refreshing materialized views, altering system variables directly from daemon, ...

Then, Pavel Stěhule from GoodData gave us a bit more deep review of storing non-structural data within PostgreSQL (from historic implementations to the newest great JSONB new in 9.4). It was quite interesting to see that even traditional relational DB users think seriously about NoSQL solutions like MongoDB. It seems even advocates of those traditional SQL solutions see some advantages of new way to store data (NoSQL concepts).

It was also mentioned that NoSQL is not only understood as not-sql, but more often like not-only-sql and that while relational databases implement features to get closer to NoSQL, NoSQL databases implement features to get a bit closer to traditional SQL solutions (assuming implementing better ACID, reliability, ...)

Vratislav Beneš from OptiSolutions had a talk about unstructured and sensor data, generally about big data challenges. He presented comparison between PostgreSQL 9.4 (with JSONB) and MongoDB for the same type of work with unstructured data. His testing showed there are no big differences in performance, PostgreSQL is however quite better in space utilization (~3 times less data on disk). He closed his talk with a thought that everybody should choose a database carefully, according to a specific use case (e.g. if we care about other NoSQL features like out-of-the-box sharding/replication or we're fine with a bit more heavy-footed, but more reliable SQL solutions).

Aliaksandr Aliashkevich gave quite basic overview about sharding solutions for PostgreSQL, just read his shards.io web for more information, the presentation had basically a similar content.
A similar overview, this time about other open-source tools (dumping, upgrading, partitioning, logical replications) was given by Keith Fiske, actual author of the tools. They are all available under his username on github, so just look there or chech his site keithf4.com for more information.

Marc Balmer from micro systems spoke about securing PostgreSQL application, where he emphasized the need to not limit access only on application level, but also on database level. He didn't omit to give a basic overview about main database vulnerabilities on some specific examples (like SQL injection) but most of the presentation was about ways to secure data within a database from being abused by users that shouldn't have access for them. I think nobody who was there and pays some attention about security will ever use one superuser role to access DB again. Hopefully the slides will be soon at p2d2 site.

Petr Jelínek from 2ndQuadrant talked about a feature, that is heading to PostgreSQL 9.next (not sure yet if 9.5 mades it) -- BDR -- Bi Directional Replication. This basically implements multi-master capabilities and it is already working solution, which is patch-by-patch heading to PostgreSQL upstream. We know upstream is careful about any new feature, so even in this case it goes rather slowly, because upstream needs to be sure it is good for everybody (not only for specific use cases).

Štěpán Bechynský showed Amazon's AWS in practice, especially what are the steps to get a PostgreSQL machine in the cloud (basically PaaS variant). It must have been interesting for anybody who haven't seen how provisioning in cloud works from users' point of view, but for me, as I already saw OpenStack in action, I didn't actually learned much new. I heard from other attendees as well that there were some specific experiences missing -- like performance experiences, what are some specific issues someone new to cloud world experiences, etc.

Since PostgreSQL now supports foreign data wrappers, couple of interesting wrappers are available already. Those add some new functionality to the daemon itself and reminds me MySQL's engine architecture a bit. Jan Holcapek introduced on of those wrappers -- cstore_fwd, which adds columnar storage capabilities to PostgreSQL.

It was interesting to see that even on some non-complicated queries the EXPLAIN command showed several times less IO operations, comparing to native PostgreSQL. For more complicated use cases that may be even better, since going through columns directly for aggregation is much more effective than reading all the rows with most of the data read unnecessarily for the specific query.

Even if this particular foreign data wrapper doesn't support insert/update/delete features, it seemed very promising. It was also interesting when Jan asked audience what column database they use. Since MonetDB was mentioned not only once, it seems to be a good candidate to be packaged into Fedora. Who is volunteering for this?

Tuesday, February 03, 2015

Fosdem notes alias what's happening in open-source world

I've had a great opportunity to be at Fosdem, enormous conference with several thousand of attendees that took place the last weekend in Brussels. The following are some notes from talks I attended.

Python & PostgreSQL, a Wonderful Wedding

This was more or less summary of various ways how python and PostgreSQL may interact. There were couple of python libraries introduced (psycopg2, SQLAlchemy, Alembic), then some PostgreSQL python-related extensions (like PL/Python, MultiCorn).
Especially MultiCorn is something which gives quite wide range of options to python developers above PostgreSQL, it basically allows to write foreign data wrapper in python, so whatever data you can get with python (be it a text file, embedded DB, XML document or another database), you can then access it within database using SQL.
Schedule entry here.

Nix, NixOS, NixOps

This seemed to be a bit chaotic demo about Nix* tools and OS. I expected to get some ideas about solving packaging-more-versions issue, but the talk was more focused on vagrant-like provisioning of virtual environments.
It seemed also a bit interesting that the speaker didn't know about environment modules existence, so obviously he was not able to tell what is the difference.
But hearing some guys from audience that environment modules are great to solve many issues for them was quite interesting for me, especially related to Software Collections.
Schedule entry here.

Upstream Downstream: The relationship between developer and package maintainer

Talk given by Oracle guy who is btw. responsible to maintain relationships with mysql package maintainers in other distros. He tried to show Oracle is not that evil and that they want to fix bugs if they know about them (which is not happening in some cases).
It was a bit less technical talk, where also some interesting numbers were mentioned -- that devel count on MySQL project has been doubled and QE count tripled recently.
Even after personal straight question what happens with MySQL in the future I got quite straight answer that I shouldn't be afraid, it won't get closed more.
We also talked together with MySQL/MariaDB debian maintainer Otto and MariaDB architect Colin about need to bring packages in various distros closer, so we unify users' experience.
Schedule entry here.

Modern SQL in PostgreSQL

This was half technical and half motivation talk which was supposed to convince developers to use latest features from SQL databases and also SQL standard. On few examples we saw like one keyword that is not known by majority of DB developers may help in readability or even performance.
It was interesting seeing when some of the latest SQL standard features are implemented in various databases. Usually IBM's DB2 or Oracle database supported the features even before or shortly after introducing it in the SQL standard, while PostgreSQL with MS Server usually implemented those features few years after.
And MySQL did not support those mentioned features almost ever, so it proofed that MySQL is better to simple use cases and less advanced (traditional) usage. On the other hand, PostgreSQL with MS Server follow Oracle DB's and DB2's development being few years behind.
Schedule entry here.

ProxySQL : High Availability and High Performance Proxy for MySQL

ProxySQL is a promising tool that may help setting up even complicated deployments with various replication schemes, since the proxy server communicates either with MySQL server or another ProxySQL daemon with pure MySQL protocol. Thanks that feature it allows to do various things according to regular expressions defined to behave differently on various queries.
It was almost the same talk as the last year, except there are probably some bugs fixed. Anyway, the tool seems worth packaging to Fedora and playing with it at some point.
Schedule entry here.

User-land and developer-land chat

Interesting talk about communication within open source project -- designers vs. devels vs. users. It was nice to see the opposite side of the relation, for example that users often don't understand why bug reports get closed and why some changes in design happens, which often leads to their frustration.
Some simple things like indicating briefly, when closing a bug, that devels still want to collaborate, they just don't have enough data. It may be that simple, so users don't feel like they are ignored.
Another example with Gnome 3 and it's firstly problematic and later quite large adoption showed that even if devels do rapid changes in GUI, users may feel good, but only after some needed plugins have been introduced, which brought some of their favorite features back.
Another issue was that even if upstream is open for collaboration, people still need some borders, so setting up some guidelines similar to Tango Icon Theme Guidelines may help producing consistent output and the collaboration works much better with then.
The last thought was dedicated to need non-technical people, since devels are often reluctant to write necessary documentation, maintain proper social networks, etc.
These all are few topics everyone involved in OSS may think of.
Schedule entry here.