Sunday, November 22, 2015

Thoughts from PGConf.eu, PostgreSQL Europe Conference 2015, Day 1

It was my first PGConf.eu and it was awesome. My bad that I waited so long to share some of the thoughts, but fixing it now with summary. Links to presentations are located at https://wiki.postgresql.org/wiki/PostgreSQL_Conference_Europe_Talks_2015.

The keynote by Tamara Atanasoska called The bigger picture: More then just code mentioned that community project is more about people, code is second. It was also stressed how important it is to be open to new-comers and users. Great eye opener for enybody involved in open-source.

Upcoming PostgreSQL 9.5 Features talk by  Bruce Momjian involved these news: insert on update, aggregate functions enhancements, in-memory performance, JSON manipulation and operations and improvements in foreign data wrappers, which now can be part of sharding now. I really can't wait for PostgreSQL 9.5 GA.

Dockerizing a Larger PostgreSQL Installation: What could possibly go wrong? by Mladen Marinović was really something what I looked for, because containers is now the topic #1 for me.  Mladen demonstrated using containers mostly as VM, basics of Docker were also introduced, since only half of the room were familiar with it. Problems with using cache during `docker build` and locales inside containers were mentioned, but the solutions was rather hacky to me (using date to trick the daemon, instead of straightforward --nocache). Problem with sending signals to process was alse mentioned, which is something we were looking at also during creation of containers for Open Shift.

Mladen uses Ansible for building images and uses several containers for specific actions -- e.g. a separate container for tools (dump/load). They support replication -- host_standby, restart_command (run in separate container) and run on two physical servers. Backups (pg_basebackup + compress) according retention policy - another container.

It was also mentioned that taking back-up from slave is not that easy. Monitoring, that all containers are alive, is done by HA SW. Problem with OOM were also mentioned, that system may kill your container processes, but solution was to use max_connections. Problem with freezing server was then solved by using timeout for every command. Transparent huge pages were mentioned as something not to use -- use normal huge pages instead. Finaly, upgrading to new major versions were mentioned as tough point always.
See more at: https://bitbucket.org/marin/pgconfeu2015/src

DBA's toolbelt by Kaarel Moppel was more a list of possible tools to look at when you mean it seriously with administration of PostgreSQL. So, just repeating the list may be interesting for someone.:
  • docs + pgconfig for configuring
  • pgbench + pgbench-tools for bechmarking
  • pg_activity (top-like) + pg_view + pgstats + pg_stat_statements + pgstattuple + plugins for monitoring systems (nagios) + postgresql-metricks (spitify) for monitoring
  • pgBadger + pg_loggrep - log analyzing
  • postgresql-toolkit - victorinox for postgresql dba
  • pg-utils, pgx_scripts (https://github.com/pgexperts/pgx_scripts), acid-tools
  • pg_locks, pg_stat_activity (locks)
  • pgObserver, pgCluu, many others, based on what we need to do
  • pgadmin4 on the way (backend and web frontend)
  • developer: pgloader, pg_partman, pgpool-II, PL/Proxy
Managing PostgreSQL with Ansible by Gulcin Yildirim was more an introduction of Ansible.
https://github.com/gulcin/pgconfeu2015

Let’s turn your PostgreSQL into columnar store with cstore_fdw by Jan Holčapek introduced interesting plugin, that may convert clasic row-based storage PostgreSQL engine into column storage database by utilizing foreign data wrapper concept. In some cases this may help a lot for the performance, once ready.

Performance improvements in 9.5 and beyond by Tomas Vondra was not only an interesting insight to particular areas that PostgreSQL hackers look at, but also nice motivation to look at aupgrading to 9.5. For example sorting speed-up up to 3-4x in comparisson to 9.4 in some cases is something I wouldn't really expect. Another comparisson of BRIN vs. BTREE indexes showed that performance is quite similar, but BRIN is much much much smaller. Another set of graphs showed how parallel scan allows to speed-up selects up to half of the time.

Notes from second day are here and from third day are here.

Saturday, November 21, 2015

How we've fixed upgrade path for CentOS 6 Software Collections

Short meesage for those who don't have time:
Software Collectoins for CentOS 6 are ready for upgrade from older rebuilds.

Now full story for those who care. Btw. this all is related to work done by SCLo SIG group that is part of CentOS (read more at http://wiki.centos.org/SpecialInterestGroup/SCLo).

A bit of history for beginning. Shortly after first RHSCL 1.0 release, CentOS rebuilds were prepared and since then they are available under:
http://mirror.centos.org/centos/6/SCL/x86_64/

However, keeping these rebuilds in sync with RHSCL content hasn't been easy task. With introduction of Java packages in collections, this task became even more tricky, which means these collections were not updated for long time. With that said, someone would expect there won't be problem with upgrade path, in other words that the new RPMs, that the SCLo SIG group is about to release, will update the older RPMs smoothly.

Well, not always. The original RPMs used ".el6.centos.alt" as %dist tag, while new builds use just ".el6" and that evolves in cases where python27-python-bson-2.5.2-4.el6.centos.alt.x86_64 > python27-python-bson-2.5.2-4.el6.x86_64, even if those packages have same Release tag in RPM SPEC. That obviously means the packages won't update smoothly.

Solution is quite simple in this case -- use higher Release in RPM SPEC. In some packages, this was already done, because some of the packages received update since original inclusion. In other cases we solve it by adding ".scX" (X is number) suffix to the Release tag. The ".scX" was chosen deliberately since ".scX.el6" is higher (alphabetically) than .el6.

Btw. for cases we need to build package more times before final build (bootstraping), we use suffix ".bsX", which means we can build package without any Release suffix in the end, because ".bsX.el6" < ".el6".

Anyway, this post was meant to let you know that upgrading of el6 packages from originally built RPMs is something we care about.

To verify it works, I've installed all the packages from original repository, then ran "yum update" and that evolved in proper update of all packages. I took that as proof it should work fine in your case as well. If there are still some issues, let us know.

Enjoy Software Collections on CentOS!

Friday, February 13, 2015

Thoughts and notes from Prague PostgreSQL Developers Day 2015

The day started with news in PostgreSQL 9.4 presented by Tomáš Vondra, now working for 2ndQuadrant. He talked about improvements in replication, GIN indexes, numeric aggregates, refreshing materialized views, altering system variables directly from daemon, ...

Then, Pavel Stěhule from GoodData gave us a bit more deep review of storing non-structural data within PostgreSQL (from historic implementations to the newest great JSONB new in 9.4). It was quite interesting to see that even traditional relational DB users think seriously about NoSQL solutions like MongoDB. It seems even advocates of those traditional SQL solutions see some advantages of new way to store data (NoSQL concepts).

It was also mentioned that NoSQL is not only understood as not-sql, but more often like not-only-sql and that while relational databases implement features to get closer to NoSQL, NoSQL databases implement features to get a bit closer to traditional SQL solutions (assuming implementing better ACID, reliability, ...)

Vratislav Beneš from OptiSolutions had a talk about unstructured and sensor data, generally about big data challenges. He presented comparison between PostgreSQL 9.4 (with JSONB) and MongoDB for the same type of work with unstructured data. His testing showed there are no big differences in performance, PostgreSQL is however quite better in space utilization (~3 times less data on disk). He closed his talk with a thought that everybody should choose a database carefully, according to a specific use case (e.g. if we care about other NoSQL features like out-of-the-box sharding/replication or we're fine with a bit more heavy-footed, but more reliable SQL solutions).

Aliaksandr Aliashkevich gave quite basic overview about sharding solutions for PostgreSQL, just read his shards.io web for more information, the presentation had basically a similar content.
A similar overview, this time about other open-source tools (dumping, upgrading, partitioning, logical replications) was given by Keith Fiske, actual author of the tools. They are all available under his username on github, so just look there or chech his site keithf4.com for more information.

Marc Balmer from micro systems spoke about securing PostgreSQL application, where he emphasized the need to not limit access only on application level, but also on database level. He didn't omit to give a basic overview about main database vulnerabilities on some specific examples (like SQL injection) but most of the presentation was about ways to secure data within a database from being abused by users that shouldn't have access for them. I think nobody who was there and pays some attention about security will ever use one superuser role to access DB again. Hopefully the slides will be soon at p2d2 site.

Petr Jelínek from 2ndQuadrant talked about a feature, that is heading to PostgreSQL 9.next (not sure yet if 9.5 mades it) -- BDR -- Bi Directional Replication. This basically implements multi-master capabilities and it is already working solution, which is patch-by-patch heading to PostgreSQL upstream. We know upstream is careful about any new feature, so even in this case it goes rather slowly, because upstream needs to be sure it is good for everybody (not only for specific use cases).

Štěpán Bechynský showed Amazon's AWS in practice, especially what are the steps to get a PostgreSQL machine in the cloud (basically PaaS variant). It must have been interesting for anybody who haven't seen how provisioning in cloud works from users' point of view, but for me, as I already saw OpenStack in action, I didn't actually learned much new. I heard from other attendees as well that there were some specific experiences missing -- like performance experiences, what are some specific issues someone new to cloud world experiences, etc.

Since PostgreSQL now supports foreign data wrappers, couple of interesting wrappers are available already. Those add some new functionality to the daemon itself and reminds me MySQL's engine architecture a bit. Jan Holcapek introduced on of those wrappers -- cstore_fwd, which adds columnar storage capabilities to PostgreSQL.

It was interesting to see that even on some non-complicated queries the EXPLAIN command showed several times less IO operations, comparing to native PostgreSQL. For more complicated use cases that may be even better, since going through columns directly for aggregation is much more effective than reading all the rows with most of the data read unnecessarily for the specific query.

Even if this particular foreign data wrapper doesn't support insert/update/delete features, it seemed very promising. It was also interesting when Jan asked audience what column database they use. Since MonetDB was mentioned not only once, it seems to be a good candidate to be packaged into Fedora. Who is volunteering for this?

Tuesday, February 03, 2015

Fosdem notes alias what's happening in open-source world

I've had a great opportunity to be at Fosdem, enormous conference with several thousand of attendees that took place the last weekend in Brussels. The following are some notes from talks I attended.

Python & PostgreSQL, a Wonderful Wedding

This was more or less summary of various ways how python and PostgreSQL may interact. There were couple of python libraries introduced (psycopg2, SQLAlchemy, Alembic), then some PostgreSQL python-related extensions (like PL/Python, MultiCorn).
Especially MultiCorn is something which gives quite wide range of options to python developers above PostgreSQL, it basically allows to write foreign data wrapper in python, so whatever data you can get with python (be it a text file, embedded DB, XML document or another database), you can then access it within database using SQL.
Schedule entry here.

Nix, NixOS, NixOps

This seemed to be a bit chaotic demo about Nix* tools and OS. I expected to get some ideas about solving packaging-more-versions issue, but the talk was more focused on vagrant-like provisioning of virtual environments.
It seemed also a bit interesting that the speaker didn't know about environment modules existence, so obviously he was not able to tell what is the difference.
But hearing some guys from audience that environment modules are great to solve many issues for them was quite interesting for me, especially related to Software Collections.
Schedule entry here.

Upstream Downstream: The relationship between developer and package maintainer

Talk given by Oracle guy who is btw. responsible to maintain relationships with mysql package maintainers in other distros. He tried to show Oracle is not that evil and that they want to fix bugs if they know about them (which is not happening in some cases).
It was a bit less technical talk, where also some interesting numbers were mentioned -- that devel count on MySQL project has been doubled and QE count tripled recently.
Even after personal straight question what happens with MySQL in the future I got quite straight answer that I shouldn't be afraid, it won't get closed more.
We also talked together with MySQL/MariaDB debian maintainer Otto and MariaDB architect Colin about need to bring packages in various distros closer, so we unify users' experience.
Schedule entry here.

Modern SQL in PostgreSQL

This was half technical and half motivation talk which was supposed to convince developers to use latest features from SQL databases and also SQL standard. On few examples we saw like one keyword that is not known by majority of DB developers may help in readability or even performance.
It was interesting seeing when some of the latest SQL standard features are implemented in various databases. Usually IBM's DB2 or Oracle database supported the features even before or shortly after introducing it in the SQL standard, while PostgreSQL with MS Server usually implemented those features few years after.
And MySQL did not support those mentioned features almost ever, so it proofed that MySQL is better to simple use cases and less advanced (traditional) usage. On the other hand, PostgreSQL with MS Server follow Oracle DB's and DB2's development being few years behind.
Schedule entry here.

ProxySQL : High Availability and High Performance Proxy for MySQL

ProxySQL is a promising tool that may help setting up even complicated deployments with various replication schemes, since the proxy server communicates either with MySQL server or another ProxySQL daemon with pure MySQL protocol. Thanks that feature it allows to do various things according to regular expressions defined to behave differently on various queries.
It was almost the same talk as the last year, except there are probably some bugs fixed. Anyway, the tool seems worth packaging to Fedora and playing with it at some point.
Schedule entry here.

User-land and developer-land chat

Interesting talk about communication within open source project -- designers vs. devels vs. users. It was nice to see the opposite side of the relation, for example that users often don't understand why bug reports get closed and why some changes in design happens, which often leads to their frustration.
Some simple things like indicating briefly, when closing a bug, that devels still want to collaborate, they just don't have enough data. It may be that simple, so users don't feel like they are ignored.
Another example with Gnome 3 and it's firstly problematic and later quite large adoption showed that even if devels do rapid changes in GUI, users may feel good, but only after some needed plugins have been introduced, which brought some of their favorite features back.
Another issue was that even if upstream is open for collaboration, people still need some borders, so setting up some guidelines similar to Tango Icon Theme Guidelines may help producing consistent output and the collaboration works much better with then.
The last thought was dedicated to need non-technical people, since devels are often reluctant to write necessary documentation, maintain proper social networks, etc.
These all are few topics everyone involved in OSS may think of.
Schedule entry here.

Sunday, November 09, 2014

Notes and thoughts from MongoDB Day in London

MongoDB Day in London was held on Nov 06 and I'm still excited I could take part of it. And this post is meant to spread some news learnt there a bit more.

What surprised me in the first place was how many people got to this one-day event -- it was quite few hundereds, which proofed MongoDB user base is still growing and is not small at all already. If wondering who were those visitors, I can just share one observation -- when one talker asked who had some MongoDB application already in production, then less than a half raised their hand, which means most of them are in the middle of developing some MongoDB application right now. At least that was my understanding. But let's focus on technologies already.

MongoDB Management System


Not only in one talk quite a lot of focus was given to MMS, MongoDB's new Management System for managing replicas and shards in the cloud. What I was wondering and so I asked one of the developers -- if there are any plans to open-source it (in contrast to other closed-sourced tools above main DB engine MongoDB provides), but I got a negative answer.

MongoDB' strategy on this is to deliver open-source tools for start-ups and basic applications, but they still plan to earn money on tools necessary for enterprise utilization. Anyway, back to MMS -- where I see a difference comparing to some general tool (Kubernetes, Cockpit) is that MMS is specialized specifically for MongoDB, so it allows to organize servers to replica sets and shards, while configuring them in one web-based system as well. For deployment they focus on AWS, Open Stack or even on-premise usage.

Notes from keynotes


The keynote mentioned main features MongoDB developers have focused in during the last year -- better one-server scalability, removing unnecessary locks (document-level locking) and generally improving performance of course.

Another keynote closed the day in a similar way, added some future features users are asking and that are coming in 2.8.x hopefully -- multi-document transactions, joins and query-able backups. The biggest change though in 2.8 will be the plug-able storage API, which will offer several engines (similar to MySQL), RocksDB as built in and we'll probably be able to use the Tokutek's one -- fractal-tree-based-super-performance engine (well, I can't wait to see the performance in practice).

MEAN and Meteor application stacks above MongoDB


MEAN stack presented combination of MongoDB, Express web framework, Angular HTML framework and NodeJS for server logic -- all together as a modern base for Internet of Things applications. The same presentation also included an example of a real application where we could see which mistakes authors did during development and we may learn a lesson:
  • prototyping with too little data, while founding performance issues with real data later
  • frameworks do much work for developers, which has pros but also cons and sometimes we need to use even hacks to improve performance for instance
  • normalization is not always effective, some level of pre-aggregation is fine
  • tuning a real deployment is more like a whack-a-mole game -- solving one bottleneck moves the bottleneck to a different component (from number of NodeJS servers, to HAProxy, latency between client application and HAProxy)..
Another interesting stack presented is called Meteor, which is a set of many little peaces, while in this presentation the talker focused on a full-JS stack. This framework includes jQuery as client JS, MongoDB as backend and another miniMongoDB implementation on client side, again in JS.

What was really nice was that some code is common for client and server and the framework works transparently like some instant-messaging tool. That means that as soon as something changes in the backend, all clients get updated instantly. It is not general-purpose framework, but might work fine for some special single-page web services.

Some other random thoughts


Since we already hear some voices that would like to see MongoDB running on power arches, I tried to ask if there are some similar requests already. I got a negative answer again, but MongoDB guys ensured me they can change their mind if there is some official bigger demand.

To sum it up, MongoDB is being developed quite heavily and we should not curse it at all, even if someone might observe some issues that were solved in relational databases few years back already.

However, even if they offer the basic tools as open-source products, their approach to keep interesting parts as closed-source and most of the development to be directed within the MongoDB folks reminds me Oracle's approach more than I'd like to see. Hopefully they won't turn to evil as well.

Monday, August 25, 2014

SQL part of your work? Read SQL Performance Explained

Simply said this book by Markus Winand should not be missing in any shelf of any database engineer. It does not explain basic knowledge about SQL, which you might learn from many others SQL-oriented books or from the Internet, but it gives a great background on the inner database implementations.

Why is knowing this background so crucial for database programmers? A general rule sais, if somebody programmes in some language, he should know basics of what is happening in two layers down. For example, a Python programmer should know at least basics of programming in C and should also know something about assembly language. Moving this theory to databases world, a programmer who works with SQL, should know about inner design of DBMS and should also know at least basics of C language. And the book from the title gives you that knowledge of DBMS.

SQL Performance Explained does not prioritize any DBMS, but tries to cover all the most famous DBMS out there -- Oracle DB, MySQL, SQL Server and (my favourite) PostgreSQL. A reader learns the magic behind database indexing implementation in the beginning, since it is the key for good performance in the first place. Later, we learn how created indexes are actually used in WHERE or ORDER BY clauses and how to help DBMS to actually use them.

Since the book is quite fresh, it covers even some newer features that are becoming a part of DBMS in the latest versions, like index-ony scans. As an icing on the cake, readers will find useful some very concrete examples. One example for all, Markus explains how a bit more modern way of pagination on the web pages using "later" instead of classic numbered pagination can make quite big performance difference in some cases.

Well, the not even 200 pages book is not the cheapest one, which is usual for such a book for quite narrow set of readers though. But it is definitely worth reading and looking time to time.

Good work, Markus, and thanks for that!

Monday, February 10, 2014

Fosdem report with some thoughts from P2D2

(TL;DR version) Issues in cloud-based deployments totally dominated the database topics on both, Fosdem and p2d2 conferences. Whether it is high availability or scalability of the data stack, usually possible to handle with in-house features, people choose to implement those challenges more and more using external tools recently, which seems more accurate in some particular cases. Internal features simply include way too many manual steps, so they're not fine enough in environment, where full automation is necessary. Any such an extra effort distributions can eliminate counts.

A bit more verbose version follows.

This is my report from Fosdem conference from the last weekend and I also included some thoughts from Prague PostgreSQL Developers Day 2014 I attended on Thursday. As databases and general packaging are my concerns, I focused on databases lecture rooms on Fosdem, which means I alternated between rooms focused on MySQL/MariaDB, PostgreSQL, NoSQL and Distributions.

Generally speaking about databases direction, there were really a lot discussions about deploying databases in cloud-based environment.

The problems we face in that field can be summarized by several topics: replication for high availability, scaling for utilize more nodes effectively and automation to limit manpower in deployments where number of nodes has more than 3 digits. Quite a lot interesting smaller or larger projects, that would solve some of these issues, were introduced on the lectures and I'd like to mention the most interesting of them.

Some interesting tools for recent challenges

proxysql, currently a single-developer project, seemed like a generally usable light proxy server that can be configured to offer load balancing, partitioning or even some more complicated tasks like query rewrite or selecting proper server based on region priorities. The lack of community around that project is the biggest con right now.

For PostgreSQL, there is a similar project, Postgre-XC, that also allows to access multiple PostgreSQL instances through one single proxy for large cluster. PgBouncer then offers a solution by pooling the connections for deployments where number of client connections attacks thousands and thus reaches the limit.

Similar slight layer between client and server is community-developed Galera clustering project, which originated from Oracle's MySQL cluster solution. Even if it is considered standard for MariaDB now, the support in MariaDB itself is not merged yet and necessary patch kind of blocks wider adoption in distribution, since an extra MariaDB binary package has to be maintained.

Connect engine for MariaDB seems like really fine help in heterogeneous environment, since it allows to work with external data from MariaDB, the same as there was a native table. Similar feature is now included in the most recent stable version of PostgreSQL 9.2.

More about replication

Interesting comparison between an external HA solution Slony versus out-of-the box streaming replication, that new PostgreSQL offers and was evaluated as better, showed that PostgreSQL upstream does still very good. Also the number of new features focused on HA, scalability and NoSQL-like usage of this DBMS promise good future for PostgreSQL. For example new JSON document type has even better performance than pure-JSON MongoDB in the most recent PostgreSQL, according to one of the benchmarks out there; and as a bonus it doesn't have to give up the ACID features, like MongoDB does.

To sum it up - relational databases offer some basic features, but do not include good enough solutions for all distributed deployments using a simple way. This can change for PostgreSQL in the next releases, but since one size doesn't fit all, we would still need some little help of plug-ins, proxies, etc.

Where distributions and especially support companies can help is also offering some supported scenarios, because as a talker from Avast described, it was real PITA to choose the best-fit database system to store their big data, especially if the management can't give specific-enough requirements.

What is up in NoSQL world?

The situation is a bit different for NoSQL databases, that include some good enough automatic solutions out of the box or even by definition. The draw back side of NoSQL databases are rather in missing ACID transactions and that they require totally different approach to database design than relational databases.

However, really full NoSQL lecture rooms should be clear sign that NoSQL world will be very important in the next years' cloud-oriented market.

During the devops-related talks I've seen many desires to configure system using some automatic tool like puppet or chef, which is what companies use in the end and there is definitely some space to support these deployment tools better. So generally, no more admins' work can be manual any more, not in cloud environment. Automation then can't be done without versioning, logging, module testing, integration testing or continuous integration.

What is not happening any more

Interesting thing for me was kind of lack of single-server performance talks. It seems the database servers reached some point where improving performance is not so important for a single server, especially if we have so many challenges to improve things in scaling parallel deployments or increasing limits of data amount to be stored efficiently.

To sum it up, databases world might be quite boring the last years, but that is definitely changing now. Clouds are simply database-enabled.

Sunday, February 09, 2014

How to glue Copr, Software Collections, MariaDB and Docker?

We've heard about exciting technologies on devconf this weekend, like software collections, which is an alternative way how to package multiple versions of packages, or docker, which offers a containerized environment for software delivery.

We've also heard about Copr, which is a new build system and allows to build software collection. What I didn't heard here was MariaDB, a community developed fork or MySQL, a default MySQL implementation in Fedora and which is developed by most of original developers of MySQL, who escaped from Sun or Oracle after things started to change. That was kind of my mistake and I'll definitely try to come up with something database-related the next year. But now, how these all things and technologies come together?

Imagine you take care about a public cloud environment and you're supposed to provide various versions of the awesome MariaDB database and for some non-disclosed reasons also original MySQL from Oracle. This service would be used on demand for some short time and so it should be as fast as possible.

Problem #1 -- packages of various versions of MariaDB and also MySQL are in conflict, so they are not able to be installed one one system. Usually. That could complicate your situation, since you would be able to offer only one database version on one particular machine at a time. Solution -- we can use the databases packaged as software collections, so files are properly put away to /opt/something.

Problem #2 -- my system, which is Fedora, does not include any scl packages I need. Solution -- we'll build them in Copr. It's just enough to edit the buildroot (we need to include the meta package and scl-utils-build to th buildroot, so we have all macros at a time the scl-ized package is parsed by RPM) add prepared srpms and wait a while to build the stuff.

Problem #3 -- full visualization is too big overhead if we want to provide just one little (well little is questionable) daemon. Solution -- we'll use docker.

So, this is a workflow, that could work in practice. First, we convert the MariaDB package into software collection package. Second, we build this package in Copr and save the repository from Copr in the containerized system in Docker.

Firing up a new database instance then means just creating a data directory for a new instance and running the daemon, while redirecting the port to hosting system. It will take a couple of seconds only and you'll have any version of database you want. Awesome, right?

Saturday, January 25, 2014

The best database, PostgreSQL 9.3

Fedora 20 was released few days before Christmas with only slight delay and you can now enjoy many new bleeding-edge applications and one of them is the newest stable PostgreSQL 9.3. What can we find with this new shiny database engine is shortly summarized in this post.

Something on data definition field

One of the most significant news, materialized views, brings to users a possibility to create a real table instead of a virtual view and you can even create indexes on such view. However, after updating the view you need to call REFRESH explicitly, since this feature is not totally completed yet. Still worth using already, though.

Auto-updatable views might already be kind of hacked in previous versions, but now we have this cool feature made by default. Well, such views have to be quite simple, they need to have only one table in FROM clause, they can't modify relation using clauses like DISTINCT or UNION, but still cool, he? For more complicated views you can use INSTEAD OF triggers.

Event triggers allow you to define triggers as a reaction for some DDL query, i.e for queries like CREATE table. This can be used for some complicated settings of users, privileges and so on, where default settings doesn't work sufficiently.

New command COPY can be used for faster import/export data into/out of database, where there is a file or even application (using PIPE COPY) on the other end. The command is executed with rules that postgresql's superuser has and yes, only the superuser can do this, so nothing bad should happen. Another modification of COPY command is FREEZE, which means that constant records get frozen during copying, instead of running quite slow VACUUM FREEZE.

Performance is not without change

Since version 9.3 you can use GiST indexes for unlogged tables, which don't use transactional log and thus are not persistent during crash. Another new indexes, this time for regular expressions, will be surely beneficial for performance of many applications.

Performance improvements can be noticed also during numeric operations, parallel pg_dump, working with new range data type (which was introduced in 9.2), where upstream improved performance using histograms, and the same data type can be made faster using SP-Gist indexes, which was not possible before.

Other performance changes are introduced in shared memory, mmap is now used for implementation; large objects can be now 4TB large, while until now it was only 2GB. Locks can now use limited duration, which can be explicitely specified.

Checksums of data pages are not a performance improvement, rather the opposite, but some paranoid application can benefit from them. For such applications we can also use new style of warning messages, which now include easily parse-able items like SCHEMA NAME, TABLE NAME, CONSTRAINT NAME or LOCATION.

Something for plugin writers

Foreign Data Wrappers (FWD) bring new way how to work with other data than classic PostgreSQL data stack, while the data doesn't need to be only different PostgreSQL databases. We can get or even store data from/into almost any data storage, if technology allows it. New version of PostgreSQL brings also a new API which offers a way how to write new background processes, so called Background worker process. What this will mean in practice we'll see in the future, because there are no other plugins out there so far.

Another new way how to interract with PostgreSQL daemon is offered by pg_xlogdump, which basically prints transactional log in an easily parse-able format. What can we do with it? It seems it is limited only by your imagination.

News in SQL queries writing

I find it hard to explain what new LATERAL JOIN means, so I'll borrow quite easy description from [1] -- Put simply, a LATERAL JOIN enables a subquery in the FROM part of a clause to reference columns from preceding items in the FROM list. Deeper description is beyond this post, so feel free to use upstream's documentation [2].

First steps for supporting JSON data type were introduced in previous version, but now we have finally quite enough to work with it in real application. The thing is that the new version brings possibility to use agregation or we can parse JSON values now, which means we can use only some values from the JSON document. JSON can be also converted to record data type and also some basic values can be easily converted to JSON document. When working with arrays, we can use new functions array_remove and array_replace.

Some bonuses extra

Replication can be now set so that after a master server gets lost, new master can be set from one of the remaining slaves, while until now we had to set all slaves from scratch. PostgreSQL is therefore a bit closer to cloud-oriented database system, while automatic recovery is crucial in such applications.

Not only during systemd boot, where services get run in parallel if possible, we need to know that some service is ready. PostgreSQL now includes a new utility pg_isready, that can be easily used in these cases. It doesn't do nothing more than telling that a PostgreSQL daemon is ready to accept connections. But believe me, it'll help not only package maintainers a lot.

There are some news in the console psql as well -- back-slash commands \gset and \watch. Server now offers possibility to listen on more unix sockets and option unix_socket_directory is now renamed to unix_socket_directories. Well, this last feature is quite important for me personally, since I'm very proud to be recognized as an author of this patch. Well, I should admit that final edits were done by Tom Lane, the most active upstream committer. Anyway, great to see my name in such wonderful project.

And since the year 2014 is here for some time, I'd like to wish all guys involved in the PostgreSQL upstream to have wonderful peaceful time and get much energy for PostgreSQL 9.4. Thank you for all your work during the year 2013.

The article is based on the following articles:
[1] http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.3
[2] http://www.postgresql.org/docs/devel/static/queries-table-expressions.html#QUERIES-LATERAL
[3] http://www.postgresql.org/docs/devel/static/release-9-3.html
[4] http://www.root.cz/clanky/novinky-v-pripravovanem-postgresql-9-3/

Wednesday, January 01, 2014

Happy New Year Fedora and what to do when upgrading to Heisenbug doesn't work as supposed

The old year 2013 was about to end and I realized I didn't want to start he new one with an old Fedora. Having Fedora 19 still on my personal and also on my working computer, I began. These are my hints and experiences.

My fails but finally successful with fedup

First, on my personal computer, I used dnf (a new, much faster successor-to-be of yum; I recommend it to try it out) to update all packages to the latest F19 version and then ran fedup utility with "--network 20 --nogpgcheck" options. Turning-off GPG checking was the fastest way how to work-around some signature issues. fedup reported some un-available repositories, so I just disabled those for the sake of upgrade.

After downloading packages and installing new fedup drucut I restarted computer and it proceeded without problems. And it still works fine.

So, to sum it up, on my personal computer it went very well, only some un-available repositories and GPG issues were encountered during the whole upgrade, but those were easily worked-around by using appropriate options.

What to do if fedup is not an option

The day after I proceeded with the same steps on my working computer, which is Lenovo T510. I saw the same issues with un-available repositories but not GPG issues any more. So I disabled the repositories and restarted computer after all packages had been downloaded, the same as I did on my personal comuputer.

When fedup dracut started I saw kernel panic after 7s (similar to RHBZ #873459, but I can't reproduce anymore, so I'll try it again on one of my virtual machines -- well quite paradox that I got kind of a heisenbug in a process of getting Fedora Heisenbug) so basically in a very early state. What is more, the same happened when I tried to boot with the newest kernel-3.12.x. But I was still able to boot with older kernel-3.11.x. So I wasn't screwed totally, but I was not able to finish fedup upgrade even repeatedly.

My solution was to go on with another way of upgrading -- upgrading using yum. I put dnf aside, since I wasn't sure if it is able to deal with upgrade (now I regret that I didn't try it) and using distribution-synchronisation option it went very well. After some hours (yeah, I have too many packages) I ran restorecon on whole root to be sure SELinux will work fine and rebooted. A bit surprised, I saw a new Fedora Heisenbug successfully booted and didn't noticed any issues.

Now I'm happy I could welcome the New Year with the bleeding-edge features that are served by Fedora 20 Heisenbug and I would like to wish all (not only) Fedora maintainers to have many solved issues, more interesting challenges and a lot satisfied users.

Guys, thank you all for your contribution in 2013!