Tuesday, February 23, 2016

Notes from Prague PostgreSQL Developer Day

What a surprise that I haven't heard about containers at all during whole conference.. Really a special thing today, database experts are probably quite reluctant to such cool hype technologies... Anyway, it was very interesting 1-day conference around PostgreSQL, so sharing my notes, since those might be interesting for others as starting point for further study. Most of the slides are here: http://p2d2.cz/rocnik-2016/prednasky

Hlavní novinky v PostgreSQL 9.5

First presentation by Pavel Stehule, last time replacement for Tomas Vondra, was about news in PostgreSQL 9.5 and even without slides it was very interesting summary with enough details at the same time. PostgreSQL 9.5 turns to be wanted by quite a lot of attendees.
Most of main PostgreSQL upstream developers do it as full time jobs, which means it is a professional development, even though it is still true open-source and always will be. On the other hand, as the project is more and more mature, a lot of features are in development for even 3 years.
As for version 9.5, many users are interested in UPSERT, performance is not that different from previous versions. UPSERT is kind of like MERGE from Ansi SQL, but MERGE was not implemented 100%, so it is called differently.
Grouping sets kind of allow to create multidimensional views on data.
Locks reworked a lot and arrays in PLPGSQL mean that for some complicated scenarios it may improve performance a lot. Usually, sort with C lang is much quicker than with non-C lang. Languages are simply sorted slowly, in 9.5 it is much better even for language sorting.
Bitmap indexes with a lot of duplicits do not exist like in Oracle. Something new was created though, originally called minmax, now BRIN index, which is much smaller than other indexes and is primarily used for append only data.
Extension pg_rewind avoids cloning master and we can make a slave from master that has even a bit old data during fail-over.
Two things where PostgreSQL is behind commercial DBMS are multiprocessors utilizing and partitioning. Today the HDD is not the bottleneck anymore, with RAM disks and many CPUs it is speed of on CPU. In 9.6 there is already couple of commits for parallel queries, which is so great that we might even call 9.6 to be 10.0. There are also plans to have logical or BDR replication.. some more patches for making guesses better, using correlations. Especially useful for users who do not understand query execution and cannot use EXPLAIN.
Another feature is RRS (row right security), especially interesting for banking sector for example.

IO in Postgres - Architecture, Tuning, Problems

Andres from CitusData talked about tuning, giving overview around memory architecture, explaining why there is shared memory and private per process memory. One cache handled by PostgreSQL, another hash is in the system when reading files. Clock sweep invalidates part of the buffer so we know what buffer part we can replaced -- that might be expensive in some cases.
Reason why WAL is efficient is because it is a sequential write. Checkpoint then write bunch of data at some point, that can be expensive. We can extend checkpoint timeout, for large data written, checkpoint_segments should be tuned up. To reduce starting up we can do checkpoint before shutdown.
Checkpoints explain why there is sometimes more load and sometimes less in benchmarks graphs. Andres also showed how the tuning of buffer size and checkpoint values influence the graphs.
Interesting that turning off caching in OS may do better job, although we need to count with consequences.
Tuning buffers is basically about understanding the workload, no advice can be good enough if it is done without that. Drop index and re-indexing can be better with lower shared_buffer settings.
Batch runs may use different settings. Slides available at anarazel.de/talks/pgdevday-prague-2016-02-18/io.pdf

SQL Tabs

Sasha from Shards.io was speaking about his cool project SQL tabs .
Sometimes psql console is not enough, but people love it. So he started a new SQL client with requirements to be nonblocking, black and white themed, having connections switching...
react.js, nodejs, and other technologies used for creating web-based system that allows browsing, datatype exploration, context advisor, functions listing, query history, ... It uses libpq in backend, so the output is as usual. It can also generate some simple graphs by adding some semantic in SQL comments and selecting proper type of chart and writing comments in markdown. It also allows to combine more queries and having a simple report with text, chars, table.
It understands some datatypes so time series are nicely formated as one would expect.
More info at www.sqltabs.com.

Jak jsme začali provozovat PostgreSQL (a co jsme se u toho naučili)

Ales Zeleny from Ceska Sporitelna sharing experiences with PostgreSQL. He advised:
  • to try backup before going to production
  • start with simple app for lunches
  • not making oracle from PostgreSQL
  • think about what to log because audit is missing in PostgreSQL
  • creating schema for app instead of using public schema
  • separating app modules to own schemas
  • sometimes even separate databases
  • Table spaces on own clusters to avoid influencing other databases when character of one app changes rapidly
  • Monitoring is important but sometimes troublesome
  • Use check_postgres.pl and autovacuum, configured well for the workload
  • Default privileges help creating fine granulary permissions
  • Logging can supplement audit
  • Backup and recovery by barman, pgbackrest, pacemaker from rh for recovery, streaming replication with delay
  • Testing data recovery scenery
  • For monitoring they use elegraph + influx + grafana
  • Visualization helps understand where is problem
  • Configure autovacuum so that it runs often, then it does little things, so it is quickly

Open-source "clusterovací" řešení pro PostgreSQL

Petr Jelínek from 2ndQuadrant talked about general scaling concepts (vertical, horizontal), focusing on OLTP, OLAP, ...
For vertical scaling and OLTP we have couple of features already in 9.5 (sampling, BRIN indexes, sort optimizations and in 9.6 there will be some first implementation of parallel query execution.
Hot standby shorty introduced - integrated solution for scaling, it solves only read scaling, since slaves are read only.
PL/Proxy - now almost legacy, developed in Skype, no maintanance, map-reduce implementation in PostgreSQL.
Greenplum by Pivotal (fork of PostgreSQL), stable, open-source since Autumn 2015 (Apache) is kind of MPP database (multi parallel processing) and is quite a lot diverse from vanilla PostgreSQL. It has own tooling, tables optimized for inserts.
CitusData is gonna release CitusDB very soon as extension only, now it is a fork of PostgreSQL and extensions. And it will be open-source soon as well, now only pg_shard is open-sourced. It has logical servers, so it duplicates data in distributed tables.
Postgres-XL comming from Postgres-XC and StormDB, now Open-Source (MPL) taken by community after the company crashed. Soon on 9.5. It is MPP that supports both OLTP and OLAP, but is more complex for administration and installation.
BDR from 2ndQuadrant is also a fork of PostgreSQL (9.4), but it has goal to be integrated back once. It already pushed couple of features to vanilla PostgreSQL (background workers, event triggers, ...). It is asynchronous multi-master solution (all data on all servers) that uses optimistic detection and conflict solutions (in contradiction to MVCC in vanilla PostgreSQL). So, it makes the consistancy be true eventually, after conflict resolution (after commit). Primary focusses on latency optimization (avoid global writes across the universe).
Questions about eventual consistency and sequences; answer was that nobody can expect normal application will work the same as before without multi-master.

Monitoring PostgreSQL prakticky

Pavel Stehule talking about administration and monitoring, starting with interesting thought that moving from "It may work" to "I know it works" is quite a small step.. Architecture is always a key. Client server is for one server, cluster scales... Wrong chose in the beginning will affect all.
Already during user interface design to think about database -- so the database can return only limited set of lines. Important values are parameters of RAM, IO -- while we are in RAM, then there is performance degradation and it does not go linearly, it degradates by jumps.
Configuration of database and optimization of queries does not help when the architecture and app design is wrong.
Sometimes users add indexes without thinking. Sometimes users deploy replication and cluster when it is not necessary, ram may help much better, aggregated data in db is also not a bad idea.
Work_mem shared_buffers, effective_cache_size, max_connections are important - wm*mc*2 + sb + fs + os < ram
Database servers can never swap.
Shared buffers too high (over 20gb) could make finding free cache too slow, but it always depends on cache work characteristic
Effective chache size says how big portion of index size is actively used.
Strong negative feedback is about 80/20 rule, that 80% of workload is generated by 20% of queries. Sometimes even modules that are not used, eat 50% of workload.
People cannot work with memory today, it is better to read data once, work with them and write time to time to db, not connect to db for every piece.
Monitoring is important to seek for regressions.
Good queries should be verified that it is not an accident.
Disk in work has much different characteristic than on machine without load. Pg_bench is the other extreme, it does not simmulates any real workload.
Some overview at pg_stat table - Many Rollbacks are wrong, many tmp files are wrong. Changes in transactions count is suspicious, like firewall killing connections by mistake.
Kill very long running queries before they kill the server.
Especially in beginning of the app it is important to monitor queries because database is a living organism.
Queries around 50ms cannot be much faster, if we need more, we need cache and not use db at all..
Monitoring regressions in cpu utilization using profiling tools works good in linux, fix is usually to distribute the workload to more processors.
Autovacuum monitoring is also important.
And finally the bench-marking is needed and it should be done soon enough and ideally in production environment.
Finding whether something is effective is hard, takes long time. Finding whether something behaves the same as yesterday is simple and quick. Graphs are quick way from hope to know.
Too wide tables are usually wrong, in oltp we should respect normal forms.
How large is typical oltp today? Usually tens of gb. Today people often store bitmaps to db which makes it larger.
Ram > 1/10 db

pg_paxos: Table replication through distributed consensus

Pg_paxos was introduced by Marco Slot and we learned it is a framework for distributed problems. Servers sometimes need to agree on something, that is what paxos algorithm is about.
Two phases: nodes agree in participation, proposal asks for acceptance when majority must agree
If anything fails, we start over.
Paxos state machine (multi-paxos) helps to replicate log on multiple servers.
Paxos is extension with low throughput and high latency, so no alternative to streaming or logical replication, not even a magic distributed postgres, but it is a useful building block for distributed systems. It is somewhat experimental in pl/pgSQL. Available on github on citusdata organization.
Demo showed how to use three paxos servers in cloud to acquire a lock, then communication can begin normally and we know only one server works with the data.

That's all folks.

Saturday, February 13, 2016

Notes and thoughts from Developer Conference 2016

This post is mostly intended to remind myself in the future, what DevConf 2016 looked like for me, but maybe somebody else find it useful as well.
Many of the talks and workshops are now uploaded to the youtube, so feel free to find the full recording: https://www.youtube.com/playlist?list=PLjT7F8YwQhr--08z5smEEJ-m6-0aFjgOU

The keynote

Tim Burke started the DevConf 2016 by a keynote full of tips how to become a rock star in open-source world. From "not being a troll", through "not doing what you don't like", to be a real team player, because a rock star is not an individual. Passion was identified as a way how to enjoy your work. See the full keynote at https://www.youtube.com/watch?v=Jjuoj2Hz03A.

Docker versus systemd

Dan Walsh talked about systemd and docker integration. He mentioned pitfalls they have already solved and which they still need to solve. Dan himself staying in the middle of docker upstream and systemd guys, who both don't like accepting compromises. He mentioned these issues:
  • sd_notify that should eventually help notifying about app readiness
  • logs being grabbed by journald so the logs are available after container is removed
  • running systemd in docker container - all PRs closed by docker
  • with docker 1.10 it will work with some options specified during container start
  • machinectl, working with all VMs running on machine
  • eventually libcontainer will be hopefully replaced by its clone runc, from OCSpec

All Flavors of Bundling

Vit talked about practical examples of bundling he met during ruby maintenance work, mentioning bundler not only because it helps bundling stuff, but also because it bundles a lot of stuff. He mentioned that there might be some reasons to bundle, sometimes not. All that was triggered by recent change in fedora guidelines, that start to allow bundling. Vit also went through various specifics in different languages, like javascript, c++, go, ... interesting point about generators, he said that we basically bundle the code because if they make mistake, you get problem.
Q: some example of bundling that succeeded? Bundler project learned that bundling is the correct way and the only way.

Open source distributed systems at Uber by Marek Brysa:

HA is really crucial, because payment transactions and tracks are done by Uber.
More services than transporting people - food, stuff and even kittens
Technologies used: Ubuntu, Debian, docker, python, nodejs, go, kafka, redis, cassandra, hadoop
Every project meant to be open-sources by default, except exceptions
Also contributions
Umber of micro-services grew from 0 to 700 micro-services in last two years
  • consistent hashing for sharding, membership protocol - SWIM membership protocol, using direct and indirect pings to get state of an instance, to prevent random network issues
  • Apparatus called gossiping says something about other instances when sending a message
  • Infection style of dissemination, currently 1k instances, 2.5k tested, in the future 10k?
  • App level middleware
  • Soa oriented replacement of http, which turned to be slow in some cases
  • Multiplexing, performance, diagnostic, HA forwarding
  • JSON and Thrift for serialization
  • Service discovery and request forwarding
  • Give clients one hyperbahn instance and bootstraping starts automatically
  • Services are connected to ring of routers, every service is connected to few routers

CI/CD with Openshift and Jenkins:

It is not only about tools
Containers as cows, we replace one if dies.
Containers make people think about what happens when a container dies
Openshit CI/CD wants to be generalized to pipeline that may be used by other projects
Example of running Jenkins in OpenShift
S2I used as configuration tool
https://github.com/arilivigni/openshift-ci-pipeline  - 3.3 openshift release roadmap

Is it hard to build a docker image?

Tomas asked and had also answer that it is..
Squashing, cache, secrets, adding only (metadata), usage message, evolution is rapid
Conclusion is that docker is young

Remi's PHP 7:

Reason for skipping 6 was existence of books about development version 6
New API brings binary incompatibility for binary extensions
Change in size_t and int64 only on windows
Abstract syntax tree, native TLS, expectations (assert finally usable), throwable interface
Extensions porting process still in the middle, some won't ported at all, MongoDB for instance instead of mongo
Performance increased twice for common applications, comparing number of pages served
Scalar types possible to be defined in functions declaration, strict_types option makes strong typed language from php
We now can catch parse and type errors, with keeping backward compatibility of exceptions still working
Removed extentions, change in expressions containing variable names in other variables
Fedora will eventually drop incompatible extensions
We need scls in fedora, that would be the best thing for php7

Security: Everything is on fire!

Josh Bressers talking about security and whether the situation is really that desperate. It is not yet, but there is work to be done. What is happening is people earn real money on security issues, press makes money on newspaper selling, so they make up things to sound interesting.
Where do we start? Communication is key. Security guys should listen.
Security is not here for solving problems, it is part of the problem.

Centos pipeline meet-up:

Couple of people in one room had an initial chat about existing components that should be usable for CentOS Container Pipeline and decided to use Openshift, which sounds like good way to go because it includes already the missing pieces.

Fedora and RHEL:

Denise was speaking about how fedora is important for Red Hat.
Matt then presented a lot of graphs about downloads stats of fedora from various views. The impression was that it is not that bad.

Changing the releng landscape

Denise Gilmore about releng in fedora:
Koji 2.0 still in the beginning, should be build with copr backend somehow, to allow more flexible builds
Et/bodhi alligment
Rpmdiff and license scanning done internally shoud be done in fedora as well.

Re-thinking Linux Distributions by Langdon:

Rings concept did not work
It was too complicated when working out the details
Modularity should work better
Think about applications, not packages sets
We need minimize dependencies on other applications and on OS
Give separate channel with metadata, that's what rpms were invented for
Atomic app, nulecule, rolekit, xdg-app mentioned as way
E&s is where the definition should take place, not necessarily place to code it
Q: will 10 versions of library do a mess in systems? Let's make computers track that for users

Q&A with council:

included question that cannot be missed in any similar session - fedora and proprietary drivers. Josh mentioned that the problem is not only getting the drivers installed, but also not breaking the whole video once kernel is updated. Everybody understands the politic cannot be easily changed, but atleast the problem with breaking the video might be better soon. Another question questioned matt's graphs, there was a question about possible kerberos inclusion instead of generating certificates on server, where there is btw a privat key, which doesn't belong there. Generally the session was very positive.

Closing quiz:

The last session, the quiz, which full room was participating in, was funny and interesting end of the conference.

Fosdem 2016 notes and thoughts

This post is mostly intented to remind myself in the future, what fosdem 2016 looked like for me, but maybe somebody else find it useful as well.

systemd and Where We Want to Take the Basic Linux Userspace in 2016

In the first keynote, about systemd, Lennart was speaking mostly about stuff that will happen in 2016 in systemd and particularly dnssec. He began by introducing recent move of the project to github and introduction of CI. Systemd is default in almost all distros, with no many rants around.. "Are we boring now?", he asked.
Then he went through networkd, mentioning it's much more automatic and smaller than NetworkManager, nspawn to be different from Docker because nspawn is not only for microservices. sd-dhcp now used also by NetworkManager, although it's stil not public nor a supported component yet.
Unified control group hierarchy fixes issues that the old nonunified implementation had, but the new one is not used much yet, because API has changed and it will break stuff.
DNS resolving, previously done by glibc, is going to be centralized now in systemd-resolved. Caching will make it better but big thing is to have dnssec ready.
The dnssec in systemd-resolved checks just integrity, no confidentiality. Chain of trust coming from TLD zones bellow should eventually check services like ssh, tls, pgp, certificates. However, only 2% of most popular websites use dnssec, eventhough main TLDs were signed in2010 already. Interestingly,, a popular Google's DNS server does.
Validation on client is not done though, because it's slower and clients do not support it. It is important for ipv6 though.
Private networks are problem because those cannot be validated from the top level, since the sites do not exist officialy. Another problem is that ISP's DNS servers and end-user's routers are crap.
Systemd's approach is to give up when dnssec cannot be verified, which opens vulnerability, yes. But when it is successful, clients are reported and pgp, tls, ssh or others may be happy, so full dnssec enabled internet is not a blocker.
Question in the end asked about Lennart's previous plans to change the way how applications are distributed and whether this is still the plan. Answer was that most of needed technology is done, but distributions have not adopted it yet.

Distributions from the view of a package

Colin Charles from MariaDB talked about packaging. He gave us an overview of whole MySQL ecosystem, mentioning that MySQL  has been always more open-source product, not a project, which I agree.
He mentioned the FOSS exception that MySQL may be distributed under GPL and proprietary license at the same time.
Fedora was often mentioned as example of distributions that care about packages quality, mentioning some particular mailing list threads, bugs and we several times saw Tom Lane's name there, since he used to maintain MySQL for long time.
Software Collections also mentioned as solution for differences between support period of distributions and MySQL.
Some statistics showed that default version in long-term support distributions is important (RHEL and MariaDB).
Docker and juju shows stats, fedora and OpenSUSE used to do that as well but do not anymore, although it would be good for upstream to prioritize their resources.
Some platform issues were mentioned, like bundling vs. not including some extensions.
He mentioned that they care about being informed about distribution bugs in MariaDB upstream, even be on CC sometimes. He mentioned how many downstream fixes are in fedora and elsewhere, and that this needs to be fixed.

MySQL Group Replication or how good theory gets into better practice

Tiago Jorge talked about project still in MySQL labs (labs.mysql.com), Group Replication is supposed to remove fuzz from users when dealing with replication fail-over. Group communication primitives concept from early 90s was inspiration for group replication.. The process includes a certification procedure, which is about asking other nodes whether they do not work with the data we want to work with by any chance.
Group replication needs some special options, like binlog, GTID, algorithm for something and we need a replication user. Set up requires UUID for specification (because this will be used by GTID instead of server ID) of groupname and specification  of some nodes addresses, not necessarily all of them. Performance schema can say how many and which members are there, and it also includes various stats, like transactions stats, ... rejoining a member can use sync method that only applies what is not yet done. Alos, catch-up process does not block donor.
Main features: Cloud friendly, integrated, automated, no big overhead, ..
More info in mysqlhighavailability.com, like this one: http://mysqlhighavailability.com/mysql-group-replication-hello-world/
No support for online ddl though.
Question: Does it support thousand of nodes? Answer: There is a linear degradation because of communication limits.
Btw. It is type of eventual consistency.

ANALYZE for statements: MariaDB's new tool for diagnosing the optimizer

Sergei Petrunia talked about explain and optimize statements. Query plan sometimes does not correspond with reality and we need to show stats, like row stats, especially stats per table. MySQL's ANALYSE is inspired by EXPLAIN ANALYZE from PostgreSQL and Oracle, and in comparison to EXPLAIN alone, it not only optimizes the statement, but also executes it and shows some stats. It shows comparison between rows guessed and real rows count read. A general use case is that big discrepancies are worth investigating.
Filtered fields can tell how many rows were read and then discarded, which suggests to add indexes, which is always a trade-off between faster reads or faster writes.
ANALYZE statement can fix table's histogram that positively influences execution plan.
EXPLAIN may return json which might be read better, values prefixed with r_ are from ANALYZE part. We can see information like buffer sizes used or cache hit ratio as well.
An example showed by Sergei explained how to use that all to see which sub-queries caused the biggest performance issues.
ANALYZE works fine to show also range selects.
Current cons are that explain statement sometimes lies. ANALYZE may be much more right.
During Q&A, it was mentioned that histograms are better for selectivity than cost model, which is what MySQL uses them for.

Rolling out Global Transaction IDs at Dropbox

Rene Cannao spoke about experiences with GTID in Dropbox. Typically we have at least 2 slaves, Dropbox has two masters (one basically a slave of the original) and every master also has own slave.  Binary logs are important, classic approach uses file and position.
In case of non-GTID deployment, on slaves, the file names and positions are not same as on the master, which may be tricky especially when we have nested replication, because the second slave does not know where to start replication after the middle slave crashes. GTID consists of UUID of source and transaction ID and since it is the same across the all cluster, it makes the recovery easier -- the slaves simply know where to begin the recovery. Slave just uses master_auto_position =1.
The use case Rene showed, utilized also enforce_gtid_consistency, log_slave..., gtid, binlog... options.
He went through procedure of enabling GTID either offline (requires to turn off the cluster for some time) or online (without restarting all servers at once).
Sometimes there might be problem with zoombie slaves, time-outing.

MariaDB CONNECT Storage Engine: Simplify heterogeneous data access

Serge Frezefond talking about Connect plugin in MariaDB, that can access data from other sources. Features like ability to specify options when creating a table of type connect or auto discover structure of the file makes this plugin quite easily usable. With table type ODBC, we can use syntax like PRIOR statement that does not exists in MySQL. It also allows to create a triger that runs some command on an entry execution.
We can also query different databases at the same time, like Oracle DB and PostgreSQL. The XML option supports xpath syntax to describtion of which tag corresponds with which column.
JSON does not replace integrated JSON support (already in MySQL, comming soon in MariaDB) but can add extrrnal JSON source into the db.. The structure of JSON needs to be addressed, and it is done by setting a starting point for the table. Jpath spec is used for aggregation..
Most of the stuff are writable also, but the performance is not always perfect.

Clusternaut: Orchestrating Percona XtraDB Cluster with Kubernetes

Raghavendra Prabhu from Yelp was talking about orchestrating. K8s has some requirements, like server-client design, cattle, not pet approach, horizontal scaling rather that vertical, statelessness in databases and elastic scalability, like scaling both directions. We should also switch to declarative rather than imperative thinking.
There are a lot of container technologies available. Galera shortly introduced, that it uses optimistic approach when solving concurency conflicts and supports automatic node sync. CAP theorem for galera is true for CP.
Stateless DBs, problem for big data.
Kubernetes shortly introduced.
PaaSTA supports a lot of existing tools and technologies.
Galera node is equivalent with pod.
Examples of k8s configuration files were shown.

Programming a Board Game

A physical board game created by Chris Ward should be available freely in next months. CMS from start, doc in Markdown, web made by Jeckyl. Also pdf generation should be done at once with web, pandoc used for it. Latex also used to design cards, someone said it is like css for printing.
So far no big graphic, ImageMagic used for something.
Question asked, whether it should be a template for others. Answer was yes, eventhough there might be work done to make this ready..

MySQL operations in Docker: A quick guide for the uninitiated

Giuseppe Maxia talking about docker and MySQL. Services on shared resources is not good. We pay for VMs. Container is visualization system but is no VM. Several things to keep in mind when working with containers carrying databases. Never modify running container and deploy ready made containers only. We can change options by passing them to cmd.. another way is use config server by bind mounting a .cnf file. bit.ly/my-rep-samples.
Question about secret passwd, Guisepe suggests passwd from file.

Introducing new SQL syntax and improving performance with preparse Query Rewrite Plugins

Sveta Smirnova talked about features in MySQL users asked for and how oracle is well responsive to user's requests. For example after last years' Markus Winand's talk, who mentioned that MySQL does not include FILTER statement. She was able too add it using rewrite query engine. Then she took a bit closer look at how this was implemented, really showing C code. It showed how to work with MySQL's memory management. The filter query is rewritten by regular expression to CASE WHEN syntax. The new rewritten query is stored and some flag must be set.
Another example showed how comments in query might be rewritten to three queries -- setting a variable from comment and resetting it after the query is executed. This included usage of THD structure which allows to change variables quite easily.
Then, she showed how to add some new variables and she just mentioned it is possible to do much more like locks, or implement even httpd server inside mysql, etc..

Do Software Collections still matter?: With containers, unikernels, and all the new hotness?

Langdon talked about Software Collections in world of unikernels and containers. Answer was simply yes. Problem is breaking apps to microservices. You might want multiple versions of same components inside container as well. SCL bring you simpler packaging, than if getting components by hand. Good for migration or for logical separation of similar components.
Q: was it developed for OpenShift? Not only.
Q: SCLs in fedora? Yes, it makes sense, there are same reasons as in RHEL, but from opposite point of view, we need older versions or versions that are in RHEL.
Q: Why not use just dockerfiles? With rpm as middle step it is easier to verify, that packages work as expected.

Reproducible and Customizable Deployments with GNU Guix: Why "app bundles" get it wrong

Ludovic Courten talking about GNU Guix, because keeping SW environment polished is not easy. Distro is stateful system. Is it hopeless? Someone puts whole distro into application container as well. Using layers in docker is like using frozen pizza for cooking own flavored pizza.
Functional package management might be answer. We have complete image of how packages are composed. Resulting path where padkage is placed into, is hash of all deps.
We can resolve all deps exactly.
We can install things without root.
Intallation and removal of packages may be done in one transaction with Guix.
Every transaction creates a generation, like a named state where we can go back to.
Additional package manager, but this one is cool.
Gimp deps is a big mess, not easy. Search path is huge, like really huge.
There is also guixos, similar to nixos, that boots quite nicely from generated file.
Recipy is similar to RPM spec but is functional, basic skeleton might be stolen from Nix, but there are differences.


There are no other notes from Sunday, because my plane was leaving too early, but still this year's Fosdem was great, year to year better and better. Can't wait for next year already.

Sunday, January 31, 2016

CentOS Dojo 2016 Notes after lunch

This is a second part of my personal notes from this year's CentOS Dojo before Fosdem, first part includes notes from talks until lunch. So, after lunch, we returned to our chairs and heard these interesting talks.

4. Quickstart. Contributing packages to a CentOS Special Interest Group

Brian talked about basics in centos build system, how to jojn SIGs and how to build a package in CBS for a particular SIG. Really a "must seen" for every newbie in centos SIG.

5. Path from Software Collections to Containers for OpenShift

My talk about experiences with creating Containers for Open Shift included two dozens of tips from various fields. We closely looked how to create a nice, Open Shift friendly container image (yes, it was about Docker) for PostgreSQL and Python. These two examples covered the most important information that one needs to create any similar database or application builder image. Later I went quickly through list of images that are already out there, made by Red Hat or CentOS and that are based on Software Collections packages. In the end I shortly introduced the concept of Nulecule and what this project is intended for.

6. Getting started with kubernetes

Kubernetes was described the same as other orchestration systems, even condor which development started already in 1987. What makes the technologies different from the PoV of potential developer folks is language chosen to be written in.
Mesos was secretly influenced by Borg, a group run by Google. It means guys creating k8s know what they do, because we may see k8s as a new version of Mesos.
Basics of Kubernetes explained clearly and on simple examples -- what pods, services, replication controller do.
Atomic was presented as the solution to use k8s on CentOS.
For learning k8s use gh.c/skippbox/...8s
Terraform plan for deploying k8s on AWS with atomic host and flannel.
A demo showed automatically created k8s nodes and let them scale in AWS.

7. Atomic Developer Bundle - Containerized development made easy

Atomic developer bundle, guys showed why there is something like ABD, stating problems devels face today during application development, all on real user stories. They showed Vagrant devel environment, running docker secured by TLS, user being able to connect from host machine by evaling 'vagrant adbinfo' output that defines devel environment on host.
Another example showed Eclipse running on the host, connecting to remote docker, which is a scenario that might work from any OS, even windows. Although the demo did not work and we could see the live-demo Murphy's law in practice again, we got the point and I'm sure it worked fine just before the talk.
ADB supports the OpenShift and other orchestrations technologies as well.
Why centos? Because of community, that might give the needed feedback. In the end the list of available links were mentioned and community was called to action.
The future is so bright, I gotta wear shades.
Architecture is still a thing to be changed, they plan to make vagrantfiles easier.
Landrush does not work and some help is needed..
In the end guys tried the demo again but with poor Internet connectivity and Murphy's law working better, we saw only one step further.

Description of the talks and hopefully soon also slides and recording available at: https://wiki.centos.org/Events/Dojo/Brussels2016

CentOS Dojo 2016 Notes until lunch

This post is meant mostly as notes from the CentOS Dojo before Fosdem 2016, but maybe someone else finds it useful as well.

1. State of the CentOS Project

Well, the Dojo didn't begin very well, my phone decided to turn off during night and since I arrived after midnight, I easily overslept and thus missed the first talk, where Karanbir talked about where CentOS is today, so hopefully I'll see it from recording.

2. Relax-and-Recover simplifies Linux Disaster Recovery

Rear presentation about "relax and recovery" solution, recently also included in RHEL 7, presented by its author, Gratien, who supports these tools for living. It allows to solve recovery scenarios easily, but it is not a backup solution. Live demo showed a recovery on a virtual machine in less than 3 minutes. Interesting stuff even for people without admin experiences.

3. Desktop security, keeping the keys to the castle safe

Michel Scherer talked about security treats of various types, from stealing a computer and putting its RAM into a different computer (coldboot), through stealing a password by various ways to firewire DMA attack.

Big portion of the talk was about protecting the operating system, while many tips were given to protect various specific things. Phishing, password managers, firewall and other technologies were described from interesting point of view, mostly wrapped by a statement that they must be used properly to work properly.

What surprised me was that virus scanners were found insecure themselves, because all tested scanners could be cracked by a file send to be scanned and the fact that they usually run with pretty big privileges makes them quite dangerous.

From desktop world, few technologies were mentioned, but most focus was given to browsers. Chrome mentioned as good at some points like separating processes, but generally taken as proprietary thing by Michael, so not very good from security PoV. Firefox, better integrated alternative, seems to be better alternative for those who believe Mozzila Foundation, as Michael does, but with keeping some rules, like removing Flash, not only disabling it. Same for Java, except where really necessary. No Javascript with noscript module, which makes web faster, but also often broken.
Remove CAs not trusted.

Think about privacy in connection of surveilance. Adblock and cookiemaster, maybe even using Tor or trail...

Local attacks mean a need to protect the laptop from not only colleages, by screensaver with password, not leaving root shell opened, use credential expiration, disable ptrace by SElinux. Use password on SSH keys, use smartcards to store keys, like yubikey.

Server side security is about auditing, making hard/slow to delete data, machine learning on events may help to prevent attacks that are suspicious from its form, like very fast root session, which is always suspicious.
Ideally disable direct access to data at all, use backup, IDS is a lot of work and has same issues as anti viruses. Read-only OS like OStree may work, but update may be hard.
 After this talk we moved to the lobby, where we found a nice refreshments.

Description of the talks and hopefully soon also slides and recording available at: https://wiki.centos.org/Events/Dojo/Brussels2016

See also the notes after lunch.

Tuesday, November 24, 2015

Thoughts from PGConf.eu, PostgreSQL Europe Conference 2015, Day 3

This is a continuation and the last piece of notes from PGConf.eu in Vienna this year. First day notes available here and second day notes available here.

I started the 3rd day with KaiGai Kohei, who talked about pb_strom in his presentation called GPGPU Accelerates PostgreSQL - Unlock the power of multi-thousand cores. Plugin pg_strom is a  combination of PostgreSQL and gpgpu, which is a project in early state that aims to online native GPU code generation (nvrtc run-time compilier, cached, so compilation is done only once). It uses 9.5 feature of custom scan/join interface, that allows to overwrite part of the execution plan. Benchmarking showed approx. 5x speedup for queries that included 100M rows; join was more important than aggregate. Another benchmark showed some anomalies where pg_strom was slower for some queries.

Ruben Gaspar Aparicio from Cern provided some experiences from his work as DBaaS. Except DB related information, some Cern statistics were also unbelieveble -- 6000+ magnets in The large hadron collider, which makes it the largest machine in the world. It produces 0.5PB data in one second during proton collision, totally 100PB stored now, CPU with 250K cores, 2M jobs per day. 8000 physicists require real-time access to all that LHC data from 160 computer centres in 35 countries.

Except 36 PostgreSQL 9.2 and 9.4 they use 226 MySQL and 8 Oracle in DB on demand (DBoD), then RAC with ~100 Oracle DBs. They support SSO in system, overview of instances DB providers offer different things, own DBs on demand is how the physicists work. The DBaaS supports upgrades, monitoring, backup and HA. However, there is no DBA support nor app support nor Vendor support. Everybody gets own database and must solve issues on their own. System supports back-up, configuration changes, PiTR (point in time recovery). Statistics of queries can be intuitively seen by uesrs that are no DB experts. Logs overview helps to fix problems by users themselves.
I was smiling about Ruben's quote that "Deleting is manual, because there are people who don't realize that when they delete something it is no more there."

New machines are installed by puppet and they run on physical servers + OpenStack as infrastructure. They already plan to use containers for running the DBs in Cern. They use Coverity and apps are usually written in Perl 5.20. Travis-CI is then used for running unit-testing.

Correct mount options are very important for proper performance (slave laging) of PostgreSQL.
Tools using: pg_upgrade, pg_snashot,pgbadger, pgreplay
Future vision with Containers (LXC): limit CPU+memory, cli access to instances.

Locked Up: Advances in Postgres Data Encryption by Vibhor Kumar summarized possibilities of encryption in PostgreSQL -- we can either encrypt in applications, pgcrypto modul or use pgp encryption.

Gianni Ciolli talked about repmgr in talk called Automate High Availability using repmgr 3. The tool supports cascading replications, clusterware, uses barman, pgBouncer, pg_rewind.

Talk The Elephants In The Room: Limitations of the PostgreSQL Core Technology by Robert Haas direct I/O was then about dirty kernel page buffers when writing to disc, where PostgreSQL was not able to do almost anything. A need for full featured logincal replication solution was mentioned as something we really need (not only features).
Database size is bigger and bigger -- horizontal scalability seems to be the answer, i.e. the focus is in better sharding.
Other areas to consider: better parallel queries, changes to storage format, direct I/O, built-in connection pooler.

Simon Riggs closed the event with a keynote where he mentioned pgconf.eu 2015 to be the worldwide biggest event ever.
Zero-downtime upgrade is something PostgreSQL upstream wants to achieve when upgrading from 9.4 to 9.5 -- using cross-version replication. BDR (bi-directional replicatoin) is heading into postgres, which may allow multi-master replication in 9.7.
Simon mentioned Stonebraker's Vision who said that there will be either data warehouse database or transactional databases, but nothing will do both. PostgreSQL is proof that Stonebraker was wrong, because PostgreSQL is both. It was also mentioned that patched-PG projects like PgSQL-XL, that focus on scalability and that will be based on 9.5 at 1Q2016 may be several times faster than 9.6.

Dave Page, Magnus Hagander had the final word that belonged to 2nd Quadrant, which is becoming sustainable open shource development, customer funded company that provides Highly Available Support and RemoteDBA.

I already wrote it was my first PGConf.eu, but since the three days were full of very interesting topics and hallway was full of very interesting fans of PostgreSQL, I'm sure it was not the last one.

Monday, November 23, 2015

Thoughts from PGConf.eu, PostgreSQL Europe Conference 2015, Day 2

This is a continuation of notes from PGConf.eu in Vienna this year. First day notes available here and third day notes available here.

I've started the second day with WAL, Standbys and Postgrs 9.5 by Michael Paquier, which was a morning talk about archiving and warm standby and how the postgresql 9.5 may help here.

Then there was my talk called Database containers in enterprise world. I mentioned there our experiences with containers preparation and how RH takes containers (not as VMs, but rather applications); that all was described rather in lower level of detail, but rather with bigger context. Not many people were presented (Robert Haas in the next room talking about Parallelism in postgresql was just too big concurence, just bad luck in timing), but those who were there, were already familiar with containers a bit, which turned into very nice discussion and many questions in the end. The questiones were mostly about differences between Nulecule and kubernetes templates, docker compose; further questions about Nulecule, that I also roughly presented.

VACUUM, Freezing & Avoiding Wraparound was a talk by Simon Riggs where he talked about high concurency, row visibility, deleting rows that make bloat. He explained the principle of vacuum, how long running queries influence it, how it works actually and why it is that important.

Duo Gabriele Bartolini and Marco Nenciarini talked about logging of huge PostgreSQL data and analysing them, which deserves proper tooling in the first place. Guys from 2nd quadrant were talking about experiences with elastic search + logstash + kibana (ELK stack) in a talk Integrating PostgreSQL with Logstash for real-time monitoring.

Linux tuning to improve PostgreSQL performance by Ilya Kosmodemiansky was a great overview around what everything in kernel we need to take into account when tuning postgresql performance. Most of it was about setting proper sizes of memory pages and buffers.

Tomas Vondra talked once again on PGConf.eu, this time about PostgreSQL Performance on EXT4, XFS, F2FS, BTRFS and ZFS after he did serveral benchmarks of different filesystems. Shortly, ZFS was quite poor in default settings, but it can be improved by fixing pages size. Otherwise, the results are quite comparable (XFS, EXT4 seemed the best though). BTRFS seemed quite bad in read-write bechmark, while XFS nad EXT4 were best again there.

Second day was closed by 90 minutes of lightning talks, that have already become tradition on PostgreSQL conference and strict 5 minute rule makes it really interesting.

Sunday, November 22, 2015

Thoughts from PGConf.eu, PostgreSQL Europe Conference 2015, Day 1

It was my first PGConf.eu and it was awesome. My bad that I waited so long to share some of the thoughts, but fixing it now with summary. Links to presentations are located at https://wiki.postgresql.org/wiki/PostgreSQL_Conference_Europe_Talks_2015.

The keynote by Tamara Atanasoska called The bigger picture: More then just code mentioned that community project is more about people, code is second. It was also stressed how important it is to be open to new-comers and users. Great eye opener for enybody involved in open-source.

Upcoming PostgreSQL 9.5 Features talk by  Bruce Momjian involved these news: insert on update, aggregate functions enhancements, in-memory performance, JSON manipulation and operations and improvements in foreign data wrappers, which now can be part of sharding now. I really can't wait for PostgreSQL 9.5 GA.

Dockerizing a Larger PostgreSQL Installation: What could possibly go wrong? by Mladen Marinović was really something what I looked for, because containers is now the topic #1 for me.  Mladen demonstrated using containers mostly as VM, basics of Docker were also introduced, since only half of the room were familiar with it. Problems with using cache during `docker build` and locales inside containers were mentioned, but the solutions was rather hacky to me (using date to trick the daemon, instead of straightforward --nocache). Problem with sending signals to process was alse mentioned, which is something we were looking at also during creation of containers for Open Shift.

Mladen uses Ansible for building images and uses several containers for specific actions -- e.g. a separate container for tools (dump/load). They support replication -- host_standby, restart_command (run in separate container) and run on two physical servers. Backups (pg_basebackup + compress) according retention policy - another container.

It was also mentioned that taking back-up from slave is not that easy. Monitoring, that all containers are alive, is done by HA SW. Problem with OOM were also mentioned, that system may kill your container processes, but solution was to use max_connections. Problem with freezing server was then solved by using timeout for every command. Transparent huge pages were mentioned as something not to use -- use normal huge pages instead. Finaly, upgrading to new major versions were mentioned as tough point always.
See more at: https://bitbucket.org/marin/pgconfeu2015/src

DBA's toolbelt by Kaarel Moppel was more a list of possible tools to look at when you mean it seriously with administration of PostgreSQL. So, just repeating the list may be interesting for someone.:
  • docs + pgconfig for configuring
  • pgbench + pgbench-tools for bechmarking
  • pg_activity (top-like) + pg_view + pgstats + pg_stat_statements + pgstattuple + plugins for monitoring systems (nagios) + postgresql-metricks (spitify) for monitoring
  • pgBadger + pg_loggrep - log analyzing
  • postgresql-toolkit - victorinox for postgresql dba
  • pg-utils, pgx_scripts (https://github.com/pgexperts/pgx_scripts), acid-tools
  • pg_locks, pg_stat_activity (locks)
  • pgObserver, pgCluu, many others, based on what we need to do
  • pgadmin4 on the way (backend and web frontend)
  • developer: pgloader, pg_partman, pgpool-II, PL/Proxy
Managing PostgreSQL with Ansible by Gulcin Yildirim was more an introduction of Ansible.

Let’s turn your PostgreSQL into columnar store with cstore_fdw by Jan Holčapek introduced interesting plugin, that may convert clasic row-based storage PostgreSQL engine into column storage database by utilizing foreign data wrapper concept. In some cases this may help a lot for the performance, once ready.

Performance improvements in 9.5 and beyond by Tomas Vondra was not only an interesting insight to particular areas that PostgreSQL hackers look at, but also nice motivation to look at aupgrading to 9.5. For example sorting speed-up up to 3-4x in comparisson to 9.4 in some cases is something I wouldn't really expect. Another comparisson of BRIN vs. BTREE indexes showed that performance is quite similar, but BRIN is much much much smaller. Another set of graphs showed how parallel scan allows to speed-up selects up to half of the time.

Notes from second day are here and from third day are here.

Saturday, November 21, 2015

How we've fixed upgrade path for CentOS 6 Software Collections

Short meesage for those who don't have time:
Software Collectoins for CentOS 6 are ready for upgrade from older rebuilds.

Now full story for those who care. Btw. this all is related to work done by SCLo SIG group that is part of CentOS (read more at http://wiki.centos.org/SpecialInterestGroup/SCLo).

A bit of history for beginning. Shortly after first RHSCL 1.0 release, CentOS rebuilds were prepared and since then they are available under:

However, keeping these rebuilds in sync with RHSCL content hasn't been easy task. With introduction of Java packages in collections, this task became even more tricky, which means these collections were not updated for long time. With that said, someone would expect there won't be problem with upgrade path, in other words that the new RPMs, that the SCLo SIG group is about to release, will update the older RPMs smoothly.

Well, not always. The original RPMs used ".el6.centos.alt" as %dist tag, while new builds use just ".el6" and that evolves in cases where python27-python-bson-2.5.2-4.el6.centos.alt.x86_64 > python27-python-bson-2.5.2-4.el6.x86_64, even if those packages have same Release tag in RPM SPEC. That obviously means the packages won't update smoothly.

Solution is quite simple in this case -- use higher Release in RPM SPEC. In some packages, this was already done, because some of the packages received update since original inclusion. In other cases we solve it by adding ".scX" (X is number) suffix to the Release tag. The ".scX" was chosen deliberately since ".scX.el6" is higher (alphabetically) than .el6.

Btw. for cases we need to build package more times before final build (bootstraping), we use suffix ".bsX", which means we can build package without any Release suffix in the end, because ".bsX.el6" < ".el6".

Anyway, this post was meant to let you know that upgrading of el6 packages from originally built RPMs is something we care about.

To verify it works, I've installed all the packages from original repository, then ran "yum update" and that evolved in proper update of all packages. I took that as proof it should work fine in your case as well. If there are still some issues, let us know.

Enjoy Software Collections on CentOS!

Friday, February 13, 2015

Thoughts and notes from Prague PostgreSQL Developers Day 2015

The day started with news in PostgreSQL 9.4 presented by Tomáš Vondra, now working for 2ndQuadrant. He talked about improvements in replication, GIN indexes, numeric aggregates, refreshing materialized views, altering system variables directly from daemon, ...

Then, Pavel Stěhule from GoodData gave us a bit more deep review of storing non-structural data within PostgreSQL (from historic implementations to the newest great JSONB new in 9.4). It was quite interesting to see that even traditional relational DB users think seriously about NoSQL solutions like MongoDB. It seems even advocates of those traditional SQL solutions see some advantages of new way to store data (NoSQL concepts).

It was also mentioned that NoSQL is not only understood as not-sql, but more often like not-only-sql and that while relational databases implement features to get closer to NoSQL, NoSQL databases implement features to get a bit closer to traditional SQL solutions (assuming implementing better ACID, reliability, ...)

Vratislav Beneš from OptiSolutions had a talk about unstructured and sensor data, generally about big data challenges. He presented comparison between PostgreSQL 9.4 (with JSONB) and MongoDB for the same type of work with unstructured data. His testing showed there are no big differences in performance, PostgreSQL is however quite better in space utilization (~3 times less data on disk). He closed his talk with a thought that everybody should choose a database carefully, according to a specific use case (e.g. if we care about other NoSQL features like out-of-the-box sharding/replication or we're fine with a bit more heavy-footed, but more reliable SQL solutions).

Aliaksandr Aliashkevich gave quite basic overview about sharding solutions for PostgreSQL, just read his shards.io web for more information, the presentation had basically a similar content.
A similar overview, this time about other open-source tools (dumping, upgrading, partitioning, logical replications) was given by Keith Fiske, actual author of the tools. They are all available under his username on github, so just look there or chech his site keithf4.com for more information.

Marc Balmer from micro systems spoke about securing PostgreSQL application, where he emphasized the need to not limit access only on application level, but also on database level. He didn't omit to give a basic overview about main database vulnerabilities on some specific examples (like SQL injection) but most of the presentation was about ways to secure data within a database from being abused by users that shouldn't have access for them. I think nobody who was there and pays some attention about security will ever use one superuser role to access DB again. Hopefully the slides will be soon at p2d2 site.

Petr Jelínek from 2ndQuadrant talked about a feature, that is heading to PostgreSQL 9.next (not sure yet if 9.5 mades it) -- BDR -- Bi Directional Replication. This basically implements multi-master capabilities and it is already working solution, which is patch-by-patch heading to PostgreSQL upstream. We know upstream is careful about any new feature, so even in this case it goes rather slowly, because upstream needs to be sure it is good for everybody (not only for specific use cases).

Štěpán Bechynský showed Amazon's AWS in practice, especially what are the steps to get a PostgreSQL machine in the cloud (basically PaaS variant). It must have been interesting for anybody who haven't seen how provisioning in cloud works from users' point of view, but for me, as I already saw OpenStack in action, I didn't actually learned much new. I heard from other attendees as well that there were some specific experiences missing -- like performance experiences, what are some specific issues someone new to cloud world experiences, etc.

Since PostgreSQL now supports foreign data wrappers, couple of interesting wrappers are available already. Those add some new functionality to the daemon itself and reminds me MySQL's engine architecture a bit. Jan Holcapek introduced on of those wrappers -- cstore_fwd, which adds columnar storage capabilities to PostgreSQL.

It was interesting to see that even on some non-complicated queries the EXPLAIN command showed several times less IO operations, comparing to native PostgreSQL. For more complicated use cases that may be even better, since going through columns directly for aggregation is much more effective than reading all the rows with most of the data read unnecessarily for the specific query.

Even if this particular foreign data wrapper doesn't support insert/update/delete features, it seemed very promising. It was also interesting when Jan asked audience what column database they use. Since MonetDB was mentioned not only once, it seems to be a good candidate to be packaged into Fedora. Who is volunteering for this?