Tuesday, February 07, 2017

FOSDEM 2017 - personal notes from Saturday

As always, do not expect the following text to be a nice reading, it's more a sack of notes for further reference. And yeah, there was also the CentOS Dojo one day before and second date of Fosdem the day after.

Toooo big FOSDEM

First weekend in February means that several thousands of free and open source fans get to Belgium capital, Brussels. First visitor's idea about Fosdem is that it is year to year more and more crowdy. Talks in smaller rooms are almost always full and one needs to sneak inside in advance to get to some interesting talk. On the other hand, number of interesting talks is not decreasing, so it is still worth traveling there, especially when you meet friends from last year or other conferences. For the new comers, at least there was 15 degrees more than in Brno that time. All about the event is available at https://fosdem.org/2017.

Optimizing MySQL, MariaDB already in compiler

Optimizing SQL without touching my.cnf talk by Maxim Bublis, a Dropbox engineer opened MySQL devroom. He spoke about billions of files stored daily, PB of metadata, and EB of raw data.
They use sysbench to test performance and Bazel.build for building own flavor of packages, because profile-guided optimization (PGO) requires rebuild.
It's not a good idea to use benchmarking for unit tests, since those test corner cases.

Maxim mentioned concrete options they use in GCC for PGO: --fprofile-generate --fprofile-correction
Clang with PGO is a bit more successful, it was interesting that GCC 4.9 was worse than 4.6, 5.4 even much more worse, but on the other hand very good with PGO builds.
Link-time optimization, done by linker instead of compiler, is not supported in MySQL so far. However, they start coding on it.
Totally, they achieved 20% improvement, and have many further ideas to future.
More info: https://fosdem.org/2017/schedule/event/opti_mysql/

Sysbench talk by Alexey Kopytov started from history, that it originally was a simple tool, then became more complicated to be usable in general use cases, so Lua was chosen as scripting language. But then a real surprise came -- Alexey was happy to announce the first 1.0 release after 10 years of development.
Option --mysql-dry-run can measure tps in MySQL without need to know server structure. Thanks to LuaJIT and xoroshiro128+ the tool is several times faster. Performance also benefits from having no mutexes and no shared counters -- ConcurencyKit is used to avoid those, so stats are counted per thread.

Sysbench can be used in shebangs as well. Supporting all cmd-line options as global vars was troublesome, version 1.0 can validate them better.
Arbitrary C functions can be called from lua scripts, so no binary execution is needed. It's possible to use more connections in one thread. In the new version we can also see what previously was hidden to user - latency histograms. Users also may find useful to ignore some particular errors, especially in distributed environment. After long considerations Windows support was dropped. Live long and prosper, sysbench!
More info: https://fosdem.org/2017/schedule/event/sysbench/

When one instance is not enough

Spark for multi-procesing explained by Sveta Smirnova from Oracle gave an overview about how Spark differs from MySQL, why parallelism makes full table scan even better than indexing, which is missing in Spark.
More info: https://fosdem.org/2017/schedule/event/mysql_spark/

Group replication is a plugin, which was added to 5.7 version of MySQL pretty recently. It allows to automate failover in single primary mode, provides fault tolerance, multi-master updates on any node, enables group reconfiguration, etc. So, it's a pretty big deal added in the minor version, that for sure deserves some closer look.

The principle is simple, as Alfranio explained -- the plugin checks conflict, on conflict the transaction rolls back, else it gets committed.
Then he explained the concurrency problem solution, how more entities can agree on some fact, using paxos, which is used in group replication.
More info: https://fosdem.org/2017/schedule/event/mysql_gr_journey/

Which proxy for MariaDB?

Colin Charles went through existing proxy solutions we have today for MySQL and clones.
MySQL Proxy is middle part between client and server, with Lua interpretter to rewrite queries, add statements, filter results, etc. It is not much active now though.

MaxScale is similar, fully plug-able architecture operating on level 7, allowing logging, or working with other back-ends. Schema-based sharding, binlog server, or query re-writing also supported in MaxScale. Very usable for Galera cluster and Kafka backend also exists. MaxScale has first forks by booking and airbnb (connection pooling), because they use Ruby and it is then necessary for them to pool connections. Colin also mentioned the license change, when MaxScale 2.0 was released under BSL.

MySQL Router - fully GPLv2, allows transparent routing, plugin interface via MySQL Harness, can do Failover, load ballancing (at this point Colin suggested to see https://www.youtube.com/watch?v=JWy7ZLXxtZ4). MySQL Router does not work with MariaDB server though.

ProxySQL - stable, brings HA to DB topology, connection pooling and multiplexing supported, R/W split and sharding, seamless failover, Query cashing, rewriting, ...
SSL encryption not that good as in MaxScale, also no binlog router. Maxwell's Daemon allows to connect ProxySQL with Kafka.
http://vitess.io/ is also intersting, it allows to scale MySQL as well.
http://proxysql.com/compare shows comparison with other tools.
More info: https://fosdem.org/2017/schedule/event/mysql_proxy_war/

RocksDB rocks even in MariaDB

Last two talks were dedicated to the new engine for MySQL/MariaDB, built on top of RocksDB, called MyRocks. It provides better write efficiency, good enough read, best space efficiency, effective with SSD. In the Facebook testing for example, it uses 50% of spaced used by compressed innodb, or 25% of space used by uncompressed InnoDB, and only 10% of disk operations.
Performance problems are still not fully fixed, but goal for performance is to be as good as InnoDB. As for stability, what can be better prove that it is already stable, than the fact, that it is used already in Facebook for storing user data in production.
Goal is to integrate to MariaDB and Percona upstreams and expand features in the new engine.
It all is more visible when using small rows, because the per-entry overhead is very small comparing to innodb.
Bloom filter helps when non-existent IDs are read (index tree does not need to be read).

Currently it is hard to debug, has issues when using long range scans or when using large transactions or group commit.
For performance check, Linkbench is used, trx time went from 6ms to 1ms when writing, it went worse for reading a bit, but saving time during writes we have more for reads.
More info: https://fosdem.org/2017/schedule/event/myrocksdb/, https://fosdem.org/2017/schedule/event/myrocks_at_fb/

Really big data in many dimensions

Peter Bauman, a database expert from Jacob's University talked about datacubes on steroids with ISO Array SQL. Their long term research helped adding multidimensional arrays to SQL standard. Big Data can be structuralized in many ways, sets, trees, graphs or arrays. The idea of arrays in SQL is similar to other languages -- a column can simply be composed of more values. That should get to the standard later this year and it not only allows to access particular values, but also do other complicated operations, like matrix multiplication, histograms, and other operations that can be composed from those. Yeah, a real algebra in relational databases.
The research can be seen in practice on planetserver.eu, which includes 20TB of data. Generally we must count that the data will be bigger than memory if we speak about BigData..
Distribution of data gets crazy, since they not only fetch data from datacenters far away, but also want to put a database to the sattelite, and querying it from Earth.
They don't use hadoop, because it doesn't know about arrays, that might be TB long (which is the case when working with bit bitmaps for example).
Comparing their Rasdaman, Sparc and Hive, when getting more than one pixel, it didn't look pretty well for Hive and especally not for Sparc.
As for number of dimensions they look at several types of data -- it begain with 1D from sensors, and they went to 4D when modelling athmosphere.
More info: https://fosdem.org/2017/schedule/event/datacubes/

Live Patching of Xen and CI in Ubuntu

Ross Lagerwall from Citrix XenServer talked about Xen and live patching, which is needed because live migration is too slow and makes host unusable during that time. Xen looks at whether a payload needs to be applied when it is possible, so no patched code is on the stack. What they do then is putting jump istruction to the beginning of the function while stopping IRQs and WP.
The payloads are created by kpatch tool from fully build Xen and running some kind of diff tool, while picking just the changed functions. When applying, some non-static data can be handled, while touching initialized data is prevented. Definitely worth watching.
More info: https://fosdem.org/2017/schedule/event/iaas_livepatxen/

Martin Pitt talked about Ubuntu testing infrastructure. Developers must be responsible for writing tests, if the testing should be done in meaningful way.
QE should be responsible for keeping infrastructure working and for consultancy.
In the end, when the testing infrastructure really helps is the upstream testing, where test results are added for every pull request (systemd shown as example).
More info: https://fosdem.org/2017/schedule/event/distribution_ci/

So that was the first day of the Fosdem 2017.