April 16, 2015

                                                    Notes on indexes and index-like structures

                                                    Indexes are central to database management.

                                                    Perhaps it’s time for a round-up post on indexing. ??

                                                    1. First, let’s review some basics. Classically:

                                                    2. Further:? Read more

                                                    April 10, 2015

                                                    MariaDB and MaxScale

                                                    I chatted with the MariaDB folks on Tuesday. Let me start by noting:

                                                    The numbers around MariaDB are a little vague. I was given the figure that there were ~500 customers total, but I couldn’t figure out what they were customers for. Remote DBA services? MariaDB support subscriptions? Something else? I presume there are some customers in each category, but I don’t know the mix. Other notes on MariaDB the company are:

                                                    MariaDB, the company, also has an OEM business. Part of their pitch is licensing for connectors — specifically LGPL — that hopefully gets around some of the legal headaches for MySQL engine suppliers.

                                                    MaxScale is a proxy, which starts out by intercepting and parsing MariaDB queries. Read more

                                                    November 30, 2014

                                                    Thoughts and notes, Thanksgiving weekend 2014

                                                    I’m taking a few weeks defocused from work, as a kind of grandpaternity leave. That said, the venue for my Dances of Infant Calming is a small-but-nice apartment in San Francisco, so a certain amount of thinking about tech industries is inevitable. I even found time last Tuesday to meet or speak with my clients at WibiData, MemSQL, Cloudera, Citus Data, and MongoDB. And thus:

                                                    1. I’ve been sloppy in my terminology around “geo-distribution”, in that I don’t always make it easy to distinguish between:

                                                    The latter case can be subdivided further depending on whether multiple copies of the data can accept first writes (aka active-active, multi-master, or multi-active), or whether there’s a clear single master for each part of the database.

                                                    What made me think of this was a phone call with MongoDB in which I learned that the limit on number of replicas had been raised from 12 to 50, to support the full-replication/latency-reduction use case.

                                                    2. Three years ago I posted about agile (predictive) analytics. One of the points was:

                                                    … if you change your offers, prices, ad placement, ad text, ad appearance, call center scripts, or anything else, you immediately gain new information that isn’t well-reflected in your previous models.

                                                    Subsequently I’ve been hearing more about predictive experimentation such as bandit testing. WibiData, whose views are influenced by a couple of Very Famous Department Store clients (one of which is Macy’s), thinks experimentation is quite important. And it could be argued that experimentation is one of the simplest and most direct ways to increase the value of your data.

                                                    3. I’d further say that a number of developments, trends or possibilities I’m seeing are or could be connected. These include agile and experimental predictive analytics in general, as noted in the previous point, along with:? Read more

                                                    August 31, 2013

                                                    Tokutek’s interesting indexing strategy

                                                    The general Tokutek strategy has always been:

                                                    But the details of “writes indexes efficiently” have been hard to nail down. For example, my post about Tokutek indexing last January, while not really mistaken, is drastically incomplete.

                                                    Adding further confusion is that Tokutek now has two product lines:

                                                    TokuMX further adds language support for transactions and a rewrite of MongoDB’s replication code.

                                                    So let’s try again. I had a couple of conversations with Martin Farach-Colton, who:

                                                    The core ideas of Tokutek’s architecture start: Read more

                                                    April 22, 2013

                                                    Notes on TokuDB and GenieDB

                                                    Last week, I edited press releases back-to-back-to-back for three clients, all with announcements at this week’s Percona Live. The ones with embargoes ending today are Tokutek and GenieDB.

                                                    Tokutek’s news is that they’re open sourcing much of TokuDB, but holding back hot backup for their paid version. I approve of this strategy — “doesn’t lose data” is an important feature, and well worth paying for.

                                                    I kid, I kid. Any system has at least a bad way to do backups — e.g. one that involves slowing performance, or perhaps even requires taking applications offline altogether. So the real points of good backup technology are:

                                                    GenieDB is announcing a Version 2, which is basically a performance release. So in lieu of pretending to have much article-worthy news, GenieDB is taking the opportunity to remind folks of its core marketing messages, with catchphrases such as “multi-regional self-healing MySQL”. Good choice; indeed, I wish more vendors would adopt that marketing tactic.

                                                    Along the way, I did learn a bit more about GenieDB. In particular:

                                                    I also picked up some GenieDB company stats I didn’t know before — 9 employees and 2 paying customers.

                                                    Related links

                                                    April 14, 2013

                                                    Introduction to Deep Information Sciences and DeepDB

                                                    I talked Friday with Deep Information Sciences, makers of DeepDB. Much like TokuDB — albeit with different technical strategies — DeepDB is a single-server DBMS in the form of a MySQL engine, whose technology is concentrated around writing indexes quickly. That said:

                                                    *For reasons that do not seem closely related to product reality, DeepDB is marketed as if it supports “unstructured” data today.

                                                    Other NewSQL DBMS seem “designed for big data and the cloud” to at least the same ex