> MySQL Cluster is a scalable, real-time in-memory, ACID-compliant transactional database
To what degree can something be called durable if it's in-memory?
> It is possible to choose how to store data; either all in memory or with some on disk (non-indexed data only). […] Disk-based data can be used to store data with less strict performance requirements, where the data set is larger than the available RAM.
It's great to have all the concepts spelled out in this article, particularly the way nodes act as transaction managers when needed, synchronously replicate to a "buddy", and have management capabilities for recovering from partition events; however, it's almost misleading to use the MySQL name: this appears to be first and foremost an in-memory solution with some afterthought given to disk-based durability.
It's durable using checkpointing and redologging, this is default behavior.
You can read more details here: http://www.clusterdb.com/mysql-cluster/how-can-a-database-be...
And I agree that using the name MySQL is a bit misleading, perhaps better with NDB (NetworkDataBase) as NDB only uses MySQL as an API for SQL access to data.
I don't know how the MySQL one works, and if it does provide disk-based durability at all, but the way Microsoft SQL server does it is explained Section 7 (at page 7) of [1]
They use a log stream and checkpoints stream that are writing to disk.
VoltDB replicates to k nodes so you can choose the level of durability. Combined with UPSes in separate parts of the datacenter, it's durable in some senses. (I'd be more concerned with a kernel or app bug that crashes the system, if there wasn't a "save RAM on unclean boot".
This only applies to the commercial version, though. The open source one ditched that feature.
VoltDB, like MySQL Cluster and several other in-memory system, uses snapshot checkpoints and logging to maintain power-loss protection. VoltDB is a bit different because of how it uses determinism + logical logging, but the point is that most of these systems don't have to lose data on power loss.
Sure but those are mostly backup, right? If a system could log to disk at a fast enough pace, then it'd not really be a main memory DB, right? It'd be essentially the same as a normal DB with a really good RAM cache. I've got to imagine there are some VoltDB users going fast enough to make logging not 100% loss proof.
Actually no. The key innovation is that data is stored in a memory friendly way, then logged to disk in an append-only, log friendly way. VoltDB can be fast and persistent; the catch is that your data must fit in memory.
Oh really? I'm not sure how I missed that all this time. Is it a newer feature? I don't recall it in the H Store paper or in any of the docs I read. In fact I recall sync vs async logging to be a decision/tradeoff. And there used to be a rather large emphasis on choosing s good k factor, and it certainly seemed to be about durability. Otherwise, it's just another normal HA thing, no big deal? As in k 0 is a perfectly fine setup, if logging is enabled?
Command logging has worked this way since VoltDB 2.0. Sync mode needs hardware that can support more frequent fsyncs, such as disk controller with BBWC (does the job and inexpensive), or a server-class SSD, and should be configured to write the command log and snapshots to separate spindles. Async mode is the default setting so that out of the box VoltDB will perform as expected in the cloud, on a VM, or on a laptop.
I'm sorry but MySQL Cluster is pure refined salty horse shit. Nothing more. We're currently at the office trying to recover our production data from its filthy hands.
I had MySQL clustering in the lab with some serious live traffic for about a month. Unless you have a team of DBs whose job is just to monitor it and keep it going MacGyver style, seriously forget it.
There are other better options out there, with less headaches!
A modern and properly implemented database engine should be bottlenecked on network saturation for these types of workloads even if backed by disk storage, never mind in-memory. These workloads are trivially distributable, so scale-out is expected unless you are doing something really wrong.
In that sense, similar "look how fast my simple lookups are" benchmarks published by other database vendors are really a function of their hardware budget for the benchmark assuming the implementation is competent. Queries with complex constraints or joins would be a more interesting indication of implementation scalability and performance.
I'd be interested to see a benchmark test suite for ACID-compliant databases with, as you mentioned, complex constraints and joins. Get as close to real-world workloads as possible.
I don't want to see the quarter mile, I want to see you road rally.
Note that the 2PC in Ndb is not quite the same as that used for the normal '2PC is blocking' arguments :
1. The transaction coordinator (TC) role is
automatically taken over in case of the
failure of the coordinator during commit.
The new coordinator uses the surviving
participants' states to quickly decide
on Commit or Abort so that locked
resources are released.
2. As the participants are part of an integrated
system, failure of both the coordinator and
e.g. the only participant to be aware of the
commit decision can be handled correctly.
In this case the transaction is quickly
aborted by the new coordinator and on recovery
the old participant will 'undo' its local
committed-state before rejoining the cluster.
3. Participants' local commit decisions are not
coupled directly to lock release / visibility
of those decisions. Locks are released only
when the participant-failure-durability of
the commit decision is assured (e.g. all
replicas have committed).
4. Ndb has a third phase (The complete phase)
which releases locks on replicas. This is
not the same as the algorithm referred to
as three phase commit, so we don't call it
three phase commit. Also, the third Complete
phase is not part of the critical commit
path - the commit acknowledgement is sent
to the application prior to the start of
the third phase, so it does not affect
the user's experienced commit latency.
To what degree can something be called durable if it's in-memory?
> It is possible to choose how to store data; either all in memory or with some on disk (non-indexed data only). […] Disk-based data can be used to store data with less strict performance requirements, where the data set is larger than the available RAM.
It's great to have all the concepts spelled out in this article, particularly the way nodes act as transaction managers when needed, synchronously replicate to a "buddy", and have management capabilities for recovering from partition events; however, it's almost misleading to use the MySQL name: this appears to be first and foremost an in-memory solution with some afterthought given to disk-based durability.