How MySQL is able to scale to 200M QPS – MySQL Cluster

jasonmp85 · on May 18, 2015

> MySQL Cluster is a scalable, real-time in-memory, ACID-compliant transactional database

To what degree can something be called durable if it's in-memory?

> It is possible to choose how to store data; either all in memory or with some on disk (non-indexed data only). […] Disk-based data can be used to store data with less strict performance requirements, where the data set is larger than the available RAM.

It's great to have all the concepts spelled out in this article, particularly the way nodes act as transaction managers when needed, synchronously replicate to a "buddy", and have management capabilities for recovering from partition events; however, it's almost misleading to use the MySQL name: this appears to be first and foremost an in-memory solution with some afterthought given to disk-based durability.

wwwted · on May 19, 2015

It's durable using checkpointing and redologging, this is default behavior. You can read more details here: http://www.clusterdb.com/mysql-cluster/how-can-a-database-be... And I agree that using the name MySQL is a bit misleading, perhaps better with NDB (NetworkDataBase) as NDB only uses MySQL as an API for SQL access to data.

vmarsy · on May 18, 2015

I don't know how the MySQL one works, and if it does provide disk-based durability at all, but the way Microsoft SQL server does it is explained Section 7 (at page 7) of [1]

They use a log stream and checkpoints stream that are writing to disk.

[1] http://research.microsoft.com/pubs/193594/Hekaton%20-%20Sigm...

MichaelGG · on May 18, 2015

VoltDB replicates to k nodes so you can choose the level of durability. Combined with UPSes in separate parts of the datacenter, it's durable in some senses. (I'd be more concerned with a kernel or app bug that crashes the system, if there wasn't a "save RAM on unclean boot".

This only applies to the commercial version, though. The open source one ditched that feature.

jhugg · on May 19, 2015

VoltDB, like MySQL Cluster and several other in-memory system, uses snapshot checkpoints and logging to maintain power-loss protection. VoltDB is a bit different because of how it uses determinism + logical logging, but the point is that most of these systems don't have to lose data on power loss.

MichaelGG · on May 19, 2015

Sure but those are mostly backup, right? If a system could log to disk at a fast enough pace, then it'd not really be a main memory DB, right? It'd be essentially the same as a normal DB with a really good RAM cache. I've got to imagine there are some VoltDB users going fast enough to make logging not 100% loss proof.

jhugg · on May 19, 2015

Actually no. The key innovation is that data is stored in a memory friendly way, then logged to disk in an append-only, log friendly way. VoltDB can be fast and persistent; the catch is that your data must fit in memory.

MichaelGG · on May 19, 2015

Oh really? I'm not sure how I missed that all this time. Is it a newer feature? I don't recall it in the H Store paper or in any of the docs I read. In fact I recall sync vs async logging to be a decision/tradeoff. And there used to be a rather large emphasis on choosing s good k factor, and it certainly seemed to be about durability. Otherwise, it's just another normal HA thing, no big deal? As in k 0 is a perfectly fine setup, if logging is enabled?

bballard · on May 20, 2015

Command logging has worked this way since VoltDB 2.0. Sync mode needs hardware that can support more frequent fsyncs, such as disk controller with BBWC (does the job and inexpensive), or a server-class SSD, and should be configured to write the command log and snapshots to separate spindles. Async mode is the default setting so that out of the box VoltDB will perform as expected in the cloud, on a VM, or on a laptop.

cengizIO · on May 19, 2015

I'm sorry but MySQL Cluster is pure refined salty horse shit. Nothing more. We're currently at the office trying to recover our production data from its filthy hands.

no1youknowz · on May 22, 2015

Preach it loud and true brother!

I had MySQL clustering in the lab with some serious live traffic for about a month. Unless you have a team of DBs whose job is just to monitor it and keep it going MacGyver style, seriously forget it.

There are other better options out there, with less headaches!

fasteo · on May 19, 2015

>>>> pure refined salty horse shit. Nothing more

Why did you deployed such a "horse shit" product ?

>>>> to recover our production data

Don't you have a backup ?

jandrewrogers · on May 18, 2015

A modern and properly implemented database engine should be bottlenecked on network saturation for these types of workloads even if backed by disk storage, never mind in-memory. These workloads are trivially distributable, so scale-out is expected unless you are doing something really wrong.

In that sense, similar "look how fast my simple lookups are" benchmarks published by other database vendors are really a function of their hardware budget for the benchmark assuming the implementation is competent. Queries with complex constraints or joins would be a more interesting indication of implementation scalability and performance.

toomuchtodo · on May 19, 2015

I'd be interested to see a benchmark test suite for ACID-compliant databases with, as you mentioned, complex constraints and joins. Get as close to real-world workloads as possible.

I don't want to see the quarter mile, I want to see you road rally.

wwwted · on May 19, 2015

There are some more complex benchmarks here: http://www.mysql.com/why-mysql/benchmarks/mysql-cluster/ Best way as always is by running benchmark on your specific workload, there is lot's of help to be found here: http://mikaelronstrom.blogspot.se/ if you want to get started benchmarking MySQL Cluster.

no1youknowz · on May 18, 2015

I couldn't care less.

I found a bug and 5 months later still not resolved.

https://bugs.mysql.com/bug.php?id=75293

Maybe start fixing things people may want to use, instead of claiming "yeah we can do that".

I no longer use MySQL, I have upgraded to something more "enterprise" and more useful.

mrbonner · on May 18, 2015

> distributed transactions based on 2-phase commit.

i just stop reading from there.

Xorlev · on May 18, 2015

That's the way Megastore does it too. Optional 2PC for transactions that span entity groups. So what's wrong with 2PC?

takeda · on May 19, 2015

If you have a distributed system and need to create a transaction across multiple nodes 2PC has some shortcomings[1], the 3PC is addressing this.

[1] https://en.wikipedia.org/wiki/Three-phase_commit_protocol#Mo...

frazerclement · on May 19, 2015

Note that the 2PC in Ndb is not quite the same as that used for the normal '2PC is blocking' arguments :

  1.  The transaction coordinator (TC) role is 
      automatically taken over in case of the 
      failure of the coordinator during commit.  
      The new coordinator uses the surviving 
      participants' states to quickly decide 
      on Commit or Abort so that locked 
      resources are released.

  2.  As the participants are part of an integrated 
      system, failure of both the coordinator and 
      e.g. the only participant to be aware of the 
      commit decision can be handled correctly.  
      In this case the transaction is quickly 
      aborted by the new coordinator and on recovery 
      the old participant will 'undo' its local
      committed-state before rejoining the cluster.

  3.  Participants' local commit decisions are not 
      coupled directly to lock release / visibility 
      of those decisions.  Locks are released only 
      when the participant-failure-durability of 
      the commit decision is assured (e.g. all 
      replicas have committed).

  4.  Ndb has a third phase (The complete phase) 
      which releases locks on replicas.  This is 
      not the same as the algorithm referred to 
      as three phase commit, so we don't call it 
      three phase commit.  Also, the third Complete 
      phase is not part of the critical commit 
      path - the commit acknowledgement is sent 
      to the application prior to the start of 
      the third phase, so it does not affect 
      the user's experienced commit latency.

Not all '2PC' implementations are created equal.

cuimenc · on May 18, 2015

Can you elaborate on this?

ksec · on May 19, 2015

Are there anything similar for Postgre?

takeda · on May 19, 2015

I suppose http://www.postgres-xl.org/ ?