Why NoSQL Will Not Die

Reading the recent flamory piece "I Can't Wait for NoSQL to Die" from Ted Dziuba, I thought the author is wrong on so many levels. Or as Jeremy writes:

Well done, Ted. I laughed to myself a few times reading your post.

Not that I'm a NoSQL zealot, see my The Dark Side of NoSQL, but Ted is hilarous. On to our first laugh:

Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan.

No it wasn't, without heavy memcaching MySQL never worked for websites. Or:

Well, no. Did you know that Cassandra requires a restart when you change the column family definition? Yeah, the MySQL developers actually had to think out how ALTER TABLE works, but according to Cassandra, that's a hard problem that has very little business value. Right.

It seems to me, Ted has never worked with real live MySQL applications. ALTER TABLE is a pain, for tables with several millions of rows it will take hours to alter a table, mostly because MySQL creates temp tables. Which is no problem if your domain and market is static - as I assume Teds is - or if your MySQL schema is meta. But for others this is hell. All the while it does locks and your website is heavily impaired during the change. Even dumping the table, recreating it and importing all data (which is faster than ALTER TABLE) takes usually hours. You can work around this with hardware, SSDs, a clever slave setup, but you need a MySQL wizard to get this working.

The real solution to schema changes with high volumes of data is not to have a schema at all in your store - something most NoSQL databases support. This is mostly done by storing XML or (B)JSON into the store, and the store does not care about your schema. Your app then needs to deal with different versions of a schema (with at least two) and migrate data from the old schema to the new one between reads and writes (NoSQL to the rescue: Store JSON data with a version string, read old version, change, write new version). Or deal with optional values from the beginning, something a lot of code already does with sparse filled social media data. A background job can also migrate data piece by piece to a new version. With this setup, schema changes are easy, without a complicated slave setup or downtimes.

The problem with RDBMS doesn't end there. In a post to High Scalability Joe Stump writes:

Precompute on writes, make reads fast. This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region.

I wrote in more detail about this in "Essential storage tradeoff: Simple Reads vs. Simple Writes" and how RDBMS wrongly optimize wirtes (I know about materialized views).

m3mnoch speculates about the reasons for Teds laughable post:

it doesn’t look like he’s ever done anything for a large, mainstream audience. i bet he still thinks getting slashdotted or techcrunched is the definition of “a lot of users.” [...] my point is, this isn’t 1998 anymore.

Agreed, even my humble WordPress blog on one server survives this. The reason MySQL works for Ted is:

because i totally believe that google adwords runs on mysql. IT’S READ-ONLY! that’s what mysql is good for — lots of read-heavy, cacheable data you can map against other read-heavy cacheable data.

Back to Ted:

You Are Not Google. The sooner your company admits this, the sooner you can get down to some real work.

He's right. You are not Google, so you will not have those MySQL wizards around who write patches for InnoDB. And he's wrong. You will get into scaling troubles with MySQL far before you're as big as Google.

DBAs are a reason for NoSQL

DBAs should not be worried, because any company that has the resources to hire a DBA likely has decision makers who understand business reality.

Another real gem. One of the reasons people go for Cassandra is that they do not need as many DBAs as with MySQL. Clustering and Scaling works out of the box for a wide range of scenarios - cases which you would have needed a MySQL wizard to reach. In Joe Stump words, CTO and co-founder of SimpleGeo:

How much time are your DBAs spending administering your RDBMSs? How much time are they in the data centers? How much do those data centers cost? How much do DBAs cost a year? Let’s say you have 10 monster DB servers and 1 DBA; you’re looking at about $500,000 in database costs.

The cost of RDBMS operations

And more about the cost of operating RDBMS with large websites:

I’m running a 50 node cluster, which spans three data centers, on Amazon’s EC2 service for about $10,000 a month. Furthermore, this is an operational expense as opposed to a capital expense, which is a bit nicer on the books. In order to scale a RDBMS to 6,000 reads/second I’d need to spend on the order of five months of operation of my 50 node cluster. [...] I’m happy to put my $/write, $/read, and $/GB numbers for my NoSQL setup against anyone’s RDBMS numbers.

SQL databases will survive, but for a much smaller niche (transcational data) than today. For sure NoSQL will not die in the near future: They support schema changes better, they scale better for write heavy applications and they are cheaper to scale all in all.

Other NoSQL posts on CodeMonkeyism: