Why NoSQL Will Not Die

Reading the recent flamory piece "I Can't Wait for NoSQL to Die" from Ted Dziuba, I thought the author is wrong on so many levels. Or as Jeremy writes:

Well done, Ted. I laughed to myself a few times reading your post.

Not that I'm a NoSQL zealot, see my The Dark Side of NoSQL, but Ted is hilarous. On to our first laugh:

Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan.

No it wasn't, without heavy memcaching MySQL never worked for websites. Or:

Well, no. Did you know that Cassandra requires a restart when you change the column family definition? Yeah, the MySQL developers actually had to think out how ALTER TABLE works, but according to Cassandra, that's a hard problem that has very little business value. Right.

It seems to me, Ted has never worked with real live MySQL applications. ALTER TABLE is a pain, for tables with several millions of rows it will take hours to alter a table, mostly because MySQL creates temp tables. Which is no problem if your domain and market is static - as I assume Teds is - or if your MySQL schema is meta. But for others this is hell. All the while it does locks and your website is heavily impaired during the change. Even dumping the table, recreating it and importing all data (which is faster than ALTER TABLE) takes usually hours. You can work around this with hardware, SSDs, a clever slave setup, but you need a MySQL wizard to get this working.

The real solution to schema changes with high volumes of data is not to have a schema at all in your store - something most NoSQL databases support. This is mostly done by storing XML or (B)JSON into the store, and the store does not care about your schema. Your app then needs to deal with different versions of a schema (with at least two) and migrate data from the old schema to the new one between reads and writes (NoSQL to the rescue: Store JSON data with a version string, read old version, change, write new version). Or deal with optional values from the beginning, something a lot of code already does with sparse filled social media data. A background job can also migrate data piece by piece to a new version. With this setup, schema changes are easy, without a complicated slave setup or downtimes.

The problem with RDBMS doesn't end there. In a post to High Scalability Joe Stump writes:

Precompute on writes, make reads fast. This is an oldie as a scaling strategy, but it's valuable to see how SimpleGeo is applying it to their problem of finding entities within a certain geographical region.

I wrote in more detail about this in "Essential storage tradeoff: Simple Reads vs. Simple Writes" and how RDBMS wrongly optimize wirtes (I know about materialized views).

m3mnoch speculates about the reasons for Teds laughable post:

it doesn’t look like he’s ever done anything for a large, mainstream audience. i bet he still thinks getting slashdotted or techcrunched is the definition of “a lot of users.” [...] my point is, this isn’t 1998 anymore.

Agreed, even my humble WordPress blog on one server survives this. The reason MySQL works for Ted is:

because i totally believe that google adwords runs on mysql. IT’S READ-ONLY! that’s what mysql is good for — lots of read-heavy, cacheable data you can map against other read-heavy cacheable data.

Back to Ted:

You Are Not Google. The sooner your company admits this, the sooner you can get down to some real work.

He's right. You are not Google, so you will not have those MySQL wizards around who write patches for InnoDB. And he's wrong. You will get into scaling troubles with MySQL far before you're as big as Google.

DBAs are a reason for NoSQL

DBAs should not be worried, because any company that has the resources to hire a DBA likely has decision makers who understand business reality.

Another real gem. One of the reasons people go for Cassandra is that they do not need as many DBAs as with MySQL. Clustering and Scaling works out of the box for a wide range of scenarios - cases which you would have needed a MySQL wizard to reach. In Joe Stump words, CTO and co-founder of SimpleGeo:

How much time are your DBAs spending administering your RDBMSs? How much time are they in the data centers? How much do those data centers cost? How much do DBAs cost a year? Let’s say you have 10 monster DB servers and 1 DBA; you’re looking at about $500,000 in database costs.

The cost of RDBMS operations

And more about the cost of operating RDBMS with large websites:

I’m running a 50 node cluster, which spans three data centers, on Amazon’s EC2 service for about $10,000 a month. Furthermore, this is an operational expense as opposed to a capital expense, which is a bit nicer on the books. In order to scale a RDBMS to 6,000 reads/second I’d need to spend on the order of five months of operation of my 50 node cluster. [...] I’m happy to put my $/write, $/read, and $/GB numbers for my NoSQL setup against anyone’s RDBMS numbers.

SQL databases will survive, but for a much smaller niche (transcational data) than today. For sure NoSQL will not die in the near future: They support schema changes better, they scale better for write heavy applications and they are cheaper to scale all in all.

Other NoSQL posts on CodeMonkeyism:

How Java needs to become cleaner

Reading lots and lots of Java code made me realize there is too much noise in there. Not that I agree with the Railists cry of ceremony - they just don't get it - but too much noise because of purely bad choices for defaults and missing syntactic sugar for often used cases.

Why would we need less noise, how is less noise helpfull? Sometimes it's hard to see what happens in methods - and it's not helped by the fact that many Java developers don't do the best Java can do, see my post on "Next generation Java programming style".

The cry for a less noisy Java are limited by the actual developers, all changes would need to be easy enough that they do not change the language. We do not want another Lisp (because we would use Clojure then) but a language for the same target audience as Java. A language should support best practices - those we've learned over the last 15 years of using Java - and not make it hard and noisy to use them.

So here I go:

  1. Not dropping explicit type declarations (see my "Explicit Static Types are not for the Compiler, but for the Developer – Duh"), but infer types more often (see Scala), Java could right now infer more types, without the need for advanced algorithms (e.g. start with the Return type). As James writes:

    In Java I "only" have to annotate types when I want to assign to a variable or return from a method. Which, come to think of it, is an odd restriction. The compiler obviously knows the type, so why do I have to tell it yet again?

    Keep the possibility to declare types for narrowing down scope.

  2. Drop the public modifier for methods, everything is public, until it is declared private or protected.
  3. Every field is private by default.
  4. Support for properties, no need for setter and getters, all properties are public unless declared private (how to annotate properties?). Getters and setters override internal property accessors.
  5. Sugar for creating lists and maps, [] and [:] - hopefully nicer than Scala does it, Groovy does this in a consistent way.
  6. And just add Tuples, which are sometimes useful, or at least map them to a list and add "extractors" val a,b = returnsTuple();. As Java Pitstop writes:

    When I was coding in Java I used to build Classes just to return multpile values and also sometimes used pass by reference (by means of using Objects). I really missed a permanent solution [...]

    This doesn't mean you shouldn't think if the return type justifies a real type obviously.

  7. drop ";" it really only adds noise
  8. Final by default: Final is your new love, all variables, attributes, parameters are final, see my All variables in Java must be final. If a variable, attribute or parameter isn't final, declare it with "mutable"

Changes that would change Java significantly, all APIs and all thinking, are more controversial:

  1. Add Closures
  2. Or at least: Support for anonymous inner classes, where I do not have to write the boiler plate code when there is only one method in the interface

What would you do to reduce noise in Java, without making it a completly different language?

Playing with Play Framework for Java

Recently I've been playing more seriously with the Play-Framework. Play is a newer rapid development web stack for developing web applications. My impressions in short: this is a very nicely framework for write applications, with much potential but which is at the very beginning. This means there aren't awfully many features available (compare with Lift, also a recent framework for Scala on the JVM).

Good parts

The good parts I've found are:

  1. Java (JVM) based
  2. Aims for fun for developers
  3. Fast turnaround, automatically reloads Java classes, templates etc. which is one of the main gripes I have with Java, as stated in my "Java dead?" post:

    Rapid development and rapid turnaround. Java still falls flat, even with JRebel which allows seamless reloading of classes, RAD web frameworks like Wicket and splendid IDEs. Rails, Django and PHP are better and have a faster turn around. Period. Java is lacking here, and reloading changes look to be the biggest problem with Java development today.

  4. Real world developer oriented, not power oriented or hacker oriented. For example errors are shown in a nice way with the corresponding code in the browser.
  5. Error handling in Play

  6. Interesting modules available (Guice, Sass, GAE, OAuth), more Rails oriented than classic Java (which is a good thing!)
  7. No need to write getters and setters, they are automatically added. Another one of the gripes I have with Java
  8. Based on JPA, but other persistence like Siena supported
  9. Scala possible (Shorter, even nicer). Downside: Not as nice integrated as Lift, e.g. Option usage in Lift, composable attributes for domain classes
  10. Integrates Unit, Functional and Selenium (often aceptance tests) tests in one web page. Different testing styles can be used, e.g. JUnit with asserts, JUnit with should, Spec style and more.
    class SpecStyle extends UnitTest 
                          with FlatSpec 
                          with ShouldMatchers {
     
      val name = "Hello World"
    
      "'Hello World'" should "not contain the X letter" in {
        name should not include ("X")
      }
    
      it should "have 11 chars" in {
        name should have length (11)      
      }
    }
    

The things I don't like so much

There were some parts during my playing around, which I didn't like with Play:

  1. I would prefer the Lift (or StringTemplate) way of templating, with no code in templates (or the RIFE way)
  2. Not on the NoSQL bandwagon - NoSQL really is the future for some web apps I believe
  3. Wish it would use a build system, not for building, but for dependency managment, packaging etc (Gradle for example, or at least Maven).
  4. Not yet a big community, modules limited, high risk bet
  5. Minor one: Bazar? Wouldn't have thought this is needed with Git and Mercurial, don't want to learn a third competing DVCS

Overall an interesting new comer, which brings new ideas to the Java table. Definitely something to watch.