Funny, some Rubyists are stupider than a piece of wood

This one for example. He proposes a simple way to make Java developers angry, by showing them some code. His example is

File.open('server.cert').readlines[1..-2].join.gsub(/\n/, '')

and throws a challenge:

Anyhoo, all you have to do now is find someone who uses a big stupid language and throw an example like above to their face and tell them to beat it. See if they can write it more elegantly using their language. The beauty of the trick is that there’s no way in this world that’s gonna happen.

My 20min Java version looks like this - more or less identical:

File.open("server.cert").readlines(1, -2).join().gsub("\n", "")

Poor blogger, another fanboy without a clue who confuses languages and API design - in this case method chaining.

Because there's a very high probability that the other person, yes, the one who's using a big stupid language, will get so angry and beat you up.

Nope, I just write the code in Java. That's all.

(Beside that, the oneliner is hard to read, hard to maintain, hard to reuse. Fine if your the sole developer in a project, bad if there are 50 others who need to maintain your code. I'd prefer a CertStripper 😉

(As a second side node, where is the fluent interface to google collections, I've needed one today)

Update: Someone wrote a comment to the linked blog post by copying parts of this post. And because someone asked: No this cannot be done with the JDK, therefor it's an API problem. You need to write some code on your own to fluently wrap JDK classes. See the mentioned fluent interface link above to see how this can be done to an existing API.

Sharding destroys the goals of your relational database

Sharding does destroy your relational database - which is a good thing. The idea behind sharding is to distribute data to several databases based on certain criterias. This could for example be the primary key. All entities that keys begin with 1 go to one database, with 2 to another and so on (often modulo functions on the key are used, or groups based on business data like customer location, or function). Several reasons exists for sharding, the main two being better performance and lower impact of crashed databases - only persons with a name that starts with S will be affected by a database crash.

Relational databases were the tool of choice for several decades when it comes to data storage. But they do more than store data. Even reading operations can be split into several functions. There are at least three kinds of database read queries:

  • Data graph building queries: With these you get your data out of the database, customers together with adresses etc.
  • Aggregation queries: How many orders have been stored in the August, aggregated by product category
  • Search queries: Give me all customers who live in New York

Sharding now does away with the second and third query and reduces databases to data storage. Because the shards are different databases on different systems you can't aggregate queries (compared to a cluster) without custom code across systems and you cannot search with one query (only several ones - one to each database). Databases have lead to the notion that search and retrieval are linked together and should be dealt together. Most people think as retrieval and search as the same thing. This has blocked development on technologies. Sharding, S3, Dynamo, Memcached have changed this preception recently. I've written about splitting search and retrieval in "The unholy legacy of databases". There I quote Rickard from Qi4j fame:

Entities are really cool. We have decided to split the storage from the indexing/querying, sort of like how the internet works with websites vs Google, which makes it possible to implement really simple storages. Not having to deal with queries makes things a whole lot easier.

and have concluded

Free your mind! Storage and search are two different things, if you split them, you gain flexibility.

People talked about splitting storage and search for some time now. Search engines like Lucene have driven searching out of databases. But mainly the notion of store&search is prevalent. Sharding as a mechanism for more perfomance and lower risk will move into many web companies and reduce databases to storage mechanism and drop the aggreation (data warehouse and reporting) and search parts. Those can be better filled with real data warehouse servers like Mondrian and search services based on Lucene or semantic enginse like Sesame. And storage might move from databases to simple storages like Amazon Elastic Block Storage or JDBM.

Thanks for listening, and think about your databases.

400 reader milestone

After my 200 reader milestone in January and 300 in May, this blog has reached the 400 FeedBurner reader milestone. Thanks to all the regular readers of this blog for listening 🙂