Anatomy of a Flawed Clojure vs. Scala LOC Comparison

Update: The author of the post did the same biased LOC counting with Python vs. Clojure - Dhananjay Nene has a nice post about the Clojure/Python comparison Update 2: He did it with Java and PHP too. What a diservice to Clojure.

I've came across an article comparing Clojure and Scala. As I've did one too (Part 2) I'm naturally interested in the topic. But the article turned out to be FUD. Wikipedia says about FUD:

Fear, Uncertainty, and Doubt (FUD) is a tactic of rhetoric and fallacy used in sales, marketing, public relations, politics and propaganda. FUD is generally a strategic attempt to influence public perception by disseminating negative information designed to undermine the credibility of their beliefs.

Does the appearence of FUD articles mean the language war did start? I think both Scala and Clojure appeal to different audiences, but both seem in the race for replacing Java as the next mainstream thing (@headius no I don't think JRuby is the next Java thing, although you do excellent work with JRuby - I believe it's the next Ruby thing).

In the sense of Hacknots Anecdotal Evidence and Other Fairy Tales essay, I try to give some thorough criticisms and new facts concerrning the original blog post. Hacknot writes:

"As software developers we place a lot of emphasis upon our own experiences. This is natural enough, given that we have no agreed upon body of knowledge to which we might turn to resolve disputes or inform our opinions. Nor do we have the benefit of empirical investigation and experiment to serve as the ultimate arbiter of truth, as is the case for the sciences and other branches of engineering - in part because of the infancy of Empirical Software Engineering as a field of study; in part because of the difficulty of conducting controlled experiments in our domain. Therefore much of the time we are forced to base our conclusions about the competing technologies and practices of software development upon our own (often limited) experiences and whatever extrapolations from those experiences we feel are justified. An unfortunate consequence is that personal opinion and ill-founded conjecture are allowed to masquerade as unbiased observation and reasoned inference."

1. Are the lines of code really that different?

I'm very interested in Lines of Code (LOC) comparisons, as I've learned a lot from my LOC comparison of Java and Python (2x to 4x more LOC in Java). The numbers the author gives are:

  1. Scala 54
  2. Clojure 19

The difference in LOC looks to be on a big part due to missing library calls in Scala. Scala is missing a using and writing/reading to a file. While the using is part of my standard toolkit in Scala project, I must admit I do not know of a widely used library which provides such a method (perhalps scalaz or scalax?).

The reading of files is something different. Reading a little about Guava, would have made it clear that it doesn't make sense - at least in Javaland - to compare LOC based on missing library functionality.

Using Guava

List[String] lines = Files.readLines(file, Charsets.UTF_8);

instead of

implicit def file2String(file: File): String = {
  val builder = new StringBuilder
  using (new BufferedReader(new FileReader(file))) { reader =>
    var line = reader.readLine
    while (line != null) {
      line = reader.readLine

results in 1 line of code compared to 11 - 10 lines too much in the Scala example. Or as some have pointed out, using fromFile in Scala io makes the example as short.

Beside that, there is plainly code missing in the Clojure example compared to the Scala one. I've thought I might be wrong, comparing LOC and then just leaving out lines looked too apparent a flaw, but the code isn't the same it seams. Time measurements, comments and println are just left out: at least 7 lines of code.

2. Micro benchmarks are most often flawed

Everything that has been said before about micro benchmarking, and it seems has to be said to each new generation of JVM developers, applies:

Toward the end of writing high-performance code, developers often write small benchmark programs to measure the relative performance of one approach against another. Unfortunately [...] assessing the performance of a given idiom or construct in the Java language is far harder than it is in other, statically compiled languages.

The Java VM is a strange beast, with startup times, hot spot inlining, escape analysis and compiler optimizations. Writing the code a little different, makeing loops a little larger might make the code much faster - or slower. So all micro benchmarks on the JVM must sceptically analyzed and questioned.

3. Too much type annotations in Scala?

The next claim about Scala is the excessive usage of type annotations, or said differently:

As a Scala developer you spent much of your time writing out variables types which must be tedious, [...]

When counting all type annotations (not the object constructors) I end with 150 characters of source code.

Map[K, V]
[K, V]
Tuple2[K, V]
Tuple2[K, V]
[String, Int]

Compared to 2039 characters of the whole script, this results in 7.4% type annotations in Scala. Not something I would call "much of my time" - indeed only 7.4% of my time writing code, which in itself is only a small part of development.

4. Other criticisms

There are more lines of code missing in the comparison. As one comment noted:

More … the Scala code does som checking to see if the file exists or not. Your Clojure code does not do that. That will bump up the LOC by 4 – 5 lines i guess

to which the author replies

In regards to checking if the directory exists, its a matter of adding (when-let (dir (file-seq …)) 🙂

But don’t read this as the authoritative comparison between Ruby and Clojure, it’s just a fun little exercise.

Two remarks: Exactly, the line is missing and should be have been added from the beginning. The title, tone and content of the post - especially on the insistence that Clojure is both faster and has less LOC - doesn't reflect the "fun little exercise" but looks quite serious.


The conclusion is simple. Comparing languages based on LOC is a dangerous thing and shouldn't be taken lightly. The comparison is flawed, the Scala example could be reduced by 17 LOC (an astonishing 31%) and the type annotations are not "much of the time" but only 7.4%. Keep comparing languages, but based on facts and rigorous maths, not on FUD. Said that, if you find flaws in my post I would be glad to correct them.