Comparing Java and Python – is Java 10x more verbose than Python (LOC)? A modest empiric approach

In my last post about “50k lines of code considered large?” I’ve wondered about large code bases and the different perceptions on what a large code base is. I came to the topic because of a blog post: “The Maintenance myth” by Ola Bini. One minor point he makes about maintanence is lines of code in dynamic languages. I know maintanence is mainly about technical debt. But I’m interested in how lines of code factor into maintanence problems. Ola says

“(very large code bases is 50K-100K lines of code in these languages)”

pointing to Ruby and Python. In a reply to a comment from me Ola writes “I would consider 50k-100k in Ruby to be very large, yes, definitely. I know of Python code bases between 100k and 200k, but that’s about the largest I’ve heard of.” With my Java background – and some Ruby and Python background mainly from the 90s – I consider very large applications to be much bigger, perhaps 500k to 1M for very large – not 50k lines as for example SnipSnap has ;-) The Linux kernel contains between 6.4M and 10M lines of code depending on the way you count. There seems to be a huge difference in what people consider very large. There could be several reasons:

  • Python and Ruby are very difficult so smaller code bases are considered very large
  • Python and Ruby are more dense so the same amount of logic can be expressed in lesser lines of code

Considering the second hypothesis the factor should be between 10x (50k compared to 500k) and 20x (50k to 1M) for things people consider very large – taking Ola and his coworkes and me (I didn’t ask my team ;-) as a very small sample set.

Therefor I’ve expressed an example in code. The example is an application fragment for managing songs – the idea coming from the common Ruby introduction. I’ve chosen to compare Python and Java because Python is considered by some people a more mature language and used by larger projects than Ruby and because I did more and bigger projects in Python (my Ruby experience is only some years of coding web applications in Rails and writing an OR mapper and web component framework in Ruby by myself: “Convention over Configuration Framework in Ruby from 2002”). Someone could do a Ruby comparison :-)

People may be surprised, but Java development and style – at least avantgarde – has changed over the last 13 years. So the example might not look like you think that Java should look. It reflects the style I would right now write green field Java code. It is inspired by Domain Driven Design and functional principles (for more about DDD and composite oriented programming in Java see Qi4J and real world Qi4j). In true DDD style I would prefer more objects like Name and Duration – see “Never, never, never use String in Java (or at least less often :-)” – but I’ve cut the example for brevity. Some people would not use a SongList domain object but a List directly. From my point of view, if SongList is a Domain Object and not an implementation detail, you should create a class and not use a List. So I’ve used a list.

A note on formatting: Lately parts of the wave front surfing Java developers switched to one line formatting of small methods, something which helps code readability and understanding a lot (you’ll see). IntelliJ IDEA does support this as a formatting option. It’s a very good feature in IDEA but to my shame I only detected it very late, but glad I did as it’s so much better this way.

For manipulating, filtering and transforming lists I currently use Google collections. For an introduction see here. Google collections make working with lists much easier.

The REST part of the application is missing in Python as I have not enough knowledge to write the code with a state of the art REST framework. Perhaps someone could fill me in. In the Java example I’ve chosen to create the JSON and XML on my own without an automatic mapper. Automatic mappers do exist though, one could use JAXB. The code for the builder is explained in a previous post.

One cautionary note: The only Python books I have are “Internet Programming With Python” and “Programming Python”, the first edition, both from 1996. Sorry that my Python is rusty, all correcting comments or comments on how to do it better are welcome. Please focus on better, not on shorter.

On to the examples:

Java

Some example usage:

and a simple REST service which returns a JSON or XML (depending on the request) representation of the song list.

The example in Python

some example usage

A very preliminary conclusion

The example is very short and perhaps not very meaningful. One would need to do more empiric research (e.g. comparing FP to LOC in different languages). And perhaps some readers will provide addtional information. So the conclusion is preliminary and will be updated. Counting the lines of code there are 33 NCSS in Java and 19 NCSS in Python. Java has around 1.7 times the LOC of Python from my example. Taking the hypothesis above this could mean several things:

  • I’ve written sub-par code and most applications differ significantly in style and are much shorter in Python
  • Code complexity and lines of code arise from frameworks not the language
  • Java is really only 1.7x more verbose than Python, not 10x to 20x

I can’t comment on the first conclusion. The second conclusion means, someone would need to compare two framework examples, say the song list in Seam and Django. The third conclusion is very interesting. It would mean that people consider applications written in Python very large although they (relatively) contain a lot less lines of code. Ola considers 50k to 100k very large, with a factor of 2x this would make 100k to 200k of Java lines. I can’t speak for most Java enterprise/startup developers, but as I consider 500k to 1M very large, Ola and I differ by a factor of 5x of what very large is. I only can speculate what’s the reason for this.

  • This is a personal thing, and different developers have hugely different views on “very large” (perhaps depending on what they have seen)
  • Developers only write small applications in Python and consider everything else “very large”
  • Python is not maintanable above 50k to 100k lines of code and because of that people consider this code bases very large
  • Developers have trouble understanding and refactoring bigger code bases than 50k to 100k lines of code (perhaps because it’s a dynamically reference type language)

The first conclusion somehow fits with another quote from Olas post: “And it’s interesting, the number one question everyone from the static “camp” has, the one thing that worries them the most is maintenance.”. They may have seen “very large” applications contrary to the “dynamic camp”.

“Of course, this is totally anecdotal, and maybe these guys are above your average developer.”

I’m glad to provide a step (small one) from the anecdotal to the empiric and from the empiric of this post I don’t think people considering 100k of lines “very large” are “above your average developer”.

Another side note: “But in that case, shouldn’t we hear these rumblings from all those Java developers who switched to Ruby? I haven’t heard anyone say they wish they had static typing in Ruby.” Perhaps because they do green field (not brown field) development? And you need to develop for several years in one application to make it a brown field? And it takes several years to accumulate enough technical dept? Because most of them just started and don’t do “very large” applications?

Other interesting stuff:

  • A paper (PDF) from 2000 about Scripting, C and Java comes to the conclusion: “Designing and writing the program in Perl, Python, Rexx, or Tcl takes no more than half as much time as writing it in C, C++, or Java and the resulting program is only half as long. matching the 1.7x factor of my short example
  • Dhananjay Nene wrote a performance post about Python and Java (and some other languages) and the LOC for Java is 86 and for Python 41, a factor of 2.1x
  • Dave rewrote a Java programm to Python from 4700 lines of code to 700 (factor of 6.7x). This would fit more with Olas impression. Not sure how this fits in, the developer can’t show the source and it was a rewrite by a different developer. Also counting comments and empty lines, the styles between the developers could differ significantly.
  • Daveh did a comparison, with Python having 214 LOC (not NCSS) and Java 282 LOC (not NCSS). A factor of 1.3x

Lots of open questions and I would be very interested in other opinions and other examples – and to explore the topic further.

Thanks for listening to this very long post.

Update: Ryan (see comments) supplied a version of a function in C and Python and after removing the hand memory allocation code and the Python interface code of the C version, the factor is 2.2x (38 to 17 NCSS). Thanks.

Update 2: Looking at Oloh (see comments) the factor of Java and Python is 4x. Very large base of examples. One would need to check the types of programs.

Update 3: An old article I’ve found again “7 reasons I switched back to PHP after 2 years on Rails”. An interesting info: After going to Rails and coming back, with the Rails knowledge the PHP app was reduced in size “- … and much more. In only 12,000 lines of code, including HTML templates. (Down from 90,000, before.)”. Looks like rewrites or prior experience in the domain reduces code size. Could explain Olas experience with Java developers who switched to Ruby. Came to this article again through a comment by Harry Pynn “Point number 7 is that programming languages are like girlfriends: The new on is better because you are better. Could it be that people moving to dynamic languages from static languages find it easier to write maintainable code having honed their skills with a static language?” on Frank Carvers blog.

Comments are closed.