the blog for developers

50k lines considered very large?

Ola Bini considers 50K of lines as very large:

“I know several people who are responsible for quite large code bases written in Ruby and Python (very large code bases is 50K-100K lines of code in these languages).”

This explains a lot.

And the blog post made me think. We’ve written >50K code bases in Python in the 90s in a small development shop (<10 developers). I don't consider this “very large”. Large or very large starts for me at sizes when one developer cannot possibly know all the code (independently of the language) or cannot have a good overview.

As I see now that Ola Binis blog scrambled my comment, I repeat it here for reference

“The Maintenance myth”

[snip snip snip]

“Has there been any research done in this area?”

Nice blog post, if you cut out the middle. Interesting calling something a myth and then asking about research in the end.

“(very large code bases is 50K-100K lines of code in these languages).”

100K is very large? I wrote some projects in a two person team and reached 50K of lines. This is rather small. We did 50K Python programs in the 90s in a small development shop (<10 developers). Very large starts (for me) at 1M LOC.

(I don't like LOC as a metric though, FP or "Thought points" are much better because they are more comparable between languages and make more sense: A developer has to think about every "thought point" => more thought points = more complexity & more effort).

@Seo: “Codes written in dynamic languages tend to be shorter than codes written in static languages doing the same thing, and I think code size is the most important factor in maintenance.”

I don’t think Scala is much larger in LOC than Ruby.

And though Lisp & Haskell may have less LOC, they have a lot of Thought points because they have a high density of thought points whereas Java has a very low density with lots of noise in between.

Thanks for listening.

Update: Concerning my comment to Seo

An Ruby example

class Song
  def initialize(name, artist, duration)
    @name     = name
    @artist   = artist
    @duration = duration
  end

  def how_long
      "{@duration} minutes"
  end
end

or with idiomatic Scala

class Song(val name:String, val artist:String, val duration:Int) {
    def howLong = duration + " minutes"
}

or more similar to Ruby:

class Song(aName:String, aArtist:String, aDuration:Int) {
    val name = aName
    val artist = aArtist
    val duration = aDuration

    def howLong = duration + " minutes"
}

Another Update: Marcos suggested

class Song
  def initialize(name, artist, duration)
      @name, @artist, @duration = name, artist, duration
  end
  ...

as more idiomatic Ruby. Thanks.

You can leave a Reply here. Of course, you should follow me on twitter here.

You can share this post!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

About the author: Stephan Schmidt is head of development at brands4friends. He has more than 15 years of internet technology experience and 10 years experience in agile. He was head of development, consultant and CTO and is a speaker, author and blog writer. He specializes in organizing and optimizing software development helping companies by increasing productivity with lean software development and agile methodologies. Want to know more? All views are only his own.
Leave a reply.

Comments

seiju

<1k very small project
1k-10k small project
10k-100k medium project
>100k large project
xk very large project
But this measure will change according to what language we are using, of course.

stephan

@seiju: Yes, as I’ve said, thought points are much better than LOC. From my feeling Python is not more than 5x smaller than Java though [*].

So 50k of Python code are at most corresponding to 250k of Java code, which is not a very large project.

[*] comparing e.g. list comprehensions in Python or closures in Ruby with Google Collections examples. But I would be very interested in a comparison of a LOC factor of Python, Ruby, Scala and Java.

hi stephan,

50K lines in python in the 90s — I’d assume that you didn’t use a lot of libraries ? These days if one writes java code, given the frameworks and libraries, there is little java code that you yourself have to write — atleast, that is the case in 80% of the projects.

So, I’d say that 50K lines of your python code would have a good amount of infrastructure code. So, that would mean NOT MORE than 150K lines of java code today. Again depends on the shop and what frameworks and libraries and maturity of teams/developers used.

I worked on a banking application with about a 250K lines of JAVA CODE with about 70-100 libraries. Although 20% of the application code could have been reduced by refactoring and reworking, I would still call that a LARGE project. Whether it is VERY LARGE, I am not sure BUT I’m almost sure none of us could grasp the entire codebase at any point of time.

I’m interested, as you, to find out other’s opinions.

BR,
~A

stephan

@anjan: “50K lines in python in the 90s — I’d assume that you didn’t use a lot of libraries ?”

There were much less libraries than today.

“Thought points” … I like that.

Do you have a reference how to compute these? ^^

stephan

@Adrian: No, not sure, just something in my head since I’ve worked a lot with code metrics. But hadn’t have time to write a paper.

if (A && B) { ... }

would have 3 TP (use if (1), get A (1) and B (1) right)

persons.filter ( _.age > 10)

would perhaps have 2 TP (use filter and get expression right)

while

for (Person person: persons) {
    if (person.getAge() > 10) {
       filtered.add( person );
    }
}

don’t know, perhaps 1 for the loop, 2 for the if and expression, 1 for the adding, makes 4 TP.

A little bit like McCabe

@Stephan : python 10 years ago : Yes, given that there were far fewer libraries in both python and java, I assumed you wrote a good amount of infrastructure code. Today, if you were to rewrite the same project, I’d think that it would take you atleast 20-30% number of lines of code ?

Thank you,

I think Ola would be more likely to split out his infrastructure code into a separate library than most would be, so each individual codebase might be small but there might be more of them.

If you split projects up according to cliques you’ll probably find it difficult to pass 50kloc.

And more idiomatic Ruby:

class Song
def initialize(name, artist, duration)
@name, @artist, @duration = name, artist, duration
end

def how_long
“{@duration} minutes”
end
end

Kind Regards

stephan

@Marcos: Thanks for the input.

Leave a Reply

What people wrote somewhere else:

Additional comments powered by BackType

Guide to CodeMonkeyism

Over the last 4 years I wrote many articles on this blog. To make it easier for you to find the relevant ones, I've organized them into topics.

Top 10

6 reasons why my VC funded startup did fail

Go Ahead: Next Generation Java Programming Style

Java Interview questions: Write a String Reverser

The dark side of NoSQL

7 Bad Signs not to Work for a Software Company or Startup

Is Java dead?

Scala vs. Clojure

Never, never, never use String in Java

No future for functional programming in 2008 – Scala, F# and Nu

Clojure vs Scala, Part 2

Job Seeker

Another Good (Java) Interview Question

7 Bad Signs not to Work for a Software Company or Startup

Java Interview questions: Write a String Reverser (and use Recursion!)

Java Interview questions: Multiple Inheritance

As a Manager: What I value in developers

Top 10 Tips (+1) to Get a Pay Raise

Java Developer

Is Java Dead?

Go Ahead: Next Generation Java Programming Style

Be careful with magical code

All variables in Java must be final

Never, never, never use String in Java

Bending Java: More readable code with methods that do nothing?

Startup/CTO

Development Dream Teams

6 reasons why my VC funded startup did fail

American vs. European style of Software Development

12 Things to Reduce Your Lead Time and Time to Market

The high cost of overhead when working in parallel

Essential storage tradeoff: Simple Reads vs. Simple Writes

Agilist

What Developers Need to Know About Agile

5 Practices Better to Change in Your Scrum Implementation

Scrum is not about engineering practices

ScrumMaster and ZenMaster: The joke of certification

What is Trans-Scrum?