There is a dark side to most of the current NoSQL databases. People rarely talk about it. They talk about performance, about how easy schemaless databases are to use. About nice APIs. They are mostly developers and not operation and system administrators. No-one asks those. But it’s there where rubber hits the road.
The three problems no-one talks about – almost noone, I had a good talk with the Infinispan lead [1] – are:
- ad hoc data fixing – either no query language available or no skills
- ad hoc reporting – either no query language available or no in-house skills
- data export – sometimes no API way to access all data
In an insightful comment to my blog post “Essential storage tradeoff: Simple Reads vs. Simple Writes”, Eric Z. Beard, VP Engineering at Loop, wrote:
My application relies on hundreds of queries that need to run in real-time against all of that transactional data – no offline cubes or Hadoop clusters. I’m considering a jump to NoSql, but the lack of ad-hoc queries against live data is just a killer. I write probably a dozen ad-hoc queries a week to resolve support issues, and they normally need to run “right now!” I might be analyzing tens of millions of records in several different tables or fixing some field that got corrupted by a bug in the software. How do you do that with a NoSql system?
- Data export: NoSQL data bases are differently affected by those problems. Each of them is unique. With some it’s easy to export all our data, mostly the non distributed ones (CouchDB, MongoDB, Tokyo Tyrant) compared to the more difficult ones (Voldemort, Cassandra). Voldemort looks especially weak here.
- Ad hoc data fixing: With the non-distributed NoSQL stores, which do posess a query and manipulation language, ad hoc fixing is easier, while it is harder with distributed ones (Voldemort, Cassandra).
- Ad hoc reporting: The same with ad hoc reporting. The better the query capabilities (CouchDB, MongoDB) the easier ad hoc reporting becomes. For some of those reporting woes Hadoop is a solution. But as the Scala Swarm author Ian Clarke notes, not every problem is applicable to map/reduce. Either way you need to train customers and their expectations as they have become addicted to ad hoc reporting. This is not only a technical question, but a cultural one.
One solution is to split data that needs to be queried or reported (User, Login, Order, Money) and data which needs best performance (app data, social network data). Use a tradition SQL database for the first kind of data, and a fast, distributed NoSQL store for the second kind of data. Joining will be difficult, you need to support more different systems and skills are an issue. But the three problems can be solved this way.
What is your NoSQL strategy? Please leave a comment, I would like to know.
[1] they plan a distributed query language for ad hoc reporting in distributed environments
Recently I came to the conclusion, while playing with data formats that XML and JSON cannot be converted into each other nicely. Both data formats miss something in relation to the other. Good JSON misses root types and types for arrays – which XML both has, while XML misses list types – which JSON has. This leads to XMLish JSON when people transform JSON to XML and vice versa. With the advent of document stores and NoSQL, for you as a developer this means to decide how to store your data. Lets explore this.
Suppose we have a short XML format for storing shopping lists. We have a list with a name, an id and a sub list of items.
<shoppinglist>
<id>123</id>
<name>Stephans List</name>
<items>
<item>
<id>234</id><description>Apple</description>
</item>
<item>
<id>233</id><description>Banana</description>
</item>
</items>
</shoppinglist>
From reading this XML it’s easy to see that items is a list because it contains several entries of the same type. The XML can be transformed to proper JSON:
{
id: "123",
name: "Stephans List",
items: [
{ id: 234, description: "Apple"},
{ id: 233, description: "Banana"}
]
}
XMLish JSON
I call this proper JSON, compared to XMLish JSON:
{
shoppinglist: {
id: "123",
name: "Stephans List",
items: [
{ item: { id: 234, description: "Apple"} },
{ item: { id: 233, description: "Banana"} }
]
}
}
We want proper JSON, because working with XMLish JSON looks ugly in code. To access an item id one would need to write
id = list.shoppinglist.items[0].item.id
compared to
id = list.items[0].id
When seeing the following fragment, it’s not easy to decide if items is a list of items or not.
<items>
<item>
<id>234</id><description>Apple</description>
</item>
</items>
Why do we need to know? Because if you transform this XML to JSON, one needs to decide between representing this as
{ items: { item: { id: 234, description: "Apple"} } }
or
{ items: [{ id: 234, description: "Apple"}] }
XML can solve this decision problem with meta descriptions, XSD and DTD. But when transforming XML to JSON in your code, it’s a performance problem and a lot of ugly code is needed to evaluate a DTD or XSD description.
JSON to XML conversions
The same problems occurs when transforming JSON to XML. And with the upcoming NoSQL stores that focus on JSON, this will perhaps become a major a problem in the future. Lots of semantic information is lost in the data format and is only present in application code. Take our example:
{
id: "123",
name: "Stephans List",
items: [
{ id: 234, description: "Apple"},
{ id: 233, description: "Banana"}
]
}
When we want to transform this to XML, we do not know the name of the root node. Without a root node the XML is not valid. An unsatisfying solution would be to create a generic root like <document>. We also have a problem with the entries of the items array. What names do those nodes have? <item-entry>? How ugly.
Solutions
I’ve written about a solution – a third format from which to generate both JSON and XML. Most solutions without a higher level format (this includes XSD and XMLish JSON) are not very satisfying . Badgerfish creates very XMLish JSON documents with @xmlns entries for namespaces and $ entries for text content and loses a lot appeal compared to lean JSON. The one solution I currently use is to store XML when storing data in a key value store, not JSON. A “list:” namespace or type attribute lets us easily transform this XML then to JSON.
<items type="list">
<list:items>
A solution for JSON? When you need to store JSON, supplement your data with meta information on types. How are you gonna solve this?
Mike Brunt quotes an email on his blog. The mail exposes a very negative view of agile. Although the email is not written by Mike, he is “posting this here because it states my reservations very well”. So address him directly because my experience with Agile over the last years is contrary to his experience.
1. Agile programming emphasizes programming over engineering. This results in software that does not have clean interfaces and is intertwined with other code. Of course, such code is difficult to maintain, debug, and replace. Expensive code bloat is the consequence.
The base fallacy here is that agile equals chaos. When looking at the clean code movement, which has grown out of agile, there is a lot of emphasis on good code. With heavy unit testing and refactoring, especially when you do TDD, code has clean interfaces, is easy to debug, maintain and replace. The code is light and agile
Many of these [bugs] are edge cases and not detectable by testing.
I think edge cases are especially found by testing. Agile emphasizes tight integration with testers and quality assurance. From my experience with agile this leads to more test-aware thinking with developers, leading to less bugs. The usage of static analysis tools like FindBugs or PMD is due to continuous integration and automation more accepted in agile circles which also results in higher quality.
An agile team, like a Scrum development team, has more control and more responsibility. They consider the code “their” code, which is often not the case with non-agile teams who are pushed to deadlines and low qualitiy.
Everywhere where I introduced Scrum or Agile code quality went up, architecture quality went up and bugs went down.
4. Agile programming deemphasizes designing performance into products.
Many agile teams have due to continuous integration and tight integration with QA and operations a clear grasp on performance. They include rigorous performance test into their builds and deployment strategies. Especially with continuous deployment designing and measuring performance is a must.
5. Agile programming never views a program or project as complete. There’s always room to tinker and add new levels of abstraction and modify the mechanics of a program. Expenses around programming become a sucking black hole.
To the contrary: Scrum emphasizes business value delivered, as does XP. No more tinkering, no exploding programming expenses. Some agile shops can clearly state to cost for each story point and calculate detailed ROI.
6. Agile programming is a model that rewards software churn. It’s a great model for building fiefdoms in a corporation and employ busy programmers; it’s terrible for corporations who want to produce maintainable stable quality products that will not incure high overheads.
Not sure if this is ironic and the post a satire. Nothing could be more far away from truth. Agile is about communication and cooperation. Scrum of Scrums coordinate teams to solve problems together for the future.
7. Agile programming deemphasizes quality. Deploying software that works after a fashion “rather than waiting for perfection” introduces a dangerous slippery slope. I doubt that many managers can define “acceptable imperfection.” Quality should be job one. Apple demonstrates that customers will pay a premium for well designed and implemented software.
As stated above, agile emphasizes quality on every level. Only agile teams have consistently a level of done agreement with jobs are only done when they are done done. This often includes acceptance tests, documentation, clean code, unit tests, release notes, refactoring, code reviews (reading from the Level of Done Agreement at the wall here).
8. Agile programming over emphasizes schedules. Production schedules and engineering requirements should be balanced by management.
As it’s true that iteration cycles often coincide with releases to shorten time to market, they are independent. In agile product managers can release each minute, each day, every two weeks or every six months, just as you see fit.
9. When there are many projects to add assorted features to a product, code become difficult to manage. Code merges and inconsistencies become difficult to manage so all the pieces play together. Merging code down can take several days given high rates of code churn. Costs associated with code management are not linear as the number of projects increase. I suspect that the cost function is exponential.
Merging sometimes get difficult. With agile or without agile. Many agile shops migrate to distributed version control systems like Git or Mercurial that aid tremendously with merging. Personally working in enviroment from 5 to 70 developers, I haven’t seen merges that “take several days”. Distributed version control helps with keeping costs down.
10. Agile programming uses customers as the test bed. Customers don’t appreciate being treated as guinea pigs.
Agile programming low bug-count, production ready code to customers. Due to late binding of requirements, active communication with customers and reviews customers get what they need and want. Not something that they assumed they needed some months ago. Due to “done done” customers are not used as “guinea pigs” contrary to traditional development with “alpha”, “beta” and “RC” releases.
11. The agile programming model creates an unstable expensive house of cards. The house of cards will eventually collapse despite efforts to keep it standing.
Hu?
I’ve been dabbling with agile since the beginning of XP and Kents first book. Later I introduced Agile in some companies and processes and became a Scrum Master. Nowhere where I’ve been I’ve seen those issues described. To the contrary, those issues were often rampant before the introduction of agile. This leads me to the conclusion that Mike never experienced agile, either by not doing agile or by falling into the traps of “let’s do agile” management, ScrumBut or Snake Oil selling consultants.
My experience is very positive, as is the experience of all developers I’ve asked and worked with. But perhaps its best to draw your own conclusion by listening to both sides.
Your application is going to be an enterprise application soon. Prepare for it. There is a certain disdain for enterprise applications in the new world of dynamic languages and frameworks like Ruby/Rails or Python/Django. Mostly this is associated with the world of Java and C#. Developers think they are immune from enterprise woes. Think again.
Enterprise applications are not defined – contrary to public opinion – by applications for the enterprise. What developers associate with enterprise applications like untested code, tangled code, old frameworks and slow development are not something that happens in the enterprise. If you do not act, it happens to every application.
I’ve seen many applications becoming enterprise applications over time. Top driving forces for applications becoming “enterprise” are:
- A startup with few people becomes a company with many people – Craigslist being the exception
- A growing company with investors gains more goals, which results in more feature wishes. Often this means more developers. If recruitment isn’t tough enough this leads to averaging the developer force. This in turn reduces code quality, leads to lower testing, higher coupling and worse documentation.
- More employees lead to more wishes and more features. These features need more real estate on the website. Marketing wants banners, sales wants contact forms and customer support wants info boxes. Pages get cluttered, simple forms get complex.
- Marketing usually wants to store everything about your customers (and others want too). This means more fields, more complex forms and more dependencies to third party services
- Integration with other services is the most common enterprise “problem”: Integration with mail services, backends, web tracking companies, financial systems, data warehouses or payment providers tangle your application. Deployment gets harder as does testing. Everything takes longer.
- The first few years a startup has very low turnover. But from my experience as a team manager, retention in our industry is not measured in decades. So as a startup ages, turnover increases: After some years the initial developers are no longer there, and others have not quite the grab of the system. Development, code quality and architecture deteriorates.
- Founders leave: After some years, often founders leave or are ousted by their VCs. Technic savvy founding types are replaced by executives. Technology looks less important, quality goes down.
Law of software development: Greenfield becomes brownfield.
If you do not want this to happen, you need to fight it every step:
- Have migration paths, upgrade paths and life cycle management for frameworks
- Clean code
- Architecture guidelines and architecture strategy
- Someone who fights for clean web pages and forms
- Strategy for integrating with many, many systems
What do you think? How do you prevent applications from becoming “enterprise”?
Is Java finally dead? There has been much discussion about the end of Java. As a developer, do you need to care? How do you need to change your decisions in the case that Java is dead? I have pounding this question for the last several years, beginning with my adventures into Ruby at the end of the 90s. I hope to give a thorough representation of my thoughts here.
In a very interesting thread on LtU Sean McDirmid wrote:
The Java death watch continues. Its future is tied up with Sun, which continues not to make money, and in this economy… JavaFX was late and didn’t make the splash it needed to make. Can Scala (or Clojure I guess) save the JVM? And who would take over the Java mantle if Sun imploded, IBM?
while Ross Smith is adding:
I think Sun will be widely recognised as doomed, if it isn’t already dead, by the end of 2009, and they’ll take Java (and the JVM) down with them.
A lot has happened since then: Orcale is buying Sun, the JRuby team has jumped ship to Engine Yard. 2009 has not yet ended, we will see if that prediction holds true.
Searching for java is dead with Google, one gets
Results 1 – 10 of about 8,620,000 for java is dead.
Dead indeed. Or at least lots of people think it is or will die 2009.
Update: Because some readers mistook googling for scientific research, as pointed out googling for “java is dead” results in ” Results 1 – 10 of about 170,000 for “java is dead”
For a start we first need to explore what “dead” means, and in particular what dead means for Java. What dead means to you as a developer. After that I will look into the potential successors and why they are better than Java – or not. Looking into the question “why should Java die” I want to make some prediction about Javas future and especially about some future Java programming style.
What does dead mean?
Let’s begin with some thoughts on what Java means. . Most people mean different things when they talk about Java. There are mainly three parts:
- Java, the language
- Java, the libraries (JDK)
- Java, the virtual machine
So does “Java is dead” mean the language, the libraries or the virtual machine? Contrary to the commentors on Lambda the Ultimate the Java VM is safe. There are considerable efforts to open source the virtual machine beside the language. Indeed with the beginning of the Java language summit it looks stronger than before. Should Sun as a company die, Oracle drop Java or stop development on the VM, most probably some other players with their VM implementations or the OpenJDK community would jump in.
The VM as a platform has grown enormously in 2008 and 2009. Lots of people talk about JRuby as Rails for the enterprise – Engine Yards has pledged support. Scala is an object-functional language on the VM with a strong following and a lot of momentum in 2009. Both Scala and JRuby are established on the JVM, but there are newcomers. Clojure stirred up the Lisp community and the Java community in 2008 and made a lot of buzz in 2009. Just in time for christmas last year Ola Bini released Ioke, what he described as a mixture of Smalltalk, Ruby and Lisp. It looks like a more dynamic Ruby to me after some hours of playing around with it. Great feat. And recently Noop has been released.
Or does “Java is dead” mean the language? What does it mean to be dead for a programming language? Perhaps that it is no longer the default choice for projects? Default choice for what projects? It is obviously not any longer the default choice for web-sites and web startups. For some years this has been Rails, and with good reasons. Though I think a Wicket/WebBeans/Seam/JPA stack is as fast for development as Rails, Rails is a good choice for rapid development for a VC demo. The problems are down the road some years – or so I hear form some CTOs – and Grails might be a safer choice with the easy possibility to go to Java for your stable layer later.
The only large – and lets say profitable and growing – startup that uses Java is LinkedIn. Although some internal systems at FaceBook (Cassandra) and other sites run Java in their core, Twitter runs parts of it’s services in Scala. They show that it can be done. Contrary the German LinkedIn competitor XING is written in Rails.
With Rails and Python moving into the Enterprise, is Java no longer the default choice for new projects there? Not that I know of. Perhaps some grass root projects go with Rails, the default choice still is Java, C or C# depending on your enviroment. Beside some funny view on enterprise applications:
It’s been 10 years and there are still no compelling client side or desktop apps in Java and all the compelling server apps (sorry enterprise apps don’t count as compelling!) are done in PHP, Python, Ruby, Perl, Smalltalk et al.
I can’t see companies move their programmers and default choice to Ruby, Erlang, Python, Lisp or OCaml. As this would mean polyglott programming and as Alex Ruiz writes:
I haven’t seen any practical evidence yet to convince me this is a good idea.
For you as a developer, does dead mean you can’t get any more jobs in Java development? Looking at Dice
shows that Java jobs are still high, with a significant increase in 2008. The German news website Heise News showed the same for project work, a more than 89% increase since the beginning of 2007:
203 in Q1 2007
384 in Q3 2008
Is Java dead because other languages are better?
With a different angle we can discuss the death of Java in the view of it’s potential successors. As a matter of fact a language cannot die without successors, otherwise noone could develop any software. People suggest a lot of successors, some of them are:
- Ruby
- Python
- Groovy/Grails
- Scala
- Fan
- Erlang
- OCaml
- Ioke
- Factor
Not all of them – although excellent and interesting languages – share the same goals as Java and fit into the same place. Ruby and Python seem currently not enterprise ready, mostly because of tools, skills and deployments. This might change in the future, we’re not there yet. OCaml and Factor are interesting and capable, but too far away from the procedural mainstream that is the C legacy. Most prospects have the JVM languages, Fan, Groovy, Scala, Ioke. Fan doesn’t seem to have succession ambitions, Ioke is specially designed as a testing ground for ideas. Scala and Groovy seem to battle it out as successors. Scala has momentum and hype, companies use it in enterprise environments – and it’s also my current favorite. Groovy looks stronger, it made some inroads silently in the enterprise and with Spring having bought the Grails committers – and now VMWare having bought Spring – it’s better positioned than before.
Those successors need to be better than Java, otherwise it would me a folly to replace Java with high costs and gain nothing. What does better mean?
- Faster to write?
- More cost efficient?
- Higher maintainability, cleaner code?
- Shorter code?
Most of them are faster to write because they have shorter code. As I’ve shown, Java is 1.7x – 4x bigger than Python in lines of code, but does that mean Java is dead?
Most comparisions take 5 to 10-year-old brownfield, legacy Java projects with hundreds of developers – many of them average – and compare them with 2-year-old Rails projects, where the initial developers – most of them excellent – are still on board. For a real comparison one would need to compare state of the art frameworks, Webbeans/Wicket, Stripes/JPA with rapid development frameworks like Rails and Django. I’ll spare this comparison perhaps for another post in the future, but would be happy if someone does a decent comparison. I consider this question open.
A Java successor needs to go through the enterprise. There is the main beef, the most money and the most developers. To die a language needs to die there. Enterprise software is used for longer periods of time, with many developers working on it. The longer time periods mean higher turn-over during the life-time. Were are the problems in the enterprise and how could successors solve them in better ways?
Enterprise pain points
- Maintenance
- Readability
- Reuse
Do the potential successors solve those pain points better than Java? Partially. Some of them have richer reuse models, some of them have better readability and are less noisy. But I also consider this question open – from my experience with many languages those problems aren’t solved. Perhaps because many language designers today disdain the enterprise. Scala is a sweet spot for me, it gains on those issues but doesn’t introduce new problems.
Goals of Java and does Java no longer meet its goals?
What have been the goals of Java? Those are linked intrinsically to the success, so we need to take a look at them, and if those goals no longer represent what people need. Those goals are many, but the main ones seem to be:
- Solve problems of C++
- Internet capable
- Standard library – JDK
- Automatic Memory Management – GC
- No error prone pointer operations
- Enterprise-Ready – a goal that evolved after some time (easy to use, low entry, big departments)
- Easy concurrency in the language
I can’t see that those goals are no longer valid, or Java does no longer fulfill them. The goals are valid, and fulfilled. Only concurrency is the item which is highly discussed and can be an issue. Concurrency is the future. Concurrency can be an issue as the early lock and synchronize system of Java proved to be too difficult. The new trend is multi-core. Is Java unfit for massive multi-core machines? Java in later editions added easier concurrency with concurrent lists, queues and fork and join and is fit for multi-core machines. No worries at least for me that Java misses this trend.
One requirement to a language wasn’t seen as important in 1995 as it is today: Rapid development and rapid turnaround. Java still falls flat, even with JRebel which allows seamless reloading of classes, RAD web frameworks like Wicket and splendid IDEs. Rails, Django and PHP are better and have a faster turn around. Period. Java is lacking here, and reloading changes look to be the biggest problem with Java development today. Maven deployments are a pain after you’ve worked with Rails or PHP..
Faster turnaround has higher productivity. Which means more money. If Java doesn’t solve this – Java might be on the way to extinction.
Why should it die, what should we learn?
Java might die, because the drawbacks beside turnaround have gotten too big. Lots of concepts have been proven to be bad ideas.
- Inheritance: outside of frameworks, inheritance is inflexible, leads to tight coupling and is plain bad. Composition is most often a better choice. As are mixins and traits
- Noisy syntax: Lately there has been the enlightenment that too much noise in a language is a bad thing. Java is especially noisy in closures (anonymous inner classes) and generics.
- Null / NPE: Null as the default for object references was a billion dollar mistake. An object should by default need a value. Otherwise NPEs will proliferate through your code. Newer languages prevent nulls or make the null behavior the non default one
- Design patterns: Many design patterns are a good thing, but some of them are just covering inefficiencies in an only-OO language like Java
- List processing: As shown by functional languages, list processing should not be done in loops. Many operations in applications are just that: get a list, transform the list, filter the list and return a list. Javas new for loop is better than the old – but solves the wrong problem. Java should have native support for easy list processing, not via the – best we have – constructs in Google Collections.
Those are valid concerns and one wishes Java would die for those. But the Java community is working on fixes – although as can be seen with the discussion on Closures in Java 7 sometimes too slow. I consider those problems painful, but not big enough that they will lead to Javas imminent death. They could lead to a death by thousand cuts.
Java Future and what does this mean for you
From what I’ve written I come to the conclusion that Java is not dead. It’s not fundamentally flawed, it still meets it’s goal, there is interest in Java, no really clear successor has emerged, the platform evolves, the JVM shines, new languages flourish and new projects are started in Java.
But just because Java is not dead doesn’t mean it has a future. Developers need to open their eyes and learn new languages. I’m really disappointed in interviews when candidates show no interest in programming beside Java. So for you as a developer: no worries. As a student: You still need to learn Java to have a high probability to get a job – with the conditions you like. For you as a manager or CTO: have a plan ready for when the Java era ends.
It’s too early for a requiem. But if Java dies, what can we learn? Before and foremost one needs to learn from Javas success and eventual decline. The points I’ve written about, wrong concepts, enterprise pain points and what Java did right need to be remembered.
Is Java dead because no-one talks about it anymore?
Thanks for bearing with me through this long post, we now come to an end. Jitter about Java has significantly gone down. My blogging on Java has gone done. My twittering on Java has gone down. Some years ago everyone was talking about Java, now it’s mainly enthusiasts. Java is a none-hype. It’s not as bad as COBOL, but a lot like C and C++. Is a language dead when none talks about it anymore? You decide. In the end the only question that really matters: Is Java dead for me? Would I start a project in Java? I would have in 2008. Would I in 2009? Probably not, I’d use Scala.