by Stephan Schmidt

Real JSON vs. XMLish JSON

Recently I came to the conclusion, while playing with data formats that XML and JSON cannot be converted into each other nicely. Both data formats miss something in relation to the other. Good JSON misses root types and types for arrays – which XML both has, while XML misses list types – which JSON has. This leads to XMLish JSON when people transform JSON to XML and vice versa. With the advent of document stores and NoSQL, for you as a developer this means to decide how to store your data. Lets explore this.

Suppose we have a short XML format for storing shopping lists. We have a list with a name, an id and a sub list of items.

<shoppinglist>
  <id>123</id>
  <name>Stephans List</name>
  <items>
    <item>
      <id>234</id><description>Apple</description>
    </item>
    <item>
      <id>233</id><description>Banana</description>
    </item>
  </items>
</shoppinglist>

From reading this XML it’s easy to see that items is a list because it contains several entries of the same type. The XML can be transformed to proper JSON:

{
	id: "123",
	name: "Stephans List",
	items: [
		{ id: 234, description: "Apple"},
		{ id: 233, description: "Banana"}
	]
}

XMLish JSON

I call this proper JSON, compared to XMLish JSON:

{
  shoppinglist: {
	id: "123",
	name: "Stephans List",
	items: [
		{ item: { id: 234, description: "Apple"} },
		{ item: { id: 233, description: "Banana"} }
	]
  }
}

We want proper JSON, because working with XMLish JSON looks ugly in code. To access an item id one would need to write

 id = list.shoppinglist.items[0].item.id

compared to

id = list.items[0].id

When seeing the following fragment, it’s not easy to decide if items is a list of items or not.

<items>
  <item>
    <id>234</id><description>Apple</description>
  </item>
</items>

Why do we need to know? Because if you transform this XML to JSON, one needs to decide between representing this as

{ items: { item: { id: 234, description: "Apple"} } }

or

{ items: [{ id: 234, description: "Apple"}] }

XML can solve this decision problem with meta descriptions, XSD and DTD. But when transforming XML to JSON in your code, it’s a performance problem and a lot of ugly code is needed to evaluate a DTD or XSD description.

JSON to XML conversions

The same problems occurs when transforming JSON to XML. And with the upcoming NoSQL stores that focus on JSON, this will perhaps become a major a problem in the future. Lots of semantic information is lost in the data format and is only present in application code. Take our example:

{
	id: "123",
	name: "Stephans List",
	items: [
		{ id: 234, description: "Apple"},
		{ id: 233, description: "Banana"}
	]
}

When we want to transform this to XML, we do not know the name of the root node. Without a root node the XML is not valid. An unsatisfying solution would be to create a generic root like <document>. We also have a problem with the entries of the items array. What names do those nodes have? <item-entry>? How ugly.

Solutions

I’ve written about a solution – a third format from which to generate both JSON and XML. Most solutions without a higher level format (this includes XSD and XMLish JSON) are not very satisfying . Badgerfish creates very XMLish JSON documents with @xmlns entries for namespaces and $ entries for text content and loses a lot appeal compared to lean JSON. The one solution I currently use is to store XML when storing data in a key value store, not JSON. A “list:” namespace or type attribute lets us easily transform this XML then to JSON.

  <items type="list">
  <list:items>

A solution for JSON? When you need to store JSON, supplement your data with meta information on types. How are you gonna solve this?

You can leave a Reply here. Of course, you should follow me on twitter here.

You can share this post!
Do you want to tell others about this article? Use the social bookmark icons to submit this artice to the service of your choice. Thanks.

About the author: Stephan Schmidt is head of development at brands4friends. He has more than 15 years of internet technology experience and 10 years experience in agile. He was head of development, consultant and CTO and is a speaker, author and blog writer. He specializes in organizing and optimizing software development helping companies by increasing productivity with lean software development and agile methodologies. Want to know more? All views are only his own.

11 Tweets

Leave a reply.

Comments

See what you think of the JAXB support that is baked into Jackson — http://docs.codehaus.org/display/JACKSON/Jackson+JAXB+Support — among other things, this gets you data format polymorphism over a single Java object model. That said, you’ll get prettier JSON (wihtout things like @nil silliness) from a bean mapping instead, and this also overlays cleanly with a JAXB-annotated model, just not with the same annotations.

Al

I’m using namespace to qualify datatype and keep “small” XML structure.
See it here http://code.google.com/p/ndjin/wiki/WebServiceProtocol

For example:

John
23
false

@Al: Nice approach, I’ll take a look

Stephan, I agree in that trying to directly convert JSON to XML or vice versa is a losing battle. In fact, so much so that this is a clear data format anti-pattern.

Instead I think it makes sense to bind XML and JSON separately to/from plain old objects; so conversion between formats is a two-step process. But both can use “natural” representation. For Java this can be done by using JAXB for XML, and Jackson for JSON; and I assume similar pair exist for other platforms.

In most common cases, however, conversions are not even needed: input comes as xml or json, gets bound to objects; request is processed, response object(s) constructed, and converted to xml or json (not necessarily same format as input was in). Or similarly with other formats (protobuf, thrift, yaml, what have you). That seems like the obvious sensible approach — not trying to shoehorn all input/output through a single transfer data format.

People are hung about about interchangeability. JSON and XML and not equivalently expressive (or we would all use JSON!)

Leave a Reply

What people wrote somewhere else:

New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

RT @codemonkeyism New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

RT @codemonkeyism: New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

RT @codemonkeyism: New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

RT @jboner: RT @codemonkeyism: New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

RT @codemonkeyism New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

reading http://bit.ly/yA2XL

This comment was originally posted on Twitter

reading:Real JSON vs. XMLish JSON http://bit.ly/yA2XL

This comment was originally posted on Twitter

RT @jboner, @codemonkeyism: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL

This comment was originally posted on Twitter

New blog post: “Real JSON vs. XMLish JSON” http://retwt.me/mDTG #xml #json #REST #NoSQL (via @codemonkeyism)

This comment was originally posted on Twitter

RT @codemonkeyism Code Monkeyism: Real JSON vs. XMLish JSON http://bit.ly/RpZYb

This comment was originally posted on Twitter

Additional comments powered by BackType