Real JSON vs. XMLish JSON

Recently I came to the conclusion, while playing with data formats that XML and JSON cannot be converted into each other nicely. Both data formats miss something in relation to the other. Good JSON misses root types and types for arrays - which XML both has, while XML misses list types - which JSON has. This leads to XMLish JSON when people transform JSON to XML and vice versa. With the advent of document stores and NoSQL, for you as a developer this means to decide how to store your data. Lets explore this.

Suppose we have a short XML format for storing shopping lists. We have a list with a name, an id and a sub list of items.

From reading this XML it's easy to see that items is a list because it contains several entries of the same type. The XML can be transformed to proper JSON:


I call this proper JSON, compared to XMLish JSON:

We want proper JSON, because working with XMLish JSON looks ugly in code. To access an item id one would need to write

compared to

When seeing the following fragment, it's not easy to decide if items is a list of items or not.

Why do we need to know? Because if you transform this XML to JSON, one needs to decide between representing this as


XML can solve this decision problem with meta descriptions, XSD and DTD. But when transforming XML to JSON in your code, it's a performance problem and a lot of ugly code is needed to evaluate a DTD or XSD description.

JSON to XML conversions

The same problems occurs when transforming JSON to XML. And with the upcoming NoSQL stores that focus on JSON, this will perhaps become a major a problem in the future. Lots of semantic information is lost in the data format and is only present in application code. Take our example:

When we want to transform this to XML, we do not know the name of the root node. Without a root node the XML is not valid. An unsatisfying solution would be to create a generic root like <document>. We also have a problem with the entries of the items array. What names do those nodes have? <item-entry>? How ugly.


I've written about a solution - a third format from which to generate both JSON and XML. Most solutions without a higher level format (this includes XSD and XMLish JSON) are not very satisfying . Badgerfish creates very XMLish JSON documents with @xmlns entries for namespaces and $ entries for text content and loses a lot appeal compared to lean JSON. The one solution I currently use is to store XML when storing data in a key value store, not JSON. A "list:" namespace or type attribute lets us easily transform this XML then to JSON.

A solution for JSON? When you need to store JSON, supplement your data with meta information on types. How are you gonna solve this?

Comments are closed.