Problems with Jersey, REST, JSON and UTF-8 [Update]

UTF-8 is always a problem. Unbelievable. 2008 and we still haven't fixed this. One of my current projects is a Javascript frontend with a REST backend. The backend stores to MySQL (a famous UTF-8 trouble maker) and creates JSON to REST calls. The problems starts with UTF-8 characters. Somewhere in the callchain - as always - characters don't get correctly written. MySQL and the JDBC driver should work, the JSP page is UTF-8 (@page and meta-equiv), jQuery - which does the AJAX - and JS do know UTF-8 and Jersey should be UTF-8 too. But with some experiments now I'm quite sure that Jersey (JSR 311 REST framework) is to blame. I'm not sure how to specify UTF-8, this

  @ProduceMime("text/plain;charset=UTF-8")

doesn't help. Funny, every major project with several frameworks along the call chain and several languages (JS, C, Java) makes UTF-8 problems somehow. I'm so fed up with this, it's 2008.

Update: Jersey uses InputStreams for all encodings, especially StringProvider is relevant to me (se above). Does this work with Unicode?

Experiments for nicely generating JSON

I've been experimenting with ways to nicely generate JSON. There are many ways to generate JSON in Java, like XStream with Jettison, with JAXB or directly with REST API implementation Jersey. Often you don't want to serialize objects or work mith maps though. Taking code from "The best Markup Builder I could build in Java" I've tried a builder approach.

@GET
@ProduceMime("application/json")
public String getList() {
  ShoppingList list = service.getList("123");

  return toJson(
    $("items", 
      new List<ShoppingItem>(list) {
        protected Node item(ShoppingItem shoppingItem) {
          return $("description", shoppingItem.getDescription());
        }
    })
  );
}

The generated JSON would be

{ items: [ { description: "Apple"}, { description: "Orange"} ]}

To create nodes for a JSON tree I first tried a node function. But having lots of node calls makes the code quite unreadable. Luckily Java allows $ as a method name. Using $ makes the code much more readable. The List object creates a list of nodes, taking input from a collection, Iterable or Iterator and calling item() for every element.

To reuse generation code one can create semantic methods like an items method:

  public static Node items(Node... nodes) {
    return $("items", nodes);
  }

The nice thing is, with another render mechanism the tree of nodes can also be rendered to XML with a toXml() method, if XML works better for some REST calls. Next thing to add is support for XStream and Jettison to mix serialization in e.g. $("employee", employeePOJO); and experiment on how to make the code even nicer and shorter.

I also wonder how to remove the toJson() call with Jersey and to use a Jersey writer. Any ideas?

Thanks for listening.

Sowjetunion, Berlin und Verdi

Der 10. Tag im Streik, ich habe den 5. Tag Urlaub, komme aber nirgends hin weil nichts faehrt. Ich habe aber Durchhaltewillen! "Ihr Völker der Welt, schaut auf diese Stadt!" Was der Sowjetunion, die die Berliner niederringen und als Geisel nehmen wollte, nicht gelungen ist mit ihrer Blockade, wird Verdi mit dem Streik nicht schaffen!