Deprecating XML

Volume 13, Issue 51; 17 Nov 2010

The X in AJAX not withstanding, XML is not the darling of web API designers.

Someone asked me recently what I thought about XML being removed from the Twitter streaming API. Around the same time, I heard that Foursquare are also moving to a JSON-only API.

As an unrepentant XML fan, here's the full extent of my reaction:


If all you want to pass around are atomic values or lists or hashes of atomic values, JSON has many of the advantages of XML: it's straightforwardly usable over the Internet, supports a wide variety of applications, it's easy to write programs to process JSON, it has few optional features, it's human-legible and reasonably clear, its design is formal and concise, JSON documents are easy to create, and it uses Unicode.

If you're writing JavaScript in a web browser, JSON is a natural fit. The XML APIs in the browser are comparitively clumsy and the natural mapping from JavaScript objects to JSON eliminates the serialization issues that arise if you're careless with XML.

One line of argument for JSON over XML is simplicity. If you mean it's simpler to have a single data interchange format instead of two, that's incontrovertibly the case. If you mean JSON is intrinsically simpler than XML, well, I'm not sure that's so obvious. For bundles of atomic values, it's a little simpler. And the JavaScript APIs are definitely simpler. But I've seen attempts to represent mixed content in JSON and simple they aren't.

In short, if all you need are bundles of atomic values and especially if you expect to exchange data with JavaScript, JSON is the obvious choice. I don't lose any sleep over that.

XML wasn't designed to solve the problem of transmitting structured bundles of atomic values. XML was designed to solve the problem of unstructured data. In a word or two: mixed content.

XML deals remarkably well with the full richness of unstructured data. I'm not worried about the future of XML at all even if its death is gleefully celebrated by a cadre of web API designers.

And I can't resist tucking an “I told you so” token away in my desk. I look forward to seeing what the JSON folks do when they are asked to develop richer APIs. When they want to exchange less well strucured data, will they shoehorn it into JSON? I see occasional mentions of a schema language for JSON, will other languages follow?

I predict there will come a day when someone wants to federate JSON data across several application domains. I wonder, when they discover that the key “width” means different things to different constituencies, will they invent namespaces too?

In the meantime, I'll continue to model the full and rich complexity of data that crosses my path with XML, and bring a broad arsenal of powerful tools to bear when I need to process it, easily and efficiently extracting value from all of its richness. I'll send JSON to the browser when it's convenient and I'll map the the output of JSON web APIs into XML when it's convenient.

JSON vs. XML? Meh.


I also don't understand why more and more REST APIs are providing support just for JSON payload. There are much more tools that can do directly something with XML then with JSON.

It would be nice to define some uniform mapping from JSON->XML and update XML parsers to automatically convert JSON structure into infoset on-the-fly during parsing.

For XML->JSON such mapping would be of course much more difficult.

—Posted by Jirka Kosek on 17 Nov 2010 @ 08:13 UTC #

I absolutely agree. XML is a markup technology. JSON is a structured data exchange format. Those are two different domains, and for a good reason: if you try to use one on the other, it will cause pain and suffering. Interestingly, XML is much easier to adapt to structured data exchange than JSON to markup/mixed content. Still, structured data is more painful in XML than it should be.

—Posted by Martin Probst on 17 Nov 2010 @ 08:41 UTC #

One of the compelling reasons to use JSON instead of XML in current web applications are the imposed security restrictions in modern browsers; JSON can actually be retrieved from remote websites without too much trouble (using jsonp) while XML requires one to jump through a number of loops (such as a local proxy). Go figure!

—Posted by Jakob Fix on 17 Nov 2010 @ 10:40 UTC #

I also believe that the general all around awesomeness of XPath is under-appreciated by the JSON partisans.

—Posted by stand on 17 Nov 2010 @ 11:19 UTC #

See also: very good additional commentary by Noah Mendelsohn.

—Posted by Norman Walsh on 17 Nov 2010 @ 11:32 UTC #

JSON vs. XML. Meh: agreed. I love JSON for the ease of reading and handling simple data.

—Posted by I Love JSON on 18 Nov 2010 @ 12:57 UTC #

You may want to check out JSON for Linked Data. It solves the problem of 'namespaces in JSON': the JSON-LD website

—Posted by Manu Sporny on 18 Nov 2010 @ 01:04 UTC #

Totally agree. There isn't one ring to rule them all. People just need to know when its appropriate to use what. I love xml for a lot of stuff, but client-side rendering, I'm json all the way. Just gotta know when.

—Posted by Troy on 18 Nov 2010 @ 02:06 UTC #

Hmm. You can have semi-structured data in JSON. Show us a code sample where you can do something in XML and not JSON. IRT Noah's commentary, I don't see how you "can't" use it to carry resume data.

—Posted by Jader on 18 Nov 2010 @ 03:25 UTC #

Jakob, so you want to sell JSON (jsonp aka json-in-script) based on its insecurity?

Thanks, but I want to load external data into my app, not execute foreign untrusted code with credentials of my app.

—Posted by Jirka Kosek on 18 Nov 2010 @ 08:39 UTC #

Regarding mixed content, one can always embed (X)HTML in JSON. I don't think I've ever needed mixed content that were not meant for display.

—Posted by Joe Hildebrand on 18 Nov 2010 @ 12:55 UTC #

Totally agree that XML and JSON each have their place, and neither is the killer of the the other.

I think you also hit on something that I've constantly run into over the years: despite JSON being easier to work with in JavaScript, JSON doesn't do one thing that XML does great -> mixed content.

I created JsonML specifically to be able to represent mixed content in a native JSON structure, despite what others have characterized my motivations to be. Unfortunately as you allude to, this doesn't make it simple! I've always considered it a reversible encoding rather than a format intended for humans. So for my purposes it works great, but XML-replacement it is not and never was intended to be.

I think services like Twitter are choosing JSON-only not because they think XML is going away but because it takes time and money to maintain the quality of both. They are simply putting their efforts where the most clients are going.

—Posted by Stephen McKamey on 18 Nov 2010 @ 03:06 UTC #

Here is a like-minded article by Daniel Lemire: You probably misunderstand XML.

—Posted by John on 18 Nov 2010 @ 04:30 UTC #

I should also point out that this is an old debate (no surprises there, I guess). David Megginson had a pretty good article back in 2007 comparing the verbosity of the formats that I think addresses Jader's challenge. In short, the complexity of various formats converges as you add things like mixed content and namespacing.

I had never seen JSON-LD before but it strikes me at first glance as being a lot like xml namespaces (with all good and bad connotations implied).

—Posted by stand on 18 Nov 2010 @ 06:47 UTC #

Once they make an XPath equivalent for JSON, then we can talk about deprecating XML :) Until then, XML is king.

(I imagine that the problem with XPath, however, is that not that many people know about it. I would also wager that a lot of those who don't know about it are the kind of people who find XML excessively bloated and complex, thus seeing JSON as not just an acceptable substitute, but in fact an improvement.)

—Posted by Gareth Potter on 18 Nov 2010 @ 08:45 UTC #

Without any bragging going on here, I have been actively developing apps since the mainframe days (yes apps on mainframes! sort of..!), and the format wars of JSON vs XML is in some ways like VHS vs BETAmax, not exactly for sure, but close. The points in this blog are very good ones. I love JSON for its Javascript simplicity, but XML is definitely a standard across mature infrastructures as well as a lot of web services that I see. In normal states, it is a case of using the correct tool for the job at hand - as always.

—Posted by David Sheardown on 18 Nov 2010 @ 09:49 UTC #

JSON --> Data XML --> MetaData

—Posted by Ace on 19 Nov 2010 @ 08:57 UTC #

but Google in new API has given space to XML in the form of Atom. Yes another one is JSON. So JSON is reaching to height.

—Posted by Satya Prakash on 19 Nov 2010 @ 03:42 UTC #

Something worth reading for those of you coming from a more data-centric angle: "Mixed Content Myopia"

Quote: "...I have been involved in debates about XML processing techniques that seemed to be going around in circles. More often than not, the disagreement stemmed from a different conceptual model of XML processing and, more often than not, that difference revolved around the important concept of mixed content in XML. If one party to a debate sees it in their mind-map of XML and the other does not, communication problems are likely to ensue..."

—Posted by Derek Read on 20 Nov 2010 @ 12:24 UTC #

Jader wrote: IRT Noah's commentary, I don't see how you "can't" use it [I.e. JSON] to carry resume data.

Yes, of course you can. What JSON doesn't give you is a standard way to have markup inside of text, something that would could be valuable in a resume for several reasons: first of all, such markup is often used for highlighting, emphasis and format control. Sometimes, it's used to markup semantics inline with the text.

Can you do this in ad-hoc ways in JSON? Sure, but none of the supporting libraries, databases, or interfaces you use will understand that it's happening. With XML, such marked up text is a first class construct. The libraries, languages and binding tools that handle XML properly provide standardized facilities for querying, creating, and manipulating such markup. With JSON, you just can't have a property inside a run of text; it's not made for that.

—Posted by Noah Mendelsohn on 22 Nov 2010 @ 03:19 UTC #

I encourage you to read James Clark's perspective as well.

With respect to my statement that "XML wasn't designed to solve the problem of transmitting structured bundles of atomic values", James is right. The situation is more nuanced than that.

—Posted by Norman Walsh on 24 Nov 2010 @ 12:08 UTC #

Hey. I don't understand what property of namespaces is not provided by ordinary JSON objects. If you want separate namespaces, then put things into separate objects. Have I misunderstood? Thanks.

—Posted by Jonathan Hartley on 24 Nov 2010 @ 06:17 UTC #

JSON is better than XML in all respects, but one: there is no schema. Unfortunately, the XML schema have not fulfilled its promise of letting programs to understand the meaning of the data, leaving alone the faults of its design and implementation.

XML is a perfect example of a very fine strategy: promise that which is impossible and milk for money.

Sorry XML, we need a fresh promise.

—Posted by Andrey on 25 Nov 2010 @ 08:03 UTC #

I never know whether to approve comments like Andrey's or not. It's utter nonsense, but I suppose it's on-topic in a broad sense.

—Posted by Norman Walsh on 25 Nov 2010 @ 01:15 UTC #

Andrey: let me google that for you: json schema. There exists at least one schema language for JSON. The fact that it isn't very good yet means you haven't contributed enough.

—Posted by Joe Hildebrand on 29 Nov 2010 @ 09:42 UTC #

fyi, also, in the works, is JSYNC, which extends JSON to include additional data serialization features from YAML.

As much as I see SGML/XML as vital to mixed content, I personally found the XML-is-the-database ideals of 5-10 years ago a bit much. Those ideals certainly led me to do a few projects the really hard way with XML for simple data, that now I see as being more efficiently achieved with JSON.

I think this perception gap between XML and JSON has to do with how the objects are accessible / processable after the underlying text is parsed. The standard processors of XML, like XPath are great when you imagine your system as being chains of standard processors. But, if you think you just need a really specific processor for some really specific data, there's some appeal in working "directly" with data objects in the manner of JSON.

Maybe XML does offer that directness via data bindings -- I probably should play with E4X again . . .

—Posted by Jay Fienberg on 30 Nov 2010 @ 10:36 UTC #

@Joe Hilderbrand I think it works fine. There's already support for it in dojo. The nice thing is they didn't make XML's pre-schema mistake with DTD an instead went straight for JSON as the description language. You can easily take a JSON object and validate it with a JSON schema object. The dojo validation implementation also give verbose output about any validation errors.

—Posted by Luis Montes on 10 Feb 2011 @ 06:39 UTC #

Can JSON be used for multi-language content, to be clear non-english content. For Objective C as the programming language, I tried SBJSON Parser, but didn't found much luck to parse the non-english content yet. Please let me know if any other way to parse the JSON content over Objective-C

—Posted by Pramod Jain on 14 May 2011 @ 11:31 UTC #

Being a "fan" or XML or JSON is foolishness. A wise developer uses the right tool for the right job and does not get emotionally attached to them.

—Posted by Matt on 20 May 2011 @ 06:56 UTC #