XML Is Like Violence…

…if it hasn’t solved your problem yet, you’re not using enough. Curiously, I always thought of that little quip as a subtle indictment of the problems with XML, but it seems to almost be a rallying cry for proponents of the standard (I refuse to recognize XML as a “language”).  And when I say proponents, I actually mean “fanatics.”

The problem I have with XML, my “beef” if you will, is certainly not XML’s fault; it’s the fault of the overzealous developer who wants to use XML as a way to transmit pixel data over the network.  If you look at that post and your first thought is something like:

Well, duh, of course that’s the wrong way to do it, I’d use something like:

<signature>
	<pixel x="127" y="90" />
	<pixel x="128" y="91" />
	<!-- ... -->
</signature>

Well, you’re probably already too far gone; this post isn’t for you.

Of course, you still have to send your data somehow, and since raw bytestreams are out this fall, why not look into an alternative format? May I recommend some JSON?  Let’s take a look at a simple and realistic (ish) case of JSON -vs- XML: representing an order in a format that a person and a computer can read.

For this experiment, I will be placing order number 233 on behalf of customer number 971.  Said customer would like 2 each of item number 2751 and item number 1765, “beef brats” and “brat buns” respectively.  Said customer would also like the buns warmed but not toasted, and the brats to be of the “Beer Basted” variety, but not filled with cheese.  Here we go.

JSON:

var order = {
	orderId: 233,
	customerId: 971,
	items: [
		{
			itemId: 2751,
			name: "Beef Brats",
			quantity: 2,
			options: [
				{
					name: "beerbasted",
					value: true
				},
				{
					name: "cheesefilled",
					value: false
				}
			]
		},
		{
			itemId: 1765,
			name: "Brat Buns",
			quanity: 2,
			options: [
				{
					name: "warmed",
					value: true
				},
				{
					name: "toasted",
					value: false
				}
			]
		}
	]
};

EDIT: I noticed the objects weren’t exactly semantically similar; I fixed this because fair’s fair (and it only increases the size of the JSON by about 20 bytes, whereas the equivalent XML correction might have taken triple that

Versus the contender, XML:

<?xml version="1.0" encoding="UTF-8"?>
<order>
	<id>233</id>
	<customerId>971</customerId>
	<item>
		<id>2751</id>
		<name>beef brats</name>
		<quantity>2</quantity>
		<option>
			<name>beer basted</name>
			<value>true</value>
		</option>
		<option>
			<name>cheese filled</name>
			<value>false</value>
		</option>
	</item>
	<item>
		<id>1765</id>
		<name>brat buns</name>
		<quantity>2</quantity>
		<option>
			<name>warmed</name>
			<value>true</value>
		</option>
		<option>
			<name>toasted</name>
			<value>false</value>
		</option>
	</item>
</order>

Now, XML proponents likely will not take this one lying down. They’ll look at that document and accuse me of being the worst, most evil, vile, baby-eating slanderer ever to take a crack at their beloved XML. After all, one could just as easily use documents with more concise structure, something that doesn’t just look so gigantic, like one of these:

<?xml version="1.0" encoding="UTF-8"?>
<order id="233">
	<customer id="971" />
	<item id="2751">
		<name>beef brats</name>
		<quantity>2</quantity>
		<option name="beerbasted">true</option>
		<option name="cheesefilled">false</option>
	</item>
	<item id="1765">
		<name>brat buns</name>
		<quantity>2</quantity>
		<option name="warmed">true</option>
		<option name="toasted">false</option>
	</item>
</order>
<?xml version="1.0" encoding="UTF-8"?>
<order id="233" customerid="971">
	<item id="2751" name="beef brats" quantity="2">
		<option name="beerbasted" value="true" />
		<option name="cheesefilled" value="false" />
	</item>
	<item id="1765" name="brat buns" quantity="2">
		<option name="warmed" value="true" />
		<option name="toasted" value="false" />
	</item>
</order>
<?xml version="1.0" encoding="UTF-8"?>
<order id="233" customerid="971">
	<item id="2751" name="beef brats" quantity="2" beerbasted="true" cheesefilled="false" />
	<item id="1765" name="brat buns" quantity="2" warmed="true" toasted="false" />
</order>

The reason I don’t list these as the primary contention against JSON, even though that’s what the advocates would probably have written? Simple. Those representations don’t match, when treated as a representation of an object, the order I started out with. They are close, yes, and they convey the same information, but the syntax is entirely different, and therefore the parsing is not only more complex, especially when it comes to converting the XML back into an objectm, not to mention that you start to lose the “human readable”-ness of the document by trying to cram it into less and less space.

Furthermore, with JSON, exporting objects, arrays, or both, is extremely easy to do — in PHP, for instance, it’s actually native since version 5.2, and json.org has plenty of easy-to-use implementations in most languages in common use.

In order to arrive at the XML I chose as a contender, I did a google search for “array to XML PHP” and checked the top result, which was a function (for some unknown reason static-inside-a-class-as-the-only-member, maybe written by a namespace-nazi?) that seemed to do a more-or-less standardized conversion. Now, again, we could argue all day about whether or not that is even a worthy implementation, but before that argument starts let me squash it by saying this: whether or not any of those syntaxes is correct or not is less important than the simple point that any and all of them are technically correct.

Semantics aside, all of those XML versions of the same “object” are valid and well-formed XML documents, and they, depending on your method of “serialization” into and out of XML, all describe the same order. And that, ladies and gentlemen, is a problem I consider to be worst of all. Which one of those is right? Is it the smallest, the one that’s most clear, the one that is most loyal to the original order object?

Because of XML’s loose specification, which is a good thing in cases where XML is appropriate, there isn’t really a consistent way to represent these objects as XML that everyone agrees on. Some people, as in the last example, would put everything into an attribute=value pair, while many automated methods for de/serialization would use almost no attributes, instead favoring raw tags.

Before this gets too far away from me, let me draw it to a close. Because of the standard’s, if you’ll pardon the sub-pun, lack of standardization, I propose we stop, for the love of all that is good and pure, using XML to represent every single object and data-fragment we come across. It’s time to let go. There are just too many ways to do it, and we’re running out of patience…

Recent Entries

One Response to “XML Is Like Violence…”

  1. Ed Willard Says:

    Bravo!