Posted Jan 20, 2004 in XML.
Prompted in part by a post from Mark Pilgrim, there has been much discussion about how to handle XML that is not well-formed. To understand the discussion, it is necessary to understand what well-formedness actually is. XML should follow these rules:
- It must have only one root element
- All open elements must be closed. Empty elements may use an empty element tag
- All attribute values must be quoted
- Elements may be nested, but must not overlap
- Case sensitivity in element and attribute names must be observed
- Use proper character encoding and ensure no illegal characters are in the document
That is all there is to it. We aren't talking about rocket science here. A very simple set of rules that anyone (and any machine) can easily follow. I think authoring XML is easy. I like the fact that it fails catastrophically if it doesn't meet the rules of well-formedness, because that mimics the syntax error that programmers are used to. It helps to debug.
But what of the end user, who doesn't give a shit about well-formedness? They must be considered above all others, and this means creating clients that can handle broken XML. I have to sit on the fence with this one, because the programmer side of me disagrees with the user side of me.