Fussy XML

Posted Jan 20, 2004 in XML.

Prompted in part by a post from Mark Pilgrim, there has been much discussion about how to handle XML that is not well-formed. To understand the discussion, it is necessary to understand what well-formedness actually is. XML should follow these rules:

  1. It must have only one root element
  2. All open elements must be closed. Empty elements may use an empty element tag
  3. All attribute values must be quoted
  4. Elements may be nested, but must not overlap
  5. Case sensitivity in element and attribute names must be observed
  6. Use proper character encoding and ensure no illegal characters are in the document

That is all there is to it. We aren't talking about rocket science here. A very simple set of rules that anyone (and any machine) can easily follow. I think authoring XML is easy. I like the fact that it fails catastrophically if it doesn't meet the rules of well-formedness, because that mimics the syntax error that programmers are used to. It helps to debug.

But what of the end user, who doesn't give a shit about well-formedness? They must be considered above all others, and this means creating clients that can handle broken XML. I have to sit on the fence with this one, because the programmer side of me disagrees with the user side of me.


  1. Gravatar

    If you write solid software and you check your online pages with a browser that can handle XML, you have nothing to worry. And that is just something you need to do :-)

    The only problem I have is me, I haven't been able to make comment validation work, but I have (fortunately) enough time to fix small errors people make.

    FYI: your permalinks in your RSS feed are broken again.

    Posted by Anne on Jan 21, 2004.

  2. Gravatar

    You're right. I have "entry" instead of "entries". I'll fix it when I get home later. Thanks, Anne.

    Posted by Simon Jessey on Jan 21, 2004.