Revised MIME type article

Posted Aug 24, 2004 in PHP, Technology, XHTML.

It was brought to my attention that the regular expressions used in the article on serving up XHTML with the correct MIME type were flawed. Apparently, they were incorrectly reading and comparing the Q weighting, by only looking for numbers between 0.1 and 0.9. My hopeless knowledge of regular expressions didn't catch it first time around.

With the help of Kornel Lesinski, I was able to refine the expressions. They can now cope with any number between exactly zero and exactly one, with up to three decimal places, as described in this section of the HTTP 1.1 protocol.

I found the exercise particularly useful, because it helped me to better understand how regular expressions work. Cool.


  1. Gravatar

    I suggest you include some kind of history of the document somewhere in the document (an appendix). So people can see when and what was updated.

    Posted by Anne on Aug 24, 2004.

  2. Gravatar

    Not a bad idea, but hard to do retrospectively. I will try to do better with that sort of thing in the future.

    Posted by Simon Jessey on Aug 24, 2004.

  3. Gravatar

    Fixing the q=q= error in the HTML q test, this still fails for me if I set my accept header to:


    I get served XHTML.

    So far I rewrote my q test thusly and it seems to work, but I haven't finished testing it. I know it would let illegal values like q=100 through so I have to finish hacking it when I'm not at work. (wink)


    For now, I test that matches[1] <=1 before assigning it to html_q or xhtml_q while I think about the illegal values passing through the regex problem.

    Posted by Bill Mason on Aug 24, 2004.

  4. Gravatar

    Amazing isn't it? You'd think that something like this would be simple and straightforward, but so far a complete solution has eluded half a dozen great web minds!

    Posted by Simon Jessey on Aug 24, 2004.

  5. Gravatar

    Yeah, it's novel. (smile)

    I went around with it for awhile and could keep at it, but I finally decided to go back to my original test, which was looking exclusively for 0.x and just modify it for finding

    It may not be true to the letter of the spec, but I figure:

    * 10.4.7 notes that servers can still serve content to you even if your accept header specifies q=0 for that type of content. And indeed, if you set text/html;q=0 in Mozilla, it doesn't suddenly start rejecting the entire internet. (wink) So I'm not going to spend time looking for the mythical browser with q=0 in its accept header.

    * If there's a browser out there asking for application/xhtml+xml;q=1, god bless it. I'm not worrying about finding it. And under my old syntax, the browser would get XHTML anyway (since the script syntax will assume that a browser that lists application/xhtml+xml without a value is sending application/xhtml+xml with no q value).

    * I'm defaulting to text/html unless proven otherwise anyway, for that mythical text/html;q=1 browser that I'm not looking for under my current script. And I don't care about that mythical browser sending application/xhtml+xml;q=0.9,text/html;q=1 that would get XHTML served to it in my script.

    The other quirk I'm ignoring is that, on my systems anyway, the 3rd level of decimal precision is lost in the calculation process. So application/xhtml+xml;q=.955,text/html;q=.959 is showing as a tie at .95 for both. I can live with that.

    Posted by Bill Mason on Aug 24, 2004.

  6. Gravatar

    Are you suggesting, therefore, that I return the regular expressions to their previous state? What about the addition of (float)?

    Posted by Simon Jessey on Aug 25, 2004.

  7. Gravatar

    I'm not suggesting anything. This is your sandbox!

    I basically went the way I did because I don't expect to see a browser using q=0 or q=1 anytime soon, and because I don't really have the free time right now to hack at the script in depth.

    On my systems, at least, (float) didn't change anything regarding my calculation issues.

    Posted by Bill Mason on Aug 25, 2004.

  8. Gravatar

    I am not sure what to do now. I tried my hand at writing the regex myself, but I appear to have been unsuccessful. Can anyone figure this out for me?

    Posted by Simon Jessey on Aug 25, 2004.