REALLY simple xml reader

Stefan Behnel stefan_ml at behnel.de
Thu Jan 31 13:05:25 EST 2008


Stefan Behnel wrote:
> Steven D'Aprano wrote:
>> On Fri, 01 Feb 2008 00:40:01 +1100, Ben Finney wrote:
>>
>>> Quite apart from a human thinking it's pretty or not pretty, it's *not
>>> valid XML* if the XML declaration isn't immediately at the start of the
>>> document <URL:http://www.w3.org/TR/xml/#sec-prolog-dtd>. Many XML
>>> parsers will (correctly) reject such a document.
>> You know, I'd really like to know what the designers were thinking when 
>> they made this decision.
> [had a good laugh here]
>> This is legal XML:
>>
>> """<?xml version="1.0"?>
>> <greeting>Hello, world!</greeting>"""
>>
>> and so is this:
>>
>> """
>>      <greeting       >Hello, world!</greeting    >"""
>>
>>
>> but not this:
>>
>> """ <?xml version="1.0"?>
>> <greeting>Hello, world!</greeting>"""
> 
> It's actually not that stupid. When you leave out the declaration, then the
> XML is UTF-8 encoded (by spec), so normal ASCII whitespace doesn't matter.

Sorry, strip the "ASCII" here. From the XML spec POV, your example

"""
     <greeting       >Hello, world!</greeting    >"""

is exactly equivalent to

"""<?xml version='1.0' encoding='utf-8'?>
     <greeting       >Hello, world!</greeting    >"""

and whitespace between the declaration and the root element is allowed. It's
just not allowed *before* the declaration, which in your case was left out,
thus implying the default declaration.

Stefan



More information about the Python-list mailing list