[XML-SIG] Corrected list of packages handling XML 1.1

Ken Beesley ken.beesley at xrce.xerox.com
Fri Sep 2 17:54:56 CEST 2005


Uche Ogbuji wrote

>------------------------------
>
>Message: 2
>Date: Thu, 01 Sep 2005 11:59:09 -0600
>From: Uche Ogbuji <Uche.Ogbuji at fourthought.com>
>Subject: Re: [XML-SIG] Corrected list of packages handling XML 1.1
>To: Walter D?rwald <walter at livinglogic.de>
>Cc: xml-sig at python.org, Ken Beesley <ken.beesley at xrce.xerox.com>
>Message-ID: <1125597549.14255.347.camel at borgia>
>Content-Type: text/plain; charset=ISO-8859-15
>
>On Thu, 2005-09-01 at 12:50 +0200, Walter D?rwald wrote:
>  
>
>>Ken Beesley wrote:
>>
>>    
>>
>>>My apologies to Fredrik Lundh of Pythonware for the omission of 
>>>ElementType+sgmlop in my recent listing of Python-XML packages that 
>>>handle XML 1.1. The list (that I'm aware of) currently includes: 1. 
>>>pxdom by Andrew Clover (http://www.doxdesk.com/software/py/pxdom.html, 
>>>http://www.doxdesk.com/file/software/py/pxdom.py) 2. pyLTXML from the 
>>>Univ. of Edinburgh (http://www.ltg.ed.ac.uk/software/xml, 
>>>http://www.ltg.ed.ac.uk/software/gpl_xml.html, 
>>>http://www.ltg.ed.ac.uk/software/xml/xmldoc/xmldoc.html) 3. elementtree 
>>>library from Pythonware (http://effbot.org/zone/element.htm, 
>>>http://effbot.org/zone/element-index.htm) If I've forgotten anyone, 
>>>please help me complete the list.
>>>      
>>>
>> > [...]
>>
>>XIST (http://www.livinglogic.de/Python/xist) handles XML 1.1 charrefs 
>>when a parser is used that does it. (XIST uses sgmlop by default, so it 
>>works by default). When serializing XML those charrefs are always 
>>supported. See the following snippet:
>>
>> >>> from ll.xist import parsers, presenters
>> >>> from ll.xist.ns import html
>> >>> e = parsers.parseString("<body>this is a backspace: &#x0008;</body>")
>> >>> print e.asrepr(presenters.CodePresenter())
>>ll.xist.xsc.Frag(
>>    ll.xist.ns.html.body(
>>       'this is a backspace: \x08'
>>    )
>>)
>> >>> print e.asBytes()
>><body>this is a backspace: &#8;</body>
>>    
>>
>
>This conversation is really becoming surreal.  People, please, it's very
>simple: supporting the range of character references defined in XML 1.1.
>Is not, repeat *NOT* the same thing as being an XML 1.1 parser.
>
>If I have software that parses "<a>b</a>" that does not mean I have an
>XML 1.0 parser.  If that software also accepts "<a>b</c>", then it is
>obviously not such.
>
>Any software that accepts "<body>this is a backspace: &#x0008;</body>"
>is neither a compliant XML 1.0 parser nor a compliant XML 1.1. parser.
>All XML 1.1 documents *must have an XML declaration* according to the
>strict stipulation of the spec.  If an XML 1.1. parser encounters a
>document without an XML declaration, it *must* assume that it is an XML
>1.0 document, at which point it would *have to* stop with a fatal error
>when it encounters &#x0008;.  Period.  There is no negotiation here.
>
>Therefore, as far as I can tell, neither the ET/sgmlop trick nor XIST
>are XML 1.1. parsers.  I cannot speak for LTXML or pxdom, but knowing
>the authors, I would guess that they are indeed compliant XML 1.1
>parsers.
>
>
>  
>
What Mr. Ogbuji states about "being an XML 1.1 parser" and
"being a compliant XML 1.0 parser [or] a compliant XML  1.1
parser" is of course correct.  However, with respect, I believe
that he misses the point and claims of the list.

I posted a list of packages "handling XML 1.1", and Martin Dörwald
helpfully added XIST as a package that "handles XML 1.1 charrefs
when a parser [like sgmlop] is used that does it".   Neither one of
us claimed that all the listed packages (and especially not the ones
using an underlying sgmlop parser) were "XML 1.1 parsers".  Perhaps
my terminology is confusing, but what I meant by "handling XML 1.1"
is this:

         "Handle XML 1.1" = able to process a valid XML 1.1
               document without throwing up and quitting.

Sgmlop (http://effbot.org/zone/sgmlop-index.htm) is admittedly
non-validating and tolerant:  "The *sgmlop* parser is tolerant, and
happily accepts XML-like data that are not well-formed. If you need
strictness, use another parser."

In my own work, I do in fact use a second parser, separating the
validation from the processing:

1.  I prepare XML documents containing some control characters that are
valid only in XML 1.1.  I always mark the file <?xml version="1.1"?>

2.  I then validate the documents using a Relax NG schema and the Jing
validating parser, which knows the difference between XML-1.0-valid and
XML-1.1-valid. 

3.  I then need to "handle" or "process" my 
already-known-to-be-XML-1.1-valid
documents, to map them non-trivially into a different XML 1.1 language. 
Despite the fact that ElementTree+sgmlop or XIST+sgmlop
cannot be "compliant XML 1.1 parsers", their ability to "handle" an
already-known-to-be-XML-1.1-valid document is valuable to me, and perhaps
to others who want to work with XML 1.1 documents.

******
That was the point of posting the list of "packages handling XML 1.1".
If there's a better term than "handle XML 1.1", then please inform me,
and I'll try to use it.

Ken


More information about the XML-SIG mailing list