[XML-SIG] Status of XML 1.1 processing in Python?

Wed Aug 31 19:55:52 CEST 2005

Daniel Veillard wrote:

>On Tue, Aug 30, 2005 at 10:07:10AM +0200, Ken Beesley wrote:
>  
>
>>The existing (hidden) Mac parser that parses XML specifications
>>of input methods (into a low-level binary format) already
>>handles &#x0008; and other control characters now legal in XML 1.1
>>So this hidden Mac parser is XML 1.1-capable, at least as far as
>>control characters are concerned. 
>>    
>>
>
>  The real problem is that "parser" is from your initial description
>not an XML-1.0 parser nor an XML-1.1 parser. Send some flames to Apple
>for breaking a standard that everybody else tried to conform to. Then
>work around that broken piece in their stack if you want but as always
>for conformance problems workarounds it's just lost time in the long term.
>
>  
>
First, I'd like to thank experts like Daniel Veillard,
Uche Ogbuji and others who have responded to my XML 1.1
messages.  I very much appreciate your volunteer work in
creating and maintaining tools for XML processing.

Yes, as I pointed out in an earlier message, this Apple behavior
is formally a no-no.  It is of course the official duty of a respectable
XML parser to refuse to parse a document marked version="1.0"
if it contains character references like &#x0008; that are
legal only in XML 1.1.  Apple is faultable here, but it should be
understood that it's their own private HIDDEN parser, used for
exactly one specific application:  this hidden parser translates
OS-X-input-method-defining XML files, defined by a DTD documented
in http://developer.apple.com/technotes/tn2002/tn2056.html,
into an even less human-friendly binary format that OS X
really uses internally.  This hidden parser has only one purpose
in life; it's a dog that knows only one trick. 
This OS X input-method application naturally "needs"
to refer to XML 1.1 characters; and Apple has apparently
wired XML 1.1 assumptions into this hidden, one-trick parser.
Their sin would be wiped away if they simply required that
the input files be marked properly as version="1.1". 

But, again, that's not my "real problem".  I need and want
to validate and parse XML 1.1 documents containing character
references that are legal only in XML 1.1.  I'm willing and
anxious to mark the files properly as version="1.1".  I don't want to
force XML 1.1 on anyone; but it's _exactly_ what I need for my
application.   There must be some other people out there with the
same needs, in particular the people who went out of their
way to write the XML 1.1 recommendation.

The "real problem" or real nuisance for me is that so few of the
open, general-purpose XML tools can handle XML 1.1 at all.
Even if I mark my XML files properly as version="1.1", the
tools can't handle them because they're limited to XML 1.0.

Here's what I've found so far:

The Jing validating parser, for Relax NG schemas, seems
to validate XML 1.0 vs. XML 1.1 correctly.  Nice.
http://www.thaiopensource.com/relaxng/jing.html

pxdom (http://www.doxdesk.com/software/py/pxdom.html)
is a pure Python implementation of DOM, not dependent
on Expat, and claims to handle XML 1.0 and XML 1.1

PyLTXML, from the Univ. of Edinburgh, also claims to handle
XML 1.0 and XML 1.1.  (http://www.ltg.ed.ac.uk/software/xml/)

With pxdom or PyLTXML (still to be tested) it would appear
that I can do what I need to do, using real XML 1.1, and not have
to resort to any workarounds.

I'd _prefer_ to use pulldom or perhaps Ogbuji's very attractive
binderytools.pushbind().  If I were half as dedicated to
XML 1.1 as Veillard and Ogbuji are to XML in general,
I'd roll up my sleeves and contribute to the development
rather than just begging.   :)

Thanks again to all those working on XML tools,

Ken