An XML parser is an XML parser. Period.

Wed Feb 11 01:30:01 EST 2004

Peter Hansen <peter at engcorp.com> wrote in message news:<40290854.15BB5CF0 at engcorp.com>...
> Uche Ogbuji wrote:
> > 
> > Pierre N <pierren at mac.com> wrote in message news:<mailman.904.1075287732.12720.python-list at python.org>...
> > > I'm using pyRXP, and it's great.
> > > It's using one tuple, not dictionnaries.
> > > Very very fast.
> > > By the way I'm just starting using this package, anybody met any
> > > problems with pyRXP?
> > 
> > I did.  It's not an XML parser :-(.  It does not accept character
> > entities such as … (the example that bit me), giving meaningless
> > "error" messages along the lines: "not a valid 8-bit XML character".
> > If you need an XML parser, use PyRXPU, which comes in ReportLab CVS
> > only.  It is not as fast as PyRXP, but conformant in my testing, and
> > the point of XML is conformance, not speed at all costs.  If you want
> > speed at all costs, use CSV or some other plain text format.
> 
> Hmm... so it's your opinion that *all* XML parsers must handle *all*
> aspects of XML?

XML is clear on what a Parser *must* support.  The full character
production is one of those things. From XML 1.0, section 2.2:

Character Range
[2] 	Char 	::= 	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

There is no "option" to not support characters greater than #xFF.  XML
parsers *can* leave off handling some aspects of XML, external DTD
subsets, for example, but you can not be as fundamentally
non-conformant as PyRXP and still call yourself an XML parser.

This is not just an academic matter.  There are a *vast* number of
useful and heavily-used characters of code point higher than U+FF and
if parsers decided on a whim to pick and choose what to support the
result would be complete and utter chaos.

> If not, I think you should back off on the criticism
> of PyRXP as being "not an XML parser" and simply point out that it
> doesn't handle all aspects of XML because it is intended to provide
> a very fast/heavily optimized approach to parsing only certain kinds
> of XML.  It's a valid choice to do so, though of course if PyRXP is
> promoted as a "full" XML solution that might be inaccurate.

PyRXP is not an XML parser.  It's that simple.  I stand by that veru
strong satement, and I'd be surprised if XML expert refusaes to
corroborate it.

I do want to point out that PyRXPU does seem to be a proper XML
parser, and is what people should use instead if they like the
ReportLab products.

Of course if yu don't really need an XML parser, feel free to use
PyRXP.  Just don't call it what it isn't.

--Uche
http://uche.ogbuji.net