XML SAX parser bug?

uche.ogbuji at gmail.com uche.ogbuji at gmail.com
Tue Feb 7 12:34:49 EST 2006


mitsura at skynet.be wrote:
> Fredrik Lundh schreef:
> > mitsura at skynet.be wrote:
> > > I think I ran into a bug in the XML SAX parser.
> > >
> > > part of my program consist of reading a rather large XML file (about
> > > 10Mb) containing a few thousand elements.
> > > I have the following problem. Sometimes that SAX parses misreads a
> > > line.
> >
> > it's not a bug; the parser is free to split up character runs (due to buffering,
> > entities or character references, etc).  it's up to you to merge character runs
> > into strings.
>
> but how do I detect that the parser has split up the characters? I gues
> I need to detect it in order to reconstruct the complete string

Here's a recipe:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/265881

Using this filter you can then write SAX code that assumes normalized
text events.  Also, 4Suite's SAX implementation, Saxlette,
automatically does this text event merging for you at C speed:

http://4suite.org/docs/CoreManual.xml#saxlette

--
Uche Ogbuji                               Fourthought, Inc.
http://uche.ogbuji.net                    http://fourthought.com
http://copia.ogbuji.net                   http://4Suite.org
Articles: http://uche.ogbuji.net/tech/publications/




More information about the Python-list mailing list