[XML-SIG] problems using XML-sig code to read large XML files.

Anthony Baxter anthony@interlink.com.au
Mon, 15 May 2000 13:04:08 +1000


I'm using the XML-sig code to read in a largish (2.5M) XML document.
This document consists of a very very simple structure, like this:

<locations>
<country name='Aruba' ccode='ABW'>
<Name name='Aruba' canon='1' />
<place geokey='1721834' name='Barcadera'>
<Name name='Barcadera' canon='1' />
</place>
<place geokey='16838' name='Druif'>
<Name name='Druif' canon='1' />
</place>
<place geokey='7761' name='Oranjestad'>
<Name name='Oranjestad' canon='1' />
</place>
<place geokey='77050' name='Sint Nicolaas'>
<Name name='Sint Nicolaas' canon='1' />
</place>
</country>
[..more countries..]
</locations>

this was generated using the xml-sig code. However, when I try to read
it in using something like:
from xml.dom import utils
reader = utils.FileReader('out.xml')
doc = reader.document

I get an error:
  File "read.py", line 2, in ?
    reader = utils.FileReader('out.xml')
  File "/opt/python/lib/python1.5/site-packages/xml/dom/utils.py", line 131, in __init__
    self.document = self.readFile(filename)
  File "/opt/python/lib/python1.5/site-packages/xml/dom/utils.py", line 140, in readFile
    document = self.readStream(file,type)
  File "/opt/python/lib/python1.5/site-packages/xml/dom/utils.py", line 148, in readStream
    document = self.readXml(stream)
  File "/opt/python/lib/python1.5/site-packages/xml/dom/utils.py", line 165, in readXml
    p.feed(stream.read())
  File "/opt/python/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 123, in feed
    if not self.parser.Parse(data):
pyexpat.error: not well-formed: line 37162, column 19

Using the other example on http://www.python.org/doc/howto/xml/node12.html
I get something like
Traceback (innermost last):
  File "read.py", line 16, in ?
    p.close()
  File "/opt/python/lib/python1.5/site-packages/xml/sax/drivers/drv_pyexpat.py", line 127, in close
    if not self.parser.Parse("",1):
pyexpat.error: no element found: line 16148, column 16

Running both of them repeatedly gives different positions in the file.
None of the lines mentioned in the file have a problem. Zope with the Ft
ZDOM or the normal Zope DOM code have no problems with it. nsgmls has no
problem with it.

I've tried both the 0.5.4 and current CVS versions, to no avail.

The dom_from_xml_file.py demo in Ft.Dom.demo also breaks.

I can make the file available if anyone wants it, although just
taking the example above and making 10,000 copies of the country 
into a file will do the trick.

anyone?

thanks,
Anthony