[XML-SIG] Using PyExpat.py

Uche Ogbuji uche.ogbuji@fourthought.com
Sat, 10 Feb 2001 21:33:09 -0700


> 
> >>>>> "Uche" == Uche Ogbuji <uche.ogbuji@fourthought.com> writes:
> 
> Uche> I do recommend the upgrade, and 0.6.4 is on its way.
> 
> I installed 0.6.3, and immediately encountered several problems. Part of
> this may be my freshness to Python. My environment may not be complete
> in some way. First things first:

[Tale of woes snipped]

Ouch!  I don't use PyXML standalone, but even so I would have imagined screams 
from every quarter if 0.6.3 was really so broken.  I suspect someting might 
have gone wrong with your installation.  I'd suggest either using

python setup.py install -f

To force file overwrites or just blow away the _xmlplus directory in your 
Python library and reinstall.

Here are the results I get with Python 2.1a2 and 4Suite 0.10.2beta1 (which 
includes an updated PyXML).  Should be the same results with Python 1.5 or 2.0.

ftp://ftp.fourthought.com/pub/4Suite/4Suite-0.10.2b1.tar.gz

[uogbuji@borgia uogbuji]$ cat test.xml 
<spam>
  <eggs>toast</eggs>
</spam>
[uogbuji@borgia uogbuji]$ python
Python 2.1a2 (#1, Feb  3 2001, 14:38:13) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from xml.dom.ext.reader import PyExpat
>>> from xml.dom.ext import Print
>>> reader = PyExpat.Reader()
>>> xml_dom_object = reader.fromUri('test.xml')
>>> Print(xml_dom_object)
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE spam><spam>
  <eggs>toast</eggs>
</spam>>>> 
>>> 

Huh?  Where'd that broken doctype come from?  Looks as if I found my own first 
beta bug.

Anyway, in general, you can see that the PyExpat reader works in 4Suite 
0.10.2beta1

Note that if your need is for speed and your pattern is just parse and read, 
you might want to consider cDomlette (in 4Suite only) which is *very* fast, 
but read-only:

[uogbuji@borgia uogbuji]$ python
Python 2.1a2 (#1, Feb  3 2001, 14:38:13) 
[GCC egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from Ft.Lib import cDomlette
>>> reader = cDomlette.RawExpatReader()
>>> xml_dom_object = reader.fromUri('test.xml')
>>> from xml.dom.ext import Print
>>> Print(xml_dom_object)
<?xml version='1.0' encoding='UTF-8'?><spam>
  <eggs>toast</eggs>
</spam>>>> 
>>> 

Hmm.  Interesting BTW: no broken doctype.  My guess is that the PyExpat reader 
is inserting an incomplete DocumentType node, but again, this seems to be 
unrelated to your problems with PyXML 0.6.3.

> Uche> As a forewarning, the 0.6.3 and up way is
> 
> Uche> from xml.dom.ext.reader import PyExpat     #or Sax2
> Uche> reader = PyExpat.Reader()
> Uche> xml_dom_object = reader.fromUri(filename)  #should work for either URL or file
> 
> By the way, thanks for all the friendly advice so far. I've noticed that
> this list has more traffic by far relating to development work than
> questions like mine, so I hope this isn't an intrusion.

Not even close.  Your messages are *right* on-topic, and highly appreciated.  
We love to hear all the field-testing reports we can.


-- 
Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python