Bug in expatreader...

Achim Gaedke achim at zpr.uni-koeln.de
Fri Jun 15 18:57:36 EDT 2001


Hello!

My intention is to write a recursive parser for nested data structures.
In order to collect the data it is necessary to switch the contenthandler
each step.

This does NOT work for the character handler:
This is my (lean) test program:

import xml.sax.handler

parser=xml.sax.make_parser()

class second_ch(xml.sax.handler.ContentHandler):
    def startElement(self,name,attrs):
        print "start second"

    def endElement(self,name):
        print "end second"
    
    def characters(self,content):
        print "second: ",content.strip()

class first_ch(xml.sax.handler.ContentHandler):
    def startElement(self,name,attrs):
        print "start first"
        self.second=second_ch()
        parser.setContentHandler(self.second)

    def endElement(self,name):
        print "end first"
    
    def characters(self,content):
        print "first: ",content.strip()

first=first_ch()
parser.setContentHandler(first)
parser.parse('members.xml')

and this is the xml file members.xml:
<?xml version="1.0"?>
<a>a1<b>b1</b>a2</a>

more is not necessary. This is the output with python2.0 and expat-1.95.2


python2.0 xml_test.py
start first
first:  a1
start second
first:  b1
end second
first:  a2
end second

After the first line the second content handler should get the characters!

The second test is with python2.1 and expat1_1:
python2.1 xml_test.py
start first
first:  a1
start second
first:  b1
end second
first:  a2
end second

the result is the same. What a pity.
In expat reference it is stated, that changing of handler is possible and
expected.

I am running Redhat Linux 7.1 with self built python interpreters.


Achim Gaedke, ZPR
Weyertal 80, 50931 Köln
Tel: +49 221 470 6021





More information about the Python-list mailing list