[XML-SIG] [ pyxml-Bugs-433761 ] setContentHandler broken

noreply@sourceforge.net noreply@sourceforge.net
Sat, 16 Jun 2001 09:28:12 -0700


Bugs item #433761, was updated on 2001-06-16 09:28
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=433761&group_id=6473

Category: SAX
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Achim Gaedke (achimgaedke)
Assigned to: Nobody/Anonymous (nobody)
Summary: setContentHandler broken

Initial Comment:
My intention is to write a recursive parser for nested
data structures.
In order to collect the data it is necessary to switch
the contenthandler each step.

This does NOT work for the character handler: This is
my (lean) test program:

import xml.sax.handler

parser=xml.sax.make_parser()

class second_ch(xml.sax.handler.ContentHandler):
    def startElement(self,name,attrs):
        print "start second"

    def endElement(self,name):
        print "end second"

    def characters(self,content):
        print "second: ",content.strip()

class first_ch(xml.sax.handler.ContentHandler):
    def startElement(self,name,attrs):
        print "start first"
        self.second=second_ch()
        parser.setContentHandler(self.second)

    def endElement(self,name):
        print "end first"

    def characters(self,content):
        print "first: ",content.strip()

first=first_ch()
parser.setContentHandler(first)
parser.parse('members.xml')

and this is the xml file members.xml:
<?xml version="1.0"?>
<a>a1<b>b1</b>a2</a>

more is not necessary. This is the output with
python2.0 and
expat-1.95.2


python2.0 xml_test.py
start first
first:  a1
start second
first:  b1
end second
first:  a2
end second

After the first line the second content handler should
get the
characters!

The second test is with python2.1 and expat1_1:
python2.1 xml_test.py
start first
first:  a1
start second
first:  b1
end second
first:  a2
end second

the result is the same. What a pity.
In expat reference it is stated, that changing of
handler is possible and expected.

I am running Redhat Linux 7.1 with self built python
interpreters.

Ok, here is a workaround, I found in order to go on
coding:

Add the following line after each

   parser.setContenHandler(new_handler)

for the missing functionality of
parser.setContentHandler():

parser._parser.CharacterDataHandler=new_handler.characters

This line is taken from
xml.sax.expatreader.ExpatReader.reset()
This does work after parser.reset() or after
parser.parse(...)  is called once (because parse()
calls reset()).

I think this error should be corrected somewhere else
and not in my code, but I don't know where.

One of the lines in reset() are:
self._parser.CharacterDataHandler =
self._cont_handler.characters

change to

self._parser.CharacterDataHandler = self.character_data

would help, but there is a note, that that should not
happen?! Why?

Yours!

achim

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=433761&group_id=6473