[XML-SIG] SAX characters() output on multiple lines for non-ascii

Brian Smith brian at briansmith.org
Sat Feb 2 18:39:29 EST 2008


>   def characters(self, chars):
> 
>       newchars=[]
>       newchars.append(chars.encode('ISO-8859-1'))

The SAX parser calls characters() multiple times for the same text block. For example, in the input <foo>123</foo>, characters() could be called once:
	handler.characters("123")
 or twice:
	handler.characters("12")
	handler.characters("3")
 or:
	handler.characters("1")
	handler.cahraceters("23")
 or three times:
	handler.characters("1")
	handler.characters("2")
	handler.characters("3")

If you want the whole text block, then you need to do something like this:

in __init__:
	self.newchars = []

in startElement:
	self.newchars = []

in characters:
	self.newchars.append(chars)

in endElement:
	if len(self.newchars) > 0:
		combined = "".join(self.newchars).encode('ISO-8859-1') 
		print "Strean read is '%s'" % combined

I recommend using ElementTree instead.

- Brian




More information about the Python-list mailing list