[XML-SIG] SAX characters() output on multiple lines for non-ascii

"Martin v. Löwis" martin at v.loewis.de
Thu Feb 7 07:01:52 CET 2008


> However if I try and put some of the surrounding text back in either by
> concatenating strings or using multiple  sys.stdout.write() calls I get
> repetitions of the strings. 
> 
>         if len(newchars)> 0:
>           output = ''.join(newchars)
>           sys.stdout.write("String read is '")
>           sys.stdout.write(output)
>           sys.stdout.write("'")  
> 
> 
> Start ELEMENT ='title'
> String read is 'Der Einfluss kleiner naturnaher Retentionsma'String read is
> '▀'S
> tring read is 'nahmen in der Fl'String read is 'Σ'String read is 'che auf
> den Ho
> chwasserabfluss - Kleinr'String read is 'ⁿ'String read is 'ckhaltebecken -.'
> End ELEMENT ='title'

Please read Fred Drake's answer again. SAX will split the data in the
XML document into multiple pieces. You put your decoration ("String read 
is") around each piece. Multiple pieces -> multiple decorations.

To solve this issue, collect all pieces in a global variable:

output = u""

   def characters(self, chars):
     global output
     output += chars

   def endElement(self, name):
     global output
     print "String read is", output.encode("latin-1")
     output = u""

You could also chose to make output an attribute of self.

Regards,
Martin



More information about the XML-SIG mailing list