[XML-SIG] SAX characters() output on multiple lines for non-ascii

woodcock woodcocs at hotmail.com
Sun Feb 3 00:04:21 CET 2008


I am starting with SAX and am trying to parse a file that contains non-ascii
characters.  The xml file uses 'ISO-8859-1'.  When it parses text containing
non-ascii characters the output is across multiple lines.

Example
Trying to output 'Der Einfluss kleiner naturnaher Retentionsmaßnahmen in der
Fläche auf den Hochwasserabfluss - Kleinrückhaltebecken'

The output I get is 
Start ELEMENT ='title'
String read is 'Der Einfluss kleiner naturnaher Retentionsma'
String read is '▀'
String read is 'nahmen in der Fl'
String read is 'Σ'
String read is 'che auf den Hochwasserabfluss - Kleinr'
String read is 'ⁿ'
String read is 'ckhaltebecken -.'
End ELEMENT ='title'

whereas I want a single string something like... 
Start ELEMENT ='title'
String read is 'Der Einfluss kleiner naturnaher Retentionsma▀nahmen in der
FlΣche auf den Hochwasserabfluss - Kleinrⁿckhaltebecken -.
End ELEMENT ='title'

My code is:

  def characters(self, chars):

      newchars=[]
      newchars.append(chars.encode('ISO-8859-1'))
      if newchars[-1] == '\n':
        newchars = newchars[:-1]
      if len(newchars)> 0:
        output = 'String read is ' + "'" + ''.join(newchars) + "'\n"
        sys.stdout.write(output)
    return

Does anyone have any ideas?
-- 
View this message in context: http://www.nabble.com/SAX-characters%28%29-output-on-multiple-lines-for-non-ascii-tp15248449p15248449.html
Sent from the Python - xml-sig mailing list archive at Nabble.com.



More information about the XML-SIG mailing list