How to parse xml with ISO-8859-1 encoding using ElementTree and SimpleXMLTreeBuilder?

Zvi google at zvis.com
Tue May 13 16:26:00 EDT 2008


Hi All,

I got trouble parsing xml returned by web service. XML data contains
characters above 128, so ET.fromstring throws an error. Error is
thrown from python's xmllib.py file, where it detects char above 128.
I am replacing utf-8 encoding string in returned xml with
'ISO-8859-1', and then I call .encode with ISO-8859-1 param. Still I
get the parsing error, illegal character.
What's interesting is that if I define a string const and assign it
the value returned from the service request it gets parsed. I.e. the
following gets parsed ok.


TEST_EVNVELOPE2 = """<?xml version="1.0" encoding="ISO-8859-1"?>
                     <soap:Envelope xmlns:soap="http://
schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/
XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
                     <soap:Body><GetResponse xmlns="http://
tempuri.org/">
                     <GetResult><?xml version="1.0"
encoding="ISO-8859-1"?><Response><Entity Name="Accounts"
Current="00300571BDF91DDCA7D1320EE5C78877"><Field Name="Name"
Value="Bad und WA¤rmetechnik FA_hrwirt GmbH"/></Instance></
Entity></Response></GetResult></GetResponse>
                     </soap:Body></soap:Envelope>"""

CHARSET = 'ISO-8859-1'

      ET.XMLTreeBuilder = SimpleXMLTreeBuilder.TreeBuilder
      spEnv = TEST_EVNVELOPE2
      spEnv = spEnv.replace('utf-16', CHARSET)
      spEnv = spEnv.replace('utf-8', CHARSET)
      dom=ET.fromstring(spEnv.encode(CHARSET))

however when spEnv is assigned response.content directly I get parsing
error in ET.fromstring, even though TEST_EVNVELOPE2 value is just
pasted from the browser's src code, that is I just dumped
response.content to rendered html and then copied it from html src.
Why it works as a  string const and not as variable value?

So, what's the correct way to make parsing work?




More information about the Python-list mailing list