Parsing XML with ElementTree (unicode problem?)

André andre.roberge at gmail.com
Tue Jul 24 08:26:15 EDT 2007


On Jul 23, 11:29 am, oren.t... at gmail.com wrote:
> (this question was also posted in the devshed python forum:http://forums.devshed.com/python-programming-11/parsing-xml-with-elem...
> ).
> -----------------------------
>
> (it's a bit longish but I hope I give all the information)
>
> 1. here is my problem: I'm trying to parse an XML file (saved locally)
> using elementtree.parse but I get the following error:
> xml.parsers.expat.ExpatError: not well-formed (invalid token): line
> 13, column 327
> apparently, the problem is caused by the token 'Saunière' due to the
> apostrophe.
>
> the thing is that I'm sure that python (ElementTree module and parse()
> function) can handle this type of encoding since I obtain my xml file
> from the web by opening it with:
>
> from elementtree import ElementTree
> from urllib import urlopen
> query = r'http://ecs.amazonaws.com/onca/xml?
> Service=AWSECommerceService&AWSAccessKeyId=189P5TE3VP7N9MN0G302&Operation=ItemLookup&ItemId=1400079179&ResponseGroup=Reviews&ReviewPage=166'
> root = ElementTree.parse(urlopen(query))
>
How about trying
root = ElementTree.parse(urlopen(query), encoding ='utf-8')

André




More information about the Python-list mailing list