[Tutor] encoding question
Steven D'Aprano
steve at pearwood.info
Sun Jan 5 11:55:38 CET 2014
On Sat, Jan 04, 2014 at 11:57:20PM -0800, Alex Kleider wrote:
> Well, I've tried the xml approach which seems promising but still I get
> an encoding related error.
> Is there a bug in the xml.etree module (not very likely, me thinks) or
> am I doing something wrong?
I'm no expert on XML, but it looks to me like it is a bug in
ElementTree. It doesn't appear to handle unicode strings correctly
(although perhaps it doesn't promise to).
A simple demonstration using Python 2.7:
py> import xml.etree.ElementTree as ET
py> ET.fromstring(u'<xml>a</xml>')
<Element 'xml' at 0xb7ca982c>
But:
py> ET.fromstring(u'<xml>á</xml>')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/xml/etree/ElementTree.py", line 1282, in XML
parser.feed(text)
File "/usr/local/lib/python2.7/xml/etree/ElementTree.py", line 1622, in feed
self._parser.Parse(data, 0)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in
position 5: ordinal not in range(128)
An easy work-around:
py> ET.fromstring(u'<xml>á</xml>'.encode('utf-8'))
<Element 'xml' at 0xb7ca9a8c>
although, as I said, I'm no expert on XML and this may lead to errors
later on.
> There's no denying that the whole encoding issue is still not completely
> clear to me in spite of having devoted a lot of time to trying to grasp
> all that's involved.
Have you read Joel On Software's explanation?
http://www.joelonsoftware.com/articles/Unicode.html
It's well worth reading. Start with that, and then ask if you have any
further questions.
> Here's what I've got:
>
> alex at x301:~/Python/Parse$ cat ip_xml.py
> #!/usr/bin/env python
> # -*- coding : utf -8 -*-
> # file: 'ip_xml.py'
[...]
> tree = ET.fromstring(xml)
> root = tree.getroot() # Here's where it blows up!!!
I reckon that what you need is to change the first line to:
tree = ET.fromstring(xml.encode('latin-1'))
or whatever the encoding is meant to be.
--
Steven
More information about the Tutor
mailing list