Python parsing iTunes XML/COM

Jerry Hill malaclypse2 at gmail.com
Thu Jul 31 17:45:48 EDT 2008


On Thu, Jul 31, 2008 at 9:44 AM, william tanksley <wtanksleyjr at gmail.com> wrote:
> I'm using a file, a file that's correctly encoded as UTF-8, and it
> returns some text elements that are raw bytes (undecoded). I have to
> manually decode them.

I can't reproduce this behavior.  Here's a simple test case:

C:\Program Files\Python25>python -V
Python 2.5.2

C:\Program Files\Python25>more t.py
import xml.etree.cElementTree as ET

xml_string = """<?xml version="1.0" encoding="UTF-8"?>
<character title="GREEK SMALL LETTER PI">\xcf\x80</character>"""

outfile = open('sample.xml', 'wb')
outfile.write(xml_string)
outfile.close()

tree = ET.parse('sample.xml')
root = tree.getroot()
print type(root.text)
print repr(root.text)
print root.text


C:\Program Files\Python25>python t.py
<type 'unicode'>
u'\u03c0'
π

That seems to work as expected.  I wrote out a UTF-8 encoded
bytestring with a proper xml encoding statement.  When I parsed the
file with cElementTree, it returned unicode data.  Does this same
program work for you?  If so, maybe you need to show us more of your
code to see where things are going wrong.

-- 
Jerry


More information about the Python-list mailing list