Unicode and coersion
Wilfredo Sánchez
wsanchez at apple.com
Fri Dec 12 13:44:58 EST 2003
So I'm running into the very lovely exception:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5:
ordinal not in range(128)
And I've got some workarounds, but I'd like to better understand
what's going on. First the code that throws:
outfile.write(upc + "\t" +
title + "\t" +
playlist.display_artist() + "\t" +
playlist.release_date() + "\t")
The offending input is that playlist.display_artist() was returning a
unicode string, which is obtained by parsing some XML which is utf-8
encoded. The artist name is Chanté Moore, and I verified that the XML
is encoded correctly.
So I changed the playlist class so that all strings fetched from XML
get encode('utf-8') called on them, but this still craps out, so that
wasn't the (only?) problem. What's surprising is that this works:
outfile.write(upc + "\t")
outfile.write(title + "\t")
outfile.write(playlist.display_artist() + "\t")
outfile.write(playlist.release_date() + "\t")
This is surprising because I would have expected to have to separate
the "\t"s as well. Can someone explain what's going on? Why does it
try to coerce the string to ascii in the first case but not the second?
And shouldn't utf-8 work in any case?
-wsv
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2406 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20031212/ec237276/attachment.bin>
More information about the Python-list
mailing list