BUG? Python 2.0 chokes on international characters in Unicode strings
Fredrik Lundh
fredrik at effbot.org
Wed Jan 31 09:01:43 EST 2001
Jurie Horneman wrote:
> Is this a known bug? (If so, AAARRGGHHH - could it really be that Python
> basically doesn't work outside of the US? Hard to believe: these characters
> are used in the Dutch language...)
it's not a bug -- it's just that there are only 127 ASCII code
points, and 40,000+ unicode characters...
> Is there some workaround? Could I convert Unicode strings to ASCII? If so,
> how?
use the encode method:
u = any unicode string
s = str(u) # will fail for non-ascii characters
s = u.encode("utf-8") # will always work
s = u.encode("iso-8859-1") # may fail for non-latin-1 characters
s = u.encode("ascii", "ignore") # won't fail, but may lose chars
s = u.encode("ascii", "replace") # won't fail, will replace non-ascii chars
import locale
# get the most likely output encoding (works on most unix,
# windows, and macintosh installations)
loc, enc = locale.getdefaultlocale()
print u.encode(enc, "replace")
see
http://www.python.org/doc/current/lib/string-methods.html
for more info on the encode method,
http://www.python.org/doc/current/lib/module-codecs.html
for more info on codecs (including stream codecs)
Cheers /F
More information about the Python-list
mailing list