[issue3995] iso-xxx/cp1252 inconsistencies in Python 2.* not in 3.*
STINNER Victor
report at bugs.python.org
Mon Sep 29 12:14:30 CEST 2008
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
If you write "€" in the Python interpreter (Python2), you will get a
*bytes* string encoded in your terminal charset. Example on Linux
(utf-8):
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> '€'
'\xe2\x82\xac'
Use "u" prefix to get unicode string:
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> u'€'
u'\u20ac'
If you use unicode, encoding to ISO-8859-1/-15 works correctly.
(Truncated) example with python trunk:
Python 2.6rc2+ (trunk:66680M, Sep 29 2008, 12:03:32)
>>> u'€'.encode('ISO-8859-1')
...
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac'
>>> u'€'.encode('ISO-8859-15')
'\xa4'
In a script (Python code written in a file), use #coding header to
specify your file charset. Or use "\xXX", "\uXXXX" and "\UXXXX"
notations for non-ASCII characters.
Is there somewhere an Unicode Python FAQ? :-)
----------
nosy: +haypo
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3995>
_______________________________________
More information about the Python-bugs-list
mailing list