[issue3995] iso-xxx/cp1252 inconsistencies in Python 2.* not in 3.*

STINNER Victor report at bugs.python.org
Mon Sep 29 12:14:30 CEST 2008


STINNER Victor <victor.stinner at haypocalc.com> added the comment:

If you write "€" in the Python interpreter (Python2), you will get a 
*bytes* string encoded in your terminal charset. Example on Linux 
(utf-8):

Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> '€'
'\xe2\x82\xac'

Use "u" prefix to get unicode string:

Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> u'€'
u'\u20ac'

If you use unicode, encoding to ISO-8859-1/-15 works correctly. 
(Truncated) example with python trunk:

Python 2.6rc2+ (trunk:66680M, Sep 29 2008, 12:03:32)
>>> u'€'.encode('ISO-8859-1')
...
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac'
>>> u'€'.encode('ISO-8859-15')
'\xa4'

In a script (Python code written in a file), use #coding header to 
specify your file charset. Or use "\xXX", "\uXXXX" and "\UXXXX" 
notations for non-ASCII characters.

Is there somewhere an Unicode Python FAQ? :-)

----------
nosy: +haypo

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue3995>
_______________________________________


More information about the Python-bugs-list mailing list