How to turn a string into a list of integers?
Peter Otten
__peter__ at web.de
Sat Sep 6 04:22:30 EDT 2014
Steven D'Aprano wrote:
>>>>> import sys
>>>>> sys.getdefaultencoding()
>> 'ascii'
>
> That's technically known as a "lie", since if it were *really* ASCII it
> would refuse to deal with characters with the high-bit set. But it
> doesn't, it treats them in an unpredictable and implementation-dependent
> manner.
It's not a lie, it just doesn't control the unicode-to-bytes conversion when
printing:
$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> print u"äöü"
äöü
>>> str(u"äöü")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:
ordinal not in range(128)
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("latin1")
>>> print u"äöü"
äöü
>>> str(u"äöü")
'\xe4\xf6\xfc'
>>> sys.setdefaultencoding("utf-8")
>>> print u"äöü"
äöü
>>> str(u"äöü")
'\xc3\xa4\xc3\xb6\xc3\xbc'
You can enforce ascii-only printing:
$ LANG=C python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print unichr(228)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position
0: ordinal not in range(128)
To find out the encoding that is used:
$ python -c 'import locale; print locale.getpreferredencoding()'
UTF-8
$ LANG=C python -c 'import locale; print locale.getpreferredencoding()'
ANSI_X3.4-1968
"""
Help on function getpreferredencoding in module locale:
getpreferredencoding(do_setlocale=True)
Return the charset that the user is likely using,
according to the system configuration.
"""
More information about the Python-list
mailing list