How to turn a string into a list of integers?

Peter Otten __peter__ at web.de
Sat Sep 6 04:22:30 EDT 2014


Steven D'Aprano wrote:

>>>>> import sys
>>>>> sys.getdefaultencoding()
>> 'ascii'
> 
> That's technically known as a "lie", since if it were *really* ASCII it
> would refuse to deal with characters with the high-bit set. But it
> doesn't, it treats them in an unpredictable and implementation-dependent
> manner.

It's not a lie, it just doesn't control the unicode-to-bytes conversion when 
printing:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> print u"äöü"
äöü
>>> str(u"äöü")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: 
ordinal not in range(128)
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding("latin1")
>>> print u"äöü"
äöü
>>> str(u"äöü")
'\xe4\xf6\xfc'
>>> sys.setdefaultencoding("utf-8")
>>> print u"äöü"
äöü
>>> str(u"äöü")
'\xc3\xa4\xc3\xb6\xc3\xbc'

You can enforce ascii-only printing:

$ LANG=C python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print unichr(228)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 
0: ordinal not in range(128)

To find out the encoding that is used:

$ python -c 'import locale; print locale.getpreferredencoding()'
UTF-8
$ LANG=C python -c 'import locale; print locale.getpreferredencoding()'
ANSI_X3.4-1968

"""
Help on function getpreferredencoding in module locale:

getpreferredencoding(do_setlocale=True)
    Return the charset that the user is likely using,
    according to the system configuration.
"""





More information about the Python-list mailing list