[issue30338] LC_ALL=en_US + io.open() => LookupError: (osx)

Ronald Oussoren report at bugs.python.org
Thu May 11 12:43:23 EDT 2017


Ronald Oussoren added the comment:

On macOS 10.12:

ronald$ LC_ALL=en_US python2.7 -c 'import locale; print(repr(locale.getpreferredencoding()))'
''
ronald$ LC_ALL=en_US python3.6 -c 'import locale; print(repr(locale.getpreferredencoding()))'
'UTF-8'

getpreferredencoding uses the CODESET path on macOS, with means the result above is explained by this session (python 2.7):

>>> import locale
>>> locale.setlocale(locale.LC_ALL, '')
'en_US'
>>> locale.nl_langinfo(locale.CODESET)
''

Note that _pyio uses locale.getpreferedencoding(), not locale.getpreferredencoding(False). The latter would use US-ASCII as the encoding:

>>> import locale
>>> locale.nl_langinfo(locale.CODESET)
'US-ASCII'


I guess the empty string for the encoding is explained by the following shell session that looks at the locale information:

$ LC_ALL=en_US.UTF-8 locale -ck LC_ALL | charmap
charmap="UTF-8"

$ LC_ALL=en_US locale -ck LC_ALL | grep charmap
charmap=

In python3 locale.getpreferredencoding (or rather, the same function in _bootlocale) was tweaked to deal with this problem:

if not result and sys.platform == 'darwin':
     # nl_langinfo can return an empty string
     # when the setting has an invalid value.
     # Default to UTF-8 in that case because
     # UTF-8 is the default charset on OSX and
     # returning nothing will crash the
     # interpreter.
     result = 'UTF-8'

Backporting this to 2.7 would IMHO be the best way to fix this issue.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30338>
_______________________________________


More information about the Python-bugs-list mailing list