[issue30755] locale.normalize() and getdefaultlocale() convert C.UTF-8 to en_US.UTF-8

Matthew Woodcraft report at bugs.python.org
Mon Sep 25 18:37:16 EDT 2017


Matthew Woodcraft added the comment:

I've investigated a bit more.

First, I've tried with Python 3.7.0a1 . As you'd expect, PEP 537 means
this behaviour now also occurs when no locale environment variables at
all are set.


Second, I've looked through locale.py a bit. I believe what it calls the
"aliasing engine" is applied for:

 - getlocale()
 - getdefaultlocale()
 - setlocale() when passed a tuple, but not when passed a string


This leads to some rather odd results.

With 3.7.0a1 and no locale environment variables:

  >>> import locale
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # getlocale() is lying: the effective locale is really C.UTF-8
  >>> sorted("abcABC", key=locale.strxfrm)
  ['A', 'B', 'C', 'a', 'b', 'c']


Third, I've checked on a system which does have en_US.UTF-8 installed,
and (as you'd expect) instead of crashing it gives wrong results:

  >>> import locale
  >>> locale.setlocale(locale.LC_ALL, ('C', 'UTF-8'))
  'en_US.UTF-8'
  >>> locale.getlocale()
  ('en_US', 'UTF-8')

  # now getlocale() is telling the truth, and the user isn't getting the
  # collation they requested
  >>> sorted("abcABC", key=locale.strxfrm)
  ['a', 'A', 'b', 'B', 'c', 'C']

----------
versions: +Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue30755>
_______________________________________


More information about the Python-bugs-list mailing list