to better observe locales

Guido van Rossum guido at beopen.com
Sat Jul 29 11:26:55 EDT 2000


"Alexander V. Voinov" <avv at quasar.ipa.nw.ru> writes:

> It seems that Python would better observe locales at almost no cost.
> 
> 1. list.sort(). If the sorted list is just a list of strings, it's
> trivial to pass 'strcoll' as an argument. But if the list element is a
> tuple, I have to write my own comparison function which would use
> locale.

There are lots of good reasons why observing locales on string
comparisons with the < operator are a bad idea.  (The same reasons
that require you to use strcoll or equivalent in C, C++ or Java.)

But I'll give you a suggestion for how to compare tuples of strings
efficiently.

Say you have a list L of tuples containing (lastname, firstname,
address) records.  To sort L, you write:

  from locale import strxfrm
  L1 = []
  for t in L:
    L1.append((strxfrm(t[0]), strxfrm(t[1]), t))
  L1.sort()
  L2 = []
  for (dum, dum, t) in L1: L2.append(t)

Now the sorted version of L is in L2.

Believe me, for large lists this is much faster than writing a
comparison function and passing it to string.sort().

> 2. Those '\xxx\yyy\zzz' in the string literals, emitted by 'repr'. It
> would be a good idea not to encode that way those letters, which are
> claimed as letters by the locale.

Good idea.  I've proposed this myself a few times.

> The default locale is easily read from the environment both on Unices
> and on win32. No idea about the rest of the world. Therefore, if the
> locale, read from the environment, is not 'en', one may just apply its
> rules immediately upon interpreter startup (for the two mentioned
> issues, and maybe some others: use strcoll instead of strcmp _everywhere
> internally_, use letter ranges, defined by the locale, etc). Otherwise
> the current Python supprt of locales would really have no practical use,
> even mislead those who believe that support to exist.

I'm not sure why you say this is misleading.

There's a good reason for *not* setting the locale from the
environment by default -- it would screw up programs that aren't
locale-aware.  The program must make the call to locale.setlocale() to
indicate that it is aware of the locale.

--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)



More information about the Python-list mailing list