to better observe locales
Guido van Rossum
guido at beopen.com
Sat Jul 29 11:26:55 EDT 2000
"Alexander V. Voinov" <avv at quasar.ipa.nw.ru> writes:
> It seems that Python would better observe locales at almost no cost.
>
> 1. list.sort(). If the sorted list is just a list of strings, it's
> trivial to pass 'strcoll' as an argument. But if the list element is a
> tuple, I have to write my own comparison function which would use
> locale.
There are lots of good reasons why observing locales on string
comparisons with the < operator are a bad idea. (The same reasons
that require you to use strcoll or equivalent in C, C++ or Java.)
But I'll give you a suggestion for how to compare tuples of strings
efficiently.
Say you have a list L of tuples containing (lastname, firstname,
address) records. To sort L, you write:
from locale import strxfrm
L1 = []
for t in L:
L1.append((strxfrm(t[0]), strxfrm(t[1]), t))
L1.sort()
L2 = []
for (dum, dum, t) in L1: L2.append(t)
Now the sorted version of L is in L2.
Believe me, for large lists this is much faster than writing a
comparison function and passing it to string.sort().
> 2. Those '\xxx\yyy\zzz' in the string literals, emitted by 'repr'. It
> would be a good idea not to encode that way those letters, which are
> claimed as letters by the locale.
Good idea. I've proposed this myself a few times.
> The default locale is easily read from the environment both on Unices
> and on win32. No idea about the rest of the world. Therefore, if the
> locale, read from the environment, is not 'en', one may just apply its
> rules immediately upon interpreter startup (for the two mentioned
> issues, and maybe some others: use strcoll instead of strcmp _everywhere
> internally_, use letter ranges, defined by the locale, etc). Otherwise
> the current Python supprt of locales would really have no practical use,
> even mislead those who believe that support to exist.
I'm not sure why you say this is misleading.
There's a good reason for *not* setting the locale from the
environment by default -- it would screw up programs that aren't
locale-aware. The program must make the call to locale.setlocale() to
indicate that it is aware of the locale.
--Guido van Rossum (home page: http://www.pythonlabs.com/~guido/)
More information about the Python-list
mailing list