Flexible Collating (feedback please)

Fri Oct 20 18:59:26 EDT 2006

Leo Kislov wrote:
> Ron Adam wrote:
>> Leo Kislov wrote:
>>> Ron Adam wrote:
>>>
>>>> locale.setlocale(locale.LC_ALL, '')  # use current locale settings
>>> It's not current locale settings, it's user's locale settings.
>>> Application can actually use something else and you will overwrite
>>> that. You can also affect (unexpectedly to the application)
>>> time.strftime() and C extensions. So you should move this call into the
>>> _test() function and put explanation into the documentation that
>>> application should call locale.setlocale
>> I'll experiment with this a bit, I was under the impression that local.strxfrm
>> needed the locale set for it to work correctly.
> 
> Actually locale.strxfrm and all other functions in locale module work
> as designed: they work in C locale before the first call to
> locale.setlocale. This is by design, call to locale.setlocale should be
> done by an application, not by a 3rd party module like your collation
> module.

Yes, I've come to that conclusion also.  (reserching as I go) ;-)

I put an example of that in the class doc string so it could easily be found.

>> Maybe it would be better to have two (or more) versions?  A string, unicode, and
>> locale version or maybe add an option to __init__ to choose the behavior?
> 
> I don't think it should be two separate versions. Unicode support is
> only a matter of code like this:
> 
> # in the constructor
> self.encoding = locale.getpreferredencoding()
> 
> # class method
> def strxfrm(self, s):
>     if type(s) is unicode:
>         return locale.strxfrm(s.encode(self.encoding,'replace')
>     return locale.strxfrm(s)
> 
> and then instead of locale.strxfrm call self.strxfrm. And similar code
> for locale.atof

Thanks for the example.

>> This was the reason for using locale.strxfrm. It should let it work with unicode
>> strings from what I could figure out from the documents.
>>
>> Am I missing something?
> 
> strxfrm works only with byte strings encoded in the system encoding.
> 
>   -- Leo

Windows has an alternative function, wcxfrm.  (wide character transform)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_crt_strxfrm.2c_.wcsxfrm.asp

But it's not exposed in Python. I could use ctypes to call it, but it would then 
be windows specific and I doubt it would even work as expected.

Maybe a wcsxfrm patch would be good for Python 2.6?  Python 3000 will probably 
need it anyway.

I've made a few additional changes and will start a new thread after some more 
testing to get some additional feedback.

Cheers,
   Ron