strxfrm works with unicode string ?

Gerald Klix Gerald.Klix at klix.ch
Fri Jun 17 08:10:27 EDT 2005


Sali Nicolas :)),
please see below for my answers.

nicolas.riesch at genevoise.ch schrieb:
> Gruëzi, Gerald ;-)
> 
> Well, ok, but I don't understand why I should first convert a pure
> unicode string into a byte string.
> The encoding ( here, latin-1) seems an arbitrary choice.
Well "latin-1" is only encoding, about which I know that it works on
my xterm and which I can type without spelling errors :)
> 
> Your solution works, but is it a workaround or the real way to use
> strxfrm ?
> It seems a little artificial to me, but perhaps I haven't understood
> something ...
In Python 2.3.4 I had some strange encounters with the locale module,
In the end I considered it broken, at least when it came to currency 
formating.
> 
> Does this mean that you cannot pass a unicode string to strxfrm ?
This works here for my home-grown python 2.4 on Jurrasic Debian Woody:

import locale
s=u'\u00e9'
print s

print locale.setlocale(locale.LC_ALL, '')
print repr( locale.strxfrm( s.encode( "latin-1" ) ) )
print repr( locale.strxfrm( s.encode( "utf-8" ) ) )

The output is rather strange:

é
de_DE
"\x10\x01\x05\x01\x02\x01'@/locale"
"\x0c\x01\x0c\x01\x04\x01'@/locale"

Another (not so) weird thing happens when I unset LANG.

bear at special:~ > unset LANG
bear at special:~ > python2.4 ttt.py
Traceback (most recent call last):
   File "ttt.py", line 3, in ?
     print s
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in 
position 0: ordinal not in range(128)

Acually it's more weird, that printing works with LANG=de_DE.

Back to your question. A quick glance at the C-sources of the
_localemodule.c reveals:

     if (!PyArg_ParseTuple(args, "s:strxfrm", &s))

So yes, strxfrm does not accept unicode!

I am inclined to consider this a bug.
A least it is not consistent with strcoll.
Strcoll accepts either 2 strings or 2 unicode strings,
at least when HAVE_WCSCOLL was defined when python
was compiled on your plattform.

BTW: Which platform do you use?

HTH,
Gerald

PS: If you have access to irc, you can also ask at 
irc://irc.freenode.net#python.de.



-- 
GPG-Key: http://keyserver.veridis.com:11371/search?q=0xA140D634




More information about the Python-list mailing list