Sorting strings containing special characters (german 'Umlaute')

DierkErdmann at mail.com DierkErdmann at mail.com
Fri Mar 2 11:49:43 EST 2007


On 2 Mrz., 15:25, Peter Otten <__pete... at web.de> wrote:
> DierkErdm... at mail.com wrote:
> > For sorting the letter "Ä" is supposed to be treated like "Ae",
There are several way of defining the sorting order. The variant "ä
equals ae" follows DINDIN 5007 (according to wikipedia); defining (a
equals ä) complies with DIN 5007-1. Therefore both options are
possible.

> The default locale is not used by default; you have to set it explicitly
>
> >>> import locale
> >>> locale.strcoll("Ärger", "Beere")
> 1
> >>> locale.setlocale(locale.LC_ALL, "")
> 'de_DE.UTF-8'
> >>> locale.strcoll("Ärger", "Beere")
>
> -1

On my machine
>>> locale.setlocale(locale.LC_ALL, "")
gives
        'German_Germany.1252'

But this does not affect the sorting order as it does on your
computer.
>>> locale.strcoll("Ärger", "Beere")
yields 1 in both cases.

Thank you for your hint using unicode from the beginning on, see the
difference:
>>> s1 = unicode("Ärger", "latin-1")
>>> s2 = unicode("Beere", "latin-1")
>>> locale.strcoll(s1, s2)
1
>>> locale.setlocale(locale.LC_ALL, "")
-1

compared to

>>> s1 = "Ärger"
>>> s2 = "Beere"
>>> locale.strcoll(s1, s2)
1
>>> locale.setlocale(locale.LC_ALL, "")
'German_Germany.1252'
>>> locale.strcoll(s1, s2)
1

Thanks for your help.

  Dierk




>
> ['Ara', '\xc3\x84rger', 'Ast']
>
> Peter
>
> (*) German for "trouble"





More information about the Python-list mailing list