Proposal: require 7-bit source str's
Hallvard B Furuseth
h.b.furuseth at usit.uio.no
Sun Aug 22 17:26:42 EDT 2004
Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>>>> For example, if one uses character set ns_4551-1 - ASCII with {|}[\]
>>>> replaced with æøåÆØÅ, sorting by simple byte ordering will sort text
>>>> correctly. Unicode text _can't_ be sorted correctly, because of
>>>> characters like 'ö': Swedish 'ö' should match Norwegian 'ø' and sort
>>>> with that, while German 'ö' should not match 'ø' and sorts with 'o'.
>>>
>>> Why not sort depending on the locale instead of ordinal values of the
>>> bytes/characters?
>>
>> I'm in Norway. Both Swedes and Germans are foreigners.
>
> I agree with many things you said, but this example is bogus. If I
> (as a German) use ns_4551-1, sorting is simple - and incorrect, because,
> as you say, ö sorts with o in my language - yet the simple sorting of
> ns_4551-1 doesn't. So sorting is *not* simple with ns_4551-1.
Sorry, I seem to a left out a vital point here: I thought the correct -
or rather, least incorrect - ns_4551-1 character for German ö was o, not
ø. Then it works out. Oh well, one learns something every day. Time
to check if there are other examples, or if I can forget it... Gotta
try an easy one - would you also translate German ä to æ rather than a?
> Likewise, sorting *is* possible with Unicode if you take the locale
> into account. The order of character doesn't have to be the numerical
> one, and, as you explain, it might even depend on the locale. So if
> you want a Swedish collaction, use a Swedish locale; if you want a
> German collation, use a German locale.
And if I want to get both right, I need a sort_name field which is
distinct from the display_name field. There you would be lowis, while
the Swede Törnquist would be tørnquist. Or maybe lowis\tlöwis or
something; a kind of private implementation of strxfrm().
--
Hallvard
More information about the Python-list
mailing list