Proposal: require 7-bit source str's

Hallvard B Furuseth h.b.furuseth at usit.uio.no
Sun Aug 22 17:26:42 EDT 2004


Martin v. Löwis wrote:
>Hallvard B Furuseth wrote:
>>>> For example, if one uses character set ns_4551-1 - ASCII with {|}[\]
>>>> replaced with æøåÆØÅ, sorting by simple byte ordering will sort text
>>>> correctly.  Unicode text _can't_ be sorted correctly, because of
>>>> characters like 'ö': Swedish 'ö' should match Norwegian 'ø' and sort
>>>> with that, while German 'ö' should not match 'ø' and sorts with 'o'.
>>>
>>> Why not sort depending on the locale instead of ordinal values of the
>>> bytes/characters?
>> 
>> I'm in Norway.  Both Swedes and Germans are foreigners.
> 
> I agree with many things you said, but this example is bogus. If I
> (as a German) use ns_4551-1, sorting is simple - and incorrect, because,
> as you say, ö sorts with o in my language - yet the simple sorting of
> ns_4551-1 doesn't. So sorting is *not* simple with ns_4551-1.

Sorry, I seem to a left out a vital point here: I thought the correct -
or rather, least incorrect - ns_4551-1 character for German ö was o, not
ø.  Then it works out.  Oh well, one learns something every day.  Time
to check if there are other examples, or if I can forget it...  Gotta
try an easy one - would you also translate German ä to æ rather than a?

> Likewise, sorting *is* possible with Unicode if you take the locale
> into account. The order of character doesn't have to be the numerical
> one, and, as you explain, it might even depend on the locale. So if
> you want a Swedish collaction, use a Swedish locale; if you want a
> German collation, use a German locale.

And if I want to get both right, I need a sort_name field which is
distinct from the display_name field.  There you would be lowis, while
the Swede Törnquist would be tørnquist.  Or maybe lowis\tlöwis or
something; a kind of private implementation of strxfrm().

-- 
Hallvard



More information about the Python-list mailing list