Unicode normalisation [was Re: [beginner] What's wrong?]

Peter Pearson pkpearson at nowhere.invalid
Fri Apr 8 14:03:24 EDT 2016


On Sat, 9 Apr 2016 03:50:16 +1000, Chris Angelico <rosuav at gmail.com> wrote:
> On Sat, Apr 9, 2016 at 3:44 AM, Marko Rauhamaa <marko at pacujo.net> wrote:
[snip]
>> (As for ligatures, I understand that there might be quite a bit of
>> legacy software that dedicated code points and code pages for ligatures.
>> Translating that legacy software to Unicode was made more
>> straightforward by introducing analogous codepoints to Unicode. Unicode
>> has quite many such codepoints: µ, K, Ω etc.)
>
> More specifically, Unicode solved the problems that *codepages* had
> posed. And one of the principles of its design was that every
> character in every legacy encoding had a direct representation as a
> Unicode codepoint, allowing bidirectional transcoding for
> compatibility. Perhaps if Unicode had existed from the dawn of
> computing, we'd have less characters; but backward compatibility is
> way too important to let a narrow purity argument sway it.

I guess with that historical perspective the current situation
seems almost inevitable.  Thanks.  And thanks to Steven D'Aprano
for other relevant insights.

-- 
To email me, substitute nowhere->runbox, invalid->com.



More information about the Python-list mailing list