How to waste computer memory?

BartC bc at freeuk.com
Sat Mar 19 08:24:33 EDT 2016


On 19/03/2016 11:07, Marko Rauhamaa wrote:
> Chris Angelico <rosuav at gmail.com>:
>
>> On Sat, Mar 19, 2016 at 8:31 PM, Marko Rauhamaa <marko at pacujo.net> wrote:
>>> Unicode made several (understandable but grave) mistakes along the way:
>>>
>>>     * normalization
>>
>> Elaborate please? What's such a big mistake here?
>
> Unicode shouldn't have allowed multiple equivalent variants for a
> string.
>
> Now Python falls victim to:
>
>     >>> '\u006e\u0303' == '\u00f1'
>     False
>
> <URL: https://en.wikipedia.org/wiki/Unicode_equivalence>:
>
>     For example, the code point U+006E (the Latin lowercase "n") followed
>     by U+0303 (the combining tilde "◌̃") is defined by Unicode to be
>     canonically equivalent to the single code point U+00F1 (the lowercase
>     letter "ñ" of the Spanish alphabet). Therefore, those sequences
>     should be displayed in the same manner, should be treated in the same
>     way by applications such as alphabetizing names or searching, and may
>     be substituted for each other.
>


So a string that looks like:

"ññññññññññññññññññññññññññññññññññññññññññññññññññ"

can have 2**50 different representations? And occupy somewhere between 
50 and 200 bytes? Or is that 400?

OK...

-- 
Bartc



More information about the Python-list mailing list