How to waste computer memory?

Marko Rauhamaa marko at pacujo.net
Sat Mar 19 10:56:53 EDT 2016


Steven D'Aprano <steve at pearwood.info>:

> On Sat, 19 Mar 2016 11:42 pm, Marko Rauhamaa wrote:
>> When glorifying Python's advanced Unicode capabilities, are we
>> careful to emphasize the necessity of unicodedata.normalize()
>> everywhere? Should Python normalize strings unconditionally and
>> transparently? What does the O(1) character lookup mean under
>> normalization?
>> 
>> Some weeks ago I had to spend 30 minutes to debug my Python program
>> when a user complained it didn't work. Turns out they had
>> accidentally invoked the program using a space and a composing tilde
>> instead of the ASCII ~. There was no visual indication of a problem
>> on the screen, but the Python program acted up.
>
> We recently had somebody here who wrote capital I by pressing the
> lower case l on the keyboard. Should a pure-ASCII program be able to
> operate without malfunction if the user confuses 0 and O, or I l and
> 1? What about ' and ` or possibly even '' and "?

What I'm talking about is that maybe Python should treat canonically
equivalent strings equivalently, that is, indistinguishably under any
external inspection.

Anyway, Python's Unicode support is great thing, but Unicode is a big
can of worms. Far from being a paradise, it's more of a case of picking
your poison.


Marko



More information about the Python-list mailing list