Is Unicode support so hard...

rusi rustompmody at gmail.com
Sat Apr 20 21:37:00 EDT 2013


On Apr 21, 4:03 am, Neil Hodgson <nhodg... at iinet.net.au> wrote:
>     Hi jmf,
>
> > This gives me plenty of ideas to test the "flexible string
> > representation" (FSR). I should recognize this FSR is failing
> > particulary very well...
>
>     This is too vague for me.
>
>     Which string representation should Python use?
> 1) UTF-32
> 2) UTF-8
> 3) Python 3.3 -- 1, 2, or 4 bytes per character decided at runtime
> 4) Python 3.2 -- 2 or 4 bytes per character decided at Python build time
> 5) Something else

jmf recommends UTF-8.

Apart from the fact the UTF-8 would be less (time) performant in all
cases and more extremely so in cases like indexing, the fact that jmf
says so makes it more ridiculous.
According to jmf python sucks up to ASCII (those big bad Americans… of
whom Steven is the first…) whereas unicode is the true international/
universal standard.

I guess the irony is clear to all (except jmf) given that:
- its unicode that sucks up to ASCII by carefully conforming in the
first 127 positions including the completely useless control chars;
python just implements the standard
- UTF-8 is an ASCII-biased unicode-compression method viz UTF-8 is
most space-efficient on ASCII at the cost of being generally time-
inefficient
- All jmf's beefs (as far as I remember) are variations on the theme:
"time-inefficiency is equivalent to non-unicode-compliant"

In short he manifests a dog-in-the-manger mindset:
"Since the whole world will never speak french (grief, mope, grumble,
thrash…) everyone should pay for the Chinese character set's size even
if they are monolingually English"

All that said…

I believe that the recent correction in unicode performance followed
jmf's grumbles
(Mark please correct me if I am wrong)
So python community can be thankful to jmf even if he insists on
laboring under bizarre political hallucinations.

[Written from India where a monolingual person is as rare as a
palmtree on a polecap]



More information about the Python-list mailing list