How to waste computer memory?

Marko Rauhamaa marko at pacujo.net
Sun Mar 20 11:36:09 EDT 2016


Ben Bacarisse <ben.usenet at bsb.me.uk>:

> It's 21. The reason being (or at least part of the reason being) that
> 21 bits can be UTF-8 encoded in 4 bytes: 11110xxx 10xxxxxx 10xxxxxx
> 10xxxxxx (3 + 3*6).

I bet the reason is UTF-16. Microsoft and Sun/Oracle would have insisted
on a maximum of 4 bytes per character. UTF-16 can just barely squeeze 21
bits into the scheme and only at the expense of creating an ugly hole
inside Unicode. Politics, politics.


Marko



More information about the Python-list mailing list