Unicode utf-8 doesn't do back-and-forth?

Martin v. Loewis martin at v.loewis.de
Tue Jul 9 03:08:58 EDT 2002


sjmachin at lexicon.net (John Machin) writes:

> 4 more bits? It needs 21 bits to encode the 2**20 possible
> surrogate-described characters plus the basic 64K characters.
> assert 21 - 16 == 5

Not really. This makes a total of 2**20+2**16 = 1114112
characters. Now, math.log(1114112)/math.log(2) is 20.087462841250343,
so it is rather 4.09 additional bits.

Regards,
Martin




More information about the Python-list mailing list