A few questiosn about encoding

Νικόλαος Κούρας support at superhost.gr
Wed Jun 12 07:23:49 EDT 2013


On 12/6/2013 12:24 μμ, Steven D'Aprano wrote:
> On Wed, 12 Jun 2013 09:09:05 +0000, Νικόλαος Κούρας wrote:
>
>> Isn't 14 bits way to many to store a character ?
>
> No.
>
> There are 1114111 possible characters in Unicode. (And in Japan, they
> sometimes use TRON instead of Unicode, which has even more.)
>
> If you list out all the combinations of 14 bits:
>
> 0000 0000 0000 00
> 0000 0000 0000 01
> 0000 0000 0000 10
> 0000 0000 0000 11
> [...]
> 1111 1111 1111 10
> 1111 1111 1111 11
>
> you will see that there are only 32767 (2**15-1) such values. You can't
> fit 1114111 characters with just 32767 values.
>
>
>
Thanks Steven,
So, how many bytes does UTF-8 stored for codepoints > 127 ?

example for codepoint 256, 1345, 16474 ?



More information about the Python-list mailing list