A few questiosn about encoding
wxjmfauth at gmail.com
wxjmfauth at gmail.com
Sun Jun 23 11:51:41 EDT 2013
Le jeudi 20 juin 2013 19:17:12 UTC+2, MRAB a écrit :
> On 20/06/2013 17:37, Chris Angelico wrote:
>
> > On Fri, Jun 21, 2013 at 2:27 AM, <wxjmfauth at gmail.com> wrote:
>
> >> And all these coding schemes have something in common,
>
> >> they work all with a unique set of code points, more
>
> >> precisely a unique set of encoded code points (not
>
> >> the set of implemented code points (byte)).
>
> >>
>
> >> Just what the flexible string representation is not
>
> >> doing, it artificially devides unicode in subsets and try
>
> >> to handle eache subset differently.
>
> >>
>
> >
>
> >
>
> > UTF-16 divides Unicode into two subsets: BMP characters (encoded using
>
> > one 16-bit unit) and astral characters (encoded using two 16-bit units
>
> > in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
>
> > builds are guilty of exactly the same crime as the hated 3.3.
>
> >
>
> UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4
>
> bytes, and those who previously used ASCII still need only 1 byte per
>
> codepoint!
Sorry, but no, it does not work in that way:
confusion between the set of encoded code points
and the implementation of these called code units.
utf-8: how many bytes to hold an "a" in memory?
one byte.
flexible string representation: how many bytes to
hold an "a" in memory? One byte? No, two.
(Funny, it consumes more memory to hold an ascii char
than ascii itself)
utf-8: In a series of bytes implementing the encoded code
points supposed to hold a string, picking a byte and
finding to which encoded code point it belongs is a no prolem.
flexible string representation: In a series of bytes
implementing the encoded code points supposed to hold a
string, picking a byte and finding to which encoded code
point it belongs is ... impossible !
One of the cause of the bad working of this flexible string
representation.
The basics of any coding scheme, unicode included.
jmf
More information about the Python-list
mailing list