flaming vs accuracy [was Re: Performance of int/long in Python 3]

Thu Mar 28 11:14:43 EDT 2013

On 28 mar, 15:38, Chris Angelico <ros... at gmail.com> wrote:
> On Fri, Mar 29, 2013 at 1:12 AM, jmfauth <wxjmfa... at gmail.com> wrote:
> > This flexible string representation is so absurd that not only
> > "it" does not know you can not write Western European Languages
> > with latin-1, "it" penalizes you by just attempting to optimize
> > latin-1. Shown in my multiple examples.
>
> PEP393 strings have two optimizations, or kinda three:
>
> 1a) ASCII-only strings
> 1b) Latin1-only strings
> 2) BMP-only strings
> 3) Everything else
>
> Options 1a and 1b are almost identical - I'm not sure what the detail
> is, but there's something flagging those strings that fit inside seven
> bits. (Something to do with optimizing encodings later?) Both are
> optimized down to a single byte per character.
>
> Option 2 is optimized to two bytes per character.
>
> Option 3 is stored in UTF-32.
>
> Once again, jmf, you are forgetting that option 2 is a safe and
> bug-free optimization.
>
> ChrisA

As long as you are attempting to devide a set of characters in
chunks and try to handle them seperately, it will never work.

Read my previous post about the unicode transformation format.
I know what pep393 does.

jmf