String performance regression from python 3.2 to 3.3

Wed Mar 13 12:49:46 EDT 2013

On Mar 13, 3:59 pm, Chris Angelico <ros... at gmail.com> wrote:
> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm... at gmail.com> wrote:
> > Uhhh..
> > Making the subject line useful for all readers
>
> I should have read this one before replying in the other thread.
>
> jmf, I'd like to see evidence that there has been a performance
> regression compared against a wide build of Python 3.2. You still have
> never answered this fundamental, that the narrow builds of Python are
> *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you
> me, the utterly unnecessary hassles I have had to deal with when
> permitting user-provided .js code to script my engine have wasted
> rather more dev hours than you would believe - there are rather a lot
> of stupid edge cases to deal with.

This assumes that there are only three choices:
- narrow build that is buggy (surrogate pairs for astral characters)
- wide build that is 4-fold space inefficient for wide variety of
common (ASCII) use-cases
- flexible string engine that chooses a small tradeoff of space
efficiency over time efficiency.

There is a fourth choice: narrow build that chooses to be partial over
being buggy. ie when an astral character is encountered, an exception
is thrown rather than trying to fudge it into a 16-bit
representation.

I am hardly a unicode expert, my impression is this: While in today's
internationalized world, going back to ASCII is not an option, most
actual uses of unicode stay within the BMP

Further if the choice is not between two python executables but
between string-engines chosen at startup by command-line switches or
equivalent, the price may be quite small.