String performance regression from python 3.2 to 3.3

Terry Reedy tjreedy at udel.edu
Wed Mar 13 22:35:44 EDT 2013


On 3/13/2013 7:43 PM, Chris Angelico wrote:
> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody at gmail.com> wrote:
>
>> This assumes that there are only three choices:
>> - narrow build that is buggy (surrogate pairs for astral characters)
>> - wide build that is 4-fold space inefficient for wide variety of
>> common (ASCII) use-cases
>> - flexible string engine that chooses a small tradeoff of space
>> efficiency over time efficiency.

Wrong. Python almost certainly runs faster with the new string 
representation. This has been explained previously more than once.

>> There is a fourth choice: narrow build that chooses to be partial over
>> being buggy. ie when an astral character is encountered, an exception
>> is thrown rather than trying to fudge it into a 16-bit
>> representation.

This is what tcl/tk does, and it is a dammed nuisance. Completely 
unacceptible for Python's string type.
...
> It's complexity cost, though, and people would need to know when it
> would be worth giving Python that switch to change its string format.
> Plus, every C extension would need to cope with both formats. I
> personally doubt it'd be worth it, but if you want to knock together a
> patched CPython and get some timing stats, I'm sure this list or
> python-dev will be happy to discuss the matter. :)

I presume the smiley indicates that you know that python developers are 
too busy with real problems to have any interest in bogus solutions to 
bogus problems.

-- 
Terry Jan Reedy




More information about the Python-list mailing list