String performance regression from python 3.2 to 3.3

Thu Mar 14 00:05:11 EDT 2013

On Thu, 14 Mar 2013 02:01:35 +0000, MRAB wrote:

> On 14/03/2013 00:55, Chris Angelico wrote:
>> On Thu, Mar 14, 2013 at 11:52 AM, MRAB <python at mrabarnett.plus.com>
>> wrote:
>>> On 13/03/2013 23:43, Chris Angelico wrote:
>>>>
>>>> On Thu, Mar 14, 2013 at 3:49 AM, rusi <rustompmody at gmail.com> wrote:
>>>>>
>>>>> On Mar 13, 3:59 pm, Chris Angelico <ros... at gmail.com> wrote:
>>>>>>
>>>>>> On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm... at gmail.com>
>>>>>> wrote:
>>>>>> > Uhhh..
>>>>>> > Making the subject line useful for all readers
>>>>>>
>>>>>> I should have read this one before replying in the other thread.
>>>>>>
>>>>>> jmf, I'd like to see evidence that there has been a performance
>>>>>> regression compared against a wide build of Python 3.2. You still
>>>>>> have never answered this fundamental, that the narrow builds of
>>>>>> Python are *BUGGY* in the same way that JavaScript/ECMAScript is.
>>>>>> And believe you me, the utterly unnecessary hassles I have had to
>>>>>> deal with when permitting user-provided .js code to script my
>>>>>> engine have wasted rather more dev hours than you would believe -
>>>>>> there are rather a lot of stupid edge cases to deal with.
>>>>>
>>>>>
>>>>> This assumes that there are only three choices: - narrow build that
>>>>> is buggy (surrogate pairs for astral characters) - wide build that
>>>>> is 4-fold space inefficient for wide variety of common (ASCII)
>>>>> use-cases
>>>>> - flexible string engine that chooses a small tradeoff of space
>>>>> efficiency over time efficiency.
>>>>>
>>>>> There is a fourth choice: narrow build that chooses to be partial
>>>>> over being buggy. ie when an astral character is encountered, an
>>>>> exception is thrown rather than trying to fudge it into a 16-bit
>>>>> representation.
>>>>
>>>>
>>>> As a simple factual matter, narrow builds of Python 3.2 don't do
>>>> that. So it doesn't factor into my original statement. But if you're
>>>> talking about a proposal for 3.4, then sure, that's a theoretical
>>>> possibility. It wouldn't be "buggy" in the sense of "string
>>>> indexing/slicing unexpectedly does the wrong thing", but it would
>>>> still be incomplete Unicode support, and I don't think people would
>>>> appreciate it. Much better to have graceful degradation: if there are
>>>> non-BMP characters in the string, then instead of throwing an
>>>> exception, it just makes the string wider.
>>>>
>>> [snip]
>>> Do you mean that instead of switching between 1/2/4 bytes per
>>> codepoint it would switch between 2/4 bytes per codepoint?
>>
>> That's my point. We already have the better version. :)
>>
> If a later version of Python switched between 2/4 bytes per codepoint,
> how much difference would it make in terms of memory and speed compared
> to Python 3.2 (fixed width) and Python 3.3 (3 widths)?
> 
> The vast majority of the time, 2 bytes per codepoint is sufficient, but
> would that result in less switching between widths and therefore higher
> performance, or would the use of more memory (2 bytes when 1 byte would
> do) offset that?
> 
> (And I'm talking about significant differences here.)

That depends on how you use the strings. Because strings are immutable, 
there isn't really anything like "switching between widths" -- the width 
is set when the string is created, and then remains fixed.

It is true that when you create a string, Python sometimes has to do some 
work to determine what width it needs, but that's effectively a fixed-
cost per string. It's relatively trivial compared to the cost of other 
string operations, but it is a real cost. If all you do is create the 
strings then throw them away, as JMF tends to do in his benchmarks, you 
repeatedly pay the cost without seeing the benefit.

On the other hand, Python is *full* of large numbers of ASCII strings, 
and many users use lots of Latin1 strings. Both of these save significant 
amounts of memory: almost 50% of what they would otherwise use in a 
narrow build, and almost 75% in a wide build.

This memory saving has real consequences, performance-wise. Python's 
memory management can be more efficient, since objects in the heap are 
smaller. I'm not sure if objects ever move in the heap (I think Java's 
memory manager does move objects around, hence Jython will do so, but I'm 
not sure about CPython), but even if they don't, its obviously faster to 
allocate a certain sized block of memory the more free memory you have, 
and you'll have more free memory if any pre-existing objects in the heap 
are smaller.

I expect that traversing a block of memory byte-by-byte may be faster 
than traversing it 2x or 4x bytes at a time. My testing suggests that 
iterating over a 1-byte width string is about three times faster than 
iterating over a 2-byte or 4-byte wide string. But that may depend on 
your OS and hardware.

Finally, there may be CPU effects, to do with how quickly strings can be 
passed through the CPU pipelines, whether data is found in the CPU cache 
or not, etc. Obviously this too will depend on the size of the strings. 
You can squeeze 1K of data through the CPU faster than 4K of data.

In practice, how much of an effect will this have? It's hard to say 
without testing, but indications with real-world applications indicate 
that Python 3.3 not only saves significant memory over 3.2 narrow builds, 
but for real-world code, it can often be a little faster as well.

-- 
Steven