Pyhon 2.x or 3.x, which is faster?

Steven D'Aprano steve at pearwood.info
Sat Mar 12 00:59:53 EST 2016


On Sat, 12 Mar 2016 12:10 pm, Dennis Lee Bieber wrote:

> On Fri, 11 Mar 2016 22:24:45 +0000, BartC <bc at freeuk.com> declaimed the
> following:
> 
>>On 11/03/2016 21:59, Mark Lawrence wrote:
>>> On 11/03/2016 18:57, BartC wrote:
>>
>>> def test():
>>>      s=""
>>>      for i in range(10000000):
>>>          s+="*"
>>>      print (len(s))
>>>
>>> test()
>>
>>> The minor snag that you might like to correct with your microbenchmark,
>>> which any experienced Python programmer knows, is that you *NEVER, EVER*
>>> create strings like this.
>>
>>Why not? Chris said his version runs much faster (even allowing for
>>different machines), and might have a special optimisation for it.
>>
> 
> Everytime that += is executed, Python has to allocate a new chunk of
> memory equal (minimum -- it may chunk larger) to the combined length of
> "s" and "*", COPY the bytes/characters from "s" into the new memory chunk,
> copy the "*" into the end of the new memory chunk, bind the name "s" to
> the new chunk, and likely deallocate the old memory chunk that used to be
> bound to "s".

Sometimes.

What you say is true in Jython and IronPython. And it was always true in
CPython until version 2.3. And it is sometimes true in CPython even today.

But since Python 2.3, CPython is smart enough to recognise when a string has
only one reference to it, and, *if possible*, resize it in place. This
optimization seems to work well under Linux, but unfortunately due to the
vagaries of how memory is managed by the OS, it seems to often fail under
Windows, so Python will fall back to the slow copy-and-append
implementation.



[...]
> The common recommendation is to accumulate the substrings into a LIST
> structure (while the list will undergo resizing, that is done at a fairly
> well-understood rate, and only requires copying of references to the
> substrings, not the strings themselves). At the end, one invokes .join()
> on the list. .join() can scan the list elements summing the lengths of the
> substrings, allocate ONCE the memory to hold the result, and only then
> copy the substrings into the result string.

This remains the best advice for assembling strings from a large number of
small pieces.




-- 
Steven




More information about the Python-list mailing list