Speeding up: s += "string"

logistix at cathoderaymission.net logistix at cathoderaymission.net
Tue Apr 15 13:53:28 EDT 2003


Lulu of the Lotus-Eaters <mertz at gnosis.cx> wrote in message news:<mailman.1050393203.4704.python-list at python.org>...

> 
> It seems to me that the behavior CAN be improved.  If strings were to
> use a preallocation strategy--similar to what lists do--MANY string
> appends could avoid a memory copy on the original string.  The intial
> value could stay where it was, and the extra characters could live in
> the already-available contiguous space.  Obviously, the utilized length
> would need to be stored somewhere.  Once in a while, of course, the
> preallocation would fill up; in that case you'd need to do a costly
> strcopy()... but this could be avoided MOST of the time, ISTM.
> 

This is the way the current list optimization works.  It's based on
(generally valid) assumption that re'malloc has three cases:
    You're re'mallocing to a smaller size, shrink allocation
    You're re'mallocing to the same size, do nothing
    YOu're re'mallocing to a bigger size, grab new memory and do a
memcpy()

The current optimization for list adjusts the malloc'ed memory based
on the size current size.  Even for something like a 512Meg list, it
only uses a maximum of 16 Meg for padding.

The real problem is the fact that strings are immutable.  From the
bytecode perspective, I don't think it's possible to reliably assume
that if string has a refcount = x in a '+' operation that it's not a
shared reference (where you couldn't optimize)

I submitted a patch to use the same allocation technique for
arraymodule, and it decreased my test case of 8 million appends from
12 hours to 4-5 seconds.  Needless to say, the patch was accepted and
will be 2.3 ;)

Someone else had a feature request to allow strings to be .appended()
to array.array("c") objects, but there's no patch yet.  If this were
implemented, array.array("c") would pretty much become a
"mutableString" class that'd be easy to use and run at reasonable
speeds.  [I was thinking about writing a patch, and then started
obsessing about what sort of unicode coercions should take place and
my brain froze up]




More information about the Python-list mailing list