O(n^2) is bad - can it be fixed?

Helen Dawson helend at accessone.com
Tue May 22 01:38:47 EDT 2001


The following script tests appending to a list and to a string.


def AppendTest(listorstring):
    for i in range(0, 100):
        for x in range(0, 2000):
            listorstring += " "
        print i

AppendTest([])
AppendTest("")


The test with the list runs very swiftly and smoothly - the numbers
are printed out at a steady rate. The test with the string runs
slowly, and the speed that the numbers appear slows down - a lot. In
fact, I have never seen the string test end - I always abort it out
of boredom before it gets to twenty five.

Without even looking in the Python internals it is obvious what is
happening. In the string case, each time a character is appended the
entire string is copied. Therefore adding n characters requires
n^2/2 bytes to be copied, making processing large files (where large
is > about 10K) very inefficient - and processing files larger
than about 100K is virtually impossible.

List does much better. Why? It probably does the same thing that the
STL vector typically does - the allocabe space doubles whenever it
runs out of room, and then the newly allocated space is filled in
as needed.

So, why can't string do that?

When we could only go string = string + " " then fixing this problem
was probably impossible, but with += it seems like it should be quite
possible - maybe even easy!

Right now if I am producing a text file a character at a time I have
to do it to a list and then use string.join() to convert at the end.
That works, but seems clunky.

It's also a landmine for the naive. I recognized the problem
immediately, but many people would simply assume that Python was slow,
when it took an hour to process a one Mbyte file.

I'm using Python 2.1, on Windows.

Bruce Dawson.

P.S. Is this a known issue?





More information about the Python-list mailing list