O(n^2) is bad - can it be fixed?

Delaney, Timothy tdelaney at avaya.com
Tue May 22 02:12:51 EDT 2001


> def AppendTest(listorstring):
>     for i in range(0, 100):
>         for x in range(0, 2000):
>             listorstring += " "
>         print i
> 
> AppendTest([])
> AppendTest("")
> 
> The test with the list runs very swiftly and smoothly - the numbers
> are printed out at a steady rate. The test with the string runs
> slowly, and the speed that the numbers appear slows down - a lot. In
> fact, I have never seen the string test end - I always abort it out
> of boredom before it gets to twenty five.
> 
> Without even looking in the Python internals it is obvious what is
> happening. In the string case, each time a character is appended the
> entire string is copied. Therefore adding n characters requires
> n^2/2 bytes to be copied, making processing large files (where large
> is > about 10K) very inefficient - and processing files larger
> than about 100K is virtually impossible.

[snip]

> When we could only go string = string + " " then fixing this problem
> was probably impossible, but with += it seems like it should be quite
> possible - maybe even easy!
> 
> Right now if I am producing a text file a character at a time I have
> to do it to a list and then use string.join() to convert at the end.
> That works, but seems clunky.

Strings are immutable. When you append to a string, it creates a new string
(as you correctly surmised).

Having immuatble strings means that instances can be shared. The string
doesn't change behind your back. This is a Good Thing (TM). If you have used
Java it has the same thing: strings are immutable. If you want an immutable
string, you have to use a StringBuffer.

There are a number of pythonic ways of doing what you want. One is the one
you discovered: concatenating into a list, then joining.

Another method would be to use the array built-in module. This will probably
save memory over the list version, and is almost the equivalent of the Java
StringBuffer, except it's so much more ;) Testing both of these though,
using a list was *much* faster than using an array.

Perhaps you need to look at what you really want to do. As I understand it,
you wish to:

1. Build a string
2. Write the string to a file.

Is there perhaps a better way to do what you want?

Or possibly you wish to read a large file into memory. Is it text? In that
case, there are several standard ways to do this in python. The main methods
you want to look for are:

	readlines()
	xreadlines()
	readline()

In addition, your basic function has a major problem, aside from the test.
When you pass in a list, you are modifying that list, and you will be able
to use the changed contents outside of the function. But when you pass in a
string, you're not modifying the string - you're creating a new one each
time. Unless you pass the generated string as a return value, you won't be
able to use it outside of the function. All you are doing is rebinding the
local reference to the string.

The difference between mutable and immutable objects is important in any
language. Python is one where it is vital.

Tim Delaney




More information about the Python-list mailing list