slow joinings of strings

Tue Jan 30 08:03:27 EST 2001

On Tue, Jan 30, 2001 at 01:34:58PM +0100, Karol Bryd wrote:

> I want to read a file (0.6MB, 10000 lines) into memory, and want to do it as
> fast as possible, this code does it, but is terribly slow

> fp = open(file, 'r')
> s = ''
> while 1:
>         line = fp.readline()
>         if line == '': break
>         s = s + line

> (executing time 25 sec)

> At first I thought that this is caused by readline() and lack of buffering
> but after removing "s = s + line" executing time decreased to 0.7 seconds!
> The question is how to join two strings in a more efficient way?

The usual answer, strangely enough, is to use string.join. Who would have
guessed ? :-)

file = open(filename, "r")
l = []
while 1:
    line = file.readline()
    if line == "":
        break
    l.append(line)
s = string.join(l, "") # or "".join(l), in Python 2.0+

The reason it's slow is because 's = s + line' creates a new string object
(and usually deletes the previous two -- decreases their refcount, and if it
hits 0, deallocates them.) By using a list and creating the string at the
end, in a single operation, only one string object is created (though a lot
are deleted.)

Alternatively, the 'readlines' method can also be helpful:

file = open(filename, "r")
l = file.readlines()
# manipulate 'l' here, for instance in a map() or list comprehension
s = string.join(l, "")

Or, of course, the 'read' method:

file = open(filename, "r")
s = file.read()

Provided you don't actually want to do anything with the separate lines :)

-- 
Thomas Wouters <thomas at xs4all.net>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!