slow joinings of strings
Thomas Wouters
thomas at xs4all.net
Tue Jan 30 08:03:27 EST 2001
On Tue, Jan 30, 2001 at 01:34:58PM +0100, Karol Bryd wrote:
> I want to read a file (0.6MB, 10000 lines) into memory, and want to do it as
> fast as possible, this code does it, but is terribly slow
> fp = open(file, 'r')
> s = ''
> while 1:
> line = fp.readline()
> if line == '': break
> s = s + line
> (executing time 25 sec)
> At first I thought that this is caused by readline() and lack of buffering
> but after removing "s = s + line" executing time decreased to 0.7 seconds!
> The question is how to join two strings in a more efficient way?
The usual answer, strangely enough, is to use string.join. Who would have
guessed ? :-)
file = open(filename, "r")
l = []
while 1:
line = file.readline()
if line == "":
break
l.append(line)
s = string.join(l, "") # or "".join(l), in Python 2.0+
The reason it's slow is because 's = s + line' creates a new string object
(and usually deletes the previous two -- decreases their refcount, and if it
hits 0, deallocates them.) By using a list and creating the string at the
end, in a single operation, only one string object is created (though a lot
are deleted.)
Alternatively, the 'readlines' method can also be helpful:
file = open(filename, "r")
l = file.readlines()
# manipulate 'l' here, for instance in a map() or list comprehension
s = string.join(l, "")
Or, of course, the 'read' method:
file = open(filename, "r")
s = file.read()
Provided you don't actually want to do anything with the separate lines :)
--
Thomas Wouters <thomas at xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
More information about the Python-list
mailing list