list to string

David Bolen db3l at fitlinxx.com
Tue Feb 27 21:00:16 EST 2001


"Delaney, Timothy" <tdelaney at avaya.com> writes:

> *Never* have a loop like (for example):
> 
> 	text = ''
> 
> 	for line in f.xreadlines()
> 		text += line
> 
> instead do
> 
> 	text = []
> 
> 	for line in f.xreadlines()
> 		text.append(line)
> 
> 	text = string.join(text, '\n')
> 
> It will be much faster and use much less memory. (...)

True it will likely be faster, but I'm not sure I agree on the memory
point.  It's interesting that you use xreadlines to avoid reading the
whole file into memory, but then you go and stick it into a list that
has to be kept in memory until it can be joined, at which point you
use twice as much memory to build up the final string.  Why not just
say:

    text = f.readlines()
    text = string.join(text)

(or as you point out later, "text = f.read()" :-))

it would do the same thing, use the same memory space, and probably
run even faster since it can suck the whole file in as quickly as
possible.

The former case thrashes memory more (although Python probably reuses
the same space a lot), but probably uses somewhat less memory, since
in the end, the final copy builds the full image from the file
contents minus the final line.  Also, the file is in memory as a
single string, and not a list with the per-line string object overhead
and list reference.  If you've got lots of lines in the file, that
per-object overhead could add up to quite a bit.

Which all sort of goes to say that I wouldn't use the term "*Never*"
as you do in the first case.  It's possible that when first developing
the algorithm, that the first case makes sense and is the clearest
implementation.  I think that reading that code makes it very clear
what is happening without having to look ahead some lines (e.g., the
reader doesn't have to ask why the lines are being stored in a list).

Of course, once you start unit testing and fleshing out the algorithm
you may find this code to be a bottleneck and choose to optimize it in
various ways, but that needn't be the first inclination, nor perhaps
even done at all if the files involved or the overall process makes
this a non-performance critical piece of code.

Performance concerns are reasonable in any piece of code - I just
wouldn't go so far as to declare one algorithm (that aside from
implementation performance issues is perfectly clear) never worth
doing.

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list