list to string

Tue Feb 27 21:35:57 EST 2001

> > 	text = ''
> > 
> > 	for line in f.xreadlines()
> > 		text += line

> True it will likely be faster, but I'm not sure I agree on the memory
> point.  It's interesting that you use xreadlines to avoid reading the
> whole file into memory, but then you go and stick it into a list that
> has to be kept in memory until it can be joined, at which point you
> use twice as much memory to build up the final string.  Why not just
> say:
> 
>     text = f.readlines()
>     text = string.join(text)
> 
> (or as you point out later, "text = f.read()" :-))

Because the whole point was to concentrate on the time and memory used by
the string concatentations. We were discussing the string concatenations vs
string.join(). I used "reading a file line by line) as an example (as I
stated) simply because it is a very common idiom which demonstrates the
exact semantics were were discussing.

> it would do the same thing, use the same memory space, and probably
> run even faster since it can suck the whole file in as quickly as
> possible.

I agree 100%. Which is why I said that these were *better* for the specific
case of my example, but they don't apply to the discussion in question.

> The former case thrashes memory more (although Python probably reuses
> the same space a lot), but probably uses somewhat less memory, since
> in the end, the final copy builds the full image from the file
> contents minus the final line.  Also, the file is in memory as a
> single string, and not a list with the per-line string object overhead
> and list reference.  If you've got lots of lines in the file, that
> per-object overhead could add up to quite a bit.

This is perfectly valid. Perhaps the best way in this *particular* case
(without using read()) would be

	text = string.join(f.xreadlines(), '')

However, once again this is not the discussion in question. Glen was asking
about concatenating the (string) elements of a list which he had in memory.
I was extending this to larger sequences, and the simplest way of getting a
large sequence is by grabbing the lines from a file.

> Which all sort of goes to say that I wouldn't use the term "*Never*"
> as you do in the first case.  It's possible that when first developing
> the algorithm, that the first case makes sense and is the clearest
> implementation.  I think that reading that code makes it very clear
> what is happening without having to look ahead some lines (e.g., the
> reader doesn't have to ask why the lines are being stored in a list).

Okay - perhaps I should have qualified "never" ;) *Never* use that algorithm
in final code in Python. Once you know the core Python libraries, never use
it at all as your first thought when confronted with this situation should
be to use the Python idiom of string.join(). Better?

> Of course, once you start unit testing and fleshing out the algorithm
> you may find this code to be a bottleneck and choose to optimize it in
> various ways, but that needn't be the first inclination, nor perhaps
> even done at all if the files involved or the overall process makes
> this a non-performance critical piece of code.

And once it becomes second nature to use string.join() you can eliminate
needing to consider optimising this part of the code (unless it is such a
bottleneck that you think you can make it faster by rewriting it as a C
extension, but that's another story entirely competely unrelated to the
original discussion).

> Performance concerns are reasonable in any piece of code - I just
> wouldn't go so far as to declare one algorithm (that aside from
> implementation performance issues is perfectly clear) never worth
> doing.

Personally, I find the string.join() idiom much clearer and more readable
than a loop concatenating the elements of a string.

Of course, all this becomes moot as soon as you have a sequence which
contains elements other than strings ;)

Tim Delaney
Avaya Australia