Why doesn't join() call str() on its arguments?

Andy Dustman farcepest at gmail.com
Thu Feb 17 19:20:01 EST 2005


I did some timings of ''.join( <list comprehension> ) vs. ''.join(
<generator expression> ) and found that generator expressions were
slightly slower, so I looked at the source code to find out why. It
turns out that the very first thing string_join(self, orig) does is:

        seq = PySequence_Fast(orig, "");

thus iterating over your generator expression and creating a list,
making it less efficient than passing a list in the first place via a
list comprehension.

The reason it does this is exactly why you said: It iterates over the
sequence and gets the sum of the lengths, adds the length of n-1
separators, and then allocates a string this size. Then it iterates
over the list again to build up the string.

For generators, you'd have to make a trial allocation and start
appending stuff as you go, periodically resizing. This *might* end up
being more efficient in the case of generators, but the only way to
know for sure is to write the code and benchmark it.

I will be at PyCon 2005 during the sprint days, so maybe I'll write it
then if someone doesn't beat me to it.  I don't think it'll be all that
hard. It might be best done as an iterjoin() method, analogous to
iteritems(), or maybe xjoin() (like xrange(), xreadlines()).

Incidentally, I was inspired to do the testing in the first place from
this:

http://www.skymind.com/~ocrow/python_string/

Those tests were done with Python-2.3. With 2.4, naive appending (i.e.
doing s1 += s2 in a loop) is about 13-15% slower than a list
comprehension, but uses much less memory (for large loops); and a
generator expression is about 7% slower and uses slightly *more* memory.




More information about the Python-list mailing list