idiom: concatenate strings with separator

Thu May 3 08:02:29 EDT 2001

"Harald Kirsch" <kirschh at lionbioscience.com> wrote in message
news:yv2r8y6bttr.fsf at lionsp093.lion-ag.de...
>
> Recently I started using code like this
>
> l = []
> for x in somethings:
>   y = massage(x)
>   l.append(y)
>
> return string.join(y, my_favorite_separator)

You presumably mean l, not y, here?

> to produce strings made of substrings separated by something. Note
> that with immediate string concatenation its hard to avoid the
> separator either in front or at the end of the produced string.

Yep, it's a little recurring inconvenience in languages that
lack such a 'join' -- the semantics of join are often desired,
and not _trivial_ (though not horribly hard either) to specify
by hand in a loop.

Note, by the way, that in Python 2 you might also write

    return my_favorite_separator.join(map(massage, somethings))

or

    return my_favorite_separator.join([massage(x) for x in somethings])

There is nothing wrong with your explicit loop, but the map
and list-comprehension forms may be handy, too.

> Nevertheless I wonder if something like
>
> s = ""
> sep = ""
> for x in somethings:
>   y = massage(x)
>   s = s + sep + y
>   sep = my_favorite_separator
>
> return s
> is faster or uses less memory than the previous solution.

It's likely to be slower and use more memory overall, because
larger and larger string objects are being allocated (and
bound to s).  Sometimes measurement is easier than wondering
about things...:

import time, string

somethings = open("ian2.txt").readlines()
my_favorite_separator = '!'

def massage(x):
    return x[:-1]

def f1():
    l = []
    for x in somethings:
      y = massage(x)
      l.append(y)

    return string.join(l, my_favorite_separator)

def f2():
    return my_favorite_separator.join(map(massage,somethings))

def f3():
    s = ""
    sep = ""
    for x in somethings:
      y = massage(x)
      s = s + sep + y
      sep = my_favorite_separator

    return s

print "Treating",len(somethings),"lines"

start = time.clock()
xx = f1()
stend = time.clock()
print "f1: %4.2f" % (stend-start)

start = time.clock()
yy = f2()
stend = time.clock()
print "f2: %4.2f" % (stend-start)

start = time.clock()
zz = f3()
stend = time.clock()
print "f3: %4.2f" % (stend-start)

assert xx==yy==zz

print "total length is", len(zz)

Running this on a smallish file gives us:

D:\Python21>dir *.txt
 Volume in drive D has no label.
 Volume Serial Number is 0498-B4C2

 Directory of D:\Python21

04/13/01  04:02p                12,724 LICENSE.txt
04/16/01  06:21p                73,863 NEWS.txt
04/15/01  10:07p                39,442 README.txt
               3 File(s)        126,029 bytes
                            265,756,672 bytes free

D:\Python21>python lolo.py NEWS.txt
Treating 1823 lines
f1: 0.03
f2: 0.01
f3: 0.96
total length is 72039

D:\Python21>

on my old box (NT4, PentiumII-300).

Note that the map version is a few times faster than
the explicit loop (2 or 3 times -- we'd need a lot
more data to time this accurately!-), while the one
that eschews join is about 30 times slower in this
run (it should get progressively worse as data gets
larger and larger).

So, if you do have any special interest in getting
better performance: switching to map, where feasible,
rather than an explicit loop, MAY be worth it, as it
may be able to reduce your overhead by a factor of
2 or 3; using join IS definitely worthwhile, as it
speeds things up by dozen of times even on rather
small cases (and by more on larger ones, I believe).

Alex