merge list of tuples with list

Chris Torek nospam at torek.net
Wed Oct 20 03:51:59 EDT 2010


>On Wed, Oct 20, 2010 at 1:33 PM, Daniel Wagner
><brocki2301 at googlemail.com> wrote:
>> Any more efficient ways or suggestions are still welcome!

In article <mailman.58.1287547882.2218.python-list at python.org>
James Mills  <prologic at shortcircuit.net.au> wrote:
>Did you not see Paul Rubin's solution:
>
>>>> [x+(y,) for x,y in zip(a,b)]
> [(1, 2, 3, 7), (4, 5, 6, 8)]
>
>I think this is much nicer and probably more efficient.

For a slight boost in Python 2.x, use itertools.izip() to avoid
making an actual list out of zip(a,b).  (In 3.x, "plain" zip() is
already an iterator rather than a list-result function.)

This method (Paul Rubin's) uses only a little extra storage, and
almost no extra when using itertools.izip() (or 3.x).  I think it
is more straightforward than multi-zip-ing (e.g., zip(*zip(*a) + [b]))
as well.  The two-zip method needs list()-s in 3.x as well, making
it clearer where the copies occur:

   list(zip(*a))     makes the list [(1, 4), (2, 5), (3, 6)]
                     [input value is still referenced via "a" so
                      sticks around]
   [b]               makes the tuple (7, 8) into the list [(7, 8)]
                     [input value is still referenced via "b" so
                      sticks around]
   +                 adds those two lists producing the list
                     [(1, 4), (2, 5), (3, 6), (7, 8)]
                     [the two input values are no longer referenced
                      and are thus discarded]
   list(zip(*that))  makes the list [(1, 2, 3, 7), (4, 5, 6, 8)]
                     [the input value -- the result of the addition
                      in the next to last step -- is no longer
                      referenced and thus discarded]

All these temporary results take up space and time.  The list
comprehension simply builds the final result, once.

Of course, I have not used timeit to try this out. :-)  Let's do
that, just for fun (and to let me play with timeit from the command
line):

    (I am not sure why I have to give the full path to the
    timeit.py source here)

    sh-3.2$ python /System/Library/Frameworks/Python.framework/\
    Versions/2.5/lib/python2.5/timeit.py \
    'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
    100000 loops, best of 3: 2.55 usec per loop

    sh-3.2$ python [long path snipped] \
    'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in zip(a,b)]'
    100000 loops, best of 3: 2.56 usec per loop

    sh-3.2$ python [long path snipped] \
    'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])'
    100000 loops, best of 3: 3.84 usec per loop

    sh-3.2$ python [long path snipped] \
    'a=[(1,2,3),(4,5,6)];b=(7,8);zip(*zip(*a) + [b])'
    100000 loops, best of 3: 3.85 usec per loop

Hence, even in 2.5 where zip makes a temporary copy of the list,
the list comprehension version is faster.  Adding an explicit use
of itertools.izip does help, but not much, with these short lists:

    sh-3.2$ python ... -s 'import itertools' \
    'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
    100000 loops, best of 3: 2.27 usec per loop

    sh-3.2$ python ... -s 'import itertools' \
    'a=[(1,2,3),(4,5,6)];b=(7,8);[x+(y,) for x,y in itertools.izip(a,b)]'
    100000 loops, best of 3: 2.29 usec per loop

(It is easy enough to move the assignments to a and b into the -s
argument, but it makes relatively little difference since the list
comprehension and two-zip methods both have the same setup overhead.
The "import", however, is pretty slow, so it is not good to repeat
it on every trip through the 100000 loops -- on my machine it jumps
to 3.7 usec/loop, almost as slow as the two-zip method.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html



More information about the Python-list mailing list