Efficiently Split A List of Tuples

Ron Adam rrr at ronadam.com
Mon Jul 18 02:37:36 EDT 2005


Raymond Hettinger wrote:
>>Variant of Paul's example:
>>
>>a = ((1,2), (3, 4), (5, 6), (7, 8), (9, 10))
>>zip(*a)
>>
>>or
>>
>>[list(t) for t in zip(*a)] if you need lists instead of tuples.
> 
> 
> 
> [Peter Hansen]
> 
>>(I believe this is something Guido considers an "abuse of *args", but I
>>just consider it an elegant use of zip() considering how the language
>>defines *args.  YMMV]
> 
> 
> It is somewhat elegant in terms of expressiveness; however, it is also
> a bit disconcerting in light of the underlying implementation.
> 
> All of the tuples are loaded one-by-one onto the argument stack.  For a
> few elements, this is no big deal.  For large datasets, it is a less
> than ideal way of transposing data.
> 
> Guido's reaction makes sense when you consider that most programmers
> would cringe at a function definition with thousands of parameters.
> There is a sense that this doesn't scale-up very well (with each Python
> implementation having its own limits on how far you can push this
> idiom).
> 
>  
> Raymond


Currently we can implicitly unpack a tuple or list by using an 
assignment.  How is that any different than passing arguments to a 
function?  Does it use a different mechanism?



(Warning, going into what-if land.)

There's a question relating to the above also so it's not completely in 
outer space.  :-)


We can't use the * syntax anywhere but in function definitions and 
calls.  I was thinking the other day that using * in function calls is 
kind of inconsistent as it's not used anywhere else to unpack tuples. 
And it does the opposite of what it means in the function definitions.

So I was thinking, In order to have explicit packing and unpacking 
outside of function calls and function definitions, we would need 
different symbols because using * in other places would conflict with 
the multiply and exponent operators.  Also pack and unpack should not be 
the same symbols for obvious reasons.  Using different symbols doesn't 
conflict with * and ** in functions calls as well.

So for the following examples, I'll use '~' as pack and '^' as unpack.

    ~ looks like a small 'N', for put stuff 'in'.
    ^ looks like an up arrow, as in take stuff out.

(Yes, I know they are already used else where.  Currently those are 
binary operators.  The '^' is used with sets also. I did say this is a 
"what-if" scenario.  Personally I think the binary operator could be 
made methods of a bit type, then they ,including the '>>' '<<' pair, 
could be freed up and put to better use.  The '<<' would make a nice 
symbol for getting values from an iterator. The '>>' is already used in 
print as redirect.)


Simple explicit unpacking would be:

(This is a silly example, I know it's not needed here but it's just to 
show the basic pattern.)

    x = (1,2,3)
    a,b,c = ^x     # explicit unpack,  take stuff out of x


So, then you could do the following.

    zip(^a)        # unpack 'a' and give it's items to zip.

Would that use the same underlying mechanism as using "*a" does?  Is it 
also the same implicit unpacking method used in an assignment using 
'='?.  Would it be any less "a bit disconcerting in light of the 
underlying implementation"?



Other possible ways to use them outside of function calls:

Sequential unpacking..

    x = [(1,2,3)]
    a,b,c = ^^x    ->  a=1, b=2, c=3

Or..

    x = [(1,2,3),4]
    a,b,c,d = ^x[0],x[1]   -> a=1, b=2, c=3, d=4

I'm not sure what it should do if you try to unpack an item not in a 
container.  I expect it should give an error because a tuple or list was 
expected.

    a = 1
    x = ^a    # error!


Explicit packing would not be as useful as we can put ()'s or []'s 
around things.  One example that come to mind at the moment is using it 
to create single item tuples.

     x = ~1    ->   (1,)

Possible converting strings to tuples?

     a = 'abcd'
     b = ~^a   ->   ('a','b','c','d') # explicit unpack and repack

and:

     b = ~a    ->   ('abcd',)   # explicit pack whole string

for:

     b = a,    ->   ('abcd',)   # trailing comma is needed here.
                                # This is an error opportunity IMO


Choice of symbols aside, packing and unpacking are a very big part of 
Python, it just seems (to me) like having an explicit way to express it 
might be a good thing.

It doesn't do anything that can't already be done, of course.  I think 
it might make some code easier to read, and possibly avoid some errors.

Would there be any (other) advantages to it beside the syntax sugar?

Is it a horrible idea for some unknown reason I'm not seeing.  (Other 
than the symbol choices breaking current code.  Maybe other symbols 
would work just as well?)

Regards,
Ron




More information about the Python-list mailing list