Reading List Comprehension?

Thu Sep 7 05:17:14 EDT 2000

"Stephen Hansen" <stephen at cerebralmaelstrom.com> wrote in message
news:20000906.215342.20887 at Jeremy.cerebralmaelstrom.com...
> Whee. Python 2.0. I'm estatic, so go to check out what's changed...
    [snip]
> But..list comprehensions.. I can't quite fathom.
>
> --> x for subseq in seq for x in subseq
>
> Eh?!?! That looks so..wrong. No whitespace, parens, no nothing, with

If there was no whitespace, that would be
   xforsubseqinseqforxinsubseq
 which would indeed be slightly harder to read.  And the whole thing
must be enclosed in brackets, pretty close to parentheses I'd say.

> all those keywords mixed in there.

There are just two keywords in the above, 'for' and 'in' (each
appears twice); you can also have an 'if' in other comprehensions,
but that just ups the total to three.

> I can't pronounce it, or make any

You can pronounce >> [[yuech:-(]] as "to", and can't pronounce a
sequence of 8 words...?

> I don't know -- is there a more in depth, vastly basic, tutorial to
> what the heck these things are somewhere? It looks..so..well,
> not-at-all-like-Python.

Hmmm, let's try; Fiona, if you choose to use this, feel free to
edit out the Haskell references if you think it appropriate -- I
believe they help, but I'm aware this is a controversial stance;
I get much more concrete in the explanation later on:-).

The semantics is somewhat Haskell-ish (except that Haskell is lazy
while Python is strict, of course), but the syntax is Pythonish --
in Haskell, it would be
    [x | seq <- subseq, x <- seq]

I suspect the following summary might help even if one knows
nothing at all about Haskell, since here it is so close to
what one often writes to define a set in math notation:
    { x | x element-of someset, x>0 }
for example (commonly using epsilon for element-of); the only
differences reflect those between sets and sequences.

More abstractly, in Haskell the general syntax of list comprehensions is:
    [ expression | qualifiers ]
where qualifiers is a comma-separated list of items each of which is
one of:
    "generator":            pattern <- expression
    "local declaration":    'let' declarations
    "guard":                expression

In python 2.0, the general syntax of list comprehensions differs in
using keywords to denote the role of each qualifier, not providing
for local declarations, and using no punctuation:
    [ expression qualifiers ]
where qualifiers is the justaposed (no punctuation) list of items
each of which is one of (adopting Haskell's meta-terminology...):
    "generator"    'for' pattern 'in' sequence
    "guard"        'if' expression
[a 'for', as usual in Python, may have a little bit of pattern
matching, specifically for tuple-unpacking when the sequence
is a sequence of tuples that are all the same length].  There
must be one or more generator, and zero or more guards; the
first qualifier must be a generator, so the comprehension
always starts with "[ expression for " and goes on from here.

So the syntax differences are easy to summarize: python uses...:
    -- no | to separate the "comprehended" expression from the
        qualifiers list
    -- no commas to separate the qualifiers from each other
    -- 'for pattern in sequence' rather than 'pattern<-sequence'
        for each generator
    -- 'if expression' rather than just 'expression' for each guard
    -- no equivalent to 'let'-introduced local declarations

The semantics of Python list comprehensions are easy to explain
in terms of pre-existing python constructions: what comes out
of a list comprehension is what you would get if you wrote, for
some name 'somename' that's not used anywhere:
    somename=[]
    *expansion-of-the-qualifiers*
        somename.append(expression)
where the 'expansion' of the qualifiers is what you could
obtain by placing a ':' after each, followed by a line break
and appropriate indent.

For example:

def expand_a(seq):
    return [x for subseq in seq for x in subseq]

yields the same result as:

def expand_b(seq):
    somename=[]
    for subseq in seq:
        for x in subseq:
            somename.append(x)
    return somename

It's similar if we also have an 'if'; here are
four equivalent approaches:

>>> somename=[x+y for x in 'ciao' for y in 'xy' if x>'a']
>>> somename
['cx', 'cy', 'ix', 'iy', 'ox', 'oy']

>>> somename=[x+y for x in 'ciao' if x>'a' for y in 'xy']
>>> somename
['cx', 'cy', 'ix', 'iy', 'ox', 'oy']

>>> somename=[]
>>> for x in 'ciao':
        for y in 'xy':
            if x>'a':
                somename.append(x+y)

>>> somename
['cx', 'cy', 'ix', 'iy', 'ox', 'oy']

>>> somename=[]
>>> for x in 'ciao':
        if x>'a':
            for y in 'xy':
                somename.append(x+y)

>>> somename
['cx', 'cy', 'ix', 'iy', 'ox', 'oy']

Python's list-comprehension does not define a scope: any
variables that are bound in a comprehension's generators
remain bound (to whatever they were last bound to) when
the generator is over:

>>> [x+y for x in 'ciao' for y in '!?']
['c!', 'c?', 'i!', 'i?', 'a!', 'a?', 'o!', 'o?']
>>> print x,y
o ?
>>>

Again, this is _exactly_ the same as if the comprehension
had been expanded into an explicitly-nested group of one
or more 'for' statements (and 0 or more 'if' statements).

Verdict: very handy syntax sugar.  Adds nothing really
"fundamental", i.e., nothing that Python couldn't do,
but, since in quite a few places you can only use an
expression (which a list comprehension is) and not
statements (which is what you would previously use to
achieve the same effect as a comprehension), it lets
you avoid defining ad-hoc functions (with all the
attendant scoping problems) in several frequent cases
(defining a function being the only general way to
make a series of statements into an expression...).

In practice, to avoid defining ad-hoc functions, one
would often use map or filter for some things you can
now do with comprehensions -- and with map or filter,
one would normally have to use a lambda, which the
comprehension avoids; this is a simplication _and_
also makes things appreciably faster.  In fact...:

def filter(function, list):
    if function:
        return [x for x in list if function(x)]
    else:
        return [x for x in list if x]

is almost identical to the builtin 'filter' (except
the latter returns a tuple if 'list' is a tuple);

def map1(function, list):
    return [function(x) for x in list]

is similar to the special-case of the builtin 'map'
where function!=None and there is only one list...

Alex