list() strange behaviour

Cameron Simpson cs at cskk.id.au
Sun Dec 20 16:09:47 EST 2020


On 20Dec2020 21:00, danilob <tanto at non.va.invalid> wrote:
>I'm an absolute beginner in Python (and in English too ;-)

Well your English is far better than my very poor second language.

>Running this code:
>--------------
># Python 3.9.0
>
>a = [[1, 2, 0, 3, 0],
>     [0, 4, 5, 0, 6],
>     [7, 0, 8, 0, 9],
>     [2, 3, 0, 0, 1],
>     [0, 0, 1, 8, 0]]

This is a list of lists.

>b = ((x[0] for x in a))

This is a generator comprehension, and _not_ a list. Explainations 
below.

>print(list(b))
>print(list(b))
>---------------
>I get this output:
>
>[1, 0, 7, 2, 0]

As you expect.

>[]

As a surprise.

>I don't know why the second print() output shows an empty list.
>Is it possible that the first print() call might have changed the value 
>of "b"?

It hasn't but, it has changed its state. Let me explain.

In Python there are 2 visually similar list-like comprehensions:

This is a _list_ comprehension (note the surrounding square brackets):

    [ x for x in range(5) ]

It genuinely constructs a list containing:

    [ 0, 1, 2, 3, 4 ]

and would behave as you expect in print().

By contrast, this is a generator comprehension (note the round 
brackets):

    ( x for x in range(5) )

This is a "lazy" construct, and is like writing a generator function:

    def g():
        for x in range(5):
            yield x

It is a little iterable which _counts_ from 0 through 4 inclusive and 
yields each value as requested.

Try putting a:

    print(b)

before your other print calls. It will not show a list.

So, what is happening?

    b = ((x[0] for x in a))

This makes a generator comprehension. The outermost brackets are 
redundant, by the way, and can be discarded:

    b = (x[0] for x in a)

And does this (using my simpler example range(5)):

    >>> b=(x for x in range(5))
    >>> b
    <generator object <genexpr> at 0x10539e120>

When you make a list from that:

    >>> L = list(b)
    >>> L
    [0, 1, 2, 3, 4]

the generator _runs_ and emits the values to be used in the list. If you 
make another list:

    >>> L = list(b)
    >>> L
    []

The generator has finished. Using it again produces no values, and so 
list() constructs an empty list.

That is what is happening in your print statements.

If, instead, you had gone:

    b = [x[0] for x in a]

Then "b" would be an actual list (a sequence of values stored in memory) 
and your prints would have done what you expect.

Python has several "lazy" operations available, which do not do the 
entire computation when they are defined; instead they give you a 
"generator" which performs the computation incrementally, running until 
the next value is found - when the user asks for the next value, _then_ 
the generator runs until that value is obtained and "yield"ed.

Supposing your array "a" were extremely large, or perhaps in some way 
stored in a database instead of in memory. It might be expensive or very 
slow to get _all_ the values. A generator lets you get values as they 
are required.

A generator expression like this:

    b = ( x for x in range(5) )

counts from 0 through 4 inclusively (or 0 through 5, excluding 5, which 
is how ranges egenrally work in Python) when asks. As a function it 
might look like this:

    def b():
        for x in range(5)
            yield x

When you call "list(b)" the list constructor collects values from 
"iter(b)", which iterates over "b". Like this, written longhand:

    L = []
    for value in b:
        L.append(b)

Written even longer:

    L = []
    b_values = iter(b)
    while True:
        try:
            value = next(b_values)
        except Stopiteration:
            break
        L.append(value)

which more clearly shows a call to "next()" to run the iterator b_values 
once, until it yields a value.

The Python for-statement is a builtin convenient way to write the 
while-loop above - it iterates over any iterable like "b".

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-list mailing list