Please comment on Draft PEP for Enhanced Generators

Rocco Moretti roccomoretti at netscape.net
Wed Jan 30 23:58:25 EST 2002


"Raymond Hettinger" <othello at javanet.com> wrote in message news:<a38c1u$945$1 at bob.news.rcn.net>...
> I have written a draft PEP summarizing proposed enhancements to generators:
> http://users.javanet.com/~othello/download/genpep.htm
> 
> Please post your comments (and maybe a little encouragement) here on
> comp.lang.py or email them directly to me.

Comments from the Peanut Gallery - Two quick ones and a Doozy.

(1) You justify the behavior of your xmap over the Cook book one in
that it makes sense when sequences are finite. (But may fail when
sequences are infinite.)

However, in the "New Python," generators are nothing but lazy
sequences in fancy clothing, so it may (in some as of yet unrealized
cases) make sense to pass a generator of infinite output to xmap.
Differing from the semantics of map *and* not being as general as the
alternative makes the alteration a non-starter, IMHO.

(2) On generator comprehensions - You list a con as the brackets make
it look like it should return a list. Well, it can be argued that a
generator is nothing but a special case of a list, and the syntax
issue probably parallels the def vs. defgen arguments - i.e. it's
going to come down to a BDFL pronouncement.

(3) The current semantics for simultaneous generators/consumers are
frightful. (I refer to the bi-directional generator case where both
.next() and .submit() are both valid.) In fact, although not
explicitly stated, your proposal seems like they are not to be
allowed. But assuming they are useful enough to be implemented ...

First off, the way you've described the generator initialization is
ambiguous in the case where both occur. I quote for reference:

: Note D:  There is a subtlety in the implementation of .submit() as
compared
:    to .next().  Here's the normal flow using .next():
:
:        g = mygen(p1) will bind p1 to a local variable 
:                and then return a generator to be bound to g
:        y = g.next() runs the generator from the first 
:                line until it encounters a yield when 
:                it suspends execution and returns a value
:                to be bound to y
:
:    In contrast, the flow using .submit() is different:
:
:        g = mygen(p1) will bind p1 to a local variable and
:               then IMMEDIATELY executes code within 
:               mygen() until a yield is encountered 
:               then it suspends execution and returns 
:               a generator to be bound to g
:        g.submit(val) resumes execution at the yield and 
:               binds val to the left hand side of the
:               yield assignment and continues running
:               until another yield is encountered.

So what happens in the case of:

def oops(switch):
    '''What Should Python Do?'''
    print "There!"
    if switch == 5:
        yield 5
    else:
        m = yield None
n = oops(q)
print "Here!"
#calls to n.next()/n.submit(val) later on in the program

So if q == 5, Python should print 'Here! There!',
but if q != 5, Python should print 'There! Here!', 
The trick being that the compiler has little way of knowing the which
the case should be without some deep introspection on which code paths
are accessible from a given argument set.

You maybe able to fix this problem by delaying execution of the first
bit of code until the first invocation of gen.next() or gen.submit().
That is, the semantics of gen.submit() could change to:

        g = mygen(p1) will bind p1 to a local variable and 
                then return a generator to be bound to g
        g.submit(val) The first time it is invoked, it runs
                the generator from the first line until it 
                encounters a yield. It then binds 
                val to the left hand side of the yield 
                assignment and continues running until 
                another yield is encountered. Subsequent 
                calls bind val to the left hand side of the
                yield assignment and continue       
                running until another yield is encountered.

On a separate note, you also run into the problem of mixed .next()
.submit() calls:

def gen():
    #initialization code
    while x != EOF:
        y = yield None
        # .... processing code ....
        yield z
        # .... more code .......

gen must be called via a strict sequence of alternating submit and
next calls.
What should happen if this sequence is broken? (An error is raised, I
assume.) I can't think of enough applications of generator/consumer
combinations to say for certain if such guarantees are possible in all
circumstances (i.e. the guarantee that if you leave via "yield val"
you always return via "gen.next()" and if you leave via "x = yield
None" you return via "gen.submit(val)"

My guess is removing the need for such guarantees would require the
allowance of combined submit/next statements equivalent to "x =
gen.submit(a)" and "x = yield b", with parameter dropping or filling
in with None.

In those regards, I'm confused as to what exactly is the confusion
that is discussed in response to overloading .next():

 <first paragraph skipped as it is addressed above>

: The second problem arose when creating sample code 
: using the .next(value) format and it turned out 
: that it was rare to both send and receive data
: to the generator.  

Rare != Unheard of -> Well, sometimes, at least. Sorry to interrupt,
please continue.

: Writing x=g.next(y) turned out to be confusing 
: and more than a little mind-blowing.  Let's see,
: execution returns to the generator, binds y to 
: a local, hmm, when does the argument to yield get
: passed back to be bound to x, is it before y is 
: assigned or after a full loop when the next yield
: is encountered?  In other words, overloading the 
: .next() method is a bug factory.

I don't get the confusion. Operate under the assumption that the right
hand side of an assignment is fully evaluated before being assigned to
the left (generally true, I believe). Using the initialization
procedures as above, start in the function calling the generator for
the first time:

w = gen.next(z)

This passes z to the generator, which holds it until the
initialization is finished. Eventually, the generator happens upon the
first yield statement
 
x = yield y

Set aside the fact that you have a yield *statement* where an
expression should go. (Moot point since you have the same with x =
yield None), "yield y" is evaluated passing y to another function
(like if it were the function 'yield(y)' in 1.5.2 - except that the
function is the one who called the generator in the first place!).

Back in the calling function, gen.next(z) evaluated to y (y was
"returned" by yield), so we set w to y. We continue on with the code
in the calling function, and get (for the sake of argument) the same
generator call again. So we pass our new z into the generator.

We see the first problem, which is that the original z never got used,
and is now overwritten with the new z. Putting that aside for now, the
generator sees this call as a return from the 'yield' function, and
now sets the value of x to the new z. Execution continues in the
generator until we hit a yield statement again which starts the
process once more.

The flow control is not that hard to understand if you think of the
yield (statement/expression) as you would a "function": save your
state, pass the value, and collect whatever was passed back to you.
The only problem is that the submitted value on the *first* (and only
the first) call is never used, as even "x = yield None" passes None
back before returning into the generator (where the newly submitted
value obliterates the old.)

The initial value loss is a little problematic, but can easily be
fixed with appropriate arguments to the constructor, or with a
sacrificial .next() after the iterator construction.
  
I believe the semantics and syntax of the above are equivalent to 2.2
when using "gen.next()" with a default value of None and "yield x"
exclusively.

That said, the using the .submit() syntax instead of overloading
.next() runs into the same problems if you don't require the
next/yield and submit/x=yield symmetries.

Just to make sure I've beaten this Norwegian Blue horse enough, think
of it this way: __call__()/return sets up a parent/child relationship
between two functions where one gives commands and the other obeys.
gen.next(val)/yield sets up an equivalency relationship between the
two functions, where they are simultaneously the "parent" and the
"child" of the other function.

For BASIC programmers? No. Simple? Not really. Something really cool
and really powerful if handled correctly? Probably, depending on your
definition of cool and powerful.

ex-QBASIC-programmer-going-back-to-lurk-mode-ly'rs

- Rocco

P.S. Feel free to rip my analysis to shreds. I have no language design
experience what-so-ever, and the only thing I know about
iterators/generators is what I've read on Python's current
implementation thereof (the PEPs) - everything else was pulled from
some nether region as I typed this.



More information about the Python-list mailing list