[Python-ideas] PEP on yield-from: throw example

Thu Feb 19 19:17:52 CET 2009

Greg Ewing wrote:
> Bruce Frederiksen wrote:
>
>>   1. The double use of send/throw and the yield expression for
>>      simultaneous input and output to/from the generator; rather than
>>      separating input and output as two different constructs.  Sending
>>      one value in does not always correspond to getting one value out.
>
> You might not be interested in sending or receiving
> a value every time, but you do have to suspend the
> generator each time you want to send and/or receive
> a value.
>
> Currently, there is only one way to suspend a
> generator, which for historical reasons is called
> 'yield'. Each time you use it, you have the opportunity
> to send a value, and an opportunity to receive a
> value, but you don't have to use both of these (or
> either of them) if you don't want to.
>
> What you seem to be proposing is having two aliases
> for 'yield', one of which only sends and the other
> only receives. Is that right? If so, I don't see
> much point in it other than making code read
> slightly better.
I'm thinking the yield goes away (both the statement and expression 
form).  This would be replaced by builtin functions.  I would propose 
that the builtins take optional pipe arguments that would default to the 
current thread's pipein/pipeout.  I would also propose that each thread 
be allowed multiple input and/or output pipes and that the selection of 
which to use could be done by passing an integer value for the pipe 
argument.  For example:

send(obj, pipeout = None)
send_from(iterable, pipeout = None)  # does what "yield from" is 
supposed to do
next(iterator = None)
num_input_pipes()
num_output_pipes()

You may need a few more functions to round this out:

pipein(index = 0)   # returns the current thread's pipein[index] object, 
could also use iter() for this.
pipeout(index = 0) # returns the current thread's pipeout[index] object
throwforward(exc_type, exc_value = None, traceback = None, pipeout = None)
throwback(exc_type, exc_value = None, traceback = None, pipein = None)

Thus:

yield expr

becomes

send(expr)

which doesn't mean "this is generator" or that control will 
*necessarily* be transfered to another thread here.  It depends on 
whether the other thread has already done a next on the corresponding 
pipein.

I'm thinking that the C code (byte interpretor) that manages Python 
stack frame objects become detached from Python stack, so that a Python 
to Python call does not grow the C stack.  This would allow the C code 
to fork the Python stack and switch between branches quite easily.

This separation of input and output would clean up most generator examples.

Guido's tree flattener has special code to yield SKIP in response to 
SKIP, because he doesn't really want a value returned from sending a 
SKIP in.  This would no longer be necessary.

def __iter__(self):
  skip = yield self.label
  if skip == SKIP:
    yield SKIPPED
  else:
    skip = yield ENTER
    if skip == SKIP:
      yield SKIPPED
    else:
      for child in self.children:
        yield from child
    yield LEAVE
    # I guess a SKIP can't be returned here?

becomes:

def __iter__(self):
  return generate(self.flatten)

def flatten(self):
  send(self.label)
  if next() != SKIP:
    send(ENTER)
    if next() != SKIP:
      for child in self.children:
        child.flatten()
    send(LEAVE)

Also, the caller could then simply look like:

for token in tree():
    if too_deep:
        send(SKIP)
    else:
        send(None)
        <process token>

rather than:

response = None
gen = tree()
try:
    while True:
        token = gen.send(response)
        if too_deep:
            response = SKIP
        else:
            response = None
            <process token>
except StopIteration:
    pass

The reason for this extra complexity is that send returns a value.  
Separating send from yielding values lets you call send from within for 
statements without having another value land in your lap that you really 
would rather have sent to the for statement.

The same thing applies to throw.  If throw didn't return a value, then 
it could be easily called within for statements.

The parsing example goes from:

  def scanner(text):
    for m in pat.finditer(text):
      token = m.group(0)
      print "Feeding:", repr(token)
      yield token
    yield None # to signal EOF

  def parse_items(closing_tag = None):
    elems = []
    while 1:
      token = token_stream.next()
      if not token:
        break # EOF
      if is_opening_tag(token):
        elems.append(parse_elem(token))
      elif token == closing_tag:
        break
      else:
        elems.append(token)
    return elems

  def parse_elem(opening_tag):
    name = opening_tag[1:-1]
    closing_tag = "</%s>" % name
    items = parse_items(closing_tag)
    return (name, items)

to

  def scanner(text):
    for m in pat.finditer(text):
      token = m.group(0)
      print "Feeding:", repr(token)
      send(token)

  def parse_items(closing_tag = None):
    for token in next():
      if is_opening_tag(token):
        send(parse_elem(token))
      elif token == closing_tag:
        break
      else:
        send(token)

  def parse_elem(opening_tag):
    name = opening_tag[1:-1]
    closing_tag = "</%s>" % name
    items = list(generate(parse_items(closing_tag), pipein=pipein()))
    return (name, items)

and perhaps called as:

    tree = list(scanner(text) | parse_items())

This also obviates the need to do an initial next call when pushing 
(sending) to generators which are acting as consumers.  A need which is 
difficult to explain and to understand.

>
>>          * I'm thinking here of a pair of cooperating pipe objects,
>>            read and write,
>
> Pipes are different in an important way -- they
> have queueing. Writes to one end don't have to
> interleave perfectly with reads at the other.
> But generators aren't like that -- there is no
> buffer to hold sent/yielded values until the
> other end is ready for them.
>
> Or are you suggesting that there should be such
> buffering? I would say that's a higher-level facility
> that should be provided by library code using
> yield, or something like it, as a primitive.
I didn't mean to imply that buffering was required, or even desired.  
With no buffering, the sender and receiver stay in-sync, just like 
generators.  A write would suspend until a matching read, and vice 
versa.  Only when the pipe sees both a write and a read would the object 
be transfered from the writer to the reader.  Thus, write/read replaces 
yield as the way to suspend the current "thread".

This avoids the confusion about whether we're "pushing" or "pulling" 
to/from a generator.

For example, itertools.tee is currently designed as a generator that 
"pulls" values from its iterable parameter.  But then it can't switch 
roles to "push" values to its consumers, and so must be prepared to 
store values in case the consumers aren't synchronized with each other.  
With this new approach, the consumer waiting for the send value would be 
activated by the pipe connecting it to tee.  And if that consumer wasn't 
ready for a value yet, tee would be suspended until it was.  So tee 
would not have to store any values.

def tee():
    num_outputs = num_output_pipes()
    for input in next():
        for i in range(num_outputs):
            send(input, i)

Does this help?

-bruce frederiksen