Please comment on Draft PEP for Enhanced Generators

Wed Jan 30 21:31:55 EST 2002

"Kragen Sitaker" <kragen at pobox.com> wrote in message
news:83r8o7zbvr.fsf at panacea.canonical.org...
> "Raymond Hettinger" <othello at javanet.com> writes:
>
> > http://users.javanet.com/~othello/download/genpep.htm
> >
> > Please post your comments (and maybe a little encouragment) here on
> > comp.lang.py or email them directly to me.
>
> Good work.  Here are my comments, which (not surprisingly) focus on
> the things I don't like rather than the things I do.
>
> xfilter, xmap, etc., should take iterables, not sequences, as
> arguments.

I see your point, the argument name, sequences, is not sufficiently
inclusive.

The intention was to take anything that could be fed to iter().  This
includes
xrange (which does not define __iter__ and .next), objects with the
old __getitem__ style, lists, dictionaries, strings, and maybe even
integers if PEP 276 makes it.

In that spirit, I'll rename the argument from *sequences to *collections.
This matches the result given from help(iter).

Thanks for pointing this out.

>
> map and zip have weird different semantics when their sequence
> arguments are of different lengths; xmap and xzip should conform to
> these same weird semantics.  The comments in the specification say
> they do, but the code says they don't.  There's a note explaining
> this, but instead there should be correct code.  The justification
> given for the difference in behavior, that it will produce infinite
> sequences less often, is very weak.

xzip() does conform.  zip( range(3), range(10) ) produces a three
item list.  xzip also runs until the end of the shortest sequence.

xmap() doesn't conform.  I'm not particular which
version is chosen so I presented both.  It's a design choice
between consistency and the risk of someone crashing in an unexpected
infinite loop (I think this might be a common outcome if any of the
supplied iterators don't terminate).

Some iterators in my personal code, such as a date/time stamper or
log number generator do not terminate.  I wanted xmap to be able to
use those generators along with a finite data stream whose length might
not be knowable in advance.

ints() or xrange(sys.maxint) are two examples of things that would
crash an xmap that conform exactly to map.

At this point, I can document both sides of the argument,
provide code or code references to both solutions, and let the world
decide between the two.  While I have my preference, I would rather
have the alternative than no xmap at all.

> xfilter, xmap, etc., don't need to be written up in a PEP; they can be
> provided in a library.  MetaPy.Iterate already has
> backward-compatible-to-1.5.2 versions of them, under more verbose
> names, and others have written these as well.  It would be very nice
> to have this in the standard library instead of in an extra package,
> but they don't need to be builtins.  (Eventually making them builtins
> might make programs that use them faster, though.)
>

I used to feel this way also because the code for them was so simple;
however,
I learned that there are many nuances to duplicating the behaviors of
map, filter, and zip.  It would be ashamed to have everyone continually
re-inventing the wheel and perhaps discovering that roundness is not
so easily achieved.

Also, we tend to use what's there and do things the hard way when the
tools aren't present.  There's going to be a lot of code duplication,
mis-duplication, avoidance, inconsistent implementations, different
spellings, etc.

Like you said, it would be darned nice to have these as built-ins.
But failing that, there's no reason these can be tucked in a library
(where, of course, they will rot on the shelf from non-use).

Out of the four functions, I'm most attached to indexed().  Like
.iteritems(), it changes and simplifies the way you program.
Life is better with these functions around than without them.

> The example of generator comprehensions says:
>     if len(line) > 5:
>         yield line
> It should instead say:
>     if len(line) > 5:
>         yield (len(line),line)
>

Thanks, I'll fix this ASAP.

> There should be no subtle differences in behavior between generator
> comprehensions and list comprehensions.  There should be no subtle
> differences in behavior in a programming language, period.

The behavior of list comprehensions is not going to change, so we're
left with three choices:
  -- mimic the existing behavior (which you dub unforgivably stupid)
      and force the locals() dictionary of the generator to do an .update()
      to the dictionary of the enclosing scope.
  -- tolerate a small difference in behavior.
  -- drop the whole idea because the symmetry isn't perfect.

Choose your poison.

>The
> current scoping behavior of list comprehensions is unforgivably
> stupid, but having new language constructs behave subtly differently
> from existing ones is worse.

It turns out that the current scoping behavior of list comprehensions
was designed that way on purpose.  It is not an artifact of the world
before nested scopes.  They decided to make it use the enclosing
scope so that it would precisely mimic the behavior of
an equivalent unrolled for-loop which does leave its variable changed
in the local scope.  I got this straight from the BDFL.

>
> I am in favor of the x = [yield y ...] generator comprehension syntax;
> mostly I'm in favor of *some* generator comprehension syntax, and this
> is the best one I've seen so far, but I hope we can come up with a
> syntax that won't pose as many problems to learning Python as this one
> does.
>
> The two-way generator parameter passing proposal is incompletely
> specified.  It apparently introduces invisible FIFOs; it doesn't
> accomplish anything that can't be done more easily by passing an
> iterator as an argument when constructing a generator; .throw() is
> poorly named and it isn't clear why it's useful.  It should be
> separated from the rest of the PEP and put into its own PEP so we can
> discuss it separately; I think I would oppose it.
>

I may separate the discussion of .throw() not because I think it is a weak
suggestion.  On the contrary, I think it should be adopted even if .submit()
is not accepted.  Currently, the code inside a generator is the only running
code in all of Python that is shielded from the exception tree.  The
.throw()
method provides some mechanism for triggering an exception at the
last point of execution.  It is especially needed because generators can't
use a try/finally to clean things up.  You need a way to signal a flush()
operation after the last invocation of the generator.

Sorry, you don't like the name.  I would have preferred raise but the
keyword is already taken.  I'm open to suggestions.

>
> Return without an argument is not now considered weak design, at least
> not in any discussion I ever read on the matter.  Return without an
> argument in a function with a meaningful return value is almost
> universally considered weak design.
>
You're preaching to the choir on this one. I originally proposed to make
the yield argument optional and was blasted from many directions.
This one falls under my no-biggee category.  It is not a critical part
of the proposal.

Thanks for your thoughtful comments and suggestions.
I'll incorporate as much as I can and be sure to include
both the strength and weaknesses of each possible design choice.

Raymond Hettinger