iterators (was: python-dev summary)

Andrew Dalke dalke at acm.org
Fri Feb 16 03:54:16 EST 2001


Here's the rationale for PEP 234

>  1. It provides an extensible iterator interface.

I notice it isn't as powerful as C++ iterators.  For
example, there's no way to specify a bidirectional iterator.
I also don't see how mutations in the container are
handled, but I don't know how C++ does that either.

I also note that there are different types of iterators
which may be applied to an object.  For example, a file
can be iterated a line at a time or a record at a time.
To support this currently, I write a wrapper object which
handles __getitem__ as appropriate.

What's wrong with that approach?  For example,

  import iterate
  for key, val in iterate.items(obj):
      pass

where iterate.items is a function which does essentially
the same thing as mp_iteritems.  Off the top of my head
(without much thought and likely to be error prone) it
could look similar to:

iterate.py:

  class IterateItems:
      def __init__(self, obj, next_key):
          self.__obj = obj
          self.__next_key = next_key
      def __getitem__(self, i):
          k = self.__next_key[i]
          return k, self.__obj[k]

  def items(obj):
      if hasattr(obj, "__iteritems__"):
          return obj.__iteritems__()
      elif hasattr(obj, "__iter__"):
          return IterateItems(obj, obj.__iter__())
      elif hasattr(obj, "__items__"):
          return obj.__items__()
      elif hasattr(obj, "__getitem__"):
          return IterateItems(obj, obj.keys())
      ...

Similar constructs for
  iterate.keys(obj)
  iterate.values(obj)
  ...

My point is that this iterator scheme does not and cannot
handle every situation so there will have to be ways to
map the desired solution to the iterator scheme.  Why not
expand on that and use wrappers/explicit factory fuctions
for everything instead of using new syntax.

>  2. It resolves the endless "i indexing sequence" debate.

None of my clients have ever brought this up as a concern.
For that matter, I'm not sure what you mean by it.
If you had said "i indexing dictionary" I might understand.

There are endless "tab vs. spaces vs. {}" debates.  Does
that mean something needs to be done with Python to settle
those debates.  Meaning, I don't view this as a useful
argument.

Maybe it really means that you are using this phrase as
a shorthand description of a problem and I don't know how
to expand it properly.

>  3. It allows performance enhancements to dictionary iteration.

My alternate suggestion supports this as well, but without
syntax changes.

>  4. It allows one to provide an interface for just iteration
      without pretending to provide random access to elements.

Hmm.  I've had success in throwing an exception when the
iterator was used in a non-forward manner.   I tracked
a "self.__n" with the current position, and enforced
that "i == self.__n" in "__getitem__(self, i)".  No one
has ever complained about it.

But I do like the idea.  Consider one place where I had
a forward iterator through a database (provided by the
underlying C API, which didn't allow a random access
iterator.)  I did the check against self.__n as above,
but it wasn't obvious in the code that only forward
iteration was allowed.  Instead, if there is a iterate
module with a "forward" factory function, then it can
work like:

iterate.py:

  class ForwardIterator:
      def __init__(self, obj):
          self.__obj = obj
          self.__n = 0
      def __getitem__(self, i):
          assert self.__n == i, "only forward iteration is allowed"
          self.__n += 1
          return self.__obj[i]

  def forward(obj):
      if hasattr(obj, "__iter__"):
          return obj.__iter__()
      elif hasattr(obj, "__getitem__"):
          return ForwardIterator(obj)
      ...

and my code would be similar to:

  class ForwardDatabaseIterator:
      def __init__(self, stream):
          self.stream = stream
      def __getitem__(self, i):
          # guaranteed to be called in increasing order, so I
          # don't need to check for random access
          handle = daylight.dt_next(self.stream)
          if h == 0:
              daylight.dt_dealloc(self.stream)
              return Record(handle)


  class Database:
      def __iter__(self):
         stream = daylight.dt_stream(self.handle, daylight.TYP_RECORD)
         return ForwardDatabaseIterator(stream)

  import iterate
  db = open_database("user at host", "password")
  for record in iterate.forward(db):
      pass

>  5. It is backward-compatible with all existing user-defined
>     classes and extension objects that emulate sequences and
>     mappings, even mappings that only implement a subset of
>     {__getitem__, keys, values, items}.

Same for my alternate suggestion because it uses the same
techniques for backwards compatibility that you use.

Note that I am *not* proposing that this module wrapper
should be used.  I would rather keep things as they are
(see my comments on this in c.l.py).  Instead, read it as
a push that if this functionality is added to Python that
it be done without adding new syntax, and a proposal
suggestion how that might be done.

For Ka-Ping Yee.  I'm not on python-dev and it's hard
to search the pipermail archives, so I don't know if I caught
everything on this discussion.  In
http://mail.python.org/pipermail/python-dev/2001-February/012677.html
you say
> > Sorry, Ping, I didn't know you have a PEP for iterators already.
>
> I posted it on this very boutique (i mean, mailing list) a week ago
> and messages have been going back and forth on its thread since then.

I can't seem to find those mails.  I don't see mention of
"234" or "iterator" in any of the titles listed in the indicies,
and pipermail doesn't have any way to do a text search.  My
apologies for not following every link to find that thread.

That said, M-A Lemburg in
http://mail.python.org/pipermail/python-dev/2001-February/012679.html
said
> I'd suggest to implement generic iterators which implements your
> suggestions and put them into the builins or a special iterator
> module...
>
> from iterators import xitems, xkeys, xvalues
>
> for key, value in xitems(dict):
> for key in xkeys(dict):
> for value in xvalues(dict):

which is essentially what I said.  Your reply to his message,
in http://mail.python.org/pipermail/python-dev/2001-February/012682.html
doesn't comment on his suggestion and the PEP doesn't make
any comment on it.

(Oh, I downloaded all of Feb's mail to date.  I can't find any
other message which includes reference to "from iter".  I also
found the thread "Sets: elt in dict, lst.include" and other
threads, but as a flat file it's hard to follow what's going
on.  My apologies for not rewriting this email based on my
new knowledge - I don't think it is affected much by what I've
read.)

Finally, as a minor note, I almost never embed def's in def's
so examples like

            def __iter__(self):
                def iter(self=self):
                    line = self.file.readline()
                    if line: return line
                    else: raise IndexError
                return iter

take some effort for me to follow.  I tend to create new
classes instead, since I find them less confusing than
using default arguments as a way to pass state around.

  Sincerely,

                    Andrew Dalke
                    dalke at acm.org






More information about the Python-list mailing list