writable iterators?

Thu Jun 23 14:26:04 EDT 2011

(I apologize for the length of this article -- if I had more time,
I could write something shorter...)

In article <mailman.296.1308770918.1164.python-list at python.org>
Neal Becker  <ndbecker2 at gmail.com> wrote:
>AFAICT, the python iterator concept only supports readable iterators,
>not write.  
>Is this true?
>
>for example:
>
>for e in sequence:
>  do something that reads e
>  e = blah # will do nothing
>
>I believe this is not a limitation on the for loop, but a limitation on the 
>python iterator concept.  Is this correct?

Yes.

Having read through the subsequent discussion, I think in some ways
you have run into some of the same issues that I did in my originally
somewhat-vague thoughts on exceptions, in that your example is too
close to "real Python code" and led a number of followers (including
me, originally) astray. :-)

It might be better expressed as, say:

    for i in IndirectIter(sequence):
        current_value = i.get()
        result = compute(current_value)
        i.set(result)

which is clearly rather klunky, and also does not fit super-well
into existing iter protocols, but could be implemented for lists
and dictionaries for instance; see below.

A "more direct" syntax (which I admit is pretty klunky, this is
kind of off the top of my head):

    for item in sequence with newvalue:
        newvalue = compute(item)

This leaves unresolved the issue of "what if you don't set the
variable newvalue", but perhaps the for loop could internally
bind both "item" *and* "newvalue" at the top of each iteration,
so that this is essentially:

    for item in sequence with newvalue:
        newvalue = item # automatically inserted for you
        ... user code; if it doesn't set newvalue the .set()
            (or whatever equivalent) will re-save the original value ...

Or -- and I think this is actually a better idea -- perhaps it
could "pre-bind" newvalue = None and the automatic iter.set()
invocation would leave "None" undisturbed.  In which case, the
internal implementation could even use .set() only, rather than
having to call iter.next(), as the primary protocol, with iter.set()
changing the current value and then doing, in essence, "return
iter.next()".  Of course this is just a micro-optimization that
might only apply to CPython in the first place; I am getting way
ahead of myself here. :-)

(To expand, what I am thinking at the moment is that if one had
this syntax, one would change the iter protocol.  An iterator object
would still provide "__iter__" and "next" callables always.  If it
also provides a "set" callable -- or "setitem" or something like
that; the name is clearly flexible at this point -- then this would
make it a "writeable iterator" that one could use with the new
syntax.  The protocol would become:

    for <var1> in <container> [with <var2>]:
        <code>

which if the "with" is present would mean: "call <container>.__iter__
to get an iterable as usual, with the usual check that iter.__iter__
is also a callable.  Then, though, check the iterable for the *new*
callable as well.  If not present, you get an error.  If present,
call iter.next() initially and bind <var2> to None.  At the bottom
of the loop, to step the loop, call the iter's iter.set() with
var2; bind its return value to var1, and re-bind var2 to None again.
Both iter.next() and iter.set() can raise StopIteration to terminate
the loop.)

This idea needs more thought applied, of course.

Another possible syntax:

    for item in container with key:

which translates roughly to "bind both key and item to the value
for lists, but bind key to the key and value for the value for
dictionary-ish items".  Then instead of:

    for elem in sequence:
        ...
        elem = newvalue

the OP would write, e.g.:

    for elem in sequence with index:
        ...
        sequence[index] = newvalue

which of course calls the usual container.__setitem__.  In this
case the "new protocol" is to have iterators define a function
that returns not just the next value in the sequence, but also
an appropriate "key" argument to __setitem__.  For lists, this
is just the index; for dictionaries, it is the key; for other
containers, it is whatever they use for their keys.

I actually think I like this second syntax more, as it leaves the
container-modifying step explicitly spelled out in user code.  It
would also eliminate much of the need for enumerate().

    ---- example IndirectIter below ----

class IndirectIterError(TypeError):
    pass

class _IInner(object):
    def __init__(self, outer, iterlist):
        self.outer = outer
        self.iterlist = iterlist
        self.index = -1

    def __iter__(self):
        return self

    def next(self):
        self.index += 1
        if self.index >= len(self.iterlist):
            raise StopIteration
        return self

    def get(self):
        return self.outer._get(self.index, self.iterlist)

    def set(self, newvalue):
        return self.outer._set(self.index, self.iterlist, newvalue)

class IndirectIter(object):
    def __init__(self, sequence):
        if isinstance(sequence, dict):
            self._iter = self._dict_iter
            self._get = self._dict_get
            self._set = self._dict_set
        elif isinstance(sequence, list):
            self._iter = self._list_iter
            self._get = self._list_get
            self._set = self._list_set
        else:
            raise IndirectIterError(
                "don't know how to IndirectIter over %s" % type(sequence))
        self._seq = sequence

    def __str__(self):
        return '%s(%s)' % (self.__class__.__name__, self._iterover)

    def __iter__(self):
        return self._iter()

    def _dict_iter(self):
        return _IInner(self, self._seq.keys())

    def _dict_get(self, index, keys):
        return self._seq[keys[index]]

    def _dict_set(self, index, keys, newvalue):
        self._seq[keys[index]] = newvalue

    def _list_iter(self):
        return _IInner(self, self._seq)

    def _list_get(self, index, _):
        return self._seq[index]

    def _list_set(self, index, _, newvalue):
        self._seq[index] = newvalue

if __name__ == '__main__':
    d = {'one': 1, 'two': 2, 'three': 3}
    l = [9, 8, 7]
    print 'modify dict %r' % d
    for i in IndirectIter(d):
        i.set(-i.get())
    print 'result: %r' % d
    print
    print 'modify list %r' % l
    for i in IndirectIter(l):
        i.set(-i.get())
    print 'result: %r' % l
-- 
In-Real-Life: Chris Torek, Wind River Systems
Intel require I note that my opinions are not those of WRS or Intel
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W)  +1 801 277 2603
email: gmail (figure it out)      http://web.torek.net/torek/index.html