Using iterators to write in the structure being iterated through?

Wed Jul 26 14:08:19 EDT 2006

On Wed, 26 Jul 2006 18:54:39 +0200, Peter Otten wrote:

> Pierre Thibault wrote:
> 
>> Hello!
>> 
>> I am currently trying to port a C++ code to python, and I think I am stuck
>> because of the very different behavior of STL iterators vs python
>> iterators. What I need to do is a simple arithmetic operations on objects
>> I don't know. In C++, the method doing that was a template, and all that
>> was required is that the template class has an iterator conforming to the
>> STL forward iterator definition. Then, the class would look like:
>> 
>> template <class H>
>> class MyClass
>> {
>> public:
>> 
>> MyClass(H& o1, H& o2) : object1(o1), object2(o2) {}
>> 
>> void compute();
>> 
>> private:
>> H& object1;
>> H& object2;
>> 
>> };
>> 
>> template <class H>
>> void MyClass::compute()
>> {
>> typedef typename H::iterator I;
>> 
>> I o1_begin = object1.begin();
>> I o2_begin = object2.begin();
>>         I o1_end = object1.end();
>> 
>> for(I io1 = o1_begin, io2 = o2_begin; io1 != o1_end; ++io1, ++io2)
>>  {
>> // Do something with *io1 and *io2, for instance:
>> // *io1 += *io2;
>> }
>> }
>> 
>> This is all nice: any object having a forward iterator works in there.
>> 
>> Then I discovered python and wanted to use all its goodies. I thought it
>> would be easy to do the same thing but I can't: the iterator mechanism is
>> read-only, right? So it does no make sense to write:
>> 
>> io1 = iter(object1)
>> io2 = iter(object2)
>> 
>> try:
>>   while 1:
>>     io1.next() += io2.next()
>> except StopIteration:
>>   pass
>> 
>> That won't work:
>> SyntaxError: can't assign to function call
>> 
>> Here is my question: how could I do that and retain enough generallity?
> 
> You need a temporary variable (out in the example below):
> 
>>>> accus = [[] for i in range(3)]
>>>> ins = ("abc" for i in range(3))
>>>> outs = iter(accus)
>>>> while 1:
> ...     out = outs.next()
> ...     out += ins.next()
> ...
> Traceback (most recent call last):
>   File "<stdin>", line 2, in <module>
> StopIteration
> 
> In idiomatic Python that becomes
> 
>>>> accus = [[] for i in range(3)]
>>>> ins = ("abc" for i in range(3))
>>>> outs = iter(accus)
>>>> from itertools import izip
>>>> for out, in_ in izip(outs, ins):
> ...     out += in_
> ...
>>>> accus
> [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']]
> 
> 
> Peter

Hum, this example seems like a special case not really appropriate for my
needs. Let me make my problem a little more precise. The objects I'll want
to iterate through will always contain some floats. Very often, I guess
they will be numpy.ndarray instances, but they can be something else as
well. For instance, I will have to deal with arrays having internal
symmetries (crystallographic datasets). The most efficient thing to do
with those is save only the unique values and keep track of the symmetry
operations needed to extract other values.

A 1d examples: I have a class which represents a unit cell
which has the additional mirror symmetry in the middle of the cell. Say
C is an instance of this class representing data on a 50-long array. Only
25 datum will be stored in C, and the class takes care of giving the value
of C[0] if C[49] is asked, C[1] for C[48], and so on. Since crystals are
periodic, the translational symmetry can also be managed (C[-1] == C[49]
== C[0], C[-2] == C[48] == C[1]...). In any case, the user of this object
need not know how the data is stored: for him, C is just a funny
array-like object defined from -infinity to +infinity on a 1-D lattice.

Now, I want to do simple math operations on the data in C. Doing a loop
from 0 to 49 would loop twice through the actual data. In this
context, an iterator is perfect since it can take care internally of going
through the data only once, so it would be great to access _AND MODIFY_
data over which I iterate.

I now realize I don't know how to do that with numpy.ndarray either. It
looks like the only way I can modify the content of an array is by using
the [] notation (that is, in random access). For instance, the
following will not change the state of a:

import numpy
a = numpy.array([[1.,2.,3.],[4.,5.,6.],[7.,8.,9.]])

for c in a.flat
    c += 2.

Of course, I know that a += .2 would be alright, but this is not general
enough for more complicated classes.

So I guess my problem is worst than I thought.

Thanks anyway for your help!

Pierre