dealing with infinite generators

Avi Gross avigross at verizon.net
Sun Dec 2 19:18:51 EST 2018


[SPECULATION ALERT]

I found it interesting as people discussed  how one gets the length of
something set up to follow the iterator protocol and especially anything
that is effectively infinite.

It is possible in python to set a value of "inf" using methods like this:

>>> x = float("inf")
>>> x
inf
>>> x+x
inf
>>> x-x
nan
>>> x*x
inf
>>> x**x
Inf

There is also a "-inf" but length somehow stubbornly seem to remain
non-negative.

So if you made an object that iterates say prime numbers in sequence or
powers of 2 or anything known to be infinite mathematically, if not
practically, and wanted to know the length of the object you could have a
method like this in the class definition:

 def  __len__(self) : return(float("inf"))

There would be no need to count the contents. As stipulated, the count would
take an infinite amount of time, give or take epsilon.

Similarly, if you had some iterator like range(1,1000000,2) that is large
but not infinite, you could set it up so it calculates how many potential
entries it might have when created using simple modular arithmetic, then
save that in a variable. The __next__() method could decrement that variable
and the __len__() method would return that variable. Again, no need to
count.

Now there may well be iterables for which this is not simple or even doable.
If the iterable starts with a list of strings containing all the words in a
book in order and returns one unique word each time by popping the first off
the list then removing any copies, then the length is not known unless you
remove all duplicates at the time you instantiate the instance. But that
sort of reduces it to a list of unique strings and if the point is to quit
iterating long before the list empties, you already did all the work.

Realistically, much python code that looks like:
  for f in iterable:

often does not need to know the length ahead of time.

The other related topic was how to deal with an unpacking like this:

a, b, *c = SOMETHING

If something may be an infinite iterator, or perhaps just a large one, would
you want c instantiated to the rest all at once?

One idea is to not get c back as a list but rather as a (modified) iterator.
This iterator can be expanded or used later in the obvious ways, or simply
ignored.

In newer versions of python the yield statement in a generator can accept a
value from the calling routine. I can imagine having a version of an
iterator that will be called for the first time and return a value, then
some time later, one of the next() calls would include a message back saying
that this call should result in getting back not a value but another
iterator that would continue on. I mean if you asked for:

a, b, *c = range(1,1000, 2)

a would get 1
b would get 3
c would not get  [5, 7, 9 ... 999] as it does now.
c would get something like this:

>>> repr(range(5,1000, 2))
'range(5, 1000, 2)'
>>> type(range(5,1000, 2))
<class 'range'>

Or perhaps get back the results of iter(range(...)) instead.

How would the unpacking operation know what you wanted? My guess is there
are multiple possible ways with anyone complaining no matter what you
choose. But since * and ** are taken to mean unpack into a list or dict, I
wonder if *** is available as in:

a, b, ***c = range(1,1000, 2)

And there are often other solutions possible like making the unpacking
happen with a function call that takes an extra argument as a flag instead
of the above method, or just setting some variable before and/or after that
line to control how greedy the match is, sort of.

Clearly many existing iterators may not be easy to set up this way. The
range() function is particularly easy to modify though as the internals of
the generator function keep track of start, stop, step as shown below:

>>> initial = range(1,10,2)
>>> initial.start, initial.stop, initial.step
(1, 10, 2)

So if asked to return a new iterating object, they can do a computation
internally of what the next result would have been and say it is 7, they
would return range(7,10,2) and presumably exit.

Just some speculation. Adding new features can have odd side effects if you
go to extremes like infinities.





More information about the Python-list mailing list