Iteration and large lists (was: Re: [Edu-sig] How to Improve code performance)

Kirby Urner urnerk@qwest.net
Tue, 30 Oct 2001 13:07:13 -0800


>
>Anyway, the general point was about resource requirements for
>different iteration idioms.  Recursion requires memory, iteration over
>a sequence requires memory if you have to build the sequence just for
>that purpose, and iteration with while _doesn't_ require more than a
>constant amount of memory (well, maybe if you get up into long
>integers).  If we rewrote code to use a while instead of a "for i in
>range", our code's memory profile could be much different.

Seth makes good points about range() pre-allocating potentially
large chunks of memory.  Historically, xrange() was provided to
allow a "lazy evaluation" approach -- instead of pre-allocating,
it would do more like Seth suggests, and compute the next "list"
element with each iteration.

Since Python 2.2, we have the yield keyword, which, when used
in a function definition, makes that function a "generator".
To 'yield' a value is akin to using a print statement, except
the argument is returned in a usable form (ala 'return') --
which ties this thread to an earlier one re 'return' vs.
'print' semantics.

When a function encounters a 'yield', execution stops and the
calling function gets the yielded falue.  When the function
is next triggered using .next(), or is iterated over as an
object, as in 'for i in myfunction()', then execution picks
up right after the 'yield', as if "nothing had happened"
(i.e. there's no more disruption to local variables than if
a 'print' had been used -- the function behaves as if that's
what had happened).

So xrange() is being phased out, given 'yield' provides a
much more generic approach to lazy evaluation (means a
'just in time' approach to value generation, where you
get a next value just when you need it, without a lot of
pre-allocation of values -- which you might never get to,
if the loop has a break in it, for example).

To make this more concrete, here's a lazyrange() function
which behaves somewhat like range (lots of improvements
and variations possible):

   >>> from __future__ import generators  # make feature available

   >>> def lazyrange(initial,final,inc):
           while 1:         # infinite loop
               if inc < 0:  # break condition
                  if initial <= final:  return
               else:        # alternate break condition
                 if initial >= final:  return
          yield initial
           initial += inc # increment using inc

   >>> for i in lazyrange(10,1,-1):  print i,

   10 9 8 7 6 5 4 3 2

You see above that by containing a yield, lazyrange becomes
a generator, and therefore a kind of iterable object (something
you can iterate over using 'for').

Note this alternative syntax:

   >>> f = lazyrange(1,10,1)  # instantiate function object

   >>> type(f)  # note f's type
   <type 'generator'>

   >>> dir(f)   # note builtin generator method next()
   ['__class__', '__delattr__', '__getattribute__', '__hash__',
   '__init__',   '__iter__', '__new__', '__reduce__', '__repr__',
   '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next']

   >>> for i in f: print i,

   1 2 3 4 5 6 7 8 9

Kirby