Generator expressions v/s list comprehensions

Sun Aug 29 00:30:52 EDT 2004

>>>>> "Mahesh" == Mahesh Padmanabhan <mahesh at privacy.net> writes:

    Mahesh> Is returning a list really a limitation considering that
    Mahesh> lists can be transformed quite easily?

Yes.  (1) You lose your memory, and (2) you can't use whatever you get
outside after the list comprehension hidden loop when evaluating it.
Let's have some examples.

You lose you memory because you have to generate a whole list, which
might be unnecessary because your processing eventually don't need it.
It is the same as the difference between range() and xrange().  E.g.,
you might write

  for a in [x*x for x in range(100000)]:
      print a

you have to wait until you generate the 100k list, and at that time
you start printing out x*x for each of the values.  The middle-ground

  for a in [x*x for x in xrange(100000)]:
      print a

save half the memory, but still needs to generate the 100k list, and
you have to wait a long time before you print the first result.  Once
you get generator expression, you can say

  for a in (x*x for x in xrange(100000)):
      print a

and it will do the same thing, except that you don't need to wait at
the beginning, and at no instant there is a 100000 element list
sitting in memory.

[N.B.: Of course, in this simple example you'll instead write

  for x in xrange(100000):
      print x*x

 but sometimes things are not as easy, e.g., what you'd do if you have
 to pass the abstraction "x*x in xrange(100000)" into a function as a
 function argument?]

Now let's turn to the second point: with list comprehension, the
evaluation of an element happens after the evaluation of previous
elements, but before the previous elements are being used.  At times
this makes things hard to achieve or even impossible (and at that time
you have to throw up your hands and write a generator function
instead).  E.g., suppose you have this:

  import time
  hash = {}
  def process1(x):
      for i in xrange(1, 11):
          hash[x * i] = 1
  def process2(x):
      for i in xrange(2, 12):
          hash[x * i] = 1
  if time.time() % 2:
      process = process1
  else:
      process = process2

Now you have a loop, which you want a neater way to rewrite:

  for x in xrange(1000):
      if not hash.has_key(x):
          process(x * x)

in such a way that you don't need to let others specify the exact loop
to run.  Intuitively you'd like to write a list comprehension to do
that.  So you'd like to write

  for y in [x*x for x in xrange(1000) if not hash.has_key(x)]:
      process(y)

and let others pass the list into the function.  But this makes
hash.hash_key(x) to be called when none of the process(y) is called,
so it breaks completely.  With generator expression, you write:

  for y in (x*x for x in xrange(1000) if not hash.has_key(x)):
      process(y)

which do the trick.  Note that now "x*x for x in xrange(1000) if not
hash.has_key(x)" is an object, and you can move it out of the
function, ask somebody else to pass it to you---which you can't do in
the original for loop.  I.e., now you can say

  def func(gen):
      for x in gen:
          process(x)

and let somebody call

  func(x*x for x in xrange(1000 if not hash.has_key(x)))

Without generator expression, to achieve this you must code a
generator function.  So generator expression helps you to write simple
generators.

Regards,
Isaac.