Big time WTF with generators - bug?

James Stroud jstroud at mbi.ucla.edu
Wed Feb 13 03:35:00 EST 2008


Hello,

I'm boggled.

I have this function which takes a keyer that keys a table (iterable). I 
filter based on these keys, then groupby based on the filtered keys and 
a keyfunc. Then, to make the resulting generator behave a little nicer 
(no requirement for user to unpack the keys), I strip the keys in a 
generator expression that unpacks them and generates the k,g pairs I 
want ("regrouped"). I then append the growing list of series generator 
into the "serieses" list ("serieses" is plural for series if your 
vocablulary isn't that big).

Here's the function:

def serialize(table, keyer=_keyer,
                      selector=_selector,
                      keyfunc=_keyfunc,
                      series_keyfunc=_series_keyfunc):
   keyed = izip(imap(keyer, table), table)
   filtered = ifilter(selector, keyed)
   serialized = groupby(filtered, series_keyfunc)
   serieses = []
   for s_name, series in serialized:
     grouped = groupby(series, keyfunc)
     regrouped = ((k, (v[1] for v in g)) for (k,g) in grouped)
     serieses.append((s_name, regrouped))
   for s in serieses:
     yield s


I defined a little debugging function called iterprint:

def iterprint(thing):
   if isinstance(thing, str):
     print thing
   elif hasattr(thing, 'items'):
     print thing.items()
   else:
     try:
       for x in thing:
         iterprint(x)
     except TypeError:
       print thing

The gist is that iterprint will print any generator down to its 
non-iterable components--it works fine for my purpose here, but I 
included the code for the curious.

When I apply iterprint in the following manner (only change is the 
iterprint line) everything looks fine and my "regrouped" generators in 
"serieses" generate what they are supposed to when iterprinting. The 
iterprint at this point shows that everything is working just the way I 
want (I can see the last item in "serieses" iterprints just fine).

def serialize(table, keyer=_keyer,
                      selector=_selector,
                      keyfunc=_keyfunc,
                      series_keyfunc=_series_keyfunc):
   keyed = izip(imap(keyer, table), table)
   filtered = ifilter(selector, keyed)
   serialized = groupby(filtered, series_keyfunc)
   serieses = []
   for s_name, series in serialized:
     grouped = groupby(series, keyfunc)
     regrouped = ((k, (v[1] for v in g)) for (k,g) in grouped)
     serieses.append((s_name, regrouped))
     iterprint(serieses)
   for s in serieses:
     yield s

Now, here's the rub. When I apply iterprint in the following manner, it 
looks like my generator ("regrouped") gets consumed (note the only 
change is a two space de-dent of the iterprint call--the printing is 
outside the loop):

def serialize(table, keyer=_keyer,
                      selector=_selector,
                      keyfunc=_keyfunc,
                      series_keyfunc=_series_keyfunc):
   keyed = izip(imap(keyer, table), table)
   filtered = ifilter(selector, keyed)
   serialized = groupby(filtered, series_keyfunc)
   serieses = []
   for s_name, series in serialized:
     grouped = groupby(series, keyfunc)
     regrouped = ((k, (v[1] for v in g)) for (k,g) in grouped)
     serieses.append((s_name, regrouped))
   iterprint(serieses)
   for s in serieses:
     yield s

Now, what is consuming my "regrouped" generator when going from inside 
the loop to outside?

Thanks in advance for any clue.

py> print version
2.5.1 (r251:54869, Apr 18 2007, 22:08:04)
[GCC 4.0.1 (Apple Computer, Inc. build 5367)]

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com



More information about the Python-list mailing list