Turning f(callback) into a generator

Jimmy Retzlaff jimmy at retzlaff.com
Wed Dec 3 21:07:08 EST 2003


Peter Otten wrote:
> It's easy to write a function that wraps a generator and provides a
> callback. E. g.:
> 
> import os, sys
> 
> def walk(path, visit):
>     """ Emulate os.path.walk() (simplified) using os.walk()"""
>     for dir, folders, files in os.walk(path):
>         visit(dir, folders + files)
> 
> 
> if __name__ == "__main__":
>     walk(".", lambda d, f: sys.stdout.write(d + "\n"))
> 
> 
> However, I did not succeed in turning the old os.path.walk(), i. e. a
> function taking a callback, into a generator. Is there a general way
to do
> it without having to store all intermediate results first?
> 
> 
> Peter
> 
> PS. No, I don't have a use case. Threads welcome if all else fails :-)

I do have a use case. I've wrapped a C library for extracting files from
compressed archives (http://www.averdevelopment.com/python/). The C
library uses a callback, but exposing a generator in Python is much more
natural in my case. Of course, if you can change the code making the
callbacks you can typically just replace the call to the callback
function with a yield statement, but that doesn't work if you don't have
access to the source (or don't want the source changed).

The reason I created my library is that my Python program reads in,
line-by-line, 100GB of data that has been compressed into a 2GB archive.
So, obviously I can't read it all in and then start yielding and I
really don't want to expand it to a disk file and then read it in, I
want to work on it as it is decompressed.

My solution used threads. I'd love to know if someone comes up with a
solution not involving threads. Here's a simple example of my
thread-based approach:


import Queue
import threading

def counter(length, callback):
    for i in range(length):
        callback(i)

def counterWrapper(length):
    """Wrap counter(...) as a generator using threads"""

    # Increase maxsize to allow more read-ahead. A value less
    # than 1 would allow arbitray read-ahead.
    queue = Queue.Queue(maxsize=1)

    # More bookkeeping would make a sentinel unnecessary, but if
    # you know of a value that can't be validly passed into the
    # callback, then it's simpler to just use a sentinel.
    sentinel = None

    def callback(i):
        queue.put(i)

    def launcher():
        counter(length, callback)
        queue.put(sentinel)

    threading.Thread(target=launcher).start()

    while True:
        value = queue.get()
        if value != sentinel:
            yield value
        else:
            break

if __name__ == "__main__":
    # callback
    def callback(i):
        print i
    counter(10, callback)

    print

    # generator
    for i in counterWrapper(10):
        print i


Jimmy






More information about the Python-list mailing list