Turning f(callback) into a generator

Peter Otten __peter__ at web.de
Thu Dec 4 07:55:53 EST 2003


Jimmy Retzlaff wrote:

> Peter Otten wrote:

[...]

>> However, I did not succeed in turning the old os.path.walk(), i. e. a
>> function taking a callback, into a generator. Is there a general way
> to do
>> it without having to store all intermediate results first?
>> 
>> 
>> Peter
>> 
>> PS. No, I don't have a use case. Threads welcome if all else fails :-)
> 
> I do have a use case. I've wrapped a C library for extracting files from
> compressed archives (http://www.averdevelopment.com/python/). The C
> library uses a callback, but exposing a generator in Python is much more
> natural in my case. Of course, if you can change the code making the
> callbacks you can typically just replace the call to the callback
> function with a yield statement, but that doesn't work if you don't have
> access to the source (or don't want the source changed).
> 
> The reason I created my library is that my Python program reads in,
> line-by-line, 100GB of data that has been compressed into a 2GB archive.
> So, obviously I can't read it all in and then start yielding and I
> really don't want to expand it to a disk file and then read it in, I
> want to work on it as it is decompressed.
> 
> My solution used threads. I'd love to know if someone comes up with a
> solution not involving threads. Here's a simple example of my
> thread-based approach:

[Nice example of a thread-based generator, thanks]

So far, no one has come up with a solution that avoids threads - short of
Bengt Richter's ideas on changing the language that is.

Below is my attempt on generalizing your code to turn any visit(callback)
into a generator (not tested beyond the __main__ stuff):

import Queue
import threading

# mark the position of the callback function in the argument list of visit()
CALLBACK = object()

def cbwrapper(visit, queuesize, *args, **kwd):
    """Wrap visit(..., callback, ...) as a generator using threads"""

    queue = Queue.Queue(maxsize=queuesize)

    sentinel = object()

    # XXX should keyword args be allowed?
    # XXX should the len(args) == 1 case be tweaked to
    #     return args[0] instead of a tuple?
    def callback(*args):
        queue.put(args)

    # replace the CALLBACK placeholder with the actual callback function
    args = list(args)
    try:
        args[args.index(CALLBACK)] = callback
    except ValueError:
        for key, value in kwd.iteritems():
            if value is CALLBACK:
                kwd[key] = callback
                break
        else:
            # XXX raise Exception("Don't know where to put callback")
            # default to last positional for now
            args.append(callback)

    def launcher():
        # XXX what if visit throws an exception?
        try:
            visit(*args, **kwd)
        except Exception, e:
            queue.put(e)
        else:
            queue.put(sentinel)

    threading.Thread(target=launcher).start()

    while True:
        value = queue.get()
        if value is sentinel:
            break
         elif isinstance(value, Exception):
             raise value
        yield value

if __name__ == "__main__":

    def counter(length, callback):
        for i in range(length):
            callback(i)
    def multiply(first, callback, second):
        for i in first:
            for k in second:
                callback(i, k, i*k)
    def divide(first, second, callback):
        for i in first:
            for k in second:
                callback(i, k, i/k)

    for i, in cbwrapper(counter, 1, 10):
        print i,
    print
    for i, k, ik in cbwrapper(multiply, 1, range(3), CALLBACK, range(2)):
        print "%d * %d = %d" % (i, k, ik)
    for i, k, ik in cbwrapper(divide, 1, range(3), range(2)[::-1]):
        print "%d / %d = %d" % (i, k, ik)

The question that has come up writing the above, how you would handle
exceptions in the thread? I put them in the queue and reraise them in the
main thread, but I'm not feeling very confident about it...

Peter




More information about the Python-list mailing list