Turning f(callback) into a generator
Peter Otten
__peter__ at web.de
Thu Dec 4 07:55:53 EST 2003
Jimmy Retzlaff wrote:
> Peter Otten wrote:
[...]
>> However, I did not succeed in turning the old os.path.walk(), i. e. a
>> function taking a callback, into a generator. Is there a general way
> to do
>> it without having to store all intermediate results first?
>>
>>
>> Peter
>>
>> PS. No, I don't have a use case. Threads welcome if all else fails :-)
>
> I do have a use case. I've wrapped a C library for extracting files from
> compressed archives (http://www.averdevelopment.com/python/). The C
> library uses a callback, but exposing a generator in Python is much more
> natural in my case. Of course, if you can change the code making the
> callbacks you can typically just replace the call to the callback
> function with a yield statement, but that doesn't work if you don't have
> access to the source (or don't want the source changed).
>
> The reason I created my library is that my Python program reads in,
> line-by-line, 100GB of data that has been compressed into a 2GB archive.
> So, obviously I can't read it all in and then start yielding and I
> really don't want to expand it to a disk file and then read it in, I
> want to work on it as it is decompressed.
>
> My solution used threads. I'd love to know if someone comes up with a
> solution not involving threads. Here's a simple example of my
> thread-based approach:
[Nice example of a thread-based generator, thanks]
So far, no one has come up with a solution that avoids threads - short of
Bengt Richter's ideas on changing the language that is.
Below is my attempt on generalizing your code to turn any visit(callback)
into a generator (not tested beyond the __main__ stuff):
import Queue
import threading
# mark the position of the callback function in the argument list of visit()
CALLBACK = object()
def cbwrapper(visit, queuesize, *args, **kwd):
"""Wrap visit(..., callback, ...) as a generator using threads"""
queue = Queue.Queue(maxsize=queuesize)
sentinel = object()
# XXX should keyword args be allowed?
# XXX should the len(args) == 1 case be tweaked to
# return args[0] instead of a tuple?
def callback(*args):
queue.put(args)
# replace the CALLBACK placeholder with the actual callback function
args = list(args)
try:
args[args.index(CALLBACK)] = callback
except ValueError:
for key, value in kwd.iteritems():
if value is CALLBACK:
kwd[key] = callback
break
else:
# XXX raise Exception("Don't know where to put callback")
# default to last positional for now
args.append(callback)
def launcher():
# XXX what if visit throws an exception?
try:
visit(*args, **kwd)
except Exception, e:
queue.put(e)
else:
queue.put(sentinel)
threading.Thread(target=launcher).start()
while True:
value = queue.get()
if value is sentinel:
break
elif isinstance(value, Exception):
raise value
yield value
if __name__ == "__main__":
def counter(length, callback):
for i in range(length):
callback(i)
def multiply(first, callback, second):
for i in first:
for k in second:
callback(i, k, i*k)
def divide(first, second, callback):
for i in first:
for k in second:
callback(i, k, i/k)
for i, in cbwrapper(counter, 1, 10):
print i,
print
for i, k, ik in cbwrapper(multiply, 1, range(3), CALLBACK, range(2)):
print "%d * %d = %d" % (i, k, ik)
for i, k, ik in cbwrapper(divide, 1, range(3), range(2)[::-1]):
print "%d / %d = %d" % (i, k, ik)
The question that has come up writing the above, how you would handle
exceptions in the thread? I put them in the queue and reraise them in the
main thread, but I'm not feeling very confident about it...
Peter
More information about the Python-list
mailing list