[stdlib-sig] futures - a new package for asynchronous execution

Brian Quinlan brian at sweetapp.com
Sat Jan 16 00:06:39 CET 2010


On 16 Jan 2010, at 00:56, Anh Hai Trinh wrote:

>> I'm not sure that I'd agree with the simpler API part though :-)
>
> I was referring to your old API. Still, we are both obviously very
> biased here :-p

For sure. I'm definitely used to looking at Future-style code so I  
find the model intuitive.

>> Does ThreadPool using some
>> sort of balancing strategy if poolsize where set to < len(URLs)?
>
> Yes, of course! Otherwise it wouldn't really qualify as a pool.
>
>> "retrieve" seems to take multiple url arguments.
>
> Correct. `retrieve` is simply a generator that retrieve URLs
> sequentially, the ThreadPool distributes the input stream so that each
> workers get an iterator over its work load.

That's a neat idea - it saves you the overhead of a function call.

>>> If delicate job control is necessary, an Executor can be used. It is
>>> implemented on top of the pool, and offers submit(*items) which
>>> returns job ids to be used for cancel() and status().  Jobs can be
>>> submitted and canceled concurrently.
>>
>> What type is each "item" supposed to be?
>
> Whatever your iterator-processing function is supposed to process.
> The URLs example can be written using an Executor as:
>
> e = Executor(ThreadPool, retrieve)
> e.submit(*URLs)
> e.close()
> print list(e.result)

There are two common scenarios where I have seen Future-like things  
used:
1. Do the same operation on different data e.g. copy some local files  
to several remote servers
2. Do several different operations on different data e.g.  
parallelizing code like this:

db = setup_database(host, port)
data = parse_big_xml_file(request.body)
save_data_in_db(data, db)

I'm trying to get a handle on how streams accommodates the second  
case. With futures, I would write something like this:

db_future = executor.submit(setup_database, host, port)
data_future = executor.submit(parse_big_xml_file, data)
# Maybe do something else here.
wait(
     [db_future, data_future],
     timeout=10,
     # If either function raises then we can't complete the operation so
     # there is no reason to make the user wait.
     return_when=FIRST_EXCEPTION)

db = db_future.result(timeout=0)
data = data.result(timeout=0)
save_data_in_db(data, db)

Cheers,
Brian

>
>> Can I wait on several items?
>
> Do you mean wait for several particular input values to be completed?
> As of this moment, yes but rather inefficiently. I have not considered
> it is a useful feature, especially when taking a wholesale,
> list-processing view: that a worker pool process its input stream
> _out_of_order_.  If you just want to wait for several particular
> items, it means you need their outputs _in_order_, so why do you want
> to use a worker pool in the first place?
>
> However, I'd be happy to implement something like
> Executor.submit(*items, wait=True).
>
> Cheers,
> aht
> _______________________________________________
> stdlib-sig mailing list
> stdlib-sig at python.org
> http://mail.python.org/mailman/listinfo/stdlib-sig



More information about the stdlib-sig mailing list