[stdlib-sig] futures - a new package for asynchronous execution

Tue Feb 23 09:31:29 CET 2010

On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:

> Where's the current version of the PEP?

http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt

> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian at sweetapp.com>  
> wrote:
>>
>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>
>>> Several comments:
>>>
>>> * I see you using the Executors as context managers, but no  
>>> mention in
>>> the specification about what that does.
>>
>> I can't see such documentation for built-in Python objects. To be
>> symmetrical with the built-in file object, i've documented the  
>> context
>> manager behavior as part of the Executor.shutdown method.
>
> For locks, it has its own section:
> http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
> But I don't care too much about the formatting as long as the PEP
> specifies it clearly.

Added.

>>> You need to specify it. (Your
>>> current implementation doesn't wait in __exit__, which I think is  
>>> the
>>> opposite of what you agreed with Antoine, but you can fix that after
>>> we get general agreement on the interface.)
>>
>> Fixed.
>>
>>> * I'd like users to be able to write Executors besides the simple
>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
>>> enable
>>> that, could you document what the subclassing interface for Executor
>>> looks like? that is, what code do user-written Executors need to
>>> include?
>>
>> I can do that.
>>
>>> I don't think it should include direct access to
>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>
>> Would it be reasonable to make Future an ABC, make a _Future that  
>> subclasses
>> it for internal usage and let other Executor subclasses define  
>> their own
>> Futures.
>
> What interface are you proposing for the Future ABC? It'll need to
> support wait() and as_completed() from non-library Futures. I wouldn't
> mind making the type just a duck-type (it probably wouldn't even need
> an ABC), although I'd like to give people trying to implement their
> own Executors as much help as possible. I'd assumed that giving Future
> some public hooks would be easier than fixing the wait() interface,
> but I could be wrong.

See below.

>>> * Could you specify in what circumstances a pure computational
>>> Future-based program may deadlock? (Ideally, that would be "never".)
>>> Your current implementation includes two such deadlocks, for which
>>> I've attached a test.
>>
>>> * Do you want to make calling Executor.shutdown(wait=True) from  
>>> within
>>> the same Executor 1) detect the problem and raise an exception, 2)
>>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>>> and then let the current one continue?
>>
>> What about a note saying that using any futures functions or  
>> methods from
>> inside a scheduled call is likely to lead to deadlock unless care  
>> is taken?
>
> Jesse pointed out that one of the first things people try to do when
> using concurrency libraries is to try to use them inside themselves.
> I've also tried to use a futures library that forbade nested use
> ('cause I wrote it), and it was a real pain.

You can use the API from within Executor-invoked functions - you just  
have to be careful.

> It should be easy enough to detect that the caller of
> Executor.shutdown is one of the Executor's threads or processes, but I
> wouldn't mind making the obviously incorrect "wait for my own
> completion" deadlock or throw an exception, and it would make sense to
> give Executor implementors their choice of which to do.
>
>>> * This is a nit, but I think that the parameter names for
>>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>>> people can parametrize their code on those constructors. Right now
>>> they're "max_threads" and "max_processes", respectively. I might
>>> suggest "max_workers".
>>
>> I'm not sure that I like that. In general consolidating the  
>> constructors for
>> executors is not going to be possible.
>
> In general, yes, but in this case they're the same, and we should try
> to avoid gratuitous differences.

num_threads and num_processes is more explicit than num_workers but I  
don't really care so I changed it.

>>> * You should document the exception that happens when you try to  
>>> pass
>>> a ProcessPoolExecutor as an argument to a task executing inside
>>> another ProcessPoolExecutor, or make it not throw an exception and
>>> document that.
>>
>> The ProcessPoolExecutor limitations are the same as the  
>> multiprocessing
>> limitations. I can provide a note about that and a link to that  
>> module's
>> documentation.
>
> And multiprocessing doesn't document that its Pool requires
> picklability and isn't picklable itself. Saying that the
> ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be
> enough for your PEP.

Done.

>>> * If it's intentional, you should probably document that if one
>>> element of a map() times out, there's no way to come back and wait
>>> longer to retrieve it or later elements.
>>
>> That's not obvious?
>
> Maybe.
>
>>> * You still mention run_to_futures, run_to_results, and FutureList,
>>> even though they're no longer proposed.
>>
>> Done.
>>
>>>
>>> * wait() should probably return a named_tuple or an object so we  
>>> don't
>>> have people writing the unreadable "wait(fs)[0]".
>>
>> Done.
>>
>>>
>>> * Instead of "call finishes" in the description of the return_when
>>> parameter, you might describe the behavior in terms of futures
>>> becoming done since that's the accessor function you're using.
>>
>> Done.
>>
>>
>>> * Is RETURN_IMMEDIATELY just a way to categorize futures into done  
>>> and
>>> not? Is that useful over [f for f in fs if f.done()]?
>>
>> That was an artifact of the previous implementation; removed.
>>
>>> * After shutdown, is RuntimeError the right exception, or should  
>>> there
>>> be a more specific exception?
>>
>> RunTimeError is what is raised in similar situations by threading  
>> e.g. when
>> starting an already started thread.
>
> Ok, works for me.
>
> On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan <brian at sweetapp.com>  
> wrote:
>> A few extra points.
>>
>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>>
>>> * I'd like users to be able to write Executors besides the simple
>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To  
>>> enable
>>> that, could you document what the subclassing interface for Executor
>>> looks like? that is, what code do user-written Executors need to
>>> include? I don't think it should include direct access to
>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>
>> One of the difficulties here is:
>> 1. i don't want to commit to the internal implementation of Futures
>
> Yep, that's why to avoid requiring them to have direct access to the
> internal variables.
>
>> 2. it might be hard to make it clear which methods are public to  
>> users and
>> which methods are public to executor implementors
>
> One way to do it would be to create another type for implementors and
> pass it to the Future constructor.

If we change the future interface like so:

class Future(object):
   # Existing public methods
   ...
   # For executors only
   def set_result(self):
     ...
   def set_exception(self):
     ...
   def check_cancel_and_notify(self):
     # returns True if the Future was cancelled and
     # notifies anyone who cares i.e. waiters for
     # wait() and as_completed

Then an executor implementor need only implement:

def submit(self, fn, *args, **kwargs):

With the logic to actual execute fn(*args, **kwargs) and update the  
returned future, of course.

Thoughts?

>>> * Could you specify in what circumstances a pure computational
>>> Future-based program may deadlock? (Ideally, that would be "never".)
>>> Your current implementation includes two such deadlocks, for which
>>> I've attached a test.
>>
>> Thanks for the tests but I wasn't planning on changing this  
>> behavior. I
>> don't really like the idea of using the calling thread to perform  
>> the wait
>> because:
>> 1. not all executors will be able to implement that behavior
>
> Why not?

What if my executor sends the data to a remove cluster for execution  
and running it locally isn't feasible?

> Thread pools can implement it,

Do you have a strategy in mind that would let you detect arbitrary  
deadlocks in threaded futures?

Cheers,
Brian

> and process pools make it
> impossible to create cycles, so they also can't deadlock.
>
>> 2. it can only be made to work if no wait time is specified
>
> With a wait time, you have to avoid stealing work, but it's also
> guaranteed not to deadlock, so it's fine.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/stdlib-sig/attachments/20100223/efc3cf7c/attachment-0001.html>