[stdlib-sig] futures - a new package for asynchronous execution
Brian Quinlan
brian at sweetapp.com
Tue Feb 23 09:31:29 CET 2010
On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote:
> Where's the current version of the PEP?
http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt
> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan <brian at sweetapp.com>
> wrote:
>>
>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>
>>> Several comments:
>>>
>>> * I see you using the Executors as context managers, but no
>>> mention in
>>> the specification about what that does.
>>
>> I can't see such documentation for built-in Python objects. To be
>> symmetrical with the built-in file object, i've documented the
>> context
>> manager behavior as part of the Executor.shutdown method.
>
> For locks, it has its own section:
> http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement
> But I don't care too much about the formatting as long as the PEP
> specifies it clearly.
Added.
>>> You need to specify it. (Your
>>> current implementation doesn't wait in __exit__, which I think is
>>> the
>>> opposite of what you agreed with Antoine, but you can fix that after
>>> we get general agreement on the interface.)
>>
>> Fixed.
>>
>>> * I'd like users to be able to write Executors besides the simple
>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To
>>> enable
>>> that, could you document what the subclassing interface for Executor
>>> looks like? that is, what code do user-written Executors need to
>>> include?
>>
>> I can do that.
>>
>>> I don't think it should include direct access to
>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>
>> Would it be reasonable to make Future an ABC, make a _Future that
>> subclasses
>> it for internal usage and let other Executor subclasses define
>> their own
>> Futures.
>
> What interface are you proposing for the Future ABC? It'll need to
> support wait() and as_completed() from non-library Futures. I wouldn't
> mind making the type just a duck-type (it probably wouldn't even need
> an ABC), although I'd like to give people trying to implement their
> own Executors as much help as possible. I'd assumed that giving Future
> some public hooks would be easier than fixing the wait() interface,
> but I could be wrong.
See below.
>>> * Could you specify in what circumstances a pure computational
>>> Future-based program may deadlock? (Ideally, that would be "never".)
>>> Your current implementation includes two such deadlocks, for which
>>> I've attached a test.
>>
>>> * Do you want to make calling Executor.shutdown(wait=True) from
>>> within
>>> the same Executor 1) detect the problem and raise an exception, 2)
>>> deadlock, 3) unspecified behavior, or 4) wait for all other threads
>>> and then let the current one continue?
>>
>> What about a note saying that using any futures functions or
>> methods from
>> inside a scheduled call is likely to lead to deadlock unless care
>> is taken?
>
> Jesse pointed out that one of the first things people try to do when
> using concurrency libraries is to try to use them inside themselves.
> I've also tried to use a futures library that forbade nested use
> ('cause I wrote it), and it was a real pain.
You can use the API from within Executor-invoked functions - you just
have to be careful.
> It should be easy enough to detect that the caller of
> Executor.shutdown is one of the Executor's threads or processes, but I
> wouldn't mind making the obviously incorrect "wait for my own
> completion" deadlock or throw an exception, and it would make sense to
> give Executor implementors their choice of which to do.
>
>>> * This is a nit, but I think that the parameter names for
>>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so
>>> people can parametrize their code on those constructors. Right now
>>> they're "max_threads" and "max_processes", respectively. I might
>>> suggest "max_workers".
>>
>> I'm not sure that I like that. In general consolidating the
>> constructors for
>> executors is not going to be possible.
>
> In general, yes, but in this case they're the same, and we should try
> to avoid gratuitous differences.
num_threads and num_processes is more explicit than num_workers but I
don't really care so I changed it.
>>> * You should document the exception that happens when you try to
>>> pass
>>> a ProcessPoolExecutor as an argument to a task executing inside
>>> another ProcessPoolExecutor, or make it not throw an exception and
>>> document that.
>>
>> The ProcessPoolExecutor limitations are the same as the
>> multiprocessing
>> limitations. I can provide a note about that and a link to that
>> module's
>> documentation.
>
> And multiprocessing doesn't document that its Pool requires
> picklability and isn't picklable itself. Saying that the
> ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be
> enough for your PEP.
Done.
>>> * If it's intentional, you should probably document that if one
>>> element of a map() times out, there's no way to come back and wait
>>> longer to retrieve it or later elements.
>>
>> That's not obvious?
>
> Maybe.
>
>>> * You still mention run_to_futures, run_to_results, and FutureList,
>>> even though they're no longer proposed.
>>
>> Done.
>>
>>>
>>> * wait() should probably return a named_tuple or an object so we
>>> don't
>>> have people writing the unreadable "wait(fs)[0]".
>>
>> Done.
>>
>>>
>>> * Instead of "call finishes" in the description of the return_when
>>> parameter, you might describe the behavior in terms of futures
>>> becoming done since that's the accessor function you're using.
>>
>> Done.
>>
>>
>>> * Is RETURN_IMMEDIATELY just a way to categorize futures into done
>>> and
>>> not? Is that useful over [f for f in fs if f.done()]?
>>
>> That was an artifact of the previous implementation; removed.
>>
>>> * After shutdown, is RuntimeError the right exception, or should
>>> there
>>> be a more specific exception?
>>
>> RunTimeError is what is raised in similar situations by threading
>> e.g. when
>> starting an already started thread.
>
> Ok, works for me.
>
> On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan <brian at sweetapp.com>
> wrote:
>> A few extra points.
>>
>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote:
>>>
>>> * I'd like users to be able to write Executors besides the simple
>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To
>>> enable
>>> that, could you document what the subclassing interface for Executor
>>> looks like? that is, what code do user-written Executors need to
>>> include? I don't think it should include direct access to
>>> future._state like ThreadPoolExecutor uses, if at all possible.
>>
>> One of the difficulties here is:
>> 1. i don't want to commit to the internal implementation of Futures
>
> Yep, that's why to avoid requiring them to have direct access to the
> internal variables.
>
>> 2. it might be hard to make it clear which methods are public to
>> users and
>> which methods are public to executor implementors
>
> One way to do it would be to create another type for implementors and
> pass it to the Future constructor.
If we change the future interface like so:
class Future(object):
# Existing public methods
...
# For executors only
def set_result(self):
...
def set_exception(self):
...
def check_cancel_and_notify(self):
# returns True if the Future was cancelled and
# notifies anyone who cares i.e. waiters for
# wait() and as_completed
Then an executor implementor need only implement:
def submit(self, fn, *args, **kwargs):
With the logic to actual execute fn(*args, **kwargs) and update the
returned future, of course.
Thoughts?
>>> * Could you specify in what circumstances a pure computational
>>> Future-based program may deadlock? (Ideally, that would be "never".)
>>> Your current implementation includes two such deadlocks, for which
>>> I've attached a test.
>>
>> Thanks for the tests but I wasn't planning on changing this
>> behavior. I
>> don't really like the idea of using the calling thread to perform
>> the wait
>> because:
>> 1. not all executors will be able to implement that behavior
>
> Why not?
What if my executor sends the data to a remove cluster for execution
and running it locally isn't feasible?
> Thread pools can implement it,
Do you have a strategy in mind that would let you detect arbitrary
deadlocks in threaded futures?
Cheers,
Brian
> and process pools make it
> impossible to create cycles, so they also can't deadlock.
>
>> 2. it can only be made to work if no wait time is specified
>
> With a wait time, you have to avoid stealing work, but it's also
> guaranteed not to deadlock, so it's fine.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/stdlib-sig/attachments/20100223/efc3cf7c/attachment-0001.html>
More information about the stdlib-sig
mailing list