[stdlib-sig] futures - PEP and API

Sat Nov 7 17:13:24 CET 2009

> The PEP is right here:
> 
> http://code.google.com/p/pythonfutures/source/browse/trunk/PEP.txt
> 
> I'm interested in hearing specific complaints about the API in the
> context of what it's trying to *do*. The only thing which jumped out
> at me was the number of methods on FutureList;

Yes that would be the first complaint. Then many of those methods are as
(un)trustable as, say, Queue.qsize().

An example :
        """`done_futures()`

        Return an iterator over all `Future` instances that completed or
        were cancelled."""

First, it claims to return an iterator but the internal container could
mutate while iterating (since it can be mutated when a task terminates
in another thread). So the API looks broken with respect to what the
semantics dictate. It should probably return a distinct container (list
or set) instead.

Second, by the time the result is processed by the caller, there's no
way to know if the information is still valid or not. It's entirely
speculative, which makes it potentially deceiving -- and should be
mentioned in the doc.

        """`has_done_futures()`

        Return `True` if any `Future` in the list has completed or was
        successfully cancelled."""

Same problem. Please note that it can be removed if `done_futures()`
returns a container, since you then just have to do a boolean check on
the container (that would remove 5 methods :-)).

Then about the Future API itself. I would argue that if we want it to be
a simple helper, it should be as simple to use as a weakref.

That is, rather than :

        """`result(timeout=None)`

        Return the value returned by the call.
        [...]

        `exception(timeout=None)`

        Return the exception raised by the call."""

Make the object callable, such as `future(timeout=None)` either returns
the computed result (if successful), raises an exception (if failed) or
raises a TimeoutError.

Then about the Executor API. I don't understand why we have the
possibility to wait on a FutureList *and* on the Executor's
run_to_results() method. I think all wait-type methods should be folded
in to the Future or FutureList API, and the Executor should only
generate that Future(List).

Practically, there should be two ways to wait for multiple results,
depending on whether you need the results ordered or not. In the web
crawling situation given as example, it is silly to wait for the results
in order rather than process each result as soon as it gets available.
(*)

I don't understand why the Executor seems to be used as a context
manager in the examples. Its resources are still alive after the "with"
since the tasks are still executin, so it can't possibly have cleaned up
anything, has it?

(*) And, of course, you start to understand why a callback-based API
such as Deferreds makes a lot of sense...

Regards

Antoine.