From benjamin at python.org Mon Feb 1 00:14:24 2010 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 31 Jan 2010 17:14:24 -0600 Subject: [stdlib-sig] socket.makefile() questions In-Reply-To: <1126304848.5692691264977988530.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> References: <1199139348.5692621264977965383.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> <1126304848.5692691264977988530.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> Message-ID: <1afaf6161001311514q3c4586c7i84de402b055a9240@mail.gmail.com> 2010/1/31 : > My apologies if this message is to the wrong group. > > I have been experimenting with socket.makefile()from Python 3.1.1. ?I have not had much difficulty reading from the returned file object, but I don't understand the behavior when trying to write (send on the socket). ?I'm hoping that someone can explain how this is supposed to work. > > I find that this works for an established connection on socket s: > fd = s.makefile('wb', buffering = 0) > fd.write("This is a test message\n".encode('ascii')) > > A mode of 'rwb' also works. ?The object fd is of type SocketIO. > > fd = s.makefile('w', buffering = 0) -> ValueError exception > fd = s.makefile('w') -> io.BufferedWriter, which does not send data. > fd = s.makefile('wb') -> io.TextIOWrapper, which does not send data. > > The default value of the "buffering" parameter is None, which from my testing has a different result than 0 (zero). > > So, questions: > 1) Why does buffering = None result in a buffered file object? > 2) Are there bugs or incomplete work with socket.makefile(), io.BufferedWriter and io.TextIOWrapper in terms of why the latter two objects are returned, but fail to send data? It sounds like that function is broken and buggy in python 3 and a few bug reports need to be filed at bugs.python.org. -- Regards, Benjamin From solipsis at pitrou.net Mon Feb 1 00:18:34 2010 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 01 Feb 2010 00:18:34 +0100 Subject: [stdlib-sig] socket.makefile() questions In-Reply-To: <1126304848.5692691264977988530.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> References: <1126304848.5692691264977988530.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> Message-ID: <1264979914.4682.4.camel@localhost> > fd = s.makefile('w', buffering = 0) -> ValueError exception > fd = s.makefile('w') -> io.BufferedWriter, which does not send data. > fd = s.makefile('wb') -> io.TextIOWrapper, which does not send data. Have you tried fd.flush() after writing your data? From tim at ksu.edu Mon Feb 1 00:37:20 2010 From: tim at ksu.edu (Timothy Bower) Date: Sun, 31 Jan 2010 15:37:20 -0800 (PST) Subject: [stdlib-sig] socket.makefile() questions In-Reply-To: <1264979914.4682.4.camel@localhost> Message-ID: <465674023.5697041264981040368.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> You are correct that fd.flush() makes a difference. I thought that flush() would only affect the ability to read after the write, but actually the data was not sent until the flush() operation. I'll do some more testing. Thank you. -- Tim Bower Assistant Professor Kansas State University at Salina Computer Systems Technology tim at ksu.edu 785-826-2920 ----- Original Message ----- From: "Antoine Pitrou" To: "stdlib-sig" Sent: Sunday, January 31, 2010 5:18:34 PM GMT -06:00 US/Canada Central Subject: Re: [stdlib-sig] socket.makefile() questions > fd = s.makefile('w', buffering = 0) -> ValueError exception > fd = s.makefile('w') -> io.BufferedWriter, which does not send data. > fd = s.makefile('wb') -> io.TextIOWrapper, which does not send data. Have you tried fd.flush() after writing your data? _______________________________________________ stdlib-sig mailing list stdlib-sig at python.org http://mail.python.org/mailman/listinfo/stdlib-sig From tim at ksu.edu Mon Feb 1 03:56:54 2010 From: tim at ksu.edu (Timothy Bower) Date: Sun, 31 Jan 2010 18:56:54 -0800 (PST) Subject: [stdlib-sig] socket.makefile() questions In-Reply-To: <465674023.5697041264981040368.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> Message-ID: <1711156070.5711621264993014587.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> fd.flush() seems to be the secret - worked as expected in all tests. If possible, I'd like to suggest that the documentation be updated to explicitly say that a buffered file object is returned unless the buffering parameter is set to 0 (zero). Setting buffering = None (default value) still returns a buffered object, which seems a little counter intuitive, but acceptable if the documentation makes it clear. Thank you. -- Tim Bower Assistant Professor Kansas State University at Salina Computer Systems Technology tim at ksu.edu ----- Original Message ----- From: "Timothy Bower" To: "Antoine Pitrou" Cc: "stdlib-sig" Sent: Sunday, January 31, 2010 5:37:20 PM GMT -06:00 US/Canada Central Subject: Re: [stdlib-sig] socket.makefile() questions You are correct that fd.flush() makes a difference. I thought that flush() would only affect the ability to read after the write, but actually the data was not sent until the flush() operation. I'll do some more testing. Thank you. -- Tim Bower Assistant Professor Kansas State University at Salina Computer Systems Technology tim at ksu.edu ----- Original Message ----- From: "Antoine Pitrou" To: "stdlib-sig" Sent: Sunday, January 31, 2010 5:18:34 PM GMT -06:00 US/Canada Central Subject: Re: [stdlib-sig] socket.makefile() questions > fd = s.makefile('w', buffering = 0) -> ValueError exception > fd = s.makefile('w') -> io.BufferedWriter, which does not send data. > fd = s.makefile('wb') -> io.TextIOWrapper, which does not send data. Have you tried fd.flush() after writing your data? _______________________________________________ stdlib-sig mailing list stdlib-sig at python.org http://mail.python.org/mailman/listinfo/stdlib-sig _______________________________________________ stdlib-sig mailing list stdlib-sig at python.org http://mail.python.org/mailman/listinfo/stdlib-sig From rdmurray at bitdance.com Mon Feb 1 18:03:01 2010 From: rdmurray at bitdance.com (R. David Murray) Date: Mon, 01 Feb 2010 12:03:01 -0500 Subject: [stdlib-sig] socket.makefile() questions In-Reply-To: <1711156070.5711621264993014587.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> References: <1711156070.5711621264993014587.JavaMail.root@mbs107.c101.zcs.mail.ac4.yahoo.net> Message-ID: <20100201170301.1AFFC1FB3F8@kimball.webabinitio.net> On Sun, 31 Jan 2010 18:56:54 -0800, Timothy Bower wrote: > fd.flush() seems to be the secret - worked as expected in all tests. > If possible, I'd like to suggest that the documentation be updated to > explicitly say that a buffered file object is returned unless the > buffering parameter is set to 0 (zero). Setting buffering = None (def > ault value) still returns a buffered object, which seems a little > counter intuitive, but acceptable if the documentation makes it clear. It is already documented. socket.makefile says the args are interpreted as for the built in function 'open' and provides a link to those docs, which clearly say that the default is that buffering is on. None is often used to mean "default" in Python functions. As for your original question as to whether or not this is the right list, no it isn't :). python-list is the best place to post this kind of question, and then if you find you have a real bug, file a bug report on the tracker. -- R. David Murray www.bitdance.com Business Process Automation - Network/Server Management - Routers/Firewalls From jyasskin at gmail.com Sun Feb 21 04:41:34 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 21 Feb 2010 03:41:34 +0000 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> References: <1257607273.3437.13.camel@localhost> <4222a8490911070732p5b5cdc6cj2a5d44416658e119@mail.gmail.com> <5d44f72f0911071137m3a499f99j9edc604bc8b9b127@mail.gmail.com> <37E25344-20F9-4D42-B982-CA43D24FA806@sweetapp.com> <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> Message-ID: <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> Several comments: * I see you using the Executors as context managers, but no mention in the specification about what that does. You need to specify it. (Your current implementation doesn't wait in __exit__, which I think is the opposite of what you agreed with Antoine, but you can fix that after we get general agreement on the interface.) * I'd like users to be able to write Executors besides the simple ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable that, could you document what the subclassing interface for Executor looks like? that is, what code do user-written Executors need to include? I don't think it should include direct access to future._state like ThreadPoolExecutor uses, if at all possible. * Could you specify in what circumstances a pure computational Future-based program may deadlock? (Ideally, that would be "never".) Your current implementation includes two such deadlocks, for which I've attached a test. * This is a nit, but I think that the parameter names for ThreadPoolExecutor and ProcessPoolExecutor should be the same so people can parametrize their code on those constructors. Right now they're "max_threads" and "max_processes", respectively. I might suggest "max_workers". * You should document the exception that happens when you try to pass a ProcessPoolExecutor as an argument to a task executing inside another ProcessPoolExecutor, or make it not throw an exception and document that. * If it's intentional, you should probably document that if one element of a map() times out, there's no way to come back and wait longer to retrieve it or later elements. * Do you want to make calling Executor.shutdown(wait=True) from within the same Executor 1) detect the problem and raise an exception, 2) deadlock, 3) unspecified behavior, or 4) wait for all other threads and then let the current one continue? * You still mention run_to_futures, run_to_results, and FutureList, even though they're no longer proposed. * wait() should probably return a named_tuple or an object so we don't have people writing the unreadable "wait(fs)[0]". * Instead of "call finishes" in the description of the return_when parameter, you might describe the behavior in terms of futures becoming done since that's the accessor function you're using. * Is RETURN_IMMEDIATELY just a way to categorize futures into done and not? Is that useful over [f for f in fs if f.done()]? * After shutdown, is RuntimeError the right exception, or should there be a more specific exception? Otherwise, looks good. Thanks! On Fri, Jan 29, 2010 at 2:22 AM, Brian Quinlan wrote: > I've updated the PEP and included it inline. The interesting changes start > in the "Specification" section. > > Cheers, > Brian > > PEP: ? ? ? ? ? ? ? XXX > Title: ? ? ? ? ? ? futures - execute computations asynchronously > Version: ? ? ? ? ? $Revision$ > Last-Modified: ? ? $Date$ > Author: ? ? ? ? ? ?Brian Quinlan > Status: ? ? ? ? ? ?Draft > Type: ? ? ? ? ? ? ?Standards Track > Content-Type: ? ? ?text/x-rst > Created: ? ? ? ? ? 16-Oct-2009 > Python-Version: ? ?3.2 > Post-History: > > ======== > Abstract > ======== > > This PEP proposes a design for a package that facilitates the evaluation of > callables using threads and processes. > > ========== > Motivation > ========== > > Python currently has powerful primitives to construct multi-threaded and > multi-process applications but parallelizing simple operations requires a > lot of > work i.e. explicitly launching processes/threads, constructing a > work/results > queue, and waiting for completion or some other termination condition (e.g. > failure, timeout). It is also difficult to design an application with a > global > process/thread limit when each component invents its own parallel execution > strategy. > > ============= > Specification > ============= > > Check Prime Example > ------------------- > > :: > > ? ?import futures > ? ?import math > > ? ?PRIMES = [ > ? ? ? ?112272535095293, > ? ? ? ?112582705942171, > ? ? ? ?112272535095293, > ? ? ? ?115280095190773, > ? ? ? ?115797848077099, > ? ? ? ?1099726899285419] > > ? ?def is_prime(n): > ? ? ? ?if n % 2 == 0: > ? ? ? ? ? ?return False > > ? ? ? ?sqrt_n = int(math.floor(math.sqrt(n))) > ? ? ? ?for i in range(3, sqrt_n + 1, 2): > ? ? ? ? ? ?if n % i == 0: > ? ? ? ? ? ? ? ?return False > ? ? ? ?return True > > ? ?with futures.ProcessPoolExecutor() as executor: > ? ? ? ?for number, is_prime in zip(PRIMES, executor.map(is_prime, PRIMES)): > ? ? ? ? ? ?print('%d is prime: %s' % (number, is_prime)) > > Web Crawl Example > ----------------- > > :: > > ? ?import futures > ? ?import urllib.request > > ? ?URLS = ['http://www.foxnews.com/', > ? ? ? ? ? ?'http://www.cnn.com/', > ? ? ? ? ? ?'http://europe.wsj.com/', > ? ? ? ? ? ?'http://www.bbc.co.uk/', > ? ? ? ? ? ?'http://some-made-up-domain.com/'] > > ? ?def load_url(url, timeout): > ? ? ? ?return urllib.request.urlopen(url, timeout=timeout).read() > > ? ?with futures.ThreadPoolExecutor(max_threads=5) as executor: > ? ? ? ?future_to_url = dict((executor.submit(load_url, url, 60), url) > ? ? ? ? ? ? ? ? ? ? ? ? ? ? for url in URLS) > > ? ?for future in futures.as_completed(future_to_url): > ? ? ? ?url = future_to_url[future] > ? ? ? ?if future.exception() is not None: > ? ? ? ? ? ?print('%r generated an exception: %s' % (url, > future.exception())) > ? ? ? ?else: > ? ? ? ? ? ?print('%r page is %d bytes' % (url, len(future.result()))) > > Interface > --------- > > The proposed package provides two core classes: `Executor` and `Future`. > An `Executor` receives asynchronous work requests (in terms of a callable > and > its arguments) and returns a `Future` to represent the execution of that > work request. > > Executor > '''''''' > > `Executor` is an abstract class that provides methods to execute calls > asynchronously. > > `submit(fn, *args, **kwargs)` > > Schedules the callable to be executed as fn(*\*args*, *\*\*kwargs*) and > returns > a `Future` instance representing the execution of the function. > > `map(func, *iterables, timeout=None)` > > Equivalent to map(*func*, *\*iterables*) but executed asynchronously and > possibly out-of-order. The returned iterator raises a `TimeoutError` if > `__next__()` is called and the result isn't available after *timeout* > seconds > from the original call to `run_to_results()`. If *timeout* is not specified > or > ``None`` then there is no limit to the wait time. If a call raises an > exception > then that exception will be raised when its value is retrieved from the > iterator. > > `Executor.shutdown(wait=False)` > > Signal the executor that it should free any resources that it is using when > the currently pending futures are done executing. Calls to > `Executor.run_to_futures`, `Executor.run_to_results` and > `Executor.map` made after shutdown will raise `RuntimeError`. > > If wait is `True` then the executor will not return until all the pending > futures are done executing and the resources associated with the executor > have been freed. > > ProcessPoolExecutor > ''''''''''''''''''' > > The `ProcessPoolExecutor` class is an `Executor` subclass that uses a pool > of > processes to execute calls asynchronously. > > `__init__(max_processes)` > > Executes calls asynchronously using a pool of a most *max_processes* > processes. If *max_processes* is ``None`` or not given then as many worker > processes will be created as the machine has processors. > > ThreadPoolExecutor > '''''''''''''''''' > > The `ThreadPoolExecutor` class is an `Executor` subclass that uses a pool of > threads to execute calls asynchronously. > > `__init__(max_threads)` > > Executes calls asynchronously using a pool of at most *max_threads* threads. > > Future Objects > '''''''''''''' > > The `Future` class encapsulates the asynchronous execution of a function > or method call. `Future` instances are returned by `Executor.submit`. > > `cancel()` > > Attempt to cancel the call. If the call is currently being executed then > it cannot be cancelled and the method will return `False`, otherwise the > call > will be cancelled and the method will return `True`. > > `Future.cancelled()` > > Return `True` if the call was successfully cancelled. > > `Future.done()` > > Return `True` if the call was successfully cancelled or finished running. > > `result(timeout=None)` > > Return the value returned by the call. If the call hasn't yet completed then > this method will wait up to *timeout* seconds. If the call hasn't completed > in *timeout* seconds then a `TimeoutError` will be raised. If *timeout* > is not specified or ``None`` then there is no limit to the wait time. > > If the future is cancelled before completing then `CancelledError` will > be raised. > > If the call raised then this method will raise the same exception. > > `exception(timeout=None)` > > Return the exception raised by the call. If the call hasn't yet completed > then this method will wait up to *timeout* seconds. If the call hasn't > completed in *timeout* seconds then a `TimeoutError` will be raised. > If *timeout* is not specified or ``None`` then there is no limit to the wait > time. > > If the future is cancelled before completing then `CancelledError` will > be raised. > > If the call completed without raising then ``None`` is returned. > > `index` > > int indicating the index of the future in its `FutureList`. > > Module Functions > '''''''''''''''' > > `wait(fs, timeout=None, return_when=ALL_COMPLETED)` > > Wait for the `Future` instances in the given sequence to complete. Returns a > 2-tuple of sets. The first set contains the futures that completed (finished > or were cancelled) before the wait completed. The second set contains > uncompleted futures. > > This method should always be called using keyword arguments, which are: > > *fs* is the sequence of Future instances that should be waited on. > > *timeout* can be used to control the maximum number of seconds to wait > before > returning. If timeout is not specified or None then there is no limit to the > wait time. > > *return_when* indicates when the method should return. It must be one of the > following constants: > > ============================= > ================================================== > ?Constant ? ? ? ? ? ? ? ? ? ? ?Description > ============================= > ================================================== > `FIRST_COMPLETED` ? ? ? ? ? ? The method will return when any call finishes. > `FIRST_EXCEPTION` ? ? ? ? ? ? The method will return when any call raises an > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?exception or when all calls finish. > `ALL_COMPLETED` ? ? ? ? ? ? ? The method will return when all calls finish. > `RETURN_IMMEDIATELY` ? ? ? ? ?The method will return immediately. > ============================= > ================================================== > > `as_completed(fs, timeout=None)` > > Returns an iterator over the Future instances given by *fs* that yields > futures > as they complete (finished or were cancelled). Any futures that completed > before `as_completed()` was called will be yielded first. The returned > iterator > raises a `TimeoutError` if `__next__()` is called and the result isn?t > available > after *timeout* seconds from the original call to `as_completed()`. If > *timeout* is not specified or `None` then there is no limit to the wait > time. > > ========= > Rationale > ========= > > The proposed design of this module was heavily influenced by the the Java > java.util.concurrent package [1]_. The conceptual basis of the module, as in > Java, is the Future class, which represents the progress and result of an > asynchronous computation. The Future class makes little commitment to the > evaluation mode being used e.g. it can be be used to represent lazy or eager > evaluation, for evaluation using threads, processes or remote procedure > call. > > Futures are created by concrete implementations of the Executor class > (called ExecutorService in Java). The reference implementation provides > classes that use either a process a thread pool to eagerly evaluate > computations. > > Futures have already been seen in Python as part of a popular Python > cookbook recipe [2]_ and have discussed on the Python-3000 mailing list > [3]_. > > The proposed design is explicit i.e. it requires that clients be aware that > they are consuming Futures. It would be possible to design a module that > would return proxy objects (in the style of `weakref`) that could be used > transparently. It is possible to build a proxy implementation on top of > the proposed explicit mechanism. > > The proposed design does not introduce any changes to Python language syntax > or semantics. Special syntax could be introduced [4]_ to mark function and > method calls as asynchronous. A proxy result would be returned while the > operation is eagerly evaluated asynchronously, and execution would only > block if the proxy object were used before the operation completed. > > ======================== > Reference Implementation > ======================== > > The reference implementation [5]_ contains a complete implementation of the > proposed design. It has been tested on Linux and Mac OS X. > > ========== > References > ========== > > .. [1] > > ? `java.util.concurrent` package documentation > > `http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/package-summary.html` > > .. [2] > > ? Python Cookbook recipe 84317, "Easy threading with Futures" > ? `http://code.activestate.com/recipes/84317/` > > .. [3] > > ? `Python-3000` thread, "mechanism for handling asynchronous concurrency" > ? `http://mail.python.org/pipermail/python-3000/2006-April/000960.html` > > > .. [4] > > ? `Python 3000` thread, "Futures in Python 3000 (was Re: mechanism for > handling asynchronous concurrency)" > ? `http://mail.python.org/pipermail/python-3000/2006-April/000970.html` > > .. [5] > > ? Reference `futures` implementation > `http://code.google.com/p/pythonfutures` > > ========= > Copyright > ========= > > This document has been placed in the public domain. > > > .. > ? Local Variables: > ? mode: indented-text > ? indent-tabs-mode: nil > ? sentence-end-double-space: t > ? fill-column: 70 > ? coding: utf-8 > ? End: -------------- next part -------------- A non-text attachment was scrubbed... Name: deadlock_test.patch Type: application/octet-stream Size: 4826 bytes Desc: not available URL: From brian at sweetapp.com Sun Feb 21 11:49:05 2010 From: brian at sweetapp.com (Brian Quinlan) Date: Sun, 21 Feb 2010 21:49:05 +1100 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> References: <1257607273.3437.13.camel@localhost> <4222a8490911070732p5b5cdc6cj2a5d44416658e119@mail.gmail.com> <5d44f72f0911071137m3a499f99j9edc604bc8b9b127@mail.gmail.com> <37E25344-20F9-4D42-B982-CA43D24FA806@sweetapp.com> <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> Message-ID: <756D10FB-EC6D-458D-80D6-21CF8ADA9676@sweetapp.com> A few extra points. On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: > * I'd like users to be able to write Executors besides the simple > ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable > that, could you document what the subclassing interface for Executor > looks like? that is, what code do user-written Executors need to > include? I don't think it should include direct access to > future._state like ThreadPoolExecutor uses, if at all possible. One of the difficulties here is: 1. i don't want to commit to the internal implementation of Futures 2. it might be hard to make it clear which methods are public to users and which methods are public to executor implementors > * Could you specify in what circumstances a pure computational > Future-based program may deadlock? (Ideally, that would be "never".) > Your current implementation includes two such deadlocks, for which > I've attached a test. Thanks for the tests but I wasn't planning on changing this behavior. I don't really like the idea of using the calling thread to perform the wait because: 1. not all executors will be able to implement that behavior 2. it can only be made to work if no wait time is specified Cheers, Brian From jyasskin at gmail.com Mon Feb 22 04:37:23 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Sun, 21 Feb 2010 22:37:23 -0500 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: References: <5d44f72f0911071137m3a499f99j9edc604bc8b9b127@mail.gmail.com> <37E25344-20F9-4D42-B982-CA43D24FA806@sweetapp.com> <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> Message-ID: <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> Where's the current version of the PEP? On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan wrote: > > On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: > >> Several comments: >> >> * I see you using the Executors as context managers, but no mention in >> the specification about what that does. > > I can't see such documentation for built-in Python objects. To be > symmetrical with the built-in file object, i've documented the context > manager behavior as part of the Executor.shutdown method. For locks, it has its own section: http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement But I don't care too much about the formatting as long as the PEP specifies it clearly. >> You need to specify it. (Your >> current implementation doesn't wait in __exit__, which I think is the >> opposite of what you agreed with Antoine, but you can fix that after >> we get general agreement on the interface.) > > Fixed. > >> * I'd like users to be able to write Executors besides the simple >> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable >> that, could you document what the subclassing interface for Executor >> looks like? that is, what code do user-written Executors need to >> include? > > I can do that. > >> I don't think it should include direct access to >> future._state like ThreadPoolExecutor uses, if at all possible. > > Would it be reasonable to make Future an ABC, make a _Future that subclasses > it for internal usage and let other Executor subclasses define their own > Futures. What interface are you proposing for the Future ABC? It'll need to support wait() and as_completed() from non-library Futures. I wouldn't mind making the type just a duck-type (it probably wouldn't even need an ABC), although I'd like to give people trying to implement their own Executors as much help as possible. I'd assumed that giving Future some public hooks would be easier than fixing the wait() interface, but I could be wrong. >> * Could you specify in what circumstances a pure computational >> Future-based program may deadlock? (Ideally, that would be "never".) >> Your current implementation includes two such deadlocks, for which >> I've attached a test. > >> * Do you want to make calling Executor.shutdown(wait=True) from within >> the same Executor 1) detect the problem and raise an exception, 2) >> deadlock, 3) unspecified behavior, or 4) wait for all other threads >> and then let the current one continue? > > What about a note saying that using any futures functions or methods from > inside a scheduled call is likely to lead to deadlock unless care is taken? Jesse pointed out that one of the first things people try to do when using concurrency libraries is to try to use them inside themselves. I've also tried to use a futures library that forbade nested use ('cause I wrote it), and it was a real pain. It should be easy enough to detect that the caller of Executor.shutdown is one of the Executor's threads or processes, but I wouldn't mind making the obviously incorrect "wait for my own completion" deadlock or throw an exception, and it would make sense to give Executor implementors their choice of which to do. >> * This is a nit, but I think that the parameter names for >> ThreadPoolExecutor and ProcessPoolExecutor should be the same so >> people can parametrize their code on those constructors. Right now >> they're "max_threads" and "max_processes", respectively. I might >> suggest "max_workers". > > I'm not sure that I like that. In general consolidating the constructors for > executors is not going to be possible. In general, yes, but in this case they're the same, and we should try to avoid gratuitous differences. >> * You should document the exception that happens when you try to pass >> a ProcessPoolExecutor as an argument to a task executing inside >> another ProcessPoolExecutor, or make it not throw an exception and >> document that. > > The ProcessPoolExecutor limitations are the same as the multiprocessing > limitations. I can provide a note about that and a link to that module's > documentation. And multiprocessing doesn't document that its Pool requires picklability and isn't picklable itself. Saying that the ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be enough for your PEP. >> * If it's intentional, you should probably document that if one >> element of a map() times out, there's no way to come back and wait >> longer to retrieve it or later elements. > > That's not obvious? Maybe. >> * You still mention run_to_futures, run_to_results, and FutureList, >> even though they're no longer proposed. > > Done. > >> >> * wait() should probably return a named_tuple or an object so we don't >> have people writing the unreadable "wait(fs)[0]". > > Done. > >> >> * Instead of "call finishes" in the description of the return_when >> parameter, you might describe the behavior in terms of futures >> becoming done since that's the accessor function you're using. > > Done. > > >> * Is RETURN_IMMEDIATELY just a way to categorize futures into done and >> not? Is that useful over [f for f in fs if f.done()]? > > That was an artifact of the previous implementation; removed. > >> * After shutdown, is RuntimeError the right exception, or should there >> be a more specific exception? > > RunTimeError is what is raised in similar situations by threading e.g. when > starting an already started thread. Ok, works for me. On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan wrote: > A few extra points. > > On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: >> >> * I'd like users to be able to write Executors besides the simple >> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable >> that, could you document what the subclassing interface for Executor >> looks like? that is, what code do user-written Executors need to >> include? I don't think it should include direct access to >> future._state like ThreadPoolExecutor uses, if at all possible. > > One of the difficulties here is: > 1. i don't want to commit to the internal implementation of Futures Yep, that's why to avoid requiring them to have direct access to the internal variables. > 2. it might be hard to make it clear which methods are public to users and > which methods are public to executor implementors One way to do it would be to create another type for implementors and pass it to the Future constructor. >> * Could you specify in what circumstances a pure computational >> Future-based program may deadlock? (Ideally, that would be "never".) >> Your current implementation includes two such deadlocks, for which >> I've attached a test. > > Thanks for the tests but I wasn't planning on changing this behavior. I > don't really like the idea of using the calling thread to perform the wait > because: > 1. not all executors will be able to implement that behavior Why not? Thread pools can implement it, and process pools make it impossible to create cycles, so they also can't deadlock. > 2. it can only be made to work if no wait time is specified With a wait time, you have to avoid stealing work, but it's also guaranteed not to deadlock, so it's fine. From brian at sweetapp.com Tue Feb 23 09:31:29 2010 From: brian at sweetapp.com (Brian Quinlan) Date: Tue, 23 Feb 2010 19:31:29 +1100 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> References: <5d44f72f0911071137m3a499f99j9edc604bc8b9b127@mail.gmail.com> <37E25344-20F9-4D42-B982-CA43D24FA806@sweetapp.com> <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> Message-ID: <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: > Where's the current version of the PEP? http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt > On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan > wrote: >> >> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: >> >>> Several comments: >>> >>> * I see you using the Executors as context managers, but no >>> mention in >>> the specification about what that does. >> >> I can't see such documentation for built-in Python objects. To be >> symmetrical with the built-in file object, i've documented the >> context >> manager behavior as part of the Executor.shutdown method. > > For locks, it has its own section: > http://docs.python.org/library/threading.html#using-locks-conditions-and-semaphores-in-the-with-statement > But I don't care too much about the formatting as long as the PEP > specifies it clearly. Added. >>> You need to specify it. (Your >>> current implementation doesn't wait in __exit__, which I think is >>> the >>> opposite of what you agreed with Antoine, but you can fix that after >>> we get general agreement on the interface.) >> >> Fixed. >> >>> * I'd like users to be able to write Executors besides the simple >>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To >>> enable >>> that, could you document what the subclassing interface for Executor >>> looks like? that is, what code do user-written Executors need to >>> include? >> >> I can do that. >> >>> I don't think it should include direct access to >>> future._state like ThreadPoolExecutor uses, if at all possible. >> >> Would it be reasonable to make Future an ABC, make a _Future that >> subclasses >> it for internal usage and let other Executor subclasses define >> their own >> Futures. > > What interface are you proposing for the Future ABC? It'll need to > support wait() and as_completed() from non-library Futures. I wouldn't > mind making the type just a duck-type (it probably wouldn't even need > an ABC), although I'd like to give people trying to implement their > own Executors as much help as possible. I'd assumed that giving Future > some public hooks would be easier than fixing the wait() interface, > but I could be wrong. See below. >>> * Could you specify in what circumstances a pure computational >>> Future-based program may deadlock? (Ideally, that would be "never".) >>> Your current implementation includes two such deadlocks, for which >>> I've attached a test. >> >>> * Do you want to make calling Executor.shutdown(wait=True) from >>> within >>> the same Executor 1) detect the problem and raise an exception, 2) >>> deadlock, 3) unspecified behavior, or 4) wait for all other threads >>> and then let the current one continue? >> >> What about a note saying that using any futures functions or >> methods from >> inside a scheduled call is likely to lead to deadlock unless care >> is taken? > > Jesse pointed out that one of the first things people try to do when > using concurrency libraries is to try to use them inside themselves. > I've also tried to use a futures library that forbade nested use > ('cause I wrote it), and it was a real pain. You can use the API from within Executor-invoked functions - you just have to be careful. > It should be easy enough to detect that the caller of > Executor.shutdown is one of the Executor's threads or processes, but I > wouldn't mind making the obviously incorrect "wait for my own > completion" deadlock or throw an exception, and it would make sense to > give Executor implementors their choice of which to do. > >>> * This is a nit, but I think that the parameter names for >>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so >>> people can parametrize their code on those constructors. Right now >>> they're "max_threads" and "max_processes", respectively. I might >>> suggest "max_workers". >> >> I'm not sure that I like that. In general consolidating the >> constructors for >> executors is not going to be possible. > > In general, yes, but in this case they're the same, and we should try > to avoid gratuitous differences. num_threads and num_processes is more explicit than num_workers but I don't really care so I changed it. >>> * You should document the exception that happens when you try to >>> pass >>> a ProcessPoolExecutor as an argument to a task executing inside >>> another ProcessPoolExecutor, or make it not throw an exception and >>> document that. >> >> The ProcessPoolExecutor limitations are the same as the >> multiprocessing >> limitations. I can provide a note about that and a link to that >> module's >> documentation. > > And multiprocessing doesn't document that its Pool requires > picklability and isn't picklable itself. Saying that the > ProcessPoolExecutor is equivalent to a multiprocessing.Pool should be > enough for your PEP. Done. >>> * If it's intentional, you should probably document that if one >>> element of a map() times out, there's no way to come back and wait >>> longer to retrieve it or later elements. >> >> That's not obvious? > > Maybe. > >>> * You still mention run_to_futures, run_to_results, and FutureList, >>> even though they're no longer proposed. >> >> Done. >> >>> >>> * wait() should probably return a named_tuple or an object so we >>> don't >>> have people writing the unreadable "wait(fs)[0]". >> >> Done. >> >>> >>> * Instead of "call finishes" in the description of the return_when >>> parameter, you might describe the behavior in terms of futures >>> becoming done since that's the accessor function you're using. >> >> Done. >> >> >>> * Is RETURN_IMMEDIATELY just a way to categorize futures into done >>> and >>> not? Is that useful over [f for f in fs if f.done()]? >> >> That was an artifact of the previous implementation; removed. >> >>> * After shutdown, is RuntimeError the right exception, or should >>> there >>> be a more specific exception? >> >> RunTimeError is what is raised in similar situations by threading >> e.g. when >> starting an already started thread. > > Ok, works for me. > > On Sun, Feb 21, 2010 at 5:49 AM, Brian Quinlan > wrote: >> A few extra points. >> >> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: >>> >>> * I'd like users to be able to write Executors besides the simple >>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To >>> enable >>> that, could you document what the subclassing interface for Executor >>> looks like? that is, what code do user-written Executors need to >>> include? I don't think it should include direct access to >>> future._state like ThreadPoolExecutor uses, if at all possible. >> >> One of the difficulties here is: >> 1. i don't want to commit to the internal implementation of Futures > > Yep, that's why to avoid requiring them to have direct access to the > internal variables. > >> 2. it might be hard to make it clear which methods are public to >> users and >> which methods are public to executor implementors > > One way to do it would be to create another type for implementors and > pass it to the Future constructor. If we change the future interface like so: class Future(object): # Existing public methods ... # For executors only def set_result(self): ... def set_exception(self): ... def check_cancel_and_notify(self): # returns True if the Future was cancelled and # notifies anyone who cares i.e. waiters for # wait() and as_completed Then an executor implementor need only implement: def submit(self, fn, *args, **kwargs): With the logic to actual execute fn(*args, **kwargs) and update the returned future, of course. Thoughts? >>> * Could you specify in what circumstances a pure computational >>> Future-based program may deadlock? (Ideally, that would be "never".) >>> Your current implementation includes two such deadlocks, for which >>> I've attached a test. >> >> Thanks for the tests but I wasn't planning on changing this >> behavior. I >> don't really like the idea of using the calling thread to perform >> the wait >> because: >> 1. not all executors will be able to implement that behavior > > Why not? What if my executor sends the data to a remove cluster for execution and running it locally isn't feasible? > Thread pools can implement it, Do you have a strategy in mind that would let you detect arbitrary deadlocks in threaded futures? Cheers, Brian > and process pools make it > impossible to create cycles, so they also can't deadlock. > >> 2. it can only be made to work if no wait time is specified > > With a wait time, you have to avoid stealing work, but it's also > guaranteed not to deadlock, so it's fine. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jyasskin at gmail.com Tue Feb 23 21:04:17 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Tue, 23 Feb 2010 15:04:17 -0500 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> References: <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> Message-ID: <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan wrote: > > On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: > > Where's the current version of the PEP? > > > > http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt > > On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan wrote: > > > On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: > > > * I'd like users to be able to write Executors besides the simple > > ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable > > that, could you document what the subclassing interface for Executor > > looks like? that is, what code do user-written Executors need to > > include? > > > I can do that. > > > I don't think it should include direct access to > > future._state like ThreadPoolExecutor uses, if at all possible. > > > Would it be reasonable to make Future an ABC, make a _Future that > subclasses > > it for internal usage and let other Executor subclasses define their own > > Futures. > > > What interface are you proposing for the Future ABC? It'll need to > support wait() and as_completed() from non-library Futures. I wouldn't > mind making the type just a duck-type (it probably wouldn't even need > an ABC), although I'd like to give people trying to implement their > own Executors as much help as possible. I'd assumed that giving Future > some public hooks would be easier than fixing the wait() interface, > but I could be wrong. > > > See below. > > * Could you specify in what circumstances a pure computational > > Future-based program may deadlock? (Ideally, that would be "never".) > > Your current implementation includes two such deadlocks, for which > > I've attached a test. > > > * Do you want to make calling Executor.shutdown(wait=True) from within > > the same Executor 1) detect the problem and raise an exception, 2) > > deadlock, 3) unspecified behavior, or 4) wait for all other threads > > and then let the current one continue? > > > What about a note saying that using any futures functions or methods from > > inside a scheduled call is likely to lead to deadlock unless care is taken? > > > Jesse pointed out that one of the first things people try to do when > using concurrency libraries is to try to use them inside themselves. > I've also tried to use a futures library that forbade nested use > ('cause I wrote it), and it was a real pain. > > > You can use the API from within Executor-invoked functions - you just have > to be careful. > It's the job of the PEP (and later the docs) to explain exactly what care is needed. Or were you asking if I was ok with adding that explanation to the PEP? I think that explanation is the minimum requirement (that's what I meant by "Could you specify in what circumstances a pure computational Future-based program may deadlock?"), but it would be better if it could never deadlock, which is achievable by stealing work. > It should be easy enough to detect that the caller of > Executor.shutdown is one of the Executor's threads or processes, but I > wouldn't mind making the obviously incorrect "wait for my own > completion" deadlock or throw an exception, and it would make sense to > give Executor implementors their choice of which to do. > > * This is a nit, but I think that the parameter names for > > ThreadPoolExecutor and ProcessPoolExecutor should be the same so > > people can parametrize their code on those constructors. Right now > > they're "max_threads" and "max_processes", respectively. I might > > suggest "max_workers". > > > I'm not sure that I like that. In general consolidating the constructors > for > > executors is not going to be possible. > > > In general, yes, but in this case they're the same, and we should try > to avoid gratuitous differences. > > > num_threads and num_processes is more explicit than num_workers but I don't > really care so I changed it. > > Thanks. * I'd like users to be able to write Executors besides the simple > > ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable > > that, could you document what the subclassing interface for Executor > > looks like? that is, what code do user-written Executors need to > > include? I don't think it should include direct access to > > future._state like ThreadPoolExecutor uses, if at all possible. > > > One of the difficulties here is: > > 1. i don't want to commit to the internal implementation of Futures > > > Yep, that's why to avoid requiring them to have direct access to the > internal variables. > > 2. it might be hard to make it clear which methods are public to users and > > which methods are public to executor implementors > > > One way to do it would be to create another type for implementors and > pass it to the Future constructor. > > > If we change the future interface like so: > > class Future(object): > # Existing public methods > ... > # For executors only > def set_result(self): > ... > def set_exception(self): > ... > def check_cancel_and_notify(self): > # returns True if the Future was cancelled and > # notifies anyone who cares i.e. waiters for > # wait() and as_completed > > > Then an executor implementor need only implement: > > def submit(self, fn, *args, **kwargs): > > With the logic to actual execute fn(*args, **kwargs) and update the > returned future, of course. > > Thoughts? > > Could you write up the submit() implementations you're thinking of? That kind of interface extension seems right. > * Could you specify in what circumstances a pure computational > > Future-based program may deadlock? (Ideally, that would be "never".) > > Your current implementation includes two such deadlocks, for which > > I've attached a test. > > > Thanks for the tests but I wasn't planning on changing this behavior. I > > don't really like the idea of using the calling thread to perform the wait > > because: > > 1. not all executors will be able to implement that behavior > > > Why not? > > > What if my executor sends the data to a remove cluster for execution and > running it locally isn't feasible? > If the executor can't send itself across the network, you're fine since it'll be impossible to create cycles. If the executor can add threads dynamically when it notices that it's not using enough of the CPU, it's also fine since you remove the limited resource. If the executor can be copied and cannot add threads, then it sent the data one way somehow, so it should be able to send the data the other way to execute locally. It _is_ possible to run out of memory or stack space. Is that what you're worried about? Thread pools can implement it, > > > Do you have a strategy in mind that would let you detect arbitrary > deadlocks in threaded futures? > Yes, AFAIK work stealing suffices for systems made up only of futures and executors. Non-future blocking objects can reintroduce deadlocks, but I believe futures alone can't. > > Cheers, > Brian > > and process pools make it > impossible to create cycles, so they also can't deadlock. > > 2. it can only be made to work if no wait time is specified > > > With a wait time, you have to avoid stealing work, but it's also > guaranteed not to deadlock, so it's fine. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Feb 23 23:00:30 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Feb 2010 14:00:30 -0800 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> References:

<216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> Message-ID: On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin wrote: > On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan wrote: >> >> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: >> >> Where's the current version of the PEP? >> >> >> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/ -- --Guido van Rossum (python.org/~guido) From ssteinerx at gmail.com Tue Feb 23 23:13:10 2010 From: ssteinerx at gmail.com (ssteinerX@gmail.com) Date: Tue, 23 Feb 2010 17:13:10 -0500 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: References:

<216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> Message-ID: <5ADA2C3B-67FE-4E1A-B120-DAF15D23E7B2@gmail.com> On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote: > On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin wrote: >> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan wrote: >>> >>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: >>> >>> Where's the current version of the PEP? >>> >>> >>> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt > > Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/ I get a 404 on that URL. S From brett at python.org Tue Feb 23 23:36:04 2010 From: brett at python.org (Brett Cannon) Date: Tue, 23 Feb 2010 14:36:04 -0800 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5ADA2C3B-67FE-4E1A-B120-DAF15D23E7B2@gmail.com> References:

<3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> <5ADA2C3B-67FE-4E1A-B120-DAF15D23E7B2@gmail.com> Message-ID: On Tue, Feb 23, 2010 at 14:13, ssteinerX at gmail.com wrote: > > On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote: > > > On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin > wrote: > >> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan > wrote: > >>> > >>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: > >>> > >>> Where's the current version of the PEP? > >>> > >>> > >>> > http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt > > > > Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/ > > I get a 404 on that URL. > It's because one of the PEPs has become improperly encoded; you can run 'make' in a PEPs checkout to trigger the error. -Brett > > S > > _______________________________________________ > stdlib-sig mailing list > stdlib-sig at python.org > http://mail.python.org/mailman/listinfo/stdlib-sig > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Feb 24 00:30:10 2010 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Feb 2010 15:30:10 -0800 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: References: <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> <5ADA2C3B-67FE-4E1A-B120-DAF15D23E7B2@gmail.com> Message-ID: On Tue, Feb 23, 2010 at 2:36 PM, Brett Cannon wrote: > > > On Tue, Feb 23, 2010 at 14:13, ssteinerX at gmail.com > wrote: >> >> On Feb 23, 2010, at 5:00 PM, Guido van Rossum wrote: >> >> > On Tue, Feb 23, 2010 at 12:04 PM, Jeffrey Yasskin >> > wrote: >> >> On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan >> >> wrote: >> >>> >> >>> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: >> >>> >> >>> Where's the current version of the PEP? >> >>> >> >>> >> >>> >> >>> http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt >> > >> > Now in SVN as PEP 3148 - http://python.org/dev/peps/pep-3148/ >> >> I get a 404 on that URL. > > It's because one of the PEPs has become improperly encoded; you can run > 'make' in a PEPs checkout to trigger the error. Eh, sorry! Fixed now. -- --Guido van Rossum (python.org/~guido) From brian at sweetapp.com Thu Feb 25 10:33:09 2010 From: brian at sweetapp.com (Brian Quinlan) Date: Thu, 25 Feb 2010 20:33:09 +1100 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> References: <5d44f72f0911080001off0d158n4c0c4d903a844516@mail.gmail.com> <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> Message-ID: <07A4B971-3691-4DBF-B62E-75560E9B817F@sweetapp.com> The PEP officially lives at: http://python.org/dev/peps/pep-3148 but this version is the most up-to-date: http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote: > On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan > wrote: > > On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: > >> Where's the current version of the PEP? > > http://code.google.com/p/pythonfutures/source/browse/branches/feedback/PEP.txt > >> On Sun, Feb 21, 2010 at 1:47 AM, Brian Quinlan >> wrote: >>> >>> On 21 Feb 2010, at 14:41, Jeffrey Yasskin wrote: >>> > >>>> * I'd like users to be able to write Executors besides the simple >>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To >>>> enable >>>> that, could you document what the subclassing interface for >>>> Executor >>>> looks like? that is, what code do user-written Executors need to >>>> include? >>> >>> I can do that. >>> >>>> I don't think it should include direct access to >>>> future._state like ThreadPoolExecutor uses, if at all possible. >>> >>> Would it be reasonable to make Future an ABC, make a _Future that >>> subclasses >>> it for internal usage and let other Executor subclasses define >>> their own >>> Futures. >> >> What interface are you proposing for the Future ABC? It'll need to >> support wait() and as_completed() from non-library Futures. I >> wouldn't >> mind making the type just a duck-type (it probably wouldn't even need >> an ABC), although I'd like to give people trying to implement their >> own Executors as much help as possible. I'd assumed that giving >> Future >> some public hooks would be easier than fixing the wait() interface, >> but I could be wrong. > > See below. > >>>> * Could you specify in what circumstances a pure computational >>>> Future-based program may deadlock? (Ideally, that would be >>>> "never".) >>>> Your current implementation includes two such deadlocks, for which >>>> I've attached a test. >>> >>>> * Do you want to make calling Executor.shutdown(wait=True) from >>>> within >>>> the same Executor 1) detect the problem and raise an exception, 2) >>>> deadlock, 3) unspecified behavior, or 4) wait for all other threads >>>> and then let the current one continue? >>> >>> What about a note saying that using any futures functions or >>> methods from >>> inside a scheduled call is likely to lead to deadlock unless care >>> is taken? >> >> Jesse pointed out that one of the first things people try to do when >> using concurrency libraries is to try to use them inside themselves. >> I've also tried to use a futures library that forbade nested use >> ('cause I wrote it), and it was a real pain. > > You can use the API from within Executor-invoked functions - you > just have to be careful. > > It's the job of the PEP (and later the docs) to explain exactly what > care is needed. Or were you asking if I was ok with adding that > explanation to the PEP? I think that explanation is the minimum > requirement (that's what I meant by "Could you specify in what > circumstances a pure computational > Future-based program may deadlock?"), but it would be better if it > could never deadlock, which is achievable by stealing work. I don't think so, see below. >> It should be easy enough to detect that the caller of >> Executor.shutdown is one of the Executor's threads or processes, >> but I >> wouldn't mind making the obviously incorrect "wait for my own >> completion" deadlock or throw an exception, and it would make sense >> to >> give Executor implementors their choice of which to do. >> >>>> * This is a nit, but I think that the parameter names for >>>> ThreadPoolExecutor and ProcessPoolExecutor should be the same so >>>> people can parametrize their code on those constructors. Right now >>>> they're "max_threads" and "max_processes", respectively. I might >>>> suggest "max_workers". >>> >>> I'm not sure that I like that. In general consolidating the >>> constructors for >>> executors is not going to be possible. >> >> In general, yes, but in this case they're the same, and we should try >> to avoid gratuitous differences. > > num_threads and num_processes is more explicit than num_workers but > I don't really care so I changed it. > > Thanks. > >>>> * I'd like users to be able to write Executors besides the simple >>>> ThreadPoolExecutor and ProcessPoolExecutor you already have. To >>>> enable >>>> that, could you document what the subclassing interface for >>>> Executor >>>> looks like? that is, what code do user-written Executors need to >>>> include? I don't think it should include direct access to >>>> future._state like ThreadPoolExecutor uses, if at all possible. >>> >>> One of the difficulties here is: >>> 1. i don't want to commit to the internal implementation of Futures >> >> Yep, that's why to avoid requiring them to have direct access to the >> internal variables. >> >>> 2. it might be hard to make it clear which methods are public to >>> users and >>> which methods are public to executor implementors >> >> One way to do it would be to create another type for implementors and >> pass it to the Future constructor. > > If we change the future interface like so: > > class Future(object): > # Existing public methods > ... > # For executors only > def set_result(self): > ... > def set_exception(self): > ... > def check_cancel_and_notify(self): > # returns True if the Future was cancelled and > # notifies anyone who cares i.e. waiters for > # wait() and as_completed > > > Then an executor implementor need only implement: > > def submit(self, fn, *args, **kwargs): > > With the logic to actual execute fn(*args, **kwargs) and update the > returned future, of course. > > Thoughts? > > Could you write up the submit() implementations you're thinking of? > That kind of interface extension seems right. I mean that submit will implement all of the application-specific logic and call the above methods as it processes the future. I added a note (but not much in the way of details) about that. > >>>> * Could you specify in what circumstances a pure computational >>>> Future-based program may deadlock? (Ideally, that would be >>>> "never".) >>>> Your current implementation includes two such deadlocks, for which >>>> I've attached a test. >>> >>> Thanks for the tests but I wasn't planning on changing this >>> behavior. I >>> don't really like the idea of using the calling thread to perform >>> the wait >>> because: >>> 1. not all executors will be able to implement that behavior >> >> Why not? > > What if my executor sends the data to a remove cluster for execution > and running it locally isn't feasible? > > If the executor can't send itself across the network, you're fine > since it'll be impossible to create cycles. If the executor can add > threads dynamically when it notices that it's not using enough of > the CPU, it's also fine since you remove the limited resource. If > the executor can be copied and cannot add threads, then it sent the > data one way somehow, so it should be able to send the data the > other way to execute locally. It _is_ possible to run out of memory > or stack space. Is that what you're worried about? > >> Thread pools can implement it, > > Do you have a strategy in mind that would let you detect arbitrary > deadlocks in threaded futures? > > Yes, AFAIK work stealing suffices for systems made up only of > futures and executors. Non-future blocking objects can reintroduce > deadlocks, but I believe futures alone can't. How would work stealing help with this sort of deadlock? import time def wait_on_b(): time.sleep(5) print(b.result()) return 5 def wait_on_a(): time.sleep(5) print(a.result()) return 6 f = ThreadPoolExecutor(max_workers=2) a = f.submit(wait_on_b) b = f.submit(wait_on_a) In any case, I've updated the docs and PEP to indicate that deadlocks are possible. Cheers, Brian > > Cheers, > Brian > >> and process pools make it >> impossible to create cycles, so they also can't deadlock. >> >>> 2. it can only be made to work if no wait time is specified >> >> With a wait time, you have to avoid stealing work, but it's also >> guaranteed not to deadlock, so it's fine. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jyasskin at gmail.com Thu Feb 25 18:27:14 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 25 Feb 2010 09:27:14 -0800 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <07A4B971-3691-4DBF-B62E-75560E9B817F@sweetapp.com> References: <216378C8-6B77-4DAC-9292-841A8E5849B5@sweetapp.com> <3BB0755A-2D85-46AF-89A6-CFC4A56744AF@sweetapp.com> <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> <07A4B971-3691-4DBF-B62E-75560E9B817F@sweetapp.com> Message-ID: <5d44f72f1002250927r66d0fd4ei34d79cef7c0af54d@mail.gmail.com> On Thu, Feb 25, 2010 at 1:33 AM, Brian Quinlan wrote: > The PEP officially lives at: > http://python.org/dev/peps/pep-3148 > > but this version is the most up-to-date: > > http://code.google.com/p/pythonfutures/source/browse/branches/feedback/pep-3148.txt > > > On Feb 24, 2010, at 7:04 AM, Jeffrey Yasskin wrote: > > On Tue, Feb 23, 2010 at 3:31 AM, Brian Quinlan wrote: > >> >> On Feb 22, 2010, at 2:37 PM, Jeffrey Yasskin wrote: >> >> * Could you specify in what circumstances a pure computational >> >> Future-based program may deadlock? (Ideally, that would be "never".) >> >> Your current implementation includes two such deadlocks, for which >> >> I've attached a test. >> >> >> * Do you want to make calling Executor.shutdown(wait=True) from within >> >> the same Executor 1) detect the problem and raise an exception, 2) >> >> deadlock, 3) unspecified behavior, or 4) wait for all other threads >> >> and then let the current one continue? >> >> >> What about a note saying that using any futures functions or methods from >> >> inside a scheduled call is likely to lead to deadlock unless care is >> taken? >> >> >> Jesse pointed out that one of the first things people try to do when >> using concurrency libraries is to try to use them inside themselves. >> I've also tried to use a futures library that forbade nested use >> ('cause I wrote it), and it was a real pain. >> >> >> You can use the API from within Executor-invoked functions - you just have >> to be careful. >> > > It's the job of the PEP (and later the docs) to explain exactly what care > is needed. Or were you asking if I was ok with adding that explanation to > the PEP? I think that explanation is the minimum requirement (that's what I > meant by "Could you specify in what circumstances a pure computational > Future-based program may deadlock?"), but it would be better if it could > never deadlock, which is achievable by stealing work. > > > I don't think so, see below. > > It should be easy enough to detect that the caller of >> Executor.shutdown is one of the Executor's threads or processes, but I >> wouldn't mind making the obviously incorrect "wait for my own >> completion" deadlock or throw an exception, and it would make sense to >> give Executor implementors their choice of which to do. >> >> * This is a nit, but I think that the parameter names for >> >> ThreadPoolExecutor and ProcessPoolExecutor should be the same so >> >> people can parametrize their code on those constructors. Right now >> >> they're "max_threads" and "max_processes", respectively. I might >> >> suggest "max_workers". >> >> >> I'm not sure that I like that. In general consolidating the constructors >> for >> >> executors is not going to be possible. >> >> >> In general, yes, but in this case they're the same, and we should try >> to avoid gratuitous differences. >> >> >> num_threads and num_processes is more explicit than num_workers but I >> don't really care so I changed it. >> >> Thanks. > > * I'd like users to be able to write Executors besides the simple >> >> ThreadPoolExecutor and ProcessPoolExecutor you already have. To enable >> >> that, could you document what the subclassing interface for Executor >> >> looks like? that is, what code do user-written Executors need to >> >> include? I don't think it should include direct access to >> >> future._state like ThreadPoolExecutor uses, if at all possible. >> >> >> One of the difficulties here is: >> >> 1. i don't want to commit to the internal implementation of Futures >> >> >> Yep, that's why to avoid requiring them to have direct access to the >> internal variables. >> >> 2. it might be hard to make it clear which methods are public to users and >> >> which methods are public to executor implementors >> >> >> One way to do it would be to create another type for implementors and >> pass it to the Future constructor. >> >> >> If we change the future interface like so: >> >> class Future(object): >> # Existing public methods >> ... >> # For executors only >> def set_result(self): >> ... >> def set_exception(self): >> ... >> def check_cancel_and_notify(self): >> # returns True if the Future was cancelled and >> # notifies anyone who cares i.e. waiters for >> # wait() and as_completed >> > > >> >> > Then an executor implementor need only implement: >> >> def submit(self, fn, *args, **kwargs): >> >> With the logic to actual execute fn(*args, **kwargs) and update the >> returned future, of course. >> >> Thoughts? >> >> Could you write up the submit() implementations you're thinking of? That > kind of interface extension seems right. > > > I mean that submit will implement all of the application-specific logic and > call the above methods as it processes the future. I added a note (but not > much in the way of details) about that. > > Your process pool still relies on future._condition, but I think you can just delete that line and everything will still work. This seems fine to me. Thanks! > > >> * Could you specify in what circumstances a pure computational >> >> Future-based program may deadlock? (Ideally, that would be "never".) >> >> Your current implementation includes two such deadlocks, for which >> >> I've attached a test. >> >> >> Thanks for the tests but I wasn't planning on changing this behavior. I >> >> don't really like the idea of using the calling thread to perform the wait >> >> because: >> >> 1. not all executors will be able to implement that behavior >> >> >> Why not? >> >> >> What if my executor sends the data to a remove cluster for execution and >> running it locally isn't feasible? >> > > If the executor can't send itself across the network, you're fine since > it'll be impossible to create cycles. If the executor can add threads > dynamically when it notices that it's not using enough of the CPU, it's also > fine since you remove the limited resource. If the executor can be copied > and cannot add threads, then it sent the data one way somehow, so it should > be able to send the data the other way to execute locally. It _is_ possible > to run out of memory or stack space. Is that what you're worried about? > > Thread pools can implement it, >> >> >> Do you have a strategy in mind that would let you detect arbitrary >> deadlocks in threaded futures? >> > > Yes, AFAIK work stealing suffices for systems made up only of futures and > executors. Non-future blocking objects can reintroduce deadlocks, but I > believe futures alone can't. > > > How would work stealing help with this sort of deadlock? > > import time > def wait_on_b(): > time.sleep(5) > print(b.result()) > return 5 > > def wait_on_a(): > time.sleep(5) > print(a.result()) > return 6 > > > f = ThreadPoolExecutor(max_workers=2) > a = f.submit(wait_on_b) > b = f.submit(wait_on_a) > > Heh. If you're going to put that in the pep, at least make it correct (sleeping is not synchronization): import threading condition = threading.Condition(threading.Lock()) a = None b = None def wait_on_b(): with condition: while b is None: condition.wait() print(b.result()) return 5 def wait_on_a(): with condition: while a is None: condition.wait() print(a.result()) return 6 f = ThreadPoolExecutor(max_workers=2) with condition: a = f.submit(wait_on_b) b = f.submit(wait_on_a) condition.notifyAll() > In any case, I've updated the docs and PEP to indicate that deadlocks are > possible. > Thanks. I still disagree, and think users are much more likely to be surprised by occasional deadlocks due to cycles of executors than they are about guaranteed deadlocks from cycles of futures, but I don't want to be the only one holding up the PEP by insisting on this. I think there are places the names could be improved, and Jesse probably has an opinion on exactly where this should go in the package hierarchy, but I think it will make a good addition to the standard library. Thanks for working on it! Jeffrey -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Thu Feb 25 19:49:20 2010 From: jnoller at gmail.com (Jesse Noller) Date: Thu, 25 Feb 2010 13:49:20 -0500 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002250927r66d0fd4ei34d79cef7c0af54d@mail.gmail.com> References:

<5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> <07A4B971-3691-4DBF-B62E-75560E9B817F@sweetapp.com> <5d44f72f1002250927r66d0fd4ei34d79cef7c0af54d@mail.gmail.com> <4222a8491002251049p6fc0c8a7sd3d7417a6b819b90@mail.gmail.com> <4222a8491002251954v5d49239crad58c85ef7f8ddf8@mail.gmail.com> Message-ID: <5d44f72f1002252003g7c7b55a6y4fce4b044a6b41f4@mail.gmail.com> On Thu, Feb 25, 2010 at 7:54 PM, Jesse Noller wrote: > I'm on the fence. I took a few minutes to think about this today, and > my gut says concurrent with a single logical namespace - so: > > from concurrent import futures > futures.ThreadPoolExecutor > > And so on. Others might balk at a deeper namespace, but then say we add: > > concurrent/ > ? ?futures/ > ? ?pool.py (allows for a process pool, or threadpool) > ? ?managers.py > > And so on. I'm trying to mentally organize things to "be like" > java.util.concurrent [1] - ideally we could move/consolidate the > common sugar into this package, and remove the other "stuff" from > multiprocessing as well. That way multiprocessing can become "just" > Process and the locking stuff, ala threading, and the rest of the > other nice-things can be made to work with threads *and* processes ala > what you've done with futures. My gut agrees, FWIW. From jyasskin at gmail.com Fri Feb 26 05:12:48 2010 From: jyasskin at gmail.com (Jeffrey Yasskin) Date: Thu, 25 Feb 2010 20:12:48 -0800 Subject: [stdlib-sig] futures - a new package for asynchronous execution In-Reply-To: <5d44f72f1002251926y497fc15bge94bb6b3eacb9897@mail.gmail.com> References: <5d44f72f1002201941o76f80045ofa25238be43f455c@mail.gmail.com> <5d44f72f1002211937s3359b93ev8466676de818e2d4@mail.gmail.com> <598B9770-5129-41D7-9C14-BF151CB48103@sweetapp.com> <5d44f72f1002231204w32a2ebdi5952abf645005680@mail.gmail.com> <07A4B971-3691-4DBF-B62E-75560E9B817F@sweetapp.com> <5d44f72f1002250927r66d0fd4ei34d79cef7c0af54d@mail.gmail.com> <50C77B57-0C81-49B9-99ED-6CA3449D7AE6@sweetapp.com> <5d44f72f1002251926y497fc15bge94bb6b3eacb9897@mail.gmail.com> Message-ID: <5d44f72f1002252012s10a7174et73076172c3a9eb69@mail.gmail.com> On Thu, Feb 25, 2010 at 7:26 PM, Jeffrey Yasskin wrote: > On Thu, Feb 25, 2010 at 7:10 PM, Brian Quinlan wrote: >> On Feb 26, 2010, at 4:27 AM, Jeffrey Yasskin wrote: >>> Heh. If you're going to put that in the pep, at least make it correct >>> (sleeping is not synchronization): >> >> I can't tell if you are joking or not.?Was my demonstration of a possible >> deadlock scenario really unclear? > > It's clear; it's just wrong code, even if the futures weren't a cycle. > Waiting using sleep in any decently-sized system is guaranteed to > cause problems. Yes, this example will work nearly every time > (although if you get your load high enough, you'll still see > NameErrors), but it's not the kind of thing we should be showing > users. (For that matter, communicating between futures using globals > is also a bad use of them, but it's not outright broken.) > >>> Thanks. I still disagree, and think users are much more likely to be >>> surprised by occasional deadlocks due to cycles of executors than they are >>> about guaranteed deadlocks from cycles of futures, but I don't want to be >>> the only one holding up the PEP by insisting on this. >> >> Cycles of futures are not?guaranteed to deadlock. Remove the sleeps from my >> example and it will deadlock a small percentage of the time. > > It only fails to deadlock when it fails to create a cycle of futures. > > > > It sounds like Antoine also wants you to either have the threaded > futures steal work or detect executor cycles and raise an exception. FWIW, the other way to fix these deadlocks is to write a smarter thread pool. If the thread pool can notice that it's not using as many CPUs as it's been told to use, it can start a new thread, which runs the queued task and resolves the deadlock. It's actually a better solution in the long run since it also solves the problem with wait-for-one deadlocking or behaving badly. The problem is that this is surprisingly hard to get right. Measuring current CPU use is tricky and non-portable; if you start new threads too aggressively, you can run out of memory or start thrashing; and if you don't start threads aggressively enough you hurt performance.