From Steve.Dower at microsoft.com Sat Dec 1 00:32:04 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Fri, 30 Nov 2012 23:32:04 +0000 Subject: [Python-ideas] An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) In-Reply-To: References: <1D9BE0CD-5BF4-480D-8D40-5A409E40760D@twistedmatrix.com> <20121130161422.GB536@snakebite.org> <50B93536.30104@canterbury.ac.nz> Message-ID: Guido van Rossum wrote: > Greg Ewing wrote: >> Guido van Rossum wrote: >>> >>> Futures or callbacks, that's the question... >>> >>> Richard and I have even been considering APIs like this: >>> >>> res = obj.some_call() >>> if isinstance(res, Future): >>> res = yield res >> >> >> I thought you had decided against the idea of yielding futures? > > As a user-facing API style, yes. But this is meant for an internal API > -- the equivalent of your bare 'yield'. If you want to, I can consider another style as well > > > res = obj.some_call() > if isinstance(res, Future): > res.() > yield > > But I don't see a fundamental advantage to this. I do, it completely avoids ever using yield from to pass values around when used for coroutines. If values are always yielded or never yielded then it is easy (or easier) to detect errors such as: def func(): data = yield from get_data_async() for x in data: yield x When values are sometimes yielded and sometimes not, it's much harder to reliably throw an error when a value was yielded. Always using bare yields lets the code calling __next__() (I forget whether we're calling this "scheduler"...) raise an error if the value is not None. Cheers, Steve From guido at python.org Sat Dec 1 00:48:23 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 30 Nov 2012 15:48:23 -0800 Subject: [Python-ideas] An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) In-Reply-To: References: <1D9BE0CD-5BF4-480D-8D40-5A409E40760D@twistedmatrix.com> <20121130161422.GB536@snakebite.org> <50B93536.30104@canterbury.ac.nz> Message-ID: On Fri, Nov 30, 2012 at 3:32 PM, Steve Dower wrote: > Guido van Rossum wrote: >> Greg Ewing wrote: >>> Guido van Rossum wrote: >>>> >>>> Futures or callbacks, that's the question... >>>> >>>> Richard and I have even been considering APIs like this: >>>> >>>> res = obj.some_call() >>>> if isinstance(res, Future): >>>> res = yield res >>> >>> >>> I thought you had decided against the idea of yielding futures? >> >> As a user-facing API style, yes. But this is meant for an internal API >> -- the equivalent of your bare 'yield'. If you want to, I can consider another style as well >> >> >> res = obj.some_call() >> if isinstance(res, Future): >> res.() >> yield >> >> But I don't see a fundamental advantage to this. > > I do, it completely avoids ever using yield from to pass values around when used for coroutines. > > If values are always yielded or never yielded then it is easy (or easier) to detect errors such as: > > def func(): > data = yield from get_data_async() > for x in data: > yield x > > When values are sometimes yielded and sometimes not, it's much harder to reliably throw an error when a value was yielded. Always using bare yields lets the code calling __next__() (I forget whether we're calling this "scheduler"...) raise an error if the value is not None. Good point. I'll keep this in mind. -- --Guido van Rossum (python.org/~guido) From rene at stranden.com Sat Dec 1 00:57:08 2012 From: rene at stranden.com (Rene Nejsum) Date: Sat, 1 Dec 2012 00:57:08 +0100 Subject: [Python-ideas] An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) In-Reply-To: References: <1D9BE0CD-5BF4-480D-8D40-5A409E40760D@twistedmatrix.com> <20121130161422.GB536@snakebite.org> Message-ID: <367DB117-A21A-4A9E-A401-3FCF4C6FE6FD@stranden.com> On Nov 30, 2012, at 8:04 PM, Guido van Rossum wrote: > Futures or callbacks, that's the question? I would strongly recommend Futures, most importantly because it seams to handle Threads more elegantly, since it is easier to move between Threads. > > Richard and I have even been considering APIs like this: > > res = obj.some_call() > if isinstance(res, Future): > res = yield res > > or > > res = obj.some_call() > if res is None: > res = yield > > where is some call on the scheduler/eventloop/proactor that > pulls the future out of a hat. > > The idea of the first version is simply to avoid the Future when the > result happens to be immediately ready (e.g. when calling readline() > on some buffering stream, most of the time the next line is already in > the buffer); the point of the second version is that "res is None" is > way faster than "isinstance(res, Future)" -- however the magic is a > little awkward. > > The debate is still open. Great :-) I understand that there are several layers involved (1) old style function call, 2) yield/coroutines and 3) threads) but I believe a model that handles all levels alike would be preferable. As a 3'rd API, consider: res = obj.some_call() self.other_call() print res the some_call() is *always" async and res i *always* a Future, 1) if executed in same thread it can be optimised out and be a normal function call 2) if coroutine it's a perfect time for t.switch() 3) if threads other_call() continues and res blocks if not ready Or maybe the notion of all objects running in separate coroutines/threads, all methods being async and all return values being Futures is something for Python 4? :-) (or PyLang an Erlang lookalike) br /rene > > --Guido > > On Fri, Nov 30, 2012 at 9:57 AM, Steve Dower wrote: >> Trent Nelson wrote: >>> TL;DR version: >>> >>> Provide an async interface that is implicitly asynchronous; >>> all calls return immediately, callbacks are used to handle >>> success/error/timeout. >> >> This is the central idea of what I've been advocating - the use of Future. Rather than adding an extra parameter to the initial call, asynchronous methods return an object that can have callbacks added. >> >>> The biggest benefit is that no assumption is made as to how the >>> asynchronicity is achieved. Note that I didn't mention IOCP or >>> kqueue or epoll once. Those are all implementation details that >>> the writer of an asynchronous Python app doesn't need to care about. >> >> I think this is why I've been largely ignored (except by Guido) - I don't even mention sockets, let alone the implementation details :). There are all sorts of operations that can be run asynchronously that do not involve sockets, though it seems that the driving force behind most of the effort is just to make really fast web servers. >> >> My code contribution is at http://bitbucket.org/stevedower/wattle, though I have not updated it in a while and there are certainly aspects that I would change. You may find it interesting if you haven't seen it yet. >> >> Cheers, >> Steve >> >> -----Original Message----- >> From: Python-ideas [mailto:python-ideas-bounces+steve.dower=microsoft.com at python.org] On Behalf Of Trent Nelson >> Sent: Friday, November 30, 2012 0814 >> To: Guido van Rossum >> Cc: Glyph; python-ideas at python.org >> Subject: [Python-ideas] An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) >> >> [ It's tough coming up with unique subjects for these async >> discussions. I've dropped python-dev and cc'd python-ideas >> instead as the stuff below follows on from the recent msgs. ] >> >> TL;DR version: >> >> Provide an async interface that is implicitly asynchronous; >> all calls return immediately, callbacks are used to handle >> success/error/timeout. >> >> class async: >> def accept(): >> def read(): >> def write(): >> def getaddrinfo(): >> def submit_work(): >> >> How the asynchronicity (not a word, I know) is achieved is >> an implementation detail, and will differ for each platform. >> >> (Windows will be able to leverage all its async APIs to full >> extent, Linux et al can keep mimicking asynchronicity via >> the usual non-blocking + multiplexing (poll/kqueue etc), >> thread pools, etc.) >> >> >> On Wed, Nov 28, 2012 at 11:15:07AM -0800, Glyph wrote: >>> On Nov 28, 2012, at 12:04 PM, Guido van Rossum wrote: >>> I would also like to bring up again. >> >> So, I spent yesterday working on the IOCP/async stuff. The saw this >> PEP and the sample async/abstract.py. That got me thinking: why don't >> we have a low-level async facade/API? Something where all calls are >> implicitly asynchronous. >> >> On systems with extensive support for asynchronous 'stuff', primarily >> Windows and AIX/Solaris to a lesser extent, we'd be able to leverage >> the platform-provided async facilities to full effect. >> >> On other platforms, we'd fake it, just like we do now, with select, >> poll/epoll, kqueue and non-blocking sockets. >> >> Consider the following: >> >> class Callback: >> __slots__ = [ >> 'success', >> 'failure', >> 'timeout', >> 'cancel', >> ] >> >> class AsyncEngine: >> def getaddrinfo(host, port, ..., cb): >> ... >> >> def getaddrinfo_then_connect(.., callbacks=(cb1, cb2)) >> ... >> >> def accept(sock, cb): >> ... >> >> def accept_then_write(sock, buf, (cb1, cb2)): >> ... >> >> def accept_then_expect_line(sock, line, (cb1, cb2)): >> ... >> >> def accept_then_expect_multiline_regex(sock, regex, cb): >> ... >> >> def read_until(fd_or_sock, bytes, cb): >> ... >> >> def read_all(fd_or_sock, cb): >> return self.read_until(fd_or_sock, EOF, cb) >> >> def read_until_lineglob(fd_or_sock, cb): >> ... >> >> def read_until_regex(fd_or_sock, cb): >> ... >> >> def read_chunk(fd_or_sock, chunk_size, cb): >> ... >> >> def write(fd_or_sock, buf, cb): >> ... >> >> def write_then_expect_line(fd_or_sock, buf, (cb1, cb2)): >> ... >> >> def connect_then_expect_line(..): >> ... >> >> def connect_then_write_line(..): >> ... >> >> def submit_work(callable, cb): >> ... >> >> def run_once(..): >> """Run the event loop once.""" >> >> def run(..): >> """Keep running the event loop until exit.""" >> >> All methods always take at least one callback. Chained methods can >> take multiple callbacks (i.e. accept_then_expect_line()). You fill >> in the success, failure (both callables) and timeout (an int) slots. >> The engine will populate cb.cancel with a callable that you can call >> at any time to (try and) cancel the IO operation. (How quickly that >> works depends on the underlying implementation.) >> >> I like this approach for two reasons: a) it allows platforms with >> great async support to work at their full potential, and b) it >> doesn't leak implementation details like non-blocking sockets, fds, >> multiplexing (poll/kqueue/select, IOCP, etc). Those are all details >> that are taken care of by the underlying implementation. >> >> getaddrinfo is a good example here. Guido, in tulip, you have this >> implemented as: >> >> def getaddrinfo(host, port, af=0, socktype=0, proto=0): >> infos = yield from scheduling.call_in_thread( >> socket.getaddrinfo, >> host, port, af, >> socktype, proto >> ) >> >> That's very implementation specific. It assumes the only way to >> perform an async getaddrinfo is by calling it from a separate >> thread. On Windows, there's native support for async getaddrinfo(), >> which we wouldn't be able to leverage here. >> >> The biggest benefit is that no assumption is made as to how the >> asynchronicity is achieved. Note that I didn't mention IOCP or >> kqueue or epoll once. Those are all implementation details that >> the writer of an asynchronous Python app doesn't need to care about. >> >> Thoughts? >> >> Trent. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From thomas at kluyver.me.uk Sat Dec 1 13:28:50 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Sat, 1 Dec 2012 12:28:50 +0000 Subject: [Python-ideas] Conventions for function annotations Message-ID: Function annotations (PEP 3107) are a very interesting new feature, but so far have gone largely unused. The only project I've seen using them is plac, a command-line option parser. One reason for this is that because function annotations can be used to mean anything, we're wary of doing anything in case we interfere with some other use case. A recent thread on ipython-dev touched on this [1], and we'd like to suggest some conventions to make annotations useful for everyone. 1. Code inspecting annotations should be prepared to ignore annotations it can't understand. 2. Code creating annotations should use wrapper classes to indicate what the annotation means. For instance, we are contemplating a way to specify options for a parameter, to be used in tab completion, so we would do something like this: from IPython.core.completer import options def my_io(filename, mode: options('read','write') ='read'): ... 3. There are a couple of important exceptions to 2: - Annotations that are simply a string can be used like a docstring, to be displayed to the user. Inspecting code should not expect to be able to parse any machine-readable information out of these strings. - Annotations that are a built-in type (int, str, etc.) indicate that the value should always be an instance of that type. Inspecting code may use these for type checking, introspection, optimisation, or other such purposes. Note that for now, I have limited this to built-in types, so other types can be used for other purposes, but this could be extended. For instance, the ABCs from collections (collections.Mapping et al.) could well be added to this category. 4. There should be a convention for attaching multiple annotations to one value. I propose that all code using annotations expects to handle tuples/lists of annotations. (We also considered dictionaries, but the result is long and ugly). So in this definition: def my_io(filename, mode: (options('read','write'), str, 'The mode in which to open the file') ='read'): ... the mode parameter has a set of options (ignored by frameworks that don't recognise it), should always be a string, and has a description. Any thoughts and suggestions are welcome. As an aside, we may also create a couple of decorators to fill in __annotations__ on Python 2, something like: @return_annotation('A file obect') @annotations(mode=(options('read','write'), str, 'The mode in which to open the file')) def my_io(filename, mode='read'): ... [1] http://mail.scipy.org/pipermail/ipython-dev/2012-November/010697.html Thanks, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Sat Dec 1 15:59:59 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 1 Dec 2012 16:59:59 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: I think code related to annotations is tightly coupled with annotated function usage context (decorator, metaclass, function caller). So annotation really can mean anything and it depends from context. I don't see use case when context need to ignore unexpected annotation. In my practice annotation is always expected if specified, absence of annotation for parameter is mark to do nothing with it (it can be allowed or disabled depending of context requirements). The same for multiple annotations. If your context allow it ? that's up to you. Exact kind of composition to use depends from context ? it can be tuple, dict, user-defined composition object. My point is: we dont need to restrict annotations in any way. If some libraries want to share annotations that means they are tightly enough coupled and can make rules for itself. All other code can go in the wild. On Sat, Dec 1, 2012 at 2:28 PM, Thomas Kluyver wrote: > Function annotations (PEP 3107) are a very interesting new feature, but so > far have gone largely unused. The only project I've seen using them is plac, > a command-line option parser. One reason for this is that because function > annotations can be used to mean anything, we're wary of doing anything in > case we interfere with some other use case. A recent thread on ipython-dev > touched on this [1], and we'd like to suggest some conventions to make > annotations useful for everyone. > > 1. Code inspecting annotations should be prepared to ignore annotations it > can't understand. > > 2. Code creating annotations should use wrapper classes to indicate what the > annotation means. For instance, we are contemplating a way to specify > options for a parameter, to be used in tab completion, so we would do > something like this: > > from IPython.core.completer import options > def my_io(filename, mode: options('read','write') ='read'): > ... > > 3. There are a couple of important exceptions to 2: > - Annotations that are simply a string can be used like a docstring, to be > displayed to the user. Inspecting code should not expect to be able to parse > any machine-readable information out of these strings. > - Annotations that are a built-in type (int, str, etc.) indicate that the > value should always be an instance of that type. Inspecting code may use > these for type checking, introspection, optimisation, or other such > purposes. Note that for now, I have limited this to built-in types, so other > types can be used for other purposes, but this could be extended. For > instance, the ABCs from collections (collections.Mapping et al.) could well > be added to this category. > > 4. There should be a convention for attaching multiple annotations to one > value. I propose that all code using annotations expects to handle > tuples/lists of annotations. (We also considered dictionaries, but the > result is long and ugly). So in this definition: > > def my_io(filename, mode: (options('read','write'), str, 'The mode in which > to open the file') ='read'): > ... > > the mode parameter has a set of options (ignored by frameworks that don't > recognise it), should always be a string, and has a description. > > Any thoughts and suggestions are welcome. > > As an aside, we may also create a couple of decorators to fill in > __annotations__ on Python 2, something like: > > @return_annotation('A file obect') > @annotations(mode=(options('read','write'), str, 'The mode in which to open > the file')) > def my_io(filename, mode='read'): > ... > > [1] http://mail.scipy.org/pipermail/ipython-dev/2012-November/010697.html > > > Thanks, > Thomas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Thanks, Andrew Svetlov From tismer at stackless.com Sat Dec 1 16:51:01 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 01 Dec 2012 16:51:01 +0100 Subject: [Python-ideas] An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) In-Reply-To: References: <1D9BE0CD-5BF4-480D-8D40-5A409E40760D@twistedmatrix.com> <20121130161422.GB536@snakebite.org> Message-ID: <50BA2765.3010405@stackless.com> On 30.11.12 20:29, Guido van Rossum wrote: > On Fri, Nov 30, 2012 at 11:18 AM, Steve Dower wrote: >> Guido van Rossum wrote: >>> Futures or callbacks, that's the question... >> I know the C++ standards committee is looking at the same thing right now, and they're probably going to provide both: futures for those who prefer them (which is basically how the code looks) and callbacks for when every cycle is critical or if the developer prefers them. C++ has the advantage that futures can often be optimized out, so implementing a Future-based wrapper around a callback-based function is very cheap, but the two-level API will probably happen. > Well, for Python 3 we will definitely have two layers already: > callbacks and yield-from-based-coroutines. The question is whether > there's room for Futures in between (I like layers of abstraction, but > I don't like having too many layers). So far I agree very much. > ... > The debate is still open. >> How about: >> >> value, future = obj.some_call(...) >> if value is None: >> value = yield future > Also considered; I don't really like having to allocate a tuple here > (which is impossible to optimize out completely, even though its > allocation may use a fast free list). A little remark: I do respect personal taste very much, and if a tuple can be avoided I'm in fore sure. But the argument of the cost of a tuple creation is something that even I no longer consider relevant, especially in a context of other constructs like yield-from which are (currently) not even efficient ( O(n)-wise ). The discussion should better stay design oriented and not consider little overhead by a constant factor. But I agree that returned tuples are not a nice pattern to be used all the time. cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From thomas at kluyver.me.uk Sat Dec 1 17:30:49 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Sat, 1 Dec 2012 16:30:49 +0000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: I think annotations are potentially very useful for things like introspection and static analysis. For instance, your IDE could warn you if you pass a parameter that doesn't match the type specified in an annotation. In these cases, the code reading the annotations isn't coupled with the function definitions. I'm not aiming to restrict annotations, just to establish some conventions to make them useful. We have a convention, for instance, that attributes with a leading underscore are private. That's a useful basis that everyone understands, so when you do obj. in IPython, it doesn't show those attributes by default. I'd like to have some conventions of that nature around annotations. Thomas On 1 December 2012 14:59, Andrew Svetlov wrote: > I think code related to annotations is tightly coupled with annotated > function usage context (decorator, metaclass, function caller). > So annotation really can mean anything and it depends from context. > I don't see use case when context need to ignore unexpected > annotation. In my practice annotation is always expected if specified, > absence of annotation for parameter is mark to do nothing with it (it > can be allowed or disabled depending of context requirements). > The same for multiple annotations. If your context allow it ? that's > up to you. Exact kind of composition to use depends from context ? it > can be tuple, dict, user-defined composition object. > > My point is: we dont need to restrict annotations in any way. If some > libraries want to share annotations that means they are tightly enough > coupled and can make rules for itself. All other code can go in the > wild. > > On Sat, Dec 1, 2012 at 2:28 PM, Thomas Kluyver > wrote: > > Function annotations (PEP 3107) are a very interesting new feature, but > so > > far have gone largely unused. The only project I've seen using them is > plac, > > a command-line option parser. One reason for this is that because > function > > annotations can be used to mean anything, we're wary of doing anything in > > case we interfere with some other use case. A recent thread on > ipython-dev > > touched on this [1], and we'd like to suggest some conventions to make > > annotations useful for everyone. > > > > 1. Code inspecting annotations should be prepared to ignore annotations > it > > can't understand. > > > > 2. Code creating annotations should use wrapper classes to indicate what > the > > annotation means. For instance, we are contemplating a way to specify > > options for a parameter, to be used in tab completion, so we would do > > something like this: > > > > from IPython.core.completer import options > > def my_io(filename, mode: options('read','write') ='read'): > > ... > > > > 3. There are a couple of important exceptions to 2: > > - Annotations that are simply a string can be used like a docstring, to > be > > displayed to the user. Inspecting code should not expect to be able to > parse > > any machine-readable information out of these strings. > > - Annotations that are a built-in type (int, str, etc.) indicate that the > > value should always be an instance of that type. Inspecting code may use > > these for type checking, introspection, optimisation, or other such > > purposes. Note that for now, I have limited this to built-in types, so > other > > types can be used for other purposes, but this could be extended. For > > instance, the ABCs from collections (collections.Mapping et al.) could > well > > be added to this category. > > > > 4. There should be a convention for attaching multiple annotations to one > > value. I propose that all code using annotations expects to handle > > tuples/lists of annotations. (We also considered dictionaries, but the > > result is long and ugly). So in this definition: > > > > def my_io(filename, mode: (options('read','write'), str, 'The mode in > which > > to open the file') ='read'): > > ... > > > > the mode parameter has a set of options (ignored by frameworks that don't > > recognise it), should always be a string, and has a description. > > > > Any thoughts and suggestions are welcome. > > > > As an aside, we may also create a couple of decorators to fill in > > __annotations__ on Python 2, something like: > > > > @return_annotation('A file obect') > > @annotations(mode=(options('read','write'), str, 'The mode in which to > open > > the file')) > > def my_io(filename, mode='read'): > > ... > > > > [1] > http://mail.scipy.org/pipermail/ipython-dev/2012-November/010697.html > > > > > > Thanks, > > Thomas > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > Thanks, > Andrew Svetlov > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 2 07:58:57 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Dec 2012 16:58:57 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: On Sun, Dec 2, 2012 at 4:26 PM, Robert McGibbon wrote: > By being *tolerant and well behaved when confronted with annotations that > our library doesn't understand I*, I > think we can use function annotations without a short-range decorator that > translates their information in some other > structure. If other annotation-using libraries are also willing to ignore > our tabbing annotations if/when they encounter them, > then can't we all get along smoothly? > * > * > (For reference, the feature will look/work something like this) > > * > In[1]: def foo(filename : tab_glob('*.txt')): # tab completion that > recommends files/directories that match a glob pattern > ... pass > ... > In[2]: foo( > 'a.txt' 'b.txt' > 'c.txt' 'dir/' > * > You're missing the other key reason for requiring decorators that interpret function annotations: they're there for the benefit of *readers*, not just other software. Given your definition above, I don't know what the annotations are for, except by recognising the "tab_glob" call. However, that then breaks as soon as the expression is put into a named variable earlier in the file: def foo(filename : text_files): # What does this mean? pass But the reader can be told *explicitly* what the annotations are related to via a decorator: @tab_expansion def foo(filename : text_files): # Oh, it's just a tab expansion specifier pass Readers no longer have to guess from context, and if the tab_expansion decorator creates IPython-specific metadata, then the interpreter doesn't need to guess either. (Note that you *can* use ordinary mechanisms like class decorators, metaclasses, post-creation modification of classes and IDE snippet inclusion to avoid the need to type out the "this is what these annotations mean" decorator explicitly. However, that's just an application of Python's standard abstraction tools, rather than a further special case convention) Mixing annotations intended for different consumers is a fundamentally bad idea, as it encourages unreadable code and complex dances to avoid stepping on each other's toes. It's better to design a *separate* API that supports composition by passing the per-parameter details directly to a decorator factory (which then adds appropriate named attributes to the function), with annotations used just as syntactic sugar for simple cases where no composition is involved. The important first question to ask is "How would we solve this if annotations didn't exist?" and only *then* look at the shorthand case for function-annotations. For cases where function annotations make code more complex or less robust, *don't use them*. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 2 05:58:27 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Dec 2012 14:58:27 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: On Sun, Dec 2, 2012 at 2:30 AM, Thomas Kluyver wrote: > I think annotations are potentially very useful for things like > introspection and static analysis. For instance, your IDE could warn you if > you pass a parameter that doesn't match the type specified in an > annotation. In these cases, the code reading the annotations isn't coupled > with the function definitions. > > I'm not aiming to restrict annotations, just to establish some conventions > to make them useful. We have a convention, for instance, that attributes > with a leading underscore are private. That's a useful basis that everyone > understands, so when you do obj. in IPython, it doesn't show those attributes by default. I'd like to have some conventions of that > nature around annotations. Indeed, composability is a problem with annotations. I suspect the only way to resolve this systematically is to adopt a convention where annotations are used *strictly* for short-range communication with an associated decorator that transfers the annotation details to a *different* purpose-specific location for long-term introspection. Furthermore, if composability is going to be possible in general, annotations can really *only* be used as a convenience API, with an underlying API where the necessary details are supplied directly to the decorator. For example, here's an example using the main decorator API for a cffi callback declaration [1]: @cffi.callback("int (char *, int)"): def my_cb(arg1, arg2): ... The problem with this is that it can get complicated to map C-level types to parameter names as the function signature gets more complicated. So, what you may want to do is write a decorator that builds the CFFI signature from annotations on the individual parameters: @annotated_cffi_callback def my_cb(arg1: "char *", arg2: "int") -> "int": ... The decorator would turn that into an ordinary call to cffi.callback, so future introspection wouldn't look at the annotations mapping at all, it would look directly at the CFFI metadata. Annotations should probably only ever be introspected by their associated decorator, and if you really want to apply multiple decorators with annotation support to a single function, you're going to have to fall back to the non-annotation based API for at least some of them. Once you start trying to overload the annotation field with multiple annotations, the readability gain for closer association with the individual parameters is counterbalanced by the loss of association between the subannotations and their corresponding decorators. [1] http://cffi.readthedocs.org/en/latest/#callbacks Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmcgibbo at gmail.com Sun Dec 2 11:12:06 2012 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Sun, 2 Dec 2012 02:12:06 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> Nick, Thanks! You make a very convincing argument. Especially if this represents the collective recommendation of the python core development team on the proper conventions surrounding the use of function annotations, I would encourage you guys to perhaps make it more widely known (blogs, etc). As python 3.x adoption continues to move forward, this type of thing could become an issue if shmucks like me start using the annotation feature more widely. -Robert On Dec 1, 2012, at 10:58 PM, Nick Coghlan wrote: > Mixing annotations intended for different consumers is a fundamentally bad idea, as it encourages unreadable code and complex dances to avoid stepping on each other's toes. It's better to design a *separate* API that supports composition by passing the per-parameter details directly to a decorator factory (which then adds appropriate named attributes to the function), with annotations used just as syntactic sugar for simple cases where no composition is involved. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 2 12:43:34 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 2 Dec 2012 21:43:34 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> Message-ID: On Sun, Dec 2, 2012 at 8:12 PM, Robert McGibbon wrote: > Nick, > > Thanks! You make a very convincing argument. > > Especially if this represents the collective recommendation of the python > core development team on the proper conventions surrounding the use of > function annotations, I would encourage you guys to perhaps make it more > widely known (blogs, etc). As python 3.x adoption continues to move > forward, this type of thing could become an issue if shmucks like me start > using the annotation feature more widely. > Last time it came up, the collective opinion on python-dev was still to leave PEP 8 officially neutral on the topic so that people could experiment more freely with annotations and the community could help figure out what worked well and what didn't. Admittedly this was long enough ago that I don't remember the details, just the obvious consequence that PEP 8 remains largely silent on the matter, aside from declaring that function annotations are off-limits for standard library modules: "The Python standard library will not use function annotations as that would result in a premature commitment to a particular annotation style. Instead, the annotations are left for users to discover and experiment with useful annotation styles." Obviously, I'm personally rather less open-minded on the topic of *composition* in particular, as that's a feature I'm firmly convinced should be left in the hands of ordinary decorator usage. I believe trying to contort annotations to handle that cause is almost certain to result in something less readable than the already possible decorator equivalent. However, the flip-side of the argument is that if we assume my opinion is correct and document it as an official recommendation in PEP 8, then many people won't even *try* to come up with good approaches to composition for function annotations. Maybe there *is* an elegant, natural solution out there that's superior to using explicit calls to decorator factories for the cases that involve composition. If PEP 8 declares "just use decorator factories for cases involving composition, and always design your APIs with a non-annotation based fallback for such cases", would we be inadvertently shutting down at least some of the very experimentation we intended to allow? After all, while I don't think the composition proposal in this thread reached the bar of being more readable than just composing decorator factories to handle more complex cases, I *do* think it is quite a decent attempt. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Sun Dec 2 16:25:24 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Sun, 2 Dec 2012 15:25:24 +0000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> Message-ID: On 2 December 2012 11:43, Nick Coghlan wrote: > However, the flip-side of the argument is that if we assume my opinion is > correct and document it as an official recommendation in PEP 8, then many > people won't even *try* to come up with good approaches to composition for > function annotations. Maybe there *is* an elegant, natural solution out > there that's superior to using explicit calls to decorator factories for > the cases that involve composition. If PEP 8 declares "just use decorator > factories for cases involving composition, and always design your APIs with > a non-annotation based fallback for such cases", would we be inadvertently > shutting down at least some of the very experimentation we intended to > allow? My concern with this is that it's tricky to experiment with composition. If you want to simultaneously use annotations for, say, one framework that checks argument types, and one that documents individual arguments based on annotations, they need to be using the same mechanism to compose annotation values. Alternatively, the first one to access the annotations could decompose the values, leaving them in a form the second can understand - but that sounds brittle and opaque. Another proposed mechanism (Robert's idea) which I didn't mention above is to override __add__, so that multiple annotations can be composed like this: def my_io(filename, mode: tab('read','write') + typed(str) ='read'): ... As a possible workaround, here's a decorator for decorators that makes the following two definitions equivalent: https://gist.github.com/4189289 @check_argtypes def checked1(a:int, b:str): pass @check_argtypes(a=int, b=str) def checked2(a, b): pass With this, it's easy to use annotations where possible, and you benefit from the extra clarity, but it's equally simple to pass the values as arguments to the decorator, for instance if the annotations are already in use for something else. It should also work under Python 2, using the non-annotated version. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Dec 2 23:23:25 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 03 Dec 2012 09:23:25 +1100 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> Message-ID: <50BBD4DD.7010703@pearwood.info> On 02/12/12 22:43, Nick Coghlan wrote: > Last time it came up, the collective opinion on python-dev was still to > leave PEP 8 officially neutral on the topic so that people could experiment > more freely with annotations and the community could help figure out what > worked well and what didn't. Admittedly this was long enough ago that I > don't remember the details, just the obvious consequence that PEP 8 remains > largely silent on the matter, aside from declaring that function > annotations are off-limits for standard library modules: "The Python > standard library will not use function annotations as that would result in > a premature commitment to a particular annotation style. Instead, the > annotations are left for users to discover and experiment with useful > annotation styles." I fear that this was a strategic mistake. The result, it seems to me, is that annotations have been badly neglected. I can't speak for others, but I heavily use the standard library as a guide to what counts as good practice in Python. I'm not a big user of third party libraries, and most of those are for 2.x, so with the lack of annotations in the std lib I've had no guidance as to what sort of things annotations could be used for apart from "type checking". I'm sure that I'm not the only one. -- Steven From andrew.svetlov at gmail.com Mon Dec 3 00:33:59 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 3 Dec 2012 01:33:59 +0200 Subject: [Python-ideas] WSAPoll and tulip In-Reply-To: References: <20121127123325.GH90314@snakebite.org> <20121127154204.5fc81457@pitrou.net> <20121127150330.GB91191@snakebite.org> Message-ID: Created http://bugs.python.org/issue16596 for jumping over yields. Please review. On Wed, Nov 28, 2012 at 11:41 PM, Nick Coghlan wrote: > That will need to be well highlighted in What's New, as it could be very > confusing if the iterator is never called again. > > -- > Sent from my phone, thus the relative brevity :) -- Thanks, Andrew Svetlov From aquavitae69 at gmail.com Mon Dec 3 06:05:22 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Mon, 3 Dec 2012 07:05:22 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <50BBD4DD.7010703@pearwood.info> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> Message-ID: > I fear that this was a strategic mistake. The result, it seems to me, is that > annotations have been badly neglected. > > I can't speak for others, but I heavily use the standard library as a guide > to what counts as good practice in Python. I'm not a big user of third party > libraries, and most of those are for 2.x, so with the lack of annotations in > the std lib I've had no guidance as to what sort of things annotations could > be used for apart from "type checking". > > I'm sure that I'm not the only one. > > > > -- > Steven > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From raymond.hettinger at gmail.com Mon Dec 3 09:09:17 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 3 Dec 2012 00:09:17 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> Message-ID: <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> On Dec 2, 2012, at 3:43 AM, Nick Coghlan wrote: > Admittedly this was long enough ago that I don't remember the details, just the obvious consequence that PEP 8 remains largely silent on the matter, aside from declaring that function annotations are off-limits for standard library modules: PEP 8 is not "largely silent" on the subject: "'' The Python standard library will not use function annotations as that would result in a premature commitment to a particular annotation style. Instead, the annotations are left for users to discover and experiment with useful annotation styles. Early core developer attempts to use function annotations revealed inconsistent, ad-hoc annotation styles. For example: [str] was ambiguous as to whether it represented a list of strings or a value that could be either str or None. The notation open(file:(str,bytes)) was used for a value that could be either bytes or str rather than a 2-tuple containing a str value followed by a bytesvalue. The annotation seek(whence:int) exhibited an mix of over-specification and under-specification: int is too restrictive (anything with __index__ would be allowed) and it is not restrictive enough (only the values 0, 1, and 2 are allowed). Likewise, the annotation write(b: bytes) was also too restrictive (anything supporting the buffer protocol would be allowed). Annotations such as read1(n: int=None) were self-contradictory since None is not an int. Annotations such as source_path(self, fullname:str) -> objectwere confusing about what the return type should be. In addition to the above, annotations were inconsistent in the use of concrete types versus abstract types: int versus Integral and set/frozenset versus MutableSet/Set. Some annotations in the abstract base classes were incorrect specifications. For example, set-to-set operations require other to be another instance of Setrather than just an Iterable. A further issue was that annotations become part of the specification but weren't being tested. In most cases, the docstrings already included the type specifications and did so with greater clarity than the function annotations. In the remaining cases, the docstrings were improved once the annotations were removed. The observed function annotations were too ad-hoc and inconsistent to work with a coherent system of automatic type checking or argument validation. Leaving these annotations in the code would have made it more difficult to make changes later so that automated utilities could be supported. ''' Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Dec 3 10:21:14 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 3 Dec 2012 10:21:14 +0100 Subject: [Python-ideas] Conventions for function annotations References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: <20121203102114.58a63d2b@pitrou.net> Le Mon, 3 Dec 2012 00:09:17 -0800, Raymond Hettinger a ?crit : > > Early core developer attempts to use function annotations revealed > inconsistent, ad-hoc annotation styles. For example: > > [str] was ambiguous as to whether it represented a list of strings or > a value that could be either str or None. The notation > open(file:(str,bytes)) was used for a value that could be either > bytes or str rather than a 2-tuple containing a str value followed by > a bytesvalue. The annotation seek(whence:int) exhibited an mix of > over-specification and under-specification: int is too restrictive > (anything with __index__ would be allowed) and it is not restrictive > enough (only the values 0, 1, and 2 are allowed). Likewise, the > annotation write(b: bytes) was also too restrictive (anything > supporting the buffer protocol would be allowed). Annotations such as > read1(n: int=None) were self-contradictory since None is not an int. > Annotations such as source_path(self, fullname:str) -> objectwere > confusing about what the return type should be. In addition to the > above, annotations were inconsistent in the use of concrete types > versus abstract types: int versus Integral and set/frozenset versus > MutableSet/Set. Some annotations in the abstract base classes were > incorrect specifications. For example, set-to-set operations require > other to be another instance of Setrather than just an Iterable. In short, we have discovered that declarative typing isn't very useful :-) Regards Antoine. From p.f.moore at gmail.com Mon Dec 3 10:30:50 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 3 Dec 2012 09:30:50 +0000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203102114.58a63d2b@pitrou.net> Message-ID: Sorry, should have gone to the list On 3 December 2012 09:30, Paul Moore wrote: > On 3 December 2012 09:21, Antoine Pitrou wrote: >> In short, we have discovered that declarative typing isn't very >> useful :-) > > .. but haven't thought of any other useful applications of > annotations, and nor has the collective community on PyPI. > > Annotations seem like a solution looking for a problem, to me. (Which > is a shame, as they look like a pretty cool solution) > Paul From rmcgibbo at gmail.com Mon Dec 3 10:41:15 2012 From: rmcgibbo at gmail.com (Robert McGibbon) Date: Mon, 3 Dec 2012 01:41:15 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203102114.58a63d2b@pitrou.net> Message-ID: <5A715C4D-18C8-4BF0-B972-761BFDDAC3F3@gmail.com> The IPython community has thought of using annotations to do argument specific tab completion in the interactive interpreter. For example, a load function whose first argument is supposed to be files matching a certain glob pattern might use a function annotation on that argument to specify the glob pattern. A sympy maintainer, Aaron Meurer, has also expressed interest in using this feature -- as implemented in ipython -- to annotate sympy functions' return values by type to facilitate tab completion for chained calls like f(x). I'm working on this feature for IPython (PR: Function annotation based hooks into the tab completion system). I've already benefited a lot from the discussion on this thread in terms of the design of the API. Specifically Nick Coghlan's arguments have been very enlightening. Comments, suggestions, contributions, etc are welcome! -Robert On Dec 3, 2012, at 1:30 AM, Paul Moore wrote: > Sorry, should have gone to the list > > On 3 December 2012 09:30, Paul Moore wrote: >> On 3 December 2012 09:21, Antoine Pitrou wrote: >>> In short, we have discovered that declarative typing isn't very >>> useful :-) >> >> .. but haven't thought of any other useful applications of >> annotations, and nor has the collective community on PyPI. >> >> Annotations seem like a solution looking for a problem, to me. (Which >> is a shame, as they look like a pretty cool solution) >> Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Mon Dec 3 11:52:26 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Mon, 3 Dec 2012 10:52:26 +0000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203102114.58a63d2b@pitrou.net> Message-ID: On 3 December 2012 09:30, Paul Moore wrote: > > .. but haven't thought of any other useful applications of > > annotations, and nor has the collective community on PyPI. > I suspect that the lack of applications is partly due to people not knowing about them, code having to still support Python 2, and an absence of guidelines about how to use them safely. For our part, I think we'll push forwards following Nick's suggestions - annotations to be accessed by closely coupled decorators only. Thanks all, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Dec 3 12:08:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 3 Dec 2012 21:08:01 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: On Mon, Dec 3, 2012 at 6:09 PM, Raymond Hettinger < raymond.hettinger at gmail.com> wrote: > > On Dec 2, 2012, at 3:43 AM, Nick Coghlan wrote: > > Admittedly this was long enough ago that I don't remember the details, > just the obvious consequence that PEP 8 remains largely silent on the > matter, aside from declaring that function annotations are off-limits for > standard library modules: > > > PEP 8 is not "largely silent" on the subject: > It's effectively silent on the matters at hand, which are: * the advisability of using annotations without an associated decorator that makes the interpretation currently in play explicit (while the examples given do illustrate why *not* doing this is a bad idea, it doesn't explicitly state that conclusion, merely "we're not going to use them in the standard library at this point") * the advisability of providing a pure annotations API, without any fallback to an explicit decorator factory * the advisability of handling composition within the annotations themselves, rather than by falling back to explicit decorator factories * the advisability of using the __annotations__ dictionary for long-term introspection, rather than using the decorator to move the information to a purpose-specific location in a separate function attribute I would be *quite delighted* if people are open to the idea of making a much stronger recommendation along the following lines explicit in PEP 8: ================== * If function annotations are used, it is recommended that: * the annotation details should be designed with a specific practical use case in mind * the annotations are used solely as a form of syntactic sugar for passing arguments to a decorator factory that would otherwise accept explicit per-parameter arguments * the decorator factory name should provide the reader of the code with a strong hint as to the intended meaning of the parameter annotations (or at least a convenient reference point to look up in the documentation) * in simple cases, using parameter and return type annotations will then allow the per-parameter details to be mapped easily by both the code author and later readers without requiring repetition of parameter names or careful alignment of factory arguments with parameter positions. * the explicit form remains available to handle more complex situations (such as applying multiple decorators to the same function) without requiring complicated conventions for composing independent annotations on a single function ================== In relation to the last point, I consider composing annotations to be analogous to composing function arguments. Writing: @g @f def annotated(arg1: (a, x), arg2: (b, y), arg3: (c, z)): ... instead of the much simpler: @g(x, y, z) @f(a, b, c) def annotated(arg1, arg2, arg3): ... is analagous to writing: args = [(a, x), (b, y), (c, z)] f(*(x[0] for x in args)) g(*(x[1] for x in args)) instead of the more obvious: f(a, b, c) g(x, y, z) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Mon Dec 3 12:45:51 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 3 Dec 2012 13:45:51 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: On Mon, Dec 3, 2012 at 1:08 PM, Nick Coghlan wrote: > * the advisability of using the __annotations__ dictionary for long-term > introspection, rather than using the decorator to move the information to a > purpose-specific location in a separate function attribute My 5 cents: perhaps you don't need to use __annotations__ at all, Signature object (PEP 362) gives more convenient way for gathering information about function spec. From ncoghlan at gmail.com Mon Dec 3 12:51:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 3 Dec 2012 21:51:01 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: On Mon, Dec 3, 2012 at 9:45 PM, Andrew Svetlov wrote: > On Mon, Dec 3, 2012 at 1:08 PM, Nick Coghlan wrote: > > * the advisability of using the __annotations__ dictionary for long-term > > introspection, rather than using the decorator to move the information > to a > > purpose-specific location in a separate function attribute > My 5 cents: perhaps you don't need to use __annotations__ at all, > Signature object (PEP 362) gives more convenient way for gathering > information about function spec. > I don't quite understand that comment - PEP 362 is purely an access mechanism. The underlying storage is still in __annotations__ (at least as far any annotations are concerned). However, using separate storage is a natural consequence of also providing an explicit decorator factory API, so I didn't bring it up. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Mon Dec 3 12:53:59 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 3 Dec 2012 13:53:59 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: Ok, you right. I told about access mechanism only. On Mon, Dec 3, 2012 at 1:51 PM, Nick Coghlan wrote: > On Mon, Dec 3, 2012 at 9:45 PM, Andrew Svetlov > wrote: >> >> On Mon, Dec 3, 2012 at 1:08 PM, Nick Coghlan wrote: >> > * the advisability of using the __annotations__ dictionary for long-term >> > introspection, rather than using the decorator to move the information >> > to a >> > purpose-specific location in a separate function attribute >> My 5 cents: perhaps you don't need to use __annotations__ at all, >> Signature object (PEP 362) gives more convenient way for gathering >> information about function spec. > > > I don't quite understand that comment - PEP 362 is purely an access > mechanism. The underlying storage is still in __annotations__ (at least as > far any annotations are concerned). > > However, using separate storage is a natural consequence of also providing > an explicit decorator factory API, so I didn't bring it up. > > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Thanks, Andrew Svetlov From barry at python.org Mon Dec 3 16:34:16 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 3 Dec 2012 10:34:16 -0500 Subject: [Python-ideas] Conventions for function annotations References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> Message-ID: <20121203103416.03094472@resist.wooz.org> On Dec 03, 2012, at 09:08 PM, Nick Coghlan wrote: >I would be *quite delighted* if people are open to the idea of making a >much stronger recommendation along the following lines explicit in PEP 8: I am -1 for putting any of what followed in PEP 8, and in fact, I think the existing examples at the bottom of PEP 8 are inappropriate. PEP 8 should be prescriptive of explicit Python coding styles. Think "do this, not that". It should be as minimal as possible, and in general provide rules that can be easily referenced and perhaps automated (e.g. pep8.py). Some of the existing text in PEP 8 already doesn't fall under that rubric, but it's close enough (e.g. designing for inheritance). I don't think annotations reach the level of consensus or practical experience needed to be added to PEP 8. OTOH, I wouldn't oppose a new informational PEP labeled "Annotations Best Practices", where some of these principles can be laid out and explored. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Mon Dec 3 18:27:35 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 3 Dec 2012 09:27:35 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <20121203103416.03094472@resist.wooz.org> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: Hm. I agree PEP 8 seems an odd place for Nick's recommendation. Even if I were to agree with hos proposal I would think it belongs in a different PEP than PEP 8. But personally I haven't given up on using annotations to give type hints -- I think it can at some times be a useful augmentation to static analysis (whose use I see mostly as an aid to human readers and/or tools like linters, IDEs, and refactoring tools, not for guiding compiler optimizations). I know of several projects (both public and private) for improving the state of the art of Python static analysis with this goal in mind. With the advent of e.g. TypeScript and Dart in the JavaScript world, optional type annotations for dynamic languages appear to be becoming more fashionable, and maybe we can get some use out of them. FWIW, as far as e.g. 'int' being both overspecified and underspecified: I don't care about the underspecification so much, that's always going to happen; and for the overspecification, we can either use some abstract class instead, or simply state that the occurrence of certain concrete types must be taken as a shorthand for a specific abstract type. This could be part of the registration call of the concrete type, or something. Obviously this would require inventing and standardizing notations for things like "list of X", "tuple with items X, Y, Z", "either X or Y", and so on, as well as a standard way of combining annotations intended for different tools. *This* would be a useful discussion. What to do in the interim... I think the current language in PEP 8 is just fine until we have a better story. --Guido On Mon, Dec 3, 2012 at 7:34 AM, Barry Warsaw wrote: > On Dec 03, 2012, at 09:08 PM, Nick Coghlan wrote: > >>I would be *quite delighted* if people are open to the idea of making a >>much stronger recommendation along the following lines explicit in PEP 8: > > I am -1 for putting any of what followed in PEP 8, and in fact, I think the > existing examples at the bottom of PEP 8 are inappropriate. > > PEP 8 should be prescriptive of explicit Python coding styles. Think "do > this, not that". It should be as minimal as possible, and in general provide > rules that can be easily referenced and perhaps automated (e.g. pep8.py). > > Some of the existing text in PEP 8 already doesn't fall under that rubric, but > it's close enough (e.g. designing for inheritance). > > I don't think annotations reach the level of consensus or practical experience > needed to be added to PEP 8. > > OTOH, I wouldn't oppose a new informational PEP labeled "Annotations Best > Practices", where some of these principles can be laid out and explored. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Dec 4 00:02:58 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 4 Dec 2012 09:02:58 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: So long as any type hinting semantics are associated with a "@type_hints" decorator, none of those ideas conflict with my suggestions for good annotation usage practices. The explicit decorators effectively end up serving as dialect specifiers for the annotations, for the benefit of other software (by moving the metadata out to purpose specific attributes) and for readers (simply by being present). Anyway, the reactions here confirmed my recollection of a lack of consensus amongst the core team. I'll just put something up on my own site, instead. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On Dec 4, 2012 3:28 AM, "Guido van Rossum" wrote: > Hm. I agree PEP 8 seems an odd place for Nick's recommendation. Even > if I were to agree with hos proposal I would think it belongs in a > different PEP than PEP 8. > > But personally I haven't given up on using annotations to give type > hints -- I think it can at some times be a useful augmentation to > static analysis (whose use I see mostly as an aid to human readers > and/or tools like linters, IDEs, and refactoring tools, not for > guiding compiler optimizations). I know of several projects (both > public and private) for improving the state of the art of Python > static analysis with this goal in mind. With the advent of e.g. > TypeScript and Dart in the JavaScript world, optional type annotations > for dynamic languages appear to be becoming more fashionable, and > maybe we can get some use out of them. > > FWIW, as far as e.g. 'int' being both overspecified and > underspecified: I don't care about the underspecification so much, > that's always going to happen; and for the overspecification, we can > either use some abstract class instead, or simply state that the > occurrence of certain concrete types must be taken as a shorthand for > a specific abstract type. This could be part of the registration call > of the concrete type, or something. > > Obviously this would require inventing and standardizing notations for > things like "list of X", "tuple with items X, Y, Z", "either X or Y", > and so on, as well as a standard way of combining annotations intended > for different tools. > > *This* would be a useful discussion. What to do in the interim... I > think the current language in PEP 8 is just fine until we have a > better story. > > --Guido > > On Mon, Dec 3, 2012 at 7:34 AM, Barry Warsaw wrote: > > On Dec 03, 2012, at 09:08 PM, Nick Coghlan wrote: > > > >>I would be *quite delighted* if people are open to the idea of making a > >>much stronger recommendation along the following lines explicit in PEP 8: > > > > I am -1 for putting any of what followed in PEP 8, and in fact, I think > the > > existing examples at the bottom of PEP 8 are inappropriate. > > > > PEP 8 should be prescriptive of explicit Python coding styles. Think "do > > this, not that". It should be as minimal as possible, and in general > provide > > rules that can be easily referenced and perhaps automated (e.g. pep8.py). > > > > Some of the existing text in PEP 8 already doesn't fall under that > rubric, but > > it's close enough (e.g. designing for inheritance). > > > > I don't think annotations reach the level of consensus or practical > experience > > needed to be added to PEP 8. > > > > OTOH, I wouldn't oppose a new informational PEP labeled "Annotations Best > > Practices", where some of these principles can be laid out and explored. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aquavitae69 at gmail.com Tue Dec 4 10:37:07 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Tue, 4 Dec 2012 11:37:07 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: Just thought of a couple of usages which don't fit into the decorator model. The first is using the return annotation for early binding: def func(seq) -> dict(sorted=sorted): return func.__annotations__['return']['sorted'](seq) Stangely enough, this seems to run slightly faster than def func(seq, sorted=sorted): return sorted(seq) My test shows the first running in about 0.376s and the second in about 0.382s (python 3.3, 64bit). The second is passing information to base classes. This is a rather contrived example which could easily be solved (better) in plenty of other ways, but it does illustrate a pattern which someone else may be able to turn into a genuine use case. class NumberBase: def adjust(self, value): return self.adjust.__annotations__['return'](value) class NegativeInteger(NumberBase): def adjust(self, value) -> int: return super().adjust(-value) >>> ni = NegativeInteger() >>> ni.adjust(4.3) -4 Cheers David On Tue, Dec 4, 2012 at 1:02 AM, Nick Coghlan wrote: > So long as any type hinting semantics are associated with a "@type_hints" > decorator, none of those ideas conflict with my suggestions for good > annotation usage practices. > > The explicit decorators effectively end up serving as dialect specifiers > for the annotations, for the benefit of other software (by moving the > metadata out to purpose specific attributes) and for readers (simply by > being present). > > Anyway, the reactions here confirmed my recollection of a lack of > consensus amongst the core team. I'll just put something up on my own site, > instead. > > Cheers, > Nick. > > -- > Sent from my phone, thus the relative brevity :) > On Dec 4, 2012 3:28 AM, "Guido van Rossum" wrote: > >> Hm. I agree PEP 8 seems an odd place for Nick's recommendation. Even >> if I were to agree with hos proposal I would think it belongs in a >> different PEP than PEP 8. >> >> But personally I haven't given up on using annotations to give type >> hints -- I think it can at some times be a useful augmentation to >> static analysis (whose use I see mostly as an aid to human readers >> and/or tools like linters, IDEs, and refactoring tools, not for >> guiding compiler optimizations). I know of several projects (both >> public and private) for improving the state of the art of Python >> static analysis with this goal in mind. With the advent of e.g. >> TypeScript and Dart in the JavaScript world, optional type annotations >> for dynamic languages appear to be becoming more fashionable, and >> maybe we can get some use out of them. >> >> FWIW, as far as e.g. 'int' being both overspecified and >> underspecified: I don't care about the underspecification so much, >> that's always going to happen; and for the overspecification, we can >> either use some abstract class instead, or simply state that the >> occurrence of certain concrete types must be taken as a shorthand for >> a specific abstract type. This could be part of the registration call >> of the concrete type, or something. >> >> Obviously this would require inventing and standardizing notations for >> things like "list of X", "tuple with items X, Y, Z", "either X or Y", >> and so on, as well as a standard way of combining annotations intended >> for different tools. >> >> *This* would be a useful discussion. What to do in the interim... I >> think the current language in PEP 8 is just fine until we have a >> better story. >> >> --Guido >> >> On Mon, Dec 3, 2012 at 7:34 AM, Barry Warsaw wrote: >> > On Dec 03, 2012, at 09:08 PM, Nick Coghlan wrote: >> > >> >>I would be *quite delighted* if people are open to the idea of making a >> >>much stronger recommendation along the following lines explicit in PEP >> 8: >> > >> > I am -1 for putting any of what followed in PEP 8, and in fact, I think >> the >> > existing examples at the bottom of PEP 8 are inappropriate. >> > >> > PEP 8 should be prescriptive of explicit Python coding styles. Think >> "do >> > this, not that". It should be as minimal as possible, and in general >> provide >> > rules that can be easily referenced and perhaps automated (e.g. >> pep8.py). >> > >> > Some of the existing text in PEP 8 already doesn't fall under that >> rubric, but >> > it's close enough (e.g. designing for inheritance). >> > >> > I don't think annotations reach the level of consensus or practical >> experience >> > needed to be added to PEP 8. >> > >> > OTOH, I wouldn't oppose a new informational PEP labeled "Annotations >> Best >> > Practices", where some of these principles can be laid out and explored. >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From haael at interia.pl Tue Dec 4 10:58:04 2012 From: haael at interia.pl (haael at interia.pl) Date: Tue, 04 Dec 2012 10:58:04 +0100 Subject: [Python-ideas] New __reference__ hook Message-ID: Hi, guys. Python 3 is very close to become a holy grail of programming languages in the sense that almost everything could be redefined. However, there is still one thing missing: the immutable copy-on-assign numeric types. Consider this part of code: a = 1 b = a a += 1 assert a == b + 1 The object "1" gets assigned to the "a" variable, then another independent copy gets assigned to the "b" variable, then the value in the "a" variable gets modified without affecting the second. The problem is - this behaviour can not be recreated in user-defined classes: a = MyInteger(1) b = a a += 1 assert a == b + 1 The "a" and "b" variables both point to the same object. This is a difference on what one might expect with numeric types. My proposal is to define another hook that gets called when an object is referenced. def MyInteger: def __reference__(self, context): return copy.copy(self) Each time when a reference count of an object would normally get incremented, this method should be called and the returned object will be referenced. The default implementation would be of course to return self. The context argument will give the object some information of the reason how we are being referenced. This will allow easy implementation of such concepts as singletons, copy-on-write, immutables and even simplify things like reference loops. The most obvious use-case would be implementations of some mathematical types like vectors, polynomials and so on. I've encountered this problem when I was writing a simple vector library and I had to explicitly copy each object on assignment, which is particulary annoying. The programmer's intuition for numeric-like types is for them to be immutable, yet they should have augmented asignment operators. This is a thing that can not be implemented transparently in the current Python, so there is my proposal. Cheers, haael. From steve at pearwood.info Tue Dec 4 12:14:44 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 04 Dec 2012 22:14:44 +1100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: References: Message-ID: <50BDDB24.7050106@pearwood.info> On 04/12/12 20:58, haael at interia.pl wrote: > > Hi, guys. > > Python 3 is very close to become a holy grail of programming languages in > the sense that almost everything could be redefined. However, there is >still one thing missing: the immutable copy-on-assign numeric types. > Consider this part of code: I dispute that "everything can be redefined" is the holy grail of programming languages. If it were, why isn't everyone using Forth? > a = 1 > b = a > a += 1 > assert a == b + 1 > > The object "1" gets assigned to the "a" variable, Correct, for some definition of "assigned" and "variable". > then another independent copy gets assigned to the "b" variable, Completely, utterly wrong. >then the value in the "a" variable gets modified Incorrect. > without affecting the second. > The problem is - this behaviour can not be recreated in user-defined > classes: Of course it can. py> from decimal import Decimal # A pure-Python class, prior to Python 3.3 py> a = Decimal(1) py> b = a py> a += 1 py> assert a == b + 1 py> print a, b 2 1 If you prefer another example, use fractions.Fraction, also pure Python and immutable, with support for augmented assignment. I don't mean to be rude, or dismissive, but this is pretty basic Python stuff. Please start with the Python data and execution models: http://docs.python.org/2/reference/datamodel.html http://docs.python.org/2/reference/executionmodel.html although I must admit I don't find either of them especially clear. But in simple terms, you need to reset your thinking: your assumptions about what Python does are incorrect. When you say: a = 1 you are *binding* the name "a" to the object 1. When you then follow by saying: b = a you bind the name "b" to the *same* object 1. It is not a copy. It is not a "copy on assignment" or any other clever trick. You can prove to yourself that they are the same object: py> a = 1 py> b = a py> a is b True py> id(a), id(b) (140087472, 140087472) When you then call a += 1 this does not modify anything. Int objects (and floats, Decimal, strings, and many others) are *immutable* -- they cannot be modified. So `a += 1` creates a new object, 2, and binds it to the name "a". But the binding from object 1 to name "b" is not touched. If this is still not clear, I recommend you take the discussion onto one of the other Python mailing lists, especially tutor at python.org or python-list at python.org, which are more appropriate for discussing these things. -- Steven From jstpierre at mecheye.net Tue Dec 4 17:43:34 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Tue, 4 Dec 2012 11:43:34 -0500 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <50BBD4DD.7010703@pearwood.info> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> Message-ID: Indeed. I've looked at annotations before, but I never understood the purpose. It seemed like a feature that was designed and implemented without some goal in mind, and where the community was supposed to discover the goal themselves. So, if I may ask, what was the original goal of annotations? The PEP gives some suggestions, but doesn't leave anything concrete. Was it designed to be an aid to IDEs, or static analysis tools that inspect source code? Something for applications themselves to munge through to provide special behaviors, like a command line parser, or runtime static checker? The local decorator influence might work, but that has the problem of only being able to be used once before we fall back to the old method. Would you rather: @tab_expand(filename=glob('*.txt')) @types def read_from_filename(filename:str, num_bytes:int) -> bytes: pass or @tab_expand(filename=glob('*.txt')) @types(filename=str, num_bytes=int, return_=bytes) def read_from_filename(filename, num_bytes): pass For consistency's sake, I'd prefer the latter. Note that we could take a convention, like Thomas suggests, and adopt both: @tab_expand @types def read_from_filename(filename:(str, glob('*.txt')), num_bytes:int) -> bytes: pass But that's a "worst of both worlds" approach: we lose the locality of which argument applies to which decorator (unless we make up rules about positioning in the tuple or something), and we gunk up the function signature, all to use a fancy new Python 3 feature. With a restricted and narrow focus, I could see them gaining adoption, but for now, it seems like extra syntax was introduced simply for the point of having extra syntax. On Sun, Dec 2, 2012 at 5:23 PM, Steven D'Aprano wrote: > On 02/12/12 22:43, Nick Coghlan wrote: > > Last time it came up, the collective opinion on python-dev was still to >> leave PEP 8 officially neutral on the topic so that people could >> experiment >> more freely with annotations and the community could help figure out what >> worked well and what didn't. Admittedly this was long enough ago that I >> don't remember the details, just the obvious consequence that PEP 8 >> remains >> largely silent on the matter, aside from declaring that function >> annotations are off-limits for standard library modules: "The Python >> standard library will not use function annotations as that would result in >> a premature commitment to a particular annotation style. Instead, the >> annotations are left for users to discover and experiment with useful >> annotation styles." >> > > I fear that this was a strategic mistake. The result, it seems to me, is > that > annotations have been badly neglected. > > I can't speak for others, but I heavily use the standard library as a guide > to what counts as good practice in Python. I'm not a big user of third > party > libraries, and most of those are for 2.x, so with the lack of annotations > in > the std lib I've had no guidance as to what sort of things annotations > could > be used for apart from "type checking". > > I'm sure that I'm not the only one. > > > > -- > Steven > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Tue Dec 4 17:51:13 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Tue, 4 Dec 2012 16:51:13 +0000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> Message-ID: On 4 December 2012 16:43, Jasper St. Pierre wrote: > The local decorator influence might work, but that has the problem of only > being able to be used once before we fall back to the old method. Would you > rather: > > @tab_expand(filename=glob('*.txt')) > @types > def read_from_filename(filename:str, num_bytes:int) -> bytes: > pass > > or > > @tab_expand(filename=glob('*.txt')) > @types(filename=str, num_bytes=int, return_=bytes) > def read_from_filename(filename, num_bytes): > pass > > For consistency's sake, I'd prefer the latter. > Using the decorator decorator I posted (https://gist.github.com/4189289 ), you could use these interchangeably, so the annotations are just a convenient alternative syntax for when you think they'd make the code more readable. Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Tue Dec 4 18:12:12 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 04 Dec 2012 12:12:12 -0500 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> Message-ID: <50BE2EEC.9000402@nedbatchelder.com> On 12/4/2012 11:43 AM, Jasper St. Pierre wrote: > Indeed. I've looked at annotations before, but I never understood the > purpose. It seemed like a feature that was designed and implemented > without some goal in mind, and where the community was supposed to > discover the goal themselves. > > So, if I may ask, what was the original goal of annotations? The PEP > gives some suggestions, but doesn't leave anything concrete. Was it > designed to be an aid to IDEs, or static analysis tools that inspect > source code? Something for applications themselves to munge through to > provide special behaviors, like a command line parser, or runtime > static checker? A telling moment for me was during an early Py3k keynote at PyCon (perhaps it was in Dallas or Chicago?), Guido couldn't remember the word "annotation," and said, "you know, those things that aren't type declarations?" :-) --Ned. From guido at python.org Tue Dec 4 18:56:01 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 4 Dec 2012 09:56:01 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <50BE2EEC.9000402@nedbatchelder.com> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> <50BE2EEC.9000402@nedbatchelder.com> Message-ID: On Tue, Dec 4, 2012 at 9:12 AM, Ned Batchelder wrote: > On 12/4/2012 11:43 AM, Jasper St. Pierre wrote: >> >> Indeed. I've looked at annotations before, but I never understood the >> purpose. It seemed like a feature that was designed and implemented without >> some goal in mind, and where the community was supposed to discover the goal >> themselves. To the contrary. There were too many use cases that immediately looked important, and we couldn't figure out which ones would be the most important or how to combine them, so we decided to take a two-step approach: in step 1, we designed the syntax, whereas in step 2, we would design the semantics. The idea was very clear that once the syntax was settled people would be free to experiment with different semantics -- just not in the stdlib. The idea was also that eventually, from all those experiments, one would emerge that would be fit for the stdlib. The process was somewhat similar to the way decorators were introduced. In Python 2.3, we introduced things like staticmethod, classmethod and property. But we *didn't* introduce the @ syntax, because we couldn't agree about it at that point. Then, for 2.4, we sorted out the proper syntax, having by then conclusively discovered that the original way of using e.g. classmethod (an assignment after the end of the method definition) was hard on the human reader. (Of course, you may note that for decorators, we decided on semantics first, syntax second. But no two situations are quite the same, and in the case of annotations, without syntax it would be nearly impossible to experiment with semantics.) >> So, if I may ask, what was the original goal of annotations? The PEP gives >> some suggestions, but doesn't leave anything concrete. Was it designed to be >> an aid to IDEs, or static analysis tools that inspect source code? Something >> for applications themselves to munge through to provide special behaviors, >> like a command line parser, or runtime static checker? Pretty much all of the above to some extent. But for me personally, the main goal was always to arrive at a notation to specify type constraints (and maybe other constraints) for arguments and return values. I've toyed at various times with specific ways of combining types. E.g. list[int] might mean a list of integers, and dict[str, tuple[float, float, float, bool]] might mean a dict mapping strings to tuples of three floats and a bool. But I felt it was much harder to get consensus about such a notation than about the syntax for argument annotations (think about how many objections you can bring in to these two examples :-) -- I've always had a strong desire to use "var: type = default" and to make the type a runtime expression to be evaluated at the same time as the default. > A telling moment for me was during an early Py3k keynote at PyCon (perhaps > it was in Dallas or Chicago?), Guido couldn't remember the word > "annotation," and said, "you know, those things that aren't type > declarations?" :-) Heh. :-) -- --Guido van Rossum (python.org/~guido) From masklinn at masklinn.net Tue Dec 4 19:12:18 2012 From: masklinn at masklinn.net (Masklinn) Date: Tue, 4 Dec 2012 19:12:18 +0100 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> On 2012-12-03, at 18:27 , Guido van Rossum wrote: > > Obviously this would require inventing and standardizing notations for > things like "list of X", "tuple with items X, Y, Z", "either X or Y", > and so on, as well as a standard way of combining annotations intended > for different tools. I've always felt that __getitem__ and __or__/__ror__ on type 1. looked rather good and 2. looked similar to informal type specs and type specs of other languages. Although that's the issue with annotations being Python syntax: it requires changing stuff fairly deep into Python to be able to experiment. The most bothersome part is that I "feel" "either X or Y" (aka `X | Y`) should be a set of type (and thus the same as {X, Y}[0]) but that doesn't work with `isinstance` or `issubclass`. Likewise, `(a, b, c)` in an annotation feels like it should mean the same as `tuple[a, b, c]` ("a tuple with 3 items of types resp. a, b and c") but that's at odds with the same type-checking functions. The first could be fixable by relaxing slightly the constraints of isinstance and issubclass, but not so for the second. [0] which works rather neatly for anonymous unions as `|` is the union of two sets, so the arithmetic would be `type | type -> typeset`, `type | typeset -> typeset` and `typeset | typeset -> typeset`, libraries could offer opaque types/typesets which would be composable without their users having to know whether they're type atoms or typesets From ericsnowcurrently at gmail.com Tue Dec 4 19:22:46 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 4 Dec 2012 11:22:46 -0700 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <50BBD4DD.7010703@pearwood.info> Message-ID: Check out http://www.artima.com/weblogs/viewpost.jsp?thread=89161 -eric On Tue, Dec 4, 2012 at 9:43 AM, Jasper St. Pierre wrote: > Indeed. I've looked at annotations before, but I never understood the > purpose. It seemed like a feature that was designed and implemented without > some goal in mind, and where the community was supposed to discover the goal > themselves. > > So, if I may ask, what was the original goal of annotations? The PEP gives > some suggestions, but doesn't leave anything concrete. Was it designed to be > an aid to IDEs, or static analysis tools that inspect source code? Something > for applications themselves to munge through to provide special behaviors, > like a command line parser, or runtime static checker? > > The local decorator influence might work, but that has the problem of only > being able to be used once before we fall back to the old method. Would you > rather: > > @tab_expand(filename=glob('*.txt')) > @types > def read_from_filename(filename:str, num_bytes:int) -> bytes: > pass > > or > > @tab_expand(filename=glob('*.txt')) > @types(filename=str, num_bytes=int, return_=bytes) > def read_from_filename(filename, num_bytes): > pass > > For consistency's sake, I'd prefer the latter. > > Note that we could take a convention, like Thomas suggests, and adopt both: > > @tab_expand > @types > def read_from_filename(filename:(str, glob('*.txt')), num_bytes:int) -> > bytes: > pass > > But that's a "worst of both worlds" approach: we lose the locality of which > argument applies to which decorator (unless we make up rules about > positioning in the tuple or something), and we gunk up the function > signature, all to use a fancy new Python 3 feature. > > With a restricted and narrow focus, I could see them gaining adoption, but > for now, it seems like extra syntax was introduced simply for the point of > having extra syntax. > > > > On Sun, Dec 2, 2012 at 5:23 PM, Steven D'Aprano wrote: >> >> On 02/12/12 22:43, Nick Coghlan wrote: >> >>> Last time it came up, the collective opinion on python-dev was still to >>> leave PEP 8 officially neutral on the topic so that people could >>> experiment >>> more freely with annotations and the community could help figure out what >>> worked well and what didn't. Admittedly this was long enough ago that I >>> don't remember the details, just the obvious consequence that PEP 8 >>> remains >>> largely silent on the matter, aside from declaring that function >>> annotations are off-limits for standard library modules: "The Python >>> standard library will not use function annotations as that would result >>> in >>> a premature commitment to a particular annotation style. Instead, the >>> annotations are left for users to discover and experiment with useful >>> annotation styles." >> >> >> I fear that this was a strategic mistake. The result, it seems to me, is >> that >> annotations have been badly neglected. >> >> I can't speak for others, but I heavily use the standard library as a >> guide >> to what counts as good practice in Python. I'm not a big user of third >> party >> libraries, and most of those are for 2.x, so with the lack of annotations >> in >> the std lib I've had no guidance as to what sort of things annotations >> could >> be used for apart from "type checking". >> >> I'm sure that I'm not the only one. >> >> >> >> -- >> Steven >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Jasper > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From barry at python.org Tue Dec 4 20:39:50 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 4 Dec 2012 14:39:50 -0500 Subject: [Python-ideas] New __reference__ hook References: <50BDDB24.7050106@pearwood.info> Message-ID: <20121204143950.02880d94@resist.wooz.org> On Dec 04, 2012, at 10:14 PM, Steven D'Aprano wrote: >I dispute that "everything can be redefined" is the holy grail of >programming languages. If it were, why isn't everyone using Forth? On the readability scale, where Python is pretty close to a 10 (almost everyone can read almost all Python), Perl is a 4 (hard to read your own code after a week or so), Forth is a 1 (you can't even read your own code after your fingers stop moving). :) a-forth-enthusiast-from-way-back-in-the-day-ly y'rs, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From mikegraham at gmail.com Wed Dec 5 15:10:14 2012 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 5 Dec 2012 09:10:14 -0500 Subject: [Python-ideas] New __reference__ hook In-Reply-To: References: Message-ID: On Tue, Dec 4, 2012 at 4:58 AM, wrote: > > Python 3 is very close to become a holy grail of programming languages in > the sense that almost everything could be redefined. However, there is > still one thing missing: the immutable copy-on-assign numeric types. > Consider this part of code: > > a = 1 > b = a > a += 1 > assert a == b + 1 > > The object "1" gets assigned to the "a" variable, then another independent > copy gets assigned to the "b" variable, then the value in the "a" variable > gets modified without affecting the second. > The problem is - this behaviour can not be recreated in user-defined > classes: > > a = MyInteger(1) > b = a > a += 1 > assert a == b + 1 > > The "a" and "b" variables both point to the same object. This is a > difference on what one might expect with numeric types. > You misunderstand Python's semantics. Python never implicitly copies anything. Some types, like int, are immutable so you can't distinguish meaningfully between copying and not. All names like `a` can be rebound `a = ....`, and this never mutates the object. Some objects can be mutated, which is done by some means other than rebinding a name. I don't know what problem you had defining MyInteger. Here is a definition (albeit comprised of very, very sloppy code) that passes your test class MyInteger(object): def __init__(self, i): self._i = i def __add__(self, other): if isinstance(other, MyInteger): other = other._i return MyInteger(self._i + other) def __eq__(self, other): return self._i == other._i Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Wed Dec 5 17:06:28 2012 From: random832 at fastmail.us (random832 at fastmail.us) Date: Wed, 05 Dec 2012 11:06:28 -0500 Subject: [Python-ideas] New __reference__ hook In-Reply-To: References: Message-ID: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> On Wed, Dec 5, 2012, at 9:10, Mike Graham wrote: > I don't know what problem you had defining MyInteger. Here is a definition (albeit comprised of very, very sloppy code) that passes your test Most likely he thought he had to define __iadd__ for += to work. From jstpierre at mecheye.net Wed Dec 5 18:05:32 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Wed, 5 Dec 2012 12:05:32 -0500 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> Message-ID: And? What's wrong with an __iadd__ that's exactly the same as Mike's __add__? On Wed, Dec 5, 2012 at 11:06 AM, wrote: > On Wed, Dec 5, 2012, at 9:10, Mike Graham wrote: > > I don't know what problem you had defining MyInteger. Here is a > definition (albeit comprised of very, very sloppy code) that passes your > test > > Most likely he thought he had to define __iadd__ for += to work. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Dec 5 19:09:47 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 05 Dec 2012 19:09:47 +0100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> Message-ID: <50BF8DEB.9040101@molden.no> On 05.12.2012 18:05, Jasper St. Pierre wrote: > And? What's wrong with an __iadd__ that's exactly the same as Mike's > __add__? I think it was a Java-confusion. He thought numbers were copied on assignment. But there is no difference between value types and object types in Python. Ints and floats are immutable, but they are not value types as in Java. But apart from that, I think allowing overloading of the binding operator "=" might be a good idea. A special method __bind__ could return the object to be bound: a = b should then bind the name "a" to the return value of b.__bind__() if b implements __bind__. Sure, it could be used to implement copy on assignment. But it would also do other things like allowing lazy evaluation of an expression. NumPy code like z = a*x + b*y + c could avoid creating three temporary arrays if there was a __bind__ function called on "=". This is a big thing, cf. the difference between NumPy and numexpr: z = numexpr.evaluate("""a*x + b*y + c""") The reason numerical expressions must be written as strings to be efficient in Python is because there is no __bind__ function. Sturla From guido at python.org Wed Dec 5 19:17:53 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Dec 2012 10:17:53 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: On Tue, Dec 4, 2012 at 1:37 AM, David Townshend wrote: > Just thought of a couple of usages which don't fit into the decorator model. > The first is using the return annotation for early binding: > > def func(seq) -> dict(sorted=sorted): > return func.__annotations__['return']['sorted'](seq) You've got to be kidding... > Stangely enough, this seems to run slightly faster than > > def func(seq, sorted=sorted): > return sorted(seq) > > My test shows the first running in about 0.376s and the second in about > 0.382s (python 3.3, 64bit). Surely that's some kind of random variation. It's only a 2% difference. > The second is passing information to base classes. This is a rather > contrived example which could easily be solved (better) in plenty of other > ways, but it does illustrate a pattern which someone else may be able to > turn into a genuine use case. > > class NumberBase: > > def adjust(self, value): > return self.adjust.__annotations__['return'](value) > > > class NegativeInteger(NumberBase): > > def adjust(self, value) -> int: > return super().adjust(-value) > > >>>> ni = NegativeInteger() >>>> ni.adjust(4.3) > -4 This looks like a contrived way to use what is semantically equivalent to function attributes. The base class could write def adjust(self, value): return self.adjust.adjuster(value) and the subclass could write def adjust(self, value): return super().adjust(-value) adjust.adjuster = int Or invent a decorator to set the attribute: @set(adjuster=int) def adjust(self, value): return super().adjust(-value) But both of these feel quite awkward compared to just using a class attribute. class NumberBase: def adjust(self, value): return self.adjuster(value) class NegativeInteger(NumberBase): adjuster = int # No need to override adjust() IOW, this is not a line of thought to pursue. -- --Guido van Rossum (python.org/~guido) From masklinn at masklinn.net Wed Dec 5 19:51:12 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 5 Dec 2012 19:51:12 +0100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <50BF8DEB.9040101@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> Message-ID: <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> On 2012-12-05, at 19:09 , Sturla Molden wrote: > On 05.12.2012 18:05, Jasper St. Pierre wrote: >> And? What's wrong with an __iadd__ that's exactly the same as Mike's >> __add__? > > I think it was a Java-confusion. He thought numbers were copied on assignment. But there is no difference between value types and object types in Python. Ints and floats are immutable, but they are not value types as in Java. > > But apart from that, I think allowing overloading of the binding operator "=" might be a good idea. A special method __bind__ could return the object to be bound: > > a = b > > should then bind the name "a" to the return value of > > b.__bind__() > > if b implements __bind__. Sounds odd and full of strange edge-cases. Would bind also get called when providing parameters to a function call? When putting an object in a literal of some sort? When returning an object from a function/method? If not, why not? > Sure, it could be used to implement copy on assignment. But it would also do other things like allowing lazy evaluation of an expression. > > NumPy code like > > z = a*x + b*y + c > > could avoid creating three temporary arrays if there was a __bind__ function called on "=". Why? z could just be a "lazy value" at this point, basically a manual building of thunks, only reifying them when necessary (whenever that is). It's not like numpy *has* to create three temporary arrays, just that it *does*. From bruce at leapyear.org Wed Dec 5 19:54:22 2012 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 5 Dec 2012 10:54:22 -0800 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <50BF8DEB.9040101@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> Message-ID: On Wed, Dec 5, 2012 at 10:09 AM, Sturla Molden wrote: > > But apart from that, I think allowing overloading of the binding operator > "=" might be a good idea. A special method __bind__ could return the object > to be bound: > > a = b > > should then bind the name "a" to the return value of > > b.__bind__() > > if b implements __bind__. > It' seems a bit more complicated than that. Take the example below. When is __bind__ going to be called? After a is multiplied by x, b is multipled by y, etc. or before? If after, that doesn't accomplish lazy evaluation as below. If before, then somehow this has to convert to a form that calls z.__bind__(something) and what is that something? > > Sure, it could be used to implement copy on assignment. But it would also > do other things like allowing lazy evaluation of an expression. > > NumPy code like > > z = a*x + b*y + c > > could avoid creating three temporary arrays if there was a __bind__ > function called on "=". This is a big thing, cf. the difference between > NumPy and numexpr: > > z = numexpr.evaluate("""a*x + b*y + c""") > > The reason numerical expressions must be written as strings to be > efficient in Python is because there is no __bind__ function. > There is another way to write expressions that don't get evaluated: lambda: a*x + b*y + c So you could write this as z.bind(lambda: rhs) or if this is important enough there could be a new bind operator: lhs @= rhs which is equivalent to lhs.__bind__(lambda: rhs) I think overriding = so sometimes it does regular binding and sometimes this magic binding would be confusing and dangerous. It means that every assignment operates differently if the lhs is already bound. Consider the difference between t = a + b typo = a+b t @= a+b typo @= a+b where typo was supposed to be t but was mistyped. In the first set, line 1 does __bind__ to a+b while line just adds a and b and does a normal binding. In the second set, the first does __bind__ while the second raises an exception that typo is not bound. It's even worse in the context of something like this: d = {} for i in range(2): d['x'] = a + i in the first pass through the loop this is a regular assignment. In the second pass it may call __bind__ depending on what the value of a + 0 is. Ick. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 5 20:22:33 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Dec 2012 11:22:33 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Tue, Dec 4, 2012 at 10:12 AM, Masklinn wrote: > On 2012-12-03, at 18:27 , Guido van Rossum wrote: >> >> Obviously this would require inventing and standardizing notations for >> things like "list of X", "tuple with items X, Y, Z", "either X or Y", >> and so on, as well as a standard way of combining annotations intended >> for different tools. > > I've always felt that __getitem__ and __or__/__ror__ on type 1. looked > rather good and 2. looked similar to informal type specs and type specs > of other languages. Although that's the issue with annotations being > Python syntax: it requires changing stuff fairly deep into Python to > be able to experiment. So, instead of using def foo(a: int, b: str) -> float: you use from experimental_type_annotations import Int, Str, Float def foo(a: Int, b: Str) -> Float: And now we're ready for experimentation. [Warning: none of this is particularly new; I've had these things in my brain for years, as the referenced Artima blog post made clear.] > The most bothersome part is that I "feel" "either X or Y" (aka `X | Y`) > should be a set of type (and thus the same as {X, Y}[0]) but that doesn't > work with `isinstance` or `issubclass`. Likewise, `(a, b, c)` in an > annotation feels like it should mean the same as `tuple[a, b, c]` ("a > tuple with 3 items of types resp. a, b and c") but that's at odds with > the same type-checking functions. Note that in Python 3 you can override isinstance, by defining __instancecheck__ in the class: http://docs.python.org/3/reference/datamodel.html?highlight=__instancecheck__#class.__instancecheck__ So it shouldn't be a problem to make isinstance(42, Int) work. We can also make things like List[Int] and Dict[Str, Float] work, and even rig it so that isinstance([1, 2, 3], List[Int]) == True while isinstance([1, 2, 'booh'], List[Int]) == False Of course there are many bikeshedding topics like whether we should ever write List -- maybe we should write Iterable or Sequence instead, and maybe we have to be able to express mutability, and so on. The numeric tower (PEP 3141) is also good to keep in mind. I think that's all solvable once we start experimenting a bit. Some important issues to bikeshed over: - Tuples. Sometimes you want to say e.g. "a tuple of integers, don't mind the length"; other times you want to say e.g. "a tuple of fixed length containing an int and two strs". Perhaps the former should be expressed using ImmutableSequence[Int] and the second as Tuple[Int, Str, Str]. - Unions. We need a way to say "either X or Y". Given that we're defining our own objects we may actually be able to get away with writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would still work. It would also be useful to have a shorthand for "either T or None", written as Optional[T] or Optional(T). - Whether to design notations to express other constraints. E.g. "integer in range(10, 100)", or "one of the strings 'r', 'w' or 'a'", etc. You can go crazy on this. - Composability (Nick's pet peeve, in that he is against it). I propose that we reserve plain tuples for this. If an annotation has the form "x: (P, Q)" then that ought to mean that x must conform to both P and Q. Even though Nick doesn't like this, I don't think we should do everything with decorators. Surly, the decorators approach is good for certain use cases, and should take precedence if it is used. But e.g. IDEs that use annotations for suggestions and refactoring should not require everything to be decorated -- that would just make the code too busy. - Runtime enforcement. What should we use type annotations for? IDEs, static checkers (linters) and refactoring tools only need the annotations when they are parsing the code. While it is tempting to invent some kind of runtime checking that automatically checks the actual types against the annotations whenever a function is called, I think this is rarely useful, and often prohibitively slow. So I'd say don't focus on this. Instead, explicit type assertions like "assert isinstance(x, List[Int])" might be used, sparingly, for those cases where we'd otherwise write a manual assertion with the same meaning (which is also sparingly!). A decorator to do this might be useful (especially if there's a separate mechanism for turning actual checking on or off through some configuration mechanism). > The first could be fixable by relaxing slightly the constraints of > isinstance and issubclass, but not so for the second. > > [0] which works rather neatly for anonymous unions as `|` is the union > of two sets, so the arithmetic would be `type | type -> typeset`, > `type | typeset -> typeset` and `typeset | typeset -> typeset`, > libraries could offer opaque types/typesets which would be composable > without their users having to know whether they're type atoms or > typesets I like this for declaring union types. I don't like it for composing constraints that are intended for different tools. -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Wed Dec 5 20:42:41 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 5 Dec 2012 12:42:41 -0700 Subject: [Python-ideas] A bind protocol (was Re: New __reference__ hook) Message-ID: On Wed, Dec 5, 2012 at 11:09 AM, Sturla Molden wrote: > But apart from that, I think allowing overloading of the binding operator > "=" might be a good idea. A special method __bind__ could return the object > to be bound: > > a = b > > should then bind the name "a" to the return value of > > b.__bind__() > > if b implements __bind__. Keep in mind that descriptors already give you that for classes. There are other workarounds if you *really* have to have this functionality. You're right that globals (module body namespace) and locals (function body namespace) do not have that capability[1]. The main case I've heard for a generic "bind" protocol is for DRY. For instance, you create a new object with some name as an argument and then bind that object to that name in the current running namespace. This has been brought up before[2], with the canonical example of namedtuple (along with arguments on why it's not a big deal[3]). I'd expect such an API to look something like this: object.__bind__(name, namespace) object.__unbind__(name, namespace, replacement=None) namespace is the mapping for the locals/object (a.k.a. vars()) where the name is going to be bound. When an object is already bound to a name, __unbind__() would be called first on the current object. In that case, replacement would be the object that is replacing the currently bound one. At a high level the whole binding operation would look something like this: def bind(ns, name, obj): if name in ns: ns[name].__unbind__(name, ns, obj) obj.__bind__(name, ns) ns[name] = obj # or whatever If you wanted to get fancy, both methods could return a boolean indicating that the name should *not* be bound/unbound (respectively): def bind(ns, name, obj): if name in ns: if not ns[name].__unbind__(name, ns, obj): return if obj.__bind__(name, ns): ns[name] = obj # or whatever The bind protocol could also be used in the fallback behavior of augmented assignment operations. Ultimately, considering how often things are bound/unbound, I'd worry that it would be too expensive for any bind API to see the light of day. -eric [1] You *can* use your own module class to get it for "globals", sort of. This wouldn't quite work for the globals associated with functions defined in the module. [2] http://mail.python.org/pipermail/python-ideas/2011-March/009233.html, and others. [3] http://mail.python.org/pipermail/python-ideas/2011-March/009277.html From benhoyt at gmail.com Wed Dec 5 20:52:08 2012 From: benhoyt at gmail.com (Ben Hoyt) Date: Thu, 6 Dec 2012 08:52:08 +1300 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: > - Tuples. Sometimes you want to say e.g. "a tuple of integers, don't > mind the length"; other times you want to say e.g. "a tuple of fixed > length containing an int and two strs". Perhaps the former should be > expressed using ImmutableSequence[Int] and the second as Tuple[Int, > Str, Str]. Nice, that seems very explicit. ImmutableSequence is long, but clear. In this specific case, should it be just Sequence, and a mutable one would be MutableSequence (to be consistent with collections.abc names?). > - Unions. We need a way to say "either X or Y". Given that we're > defining our own objects we may actually be able to get away with > writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would > still work. It would also be useful to have a shorthand for "either T > or None", written as Optional[T] or Optional(T). Definitely useful to have a notation for "either T or None", as it's a pretty heavily-used pattern. But what about using the same approach, something like "T | None" or "T | NoneType". Though if you use the real None rather than experimental_type_annotations.None, is that confusing? In any case, it seems unnecessary to have a special Optional(T) notation when you've already got the simple "T1 | T2" notation. > - Whether to design notations to express other constraints. E.g. > "integer in range(10, 100)", or "one of the strings 'r', 'w' or 'a'", > etc. You can go crazy on this. Yes, I think this is dangerous territory -- it could get crazy very fast. Statically typed languages don't have this. Then again, I guess type annotations have the potential to be *more* powerful in this regard. Still, it'd have to be an awfully nice and general notation for it to be useful. Even then, your "def" line complete with type/constraint annotations may get far too long to be readable... -Ben From bruce at leapyear.org Wed Dec 5 21:01:47 2012 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 5 Dec 2012 12:01:47 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Wed, Dec 5, 2012 at 11:22 AM, Guido van Rossum wrote: > - Unions. We need a way to say "either X or Y". Given that we're > defining our own objects we may actually be able to get away with > writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would > still work. It would also be useful to have a shorthand for "either T > or None", written as Optional[T] or Optional(T). > Optional is not the same as "or None" to me: Dict(a=Int, b=Int | None, c=Optional(Int)) suggests that b is required but might be None while c is not required, i.e., {'a': 3, b: None} is allowed while {'a': 3, c: None} is not. Ditto for Tuples: Tuple[Int, Str | None, Optional(Int)] where (3, None) matches as does (3, 'a', 4) but not (3, None, None). Optionals might be restricted to the end as matching in the middle would be complicated and possibly error-prone: Tuple[Int, Optional(Int | None), Int | Str, Int | None] --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Dec 5 21:09:49 2012 From: sturla at molden.no (Sturla Molden) Date: Wed, 5 Dec 2012 21:09:49 +0100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> Message-ID: <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Den 5. des. 2012 kl. 19:51 skrev Masklinn : > > Why? z could just be a "lazy value" at this point, basically a manual > building of thunks, only reifying them when necessary (whenever that > is). It's not like numpy *has* to create three temporary arrays, just > that it *does*. > It has to, because it does not know when to flush an expression. This strangely enough, accounts for most of the speed difference between Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an array expression as a single loop. NumPy cannot, because the binary operators does not tell when an expression is "finalized". That is why the numexpr JIT compiler evaluates Python expressions as strings, and needs to include a parser and whatnot. Today, most numerical code is memory bound, not compute bound, as CPUs are immensely faster than RAM. So what keeps numerical/scientific code written in Python slower than C or Fortran today is mostly creation of temporary array objects ? i.e. memory access ?, not the computations per se. If we could get rid of temprary arrays, Python codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that would mean we don't need to write any more Fortran or C. But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly. Sturla From masklinn at masklinn.net Wed Dec 5 21:34:43 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 5 Dec 2012 21:34:43 +0100 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: <4F969DC0-2B67-4C35-B0E7-EEEAD992E840@masklinn.net> On 2012-12-05, at 20:22 , Guido van Rossum wrote: > >> The most bothersome part is that I "feel" "either X or Y" (aka `X | Y`) >> should be a set of type (and thus the same as {X, Y}[0]) but that doesn't >> work with `isinstance` or `issubclass`. Likewise, `(a, b, c)` in an >> annotation feels like it should mean the same as `tuple[a, b, c]` ("a >> tuple with 3 items of types resp. a, b and c") but that's at odds with >> the same type-checking functions. > > Note that in Python 3 you can override isinstance, by defining > __instancecheck__ in the class: > http://docs.python.org/3/reference/datamodel.html?highlight=__instancecheck__#class.__instancecheck__ > > So it shouldn't be a problem to make isinstance(42, Int) work. My problem there was more about having e.g. Int | Float return a set, but isinstance not working with a set. But indeed it could return a TypeSet which would implement __instancecheck__. > - Tuples. Sometimes you want to say e.g. "a tuple of integers, don't > mind the length"; other times you want to say e.g. "a tuple of fixed > length containing an int and two strs". Perhaps the former should be > expressed using ImmutableSequence[Int] and the second as Tuple[Int, > Str, Str]. > - Unions. We need a way to say "either X or Y". Given that we're > defining our own objects we may actually be able to get away with > writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would > still work. It would also be useful to have a shorthand for "either T > or None", written as Optional[T] or Optional(T). Well if `|` is the "union operator", as Ben notes `T | None` works well, is clear and is sufficient. Though that's if and only if "Optional[T]" is equivalent to "T or None" which Bruce seems to disagree with. There's some history with this pattern: http://journal.stuffwithstuff.com/2010/08/23/void-null-maybe-and-nothing/ (bottom section, from "Or Some Other Solution") > - Whether to design notations to express other constraints. E.g. > "integer in range(10, 100)", or "one of the strings 'r', 'w' or 'a'", > etc. You can go crazy on this. Yes this is going in Oleg territory, a sound core is probably a good starting idea. Although basic enumerations ("one of the strings 'r', 'w' or 'a'") could be rather neat. > - Composability (Nick's pet peeve, in that he is against it). I > propose that we reserve plain tuples for this. If an annotation has > the form "x: (P, Q)" then that ought to mean that x must conform to > both P and Q. Even though Nick doesn't like this, I don't think we > should do everything with decorators. Surly, the decorators approach > is good for certain use cases, and should take precedence if it is > used. But e.g. IDEs that use annotations for suggestions and > refactoring should not require everything to be decorated -- that > would just make the code too busy. > > - Runtime enforcement. What should we use type annotations for? IDEs, > static checkers (linters) and refactoring tools only need the > annotations when they are parsing the code. For IDEs, that's pretty much all the time though, either they're parsing the code or they're trying to perform static analysis on it, which uses the annotations. > While it is tempting to > invent some kind of runtime checking that automatically checks the > actual types against the annotations whenever a function is called, I > think this is rarely useful, and often prohibitively slow. Could be useful for debug or testing runs though, in the same way event-based profilers are prohibitively slow and can't be enabled all the time but are still useful. Plus it might be possible to enable/disable this mechanism with little to no source modification via sys.setprofile (I'm not sure what hooks it provides exactly and the documentation is rather sparse, so I'm not sure if the function object itself is available to the setprofile callback, looking at Lib/profiler.py it might only get the code object). From ericsnowcurrently at gmail.com Wed Dec 5 21:40:44 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 5 Dec 2012 13:40:44 -0700 Subject: [Python-ideas] A bind protocol (was Re: New __reference__ hook) In-Reply-To: References: Message-ID: (from the "Re: New __reference__ hook" thread) On Wed, Dec 5, 2012 at 11:54 AM, Bruce Leban wrote: > There is another way to write expressions that don't get evaluated: > > lambda: a*x + b*y + c > > > So you could write this as z.bind(lambda: rhs) or if this is important > enough there could be a new bind operator: > > lhs @= rhs > > > which is equivalent to > > lhs.__bind__(lambda: rhs) The lazy/lambda part aside, such an operator would somewhat help with performance concerns and allow the "binder" to control when the "bindee" gets notified. -eric From masklinn at masklinn.net Wed Dec 5 21:45:51 2012 From: masklinn at masklinn.net (Masklinn) Date: Wed, 5 Dec 2012 21:45:51 +0100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Message-ID: On 2012-12-05, at 21:09 , Sturla Molden wrote: > > Den 5. des. 2012 kl. 19:51 skrev Masklinn : > >> >> Why? z could just be a "lazy value" at this point, basically a manual >> building of thunks, only reifying them when necessary (whenever that >> is). It's not like numpy *has* to create three temporary arrays, just >> that it *does*. >> > > It has to, because it does not know when to flush an expression. That tends to be the hard thing to decide, but it should be possible to find out most cases e.g. evaluate the thunks when elements are requested (similar to generators, but do the whole thunk at once), when printing, etc? Or use the numexpr approach and perform the reification explicitly. > But perhaps it is possible to do this with AST magic? I don't know. I'm not sure there's even a need for AST magic (although you could also play with that by writing operations within lambdas I guess, I've never done much AST analysis/rewriting), it could simply use an approach similar to SQLAlchemy's's ClauseElement: when applying an operation to e.g. an array, rather than perform it just return a representation of the operation itself (effectively rebuild some sort of AST), new operations on *that* would simply build the tree further (composing the thunk), and an explicit evaluation call or implicit evaluation due to e.g. accessing stuff would compile the "potential" operation and perform the actual computations. > A binary operator x+y would just return a symbolic representation of the > expression, but when the full expression needs to be flushed we can e.g. > ask OpenCL or LLVM to generate the code on the fly. Indeed. From jsbueno at python.org.br Wed Dec 5 21:48:06 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Wed, 5 Dec 2012 18:48:06 -0200 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Message-ID: On 5 December 2012 18:09, Sturla Molden wrote: > > Den 5. des. 2012 kl. 19:51 skrev Masklinn : > >> >> Why? z could just be a "lazy value" at this point, basically a manual >> building of thunks, only reifying them when necessary (whenever that >> is). It's not like numpy *has* to create three temporary arrays, just >> that it *does*. >> > > It has to, because it does not know when to flush an expression. This strangely enough, accounts for most of the speed difference between Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an array expression as a single loop. NumPy cannot, because the binary operators does not tell when an expression is "finalized". That is why the numexpr JIT compiler evaluates Python expressions as strings, and needs to include a parser and whatnot. Today, most numerical code is memory bound, not compute bound, as CPUs are immensely faster than RAM. So what keeps numerical/scientific code written in Python slower than C or Fortran today is mostly creation of temporary array objects ? i.e. memory access ?, not the computations per se. If we could get rid of temprary arrays, Python codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that would mean we don't need to write any more Fortran or C. > > But perhaps it is possible to do this with AST magic? I don't know. Nor do I know if __bind__ is the best way to do this. Perhaps not. But I do know that automatically detecting when to "flush a compund expression with (NumPy?) arrays" would be the holy grail for scientific computing with Python. A binary operator x+y would just return a symbolic representation of the expression, but when the full expression needs to be flushed we can e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn numerical computing into something similar to dynamic HTML. And we know how good Python is at generating structured text on the fly. Today that can be achieved by crafting a class that overrides all ops to perform literal transforms and with a "flush" or "calculate" method. Sympy does something like that, and it would not be hard to have a numpy module to perform like that with numpy arrays. In this particular use case, we'd have the full benefit of "explicit is better than implicit". js -><- > > Sturla > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ubershmekel at gmail.com Wed Dec 5 21:50:41 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 5 Dec 2012 15:50:41 -0500 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Message-ID: On Wed, Dec 5, 2012 at 3:09 PM, Sturla Molden wrote: > > But perhaps it is possible to do this with AST magic? I don't know. Nor do > I know if __bind__ is the best way to do this. Perhaps not. But I do know > that automatically detecting when to "flush a compund expression with > (NumPy?) arrays" would be the holy grail for scientific computing with > Python. A binary operator x+y would just return a symbolic representation > of the expression, but when the full expression needs to be flushed we can > e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn > numerical computing into something similar to dynamic HTML. And we know how > good Python is at generating structured text on the fly. > > Sturla > > Not all pixel fiddling can be solved using array calculus, so there will always be C involved at some point. Still this could be a great advancement. Though I don't think bind-time is the right time to evaluate anything as it would drive optimizing programmers to "one-line" things. Using intermediate variable names to explain an algorithm is crucial for readability in my experience. Creating intermediate objects only to be evaluated when programmer explicitly demands is the way to go. E.g. "evalf" in sympy http://scipy-lectures.github.com/advanced/sympy.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Wed Dec 5 21:59:23 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 5 Dec 2012 21:59:23 +0100 Subject: [Python-ideas] New __reference__ hook In-Reply-To: References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Message-ID: On 5 December 2012 18:09, Sturla Molden wrote: > > > > Den 5. des. 2012 kl. 19:51 skrev Masklinn : > > > >> > >> Why? z could just be a "lazy value" at this point, basically a manual > >> building of thunks, only reifying them when necessary (whenever that > >> is). It's not like numpy *has* to create three temporary arrays, just > >> that it *does*. > >> > > > > It has to, because it does not know when to flush an expression. This > strangely enough, accounts for most of the speed difference between > Python/NumPy and e.g. Fortran 95. A Fortran 95 compiler can compile an > array expression as a single loop. NumPy cannot, because the binary > operators does not tell when an expression is "finalized". That is why the > numexpr JIT compiler evaluates Python expressions as strings, and needs to > include a parser and whatnot. Today, most numerical code is memory bound, > not compute bound, as CPUs are immensely faster than RAM. So what keeps > numerical/scientific code written in Python slower than C or Fortran today > is mostly creation of temporary array objects ? i.e. memory access ?, not > the computations per se. If we could get rid of temprary arrays, Python > codes could possibly achieve 80 % of Fortran 95 speed. For scientistis that > would mean we don't need to write any more Fortran or C. > > > > But perhaps it is possible to do this with AST magic? I don't know. Nor > do I know if __bind__ is the best way to do this. Perhaps not. But I do > know that automatically detecting when to "flush a compund expression with > (NumPy?) arrays" would be the holy grail for scientific computing with > Python. A binary operator x+y would just return a symbolic representation > of the expression, but when the full expression needs to be flushed we can > e.g. ask OpenCL or LLVM to generate the code on the fly. It would turn > numerical computing into something similar to dynamic HTML. And we know how > good Python is at generating structured text on the fly. > > FYI, the numpy module shipped with PyPy does exactly this: the operations are recorded in some AST structure, which is evaluated only when the first item of the array is read. This is completely transparent to the user, or to other parts of the interpreter. PyPy uses JIT techniques to generate machine code specialized for the particular AST, and is typically 2x to 5x faster than Numpy, probably because a lot of allocations/copies are avoided. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas at kluyver.me.uk Wed Dec 5 22:33:07 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Wed, 5 Dec 2012 21:33:07 +0000 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> <63C70C75-1B39-4565-9AF0-1199DA6370C8@masklinn.net> <5539A564-7FD9-41E5-9BA5-14BB829A9CE7@molden.no> Message-ID: On 5 December 2012 20:09, Sturla Molden wrote: > But perhaps it is possible to do this with AST magic? As far as I understand it, numba [1] does this kind of AST magic (among other things). https://github.com/numba/numba Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Dec 5 22:59:15 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 05 Dec 2012 16:59:15 -0500 Subject: [Python-ideas] New __reference__ hook In-Reply-To: <50BF8DEB.9040101@molden.no> References: <1354723588.24521.140661162241293.22C960DA@webmail.messagingengine.com> <50BF8DEB.9040101@molden.no> Message-ID: On 12/5/2012 1:09 PM, Sturla Molden wrote: > On 05.12.2012 18:05, Jasper St. Pierre wrote: >> And? What's wrong with an __iadd__ that's exactly the same as Mike's >> __add__? > > I think it was a Java-confusion. He thought numbers were copied on > assignment. But there is no difference between value types and object > types in Python. Ints and floats are immutable, but they are not value > types as in Java. > > But apart from that, I think allowing overloading of the binding > operator "=" might be a good idea. An assignment statement mutates the current local namespace. The 'current local namespace' is a hidden input to the function performed by all statements, but it need not be a python object. The key symbol '=' is not an operator and 'a = b' is not an expression. > A special method __bind__ could > return the object to be bound: > > a = b > > should then bind the name "a" to the return value of > > b.__bind__() If one wants to perform 'a = f(b)' or 'a = b.meth()' instead of 'a = b', then one should just explicitly say so. > if b implements __bind__. > > Sure, it could be used to implement copy on assignment. But it would > also do other things like allowing lazy evaluation of an expression. > > NumPy code like > > z = a*x + b*y + c > > could avoid creating three temporary arrays if there was a __bind__ > function called on "=". No, z = (a*x + b*y * c).__bind__, which is how you defined .__bind__ working, still requires that the expression be evaluated to an object. The definition of Python requires computation of a*x, b*y, (a*x + b*y), and finally (a*x + b*y) + c in that order. Either '*' or '+' may have side-effects. > This is a big thing, cf. the difference between > NumPy and numexpr: > > z = numexpr.evaluate("""a*x + b*y + c""") > The reason numerical expressions must be written as strings to be > efficient in Python is because there is no __bind__ function. No, it is because the semantics of Python require inefficiency that can only be removed by a special parser-compiler with additional knowledge of the relevant object class, method, and instance properties. Such knowledge allows code to be re-written without changing the effect. For Fortran arrays, the needed information includes the number and length of each dimension. These are either declared or parameterized and passed as arguments. https://code.google.com/p/numexpr/wiki/Overview says that numexpr.evaluate(ex) first calls compile(ex), but does not say whether it has compile compile to cpython bytecode or only to ast. In either case, it converts arraywise operations to blockwise operations inside loops run on a custom C-coded virtual machine. They imply that this is not as good as elementwise operations compiled to native machine code. In any case, it knows that numpy array operations are side-effect free and must use the runtime dimension and size info. -- Terry Jan Reedy From guido at python.org Thu Dec 6 00:01:16 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Dec 2012 15:01:16 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Wed, Dec 5, 2012 at 12:01 PM, Bruce Leban wrote: > > > On Wed, Dec 5, 2012 at 11:22 AM, Guido van Rossum wrote: >> >> - Unions. We need a way to say "either X or Y". Given that we're >> defining our own objects we may actually be able to get away with >> writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would >> still work. It would also be useful to have a shorthand for "either T >> or None", written as Optional[T] or Optional(T). > > > Optional is not the same as "or None" to me: > > Dict(a=Int, b=Int | None, c=Optional(Int)) > > > suggests that b is required but might be None while c is not required, i.e., > {'a': 3, b: None} is allowed while {'a': 3, c: None} is not. > > Ditto for Tuples: > > Tuple[Int, Str | None, Optional(Int)] > > where (3, None) matches as does (3, 'a', 4) but not (3, None, None). > > Optionals might be restricted to the end as matching in the middle would be > complicated and possibly error-prone: > > Tuple[Int, Optional(Int | None), Int | Str, Int | None] Those are not the semantics I had in mind for Optional. -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Dec 6 00:06:01 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Dec 2012 15:06:01 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <4F969DC0-2B67-4C35-B0E7-EEEAD992E840@masklinn.net> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> <4F969DC0-2B67-4C35-B0E7-EEEAD992E840@masklinn.net> Message-ID: On Wed, Dec 5, 2012 at 12:34 PM, Masklinn wrote: > On 2012-12-05, at 20:22 , Guido van Rossum wrote: >> >>> The most bothersome part is that I "feel" "either X or Y" (aka `X | Y`) >>> should be a set of type (and thus the same as {X, Y}[0]) but that doesn't >>> work with `isinstance` or `issubclass`. Likewise, `(a, b, c)` in an >>> annotation feels like it should mean the same as `tuple[a, b, c]` ("a >>> tuple with 3 items of types resp. a, b and c") but that's at odds with >>> the same type-checking functions. >> >> Note that in Python 3 you can override isinstance, by defining >> __instancecheck__ in the class: >> http://docs.python.org/3/reference/datamodel.html?highlight=__instancecheck__#class.__instancecheck__ >> >> So it shouldn't be a problem to make isinstance(42, Int) work. > > My problem there was more about having e.g. Int | Float return a set, > but isinstance not working with a set. But indeed it could return a > TypeSet which would implement __instancecheck__. Right, that's what I meant. >> - Tuples. Sometimes you want to say e.g. "a tuple of integers, don't >> mind the length"; other times you want to say e.g. "a tuple of fixed >> length containing an int and two strs". Perhaps the former should be >> expressed using ImmutableSequence[Int] and the second as Tuple[Int, >> Str, Str]. > > > >> - Unions. We need a way to say "either X or Y". Given that we're >> defining our own objects we may actually be able to get away with >> writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would >> still work. It would also be useful to have a shorthand for "either T >> or None", written as Optional[T] or Optional(T). > > Well if `|` is the "union operator", as Ben notes `T | None` works well, > is clear and is sufficient. Though that's if and only if "Optional[T]" > is equivalent to "T or None" which Bruce seems to disagree with. There's > some history with this pattern: > http://journal.stuffwithstuff.com/2010/08/23/void-null-maybe-and-nothing/ > (bottom section, from "Or Some Other Solution") Actually, I find "T|None" somewhat impure, since None is not a type but a value. If you were allow this, what about "T|False"? And then what about "True|None"? (There's no way to make the latter work!) And I think "T|NoneType" is obscure; hence my proposal of Optional(T). (Not Optional[T], since Optional is not a type.) >> - Whether to design notations to express other constraints. E.g. >> "integer in range(10, 100)", or "one of the strings 'r', 'w' or 'a'", >> etc. You can go crazy on this. > > Yes this is going in Oleg territory, a sound core is probably a > good starting idea. Although basic enumerations ("one of the strings > 'r', 'w' or 'a'") could be rather neat. > >> - Composability (Nick's pet peeve, in that he is against it). I >> propose that we reserve plain tuples for this. If an annotation has >> the form "x: (P, Q)" then that ought to mean that x must conform to >> both P and Q. Even though Nick doesn't like this, I don't think we >> should do everything with decorators. Surly, the decorators approach >> is good for certain use cases, and should take precedence if it is >> used. But e.g. IDEs that use annotations for suggestions and >> refactoring should not require everything to be decorated -- that >> would just make the code too busy. >> >> - Runtime enforcement. What should we use type annotations for? IDEs, >> static checkers (linters) and refactoring tools only need the >> annotations when they are parsing the code. > > For IDEs, that's pretty much all the time though, either they're parsing > the code or they're trying to perform static analysis on it, which uses > the annotations. Yeah, they're parsing it, but they're not executing it. >> While it is tempting to >> invent some kind of runtime checking that automatically checks the >> actual types against the annotations whenever a function is called, I >> think this is rarely useful, and often prohibitively slow. > > Could be useful for debug or testing runs though, in the same way > event-based profilers are prohibitively slow and can't be enabled all > the time but are still useful. Plus it might be possible to > enable/disable this mechanism with little to no source modification via > sys.setprofile (I'm not sure what hooks it provides exactly and the > documentation is rather sparse, so I'm not sure if the function object > itself is available to the setprofile callback, looking at > Lib/profiler.py it might only get the code object). Hence my idea of using a decorator to enable this on specific functions. -- --Guido van Rossum (python.org/~guido) From bruce at leapyear.org Thu Dec 6 01:13:51 2012 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 5 Dec 2012 16:13:51 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Wed, Dec 5, 2012 at 3:01 PM, Guido van Rossum wrote: > > Those are not the semantics I had in mind for Optional. I know that. My point was that the standard meaning of the word optional is that something may or may not be given (or whatever the applicable verb is). That's quite different from saying it must be provided but may be None. Since you invited a bit of bikeshedding, I felt it was appropriate to point that out and then I got distracted by discussing the alternative that you weren't talking about. Sorry that was confusing. In C#, this is called Nullable and you can write Nullable to indicate the type (String or null type). The shorthand for that is String?. If you want a shorthand to specify that None is allowed, I'd suggest ~Str. --- Bruce P.S. Optional[T] is not literally a shorthand for T | None as the former is 11 characters and the latter is 10 characters even if we include and count the spaces. :-) P.P.S. I don't think Str | None rather than Str | NoneType is confusing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Dec 6 06:27:21 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 6 Dec 2012 15:27:21 +1000 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Thu, Dec 6, 2012 at 5:22 AM, Guido van Rossum wrote: > - Composability (Nick's pet peeve, in that he is against it). I > propose that we reserve plain tuples for this. If an annotation has > the form "x: (P, Q)" then that ought to mean that x must conform to > both P and Q. Even though Nick doesn't like this, I don't think we > should do everything with decorators. Surly, the decorators approach > is good for certain use cases, and should take precedence if it is > used. But e.g. IDEs that use annotations for suggestions and > refactoring should not require everything to be decorated -- that > would just make the code too busy. > I'm not against using composition within a particular set of annotation semantics, I'm against developing a convention for arbitrary composition of annotations with *different* semantics. Instead, I'm advocating for the following guidelines to avoid treading on each others toes when experimenting with annotations and to leave scope for us to define standard annotation semantics at a future date: 1. Always use a decorator that expresses the annotation semantics in use (e.g. tab completion, type descriptions, parameter documentation) 2. Always *move* the annotations out to purpose-specific storage as part of the decorator (don't leave them in the annotations storage) 3. When analysing a function later, use only the purpose-specific attribute(s), not the raw annotations storage 4. To support composition with other sets of annotation semantics, always provide an alternate API that accepts the per-parameter details directly (e.g. by name or index) rather than relying solely on the annotations The reason for this is so that if, at some future point in the time, python-dev agrees to bless some particular set of semantics as *the* meaning of function annotations (such as the type hinting system being discussed), then that won't break anything. Otherwise, if people believe that it's OK for them to simply assume that the contents of the annotations mean whatever they mean for their particular project, then it *will* cause problems further down the road as annotations written for one set of semantics (e.g. tab completion, parameter documentation) get interpreted by a processor expecting different semantics (e.g. type hinting). Here's how your example experiment would look under such a scheme: from experimental_type_annotations import type_hints, Int, Str, Float # After type_hints runs, foo.__annotations__ would be empty, and the type # hinting data would instead be stored in (e.g.) a foo._type_hints attribute. @type_hints def foo(a: Int, b: Str) -> Float: This is then completely clear and unambigious: - readers can see clearly that these annotations are intended as type hints - the type hinting processor can see that there *is* type hinting information available, due to the presence of a _type_hints attribute - other automated processors see that there are no "default" annotations (which is good, since there is currently no such thing as "default" annotation semantics) Furthermore, (as noted elsewhere in the thread) an alternate API can then easily be provided that supports composition with other annotations: @type_hints(Int, Str, _return=Float) def foo(a, b): Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Thu Dec 6 06:27:33 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 06 Dec 2012 00:27:33 -0500 Subject: [Python-ideas] A bind protocol (was Re: New __reference__ hook) In-Reply-To: References: Message-ID: On 12/5/2012 3:40 PM, Eric Snow wrote: > (from the "Re: New __reference__ hook" thread) > > On Wed, Dec 5, 2012 at 11:54 AM, Bruce Leban wrote: >> There is another way to write expressions that don't get evaluated: >> >> lambda: a*x + b*y + c >> >> >> So you could write this as z.bind(lambda: rhs) or if this is important >> enough there could be a new bind operator: >> >> lhs @= rhs >> >> >> which is equivalent to >> >> lhs.__bind__(lambda: rhs) This makes no sense to me. The targets of bind statements are not Python objects and do not have methods. They may be 'slots' in a python objects or may be turned into Python objects (strings), but within functions, they are not. In CPython, function local names are turned into C ints or uints. > The lazy/lambda part aside, such an operator would somewhat help with > performance concerns and allow the "binder" to control when the > "bindee" gets notified. So this does not make much sense either. -- Terry Jan Reedy From guido at python.org Thu Dec 6 06:54:25 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 5 Dec 2012 21:54:25 -0800 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: Hi Nick, I understand your position completely (and I did before). I just disagree. :-) I think that requiring the experiment I am proposing to use a decorator on each function that uses it (rather than just an import at the top of the module) will cause too much friction, and the experiment won't get off the ground. That's why I am proposing a universal composition convention: When an annotation for a particular argument is a tuple, then any framework or decorator that tries to assign meanings to annotations must search the items of the tuple for one that it can understand. For the experimental type annotation system I am proposing this should be simple enough -- the type annotation system can require that the things it cares about must all be subclasses of a specific base class (let's call it TypeConstraint). If the annotation is not a tuple, it should be interpreted as a singleton tuple. Yes, it is possible that a mistake leaves an annotation unclaimed. But that's no worse than currently, where all annotations are ignored. And for TypeConstraint there is no runtime behavior anyway (unless you *do* add a decorator) -- its annotations are there for other tools to parse and interpret. It's like pylint directives -- if you accidentally misspell it 'pylnt' you get no error (but you may still notice that something's fishy, because when pylint runs it doesn't suppress the thing you tried to suppress :-). --Guido On Wed, Dec 5, 2012 at 9:27 PM, Nick Coghlan wrote: > On Thu, Dec 6, 2012 at 5:22 AM, Guido van Rossum wrote: >> >> - Composability (Nick's pet peeve, in that he is against it). I >> propose that we reserve plain tuples for this. If an annotation has >> the form "x: (P, Q)" then that ought to mean that x must conform to >> both P and Q. Even though Nick doesn't like this, I don't think we >> should do everything with decorators. Surly, the decorators approach >> is good for certain use cases, and should take precedence if it is >> used. But e.g. IDEs that use annotations for suggestions and >> refactoring should not require everything to be decorated -- that >> would just make the code too busy. > > > I'm not against using composition within a particular set of annotation > semantics, I'm against developing a convention for arbitrary composition of > annotations with *different* semantics. > > Instead, I'm advocating for the following guidelines to avoid treading on > each others toes when experimenting with annotations and to leave scope for > us to define standard annotation semantics at a future date: > > 1. Always use a decorator that expresses the annotation semantics in use > (e.g. tab completion, type descriptions, parameter documentation) > 2. Always *move* the annotations out to purpose-specific storage as part of > the decorator (don't leave them in the annotations storage) > 3. When analysing a function later, use only the purpose-specific > attribute(s), not the raw annotations storage > 4. To support composition with other sets of annotation semantics, always > provide an alternate API that accepts the per-parameter details directly > (e.g. by name or index) rather than relying solely on the annotations > > The reason for this is so that if, at some future point in the time, > python-dev agrees to bless some particular set of semantics as *the* meaning > of function annotations (such as the type hinting system being discussed), > then that won't break anything. Otherwise, if people believe that it's OK > for them to simply assume that the contents of the annotations mean whatever > they mean for their particular project, then it *will* cause problems > further down the road as annotations written for one set of semantics (e.g. > tab completion, parameter documentation) get interpreted by a processor > expecting different semantics (e.g. type hinting). > > Here's how your example experiment would look under such a scheme: > > from experimental_type_annotations import type_hints, Int, Str, Float > > # After type_hints runs, foo.__annotations__ would be empty, and the > type > # hinting data would instead be stored in (e.g.) a foo._type_hints > attribute. > @type_hints > > def foo(a: Int, b: Str) -> Float: > > > This is then completely clear and unambigious: > - readers can see clearly that these annotations are intended as type hints > - the type hinting processor can see that there *is* type hinting > information available, due to the presence of a _type_hints attribute > - other automated processors see that there are no "default" annotations > (which is good, since there is currently no such thing as "default" > annotation semantics) > > Furthermore, (as noted elsewhere in the thread) an alternate API can then > easily be provided that supports composition with other annotations: > > @type_hints(Int, Str, _return=Float) > def foo(a, b): > > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- --Guido van Rossum (python.org/~guido) From aquavitae69 at gmail.com Thu Dec 6 08:23:31 2012 From: aquavitae69 at gmail.com (David Townshend) Date: Thu, 6 Dec 2012 09:23:31 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> Message-ID: On Wed, Dec 5, 2012 at 8:17 PM, Guido van Rossum wrote: > On Tue, Dec 4, 2012 at 1:37 AM, David Townshend > wrote: > > Just thought of a couple of usages which don't fit into the decorator > model. > > The first is using the return annotation for early binding: > > > > def func(seq) -> dict(sorted=sorted): > > return func.__annotations__['return']['sorted'](seq) > > You've got to be kidding... > > > Stangely enough, this seems to run slightly faster than > > > > def func(seq, sorted=sorted): > > return sorted(seq) > > > > My test shows the first running in about 0.376s and the second in about > > 0.382s (python 3.3, 64bit). > > Surely that's some kind of random variation. It's only a 2% difference. > It's consistent. I ran several tests and came out with the same 2% difference every time. > IOW, this is not a line of thought to pursue. > > I wasn't suggesting that this is a good idea, I was merely trying to point out that there are currently ways of using annotations beyond type declarations with decorators, and that there may be other use cases out there which will work well. Documenting recommendations that annotations only be used with decorators, or only be used for type declarations will limit the possibilities because nobody will bother to look further, and if they do, the ideas will no doubt be shut down as being bad style because they go against the recommended usage. I thought that limiting annotations like this was what you wanted to avoid? Having said that, I've never found a good use for annotations in my own code, so I'm not emotionally invested one way or the other. I do think that the best usage I've seen is exactly what is being discussed here and it would be great if there was some prescribed use for annotations. Perhaps people would actually use them then. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Thu Dec 6 09:43:34 2012 From: masklinn at masklinn.net (Masklinn) Date: Thu, 6 Dec 2012 09:43:34 +0100 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> <4F969DC0-2B67-4C35-B0E7-EEEAD992E840@masklinn.net> Message-ID: <33AD9673-BFDD-4C1E-8149-BAC13ADB29BB@masklinn.net> On 2012-12-06, at 00:06 , Guido van Rossum wrote: > >>> - Unions. We need a way to say "either X or Y". Given that we're >>> defining our own objects we may actually be able to get away with >>> writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would >>> still work. It would also be useful to have a shorthand for "either T >>> or None", written as Optional[T] or Optional(T). >> >> Well if `|` is the "union operator", as Ben notes `T | None` works well, >> is clear and is sufficient. Though that's if and only if "Optional[T]" >> is equivalent to "T or None" which Bruce seems to disagree with. There's >> some history with this pattern: >> http://journal.stuffwithstuff.com/2010/08/23/void-null-maybe-and-nothing/ >> (bottom section, from "Or Some Other Solution") > > Actually, I find "T|None" somewhat impure, since None is not a type > but a value. If you were allow this, what about "T|False"? And then > what about "True|None"? (There's no way to make the latter work!) And > I think "T|NoneType" is obscure; hence my proposal of Optional(T). > (Not Optional[T], since Optional is not a type.) Why would Optional not be a type? It's coherent with Option or Maybe types in languages with such features, or C#'s Nullable. From andrew.svetlov at gmail.com Thu Dec 6 15:17:35 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Thu, 6 Dec 2012 16:17:35 +0200 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> Message-ID: On Wed, Dec 5, 2012 at 9:22 PM, Guido van Rossum wrote: > - Unions. We need a way to say "either X or Y". Given that we're > defining our own objects we may actually be able to get away with > writing e.g. "Int | Str" or "Str | List[Str]", and isinstance() would > still work. It would also be useful to have a shorthand for "either T > or None", written as Optional[T] or Optional(T). Just to note: there are https://github.com/Deepwalker/trafaret library intended for checking on complex enough structures. From random832 at fastmail.us Thu Dec 6 20:56:07 2012 From: random832 at fastmail.us (random832 at fastmail.us) Date: Thu, 06 Dec 2012 14:56:07 -0500 Subject: [Python-ideas] Conventions for function annotations In-Reply-To: <33AD9673-BFDD-4C1E-8149-BAC13ADB29BB@masklinn.net> References: <4B6491A4-315B-4C39-A0F2-42F0EFB42ADA@gmail.com> <9C9240AB-CE2D-4E0D-B74C-526EEB09AEB5@gmail.com> <20121203103416.03094472@resist.wooz.org> <55D9BA21-0C74-4958-A9C7-0C0969366F93@masklinn.net> <4F969DC0-2B67-4C35-B0E7-EEEAD992E840@masklinn.net> <33AD9673-BFDD-4C1E-8149-BAC13ADB29BB@masklinn.net> Message-ID: <1354823767.1386.140661162801849.0482DD14@webmail.messagingengine.com> On Thu, Dec 6, 2012, at 3:43, Masklinn wrote: > Why would Optional not be a type? It's coherent with Option or Maybe > types in languages with such features, or C#'s Nullable. C#'s Nullable doesn't really work outside a static typing system - when you assign a Nullable to an 'object' or a 'dynamic', you get either the original type (e.g. Int32) or a null reference (which has no type). It's a real type only as far as the static typing system goes: it can be the type of a field or a local variable, it _cannot_ be the type of an object on the heap. And since python doesn't have static typing... From dreamingforward at gmail.com Fri Dec 7 22:45:25 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Fri, 7 Dec 2012 15:45:25 -0600 Subject: [Python-ideas] Graph class Message-ID: I have a decent semi-recursive Graph class that I think could be a good addition to the Collections module. It probably needs some refactoring, but I'm posting here to see if there's any interest. For those who aren't too abreast of CS theory, a graph is one of the most abstract data structures in computer science, encompassing trees, and lists. I'm a bit surprised that no one's offered one up yet, so I'll present mine. The code is at http://github.com/theProphet/Social-Garden under the pangaia directly called graph.py. It has a default dictionary (defdict.py) dependency that I made before Python came up with it on it's own (another place for refactoring). Cheers, MarkJ From thomas at kluyver.me.uk Fri Dec 7 23:22:42 2012 From: thomas at kluyver.me.uk (Thomas Kluyver) Date: Fri, 7 Dec 2012 22:22:42 +0000 Subject: [Python-ideas] Graph class In-Reply-To: References: Message-ID: On 7 December 2012 21:45, Mark Adam wrote: > I have a decent semi-recursive Graph class that I think could be a > good addition to the Collections module. It probably needs some > refactoring, but I'm posting here to see if there's any interest. > For reference, there was a previous idea to make some kind of standard Graph API: http://wiki.python.org/moin/PythonGraphApi When I had to implement a really simple DAG myself, I based it on this Graph ABC library: http://www.linux.it/~della/GraphABC/ Best wishes, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Dec 8 08:17:17 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 08 Dec 2012 02:17:17 -0500 Subject: [Python-ideas] Graph class In-Reply-To: References: Message-ID: On 12/7/2012 4:45 PM, Mark Adam wrote: > I have a decent semi-recursive Graph class that I think could be a > good addition to the Collections module. It probably needs some > refactoring, but I'm posting here to see if there's any interest. > > For those who aren't too abreast of CS theory, a graph is one of the > most abstract data structures in computer science, encompassing trees, > and lists. I'm a bit surprised that no one's offered one up yet, so > I'll present mine. I believe there are are multiple graph modules and packages, but none is really dominant. It is partly because there are multiple representations and the best depends on the problem. > The code is at http://github.com/theProphet/Social-Garden under the > pangaia directly called graph.py. It has a default dictionary > (defdict.py) dependency that I made before Python came up with it on > it's own (another place for refactoring). -- Terry Jan Reedy From storchaka at gmail.com Sat Dec 8 09:07:40 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 08 Dec 2012 10:07:40 +0200 Subject: [Python-ideas] Graph class In-Reply-To: References: Message-ID: On 07.12.12 23:45, Mark Adam wrote: > I have a decent semi-recursive Graph class that I think could be a > good addition to the Collections module. It probably needs some > refactoring, but I'm posting here to see if there's any interest. > > For those who aren't too abreast of CS theory, a graph is one of the > most abstract data structures in computer science, encompassing trees, > and lists. I'm a bit surprised that no one's offered one up yet, so > I'll present mine. > > The code is at http://github.com/theProphet/Social-Garden under the > pangaia directly called graph.py. It has a default dictionary > (defdict.py) dependency that I made before Python came up with it on > it's own (another place for refactoring). Graph is too abstract conception. There are a lot of implementations of graphs. Every non-trivial program contains some (may be implicit) graphs. See also for some implementations: Magnus Lie Hetland, "Python Algorithms. Mastering Basic Algorithms in the Python Language". From dreamingforward at gmail.com Sun Dec 9 02:29:56 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Sat, 8 Dec 2012 19:29:56 -0600 Subject: [Python-ideas] Graph class In-Reply-To: References: Message-ID: On Fri, Dec 7, 2012 at 4:22 PM, Thomas Kluyver wrote: > On 7 December 2012 21:45, Mark Adam wrote: >> >> I have a decent semi-recursive Graph class that I think could be a >> good addition to the Collections module. It probably needs some >> refactoring, but I'm posting here to see if there's any interest. > > For reference, there was a previous idea to make some kind of standard Graph > API: > http://wiki.python.org/moin/PythonGraphApi All very interesting. I'm going to suggest a sort of "meta-discussion" about why -- despite the power of graphs as a data structure -- such a feature has not stabilized into a workable solution for inclusion in a high-level language like Python. I identity the following points of "wavery": 1) the naming of methods (add_edge, vs add(1,2)): *aesthetic grounds,* 2) what methods to include (degree + neighbors or the standard dict's __len__ + __getitem__): *API grounds* 3) how much flexibility to be offered (directed, multi-graphs, edge weights with arbitrary labeling, etc.): *functionality grounds* 3) what underlying data structure to use (sparse adjacency dicts, matrices, etc): *representation conflicts*. And upon further thought, it looks like only a killer application could ever settle the issue(s) to make it part of the standard library. mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Dec 9 12:40:05 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 9 Dec 2012 11:40:05 +0000 Subject: [Python-ideas] Graph class In-Reply-To: References: Message-ID: On 9 December 2012 01:29, Mark Adam wrote: > All very interesting. I'm going to suggest a sort of "meta-discussion" > about why -- despite the power of graphs as a data structure -- such a > feature has not stabilized into a workable solution for inclusion in a > high-level language like Python. > > I identity the following points of "wavery": > > 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds, > 2) what methods to include (degree + neighbors or the standard dict's > __len__ + __getitem__): API grounds > 3) how much flexibility to be offered (directed, multi-graphs, edge weights > with arbitrary labeling, etc.): functionality grounds > 3) what underlying data structure to use (sparse adjacency dicts, matrices, > etc): representation conflicts. 4) Whether the library requires some sort of "Vertex" type, or works with arbitrary values, similarly whether there is a defined "Edge" class or edges can be labelled, weighted, etc with arbitrary Python values. 5) Granularity - if all I want is a depth-first search algorithm, why pull in a dependency on 100 graph algorithms I'm not interested in? My feeling is that graphs are right on the borderline of a data structure that is simple enough that people invent their own rather than bother conforming to a "standard" model but complex enough that it's worth using library functions rather than getting the details wrong. In C, there are many examples of this type of "borderline" stuff - linked lists, maps, sorting and searching algorithms, etc. In Python, lists, dictionaries, sorting, etc are all "self evidently" basic building blocks, but graphs hit that borderline area. Paul From allyourcode at gmail.com Sun Dec 9 20:33:43 2012 From: allyourcode at gmail.com (Daniel Wong) Date: Sun, 9 Dec 2012 11:33:43 -0800 (PST) Subject: [Python-ideas] Conventions for function annotations In-Reply-To: References: Message-ID: <2c69dfe4-5c90-4670-b747-7734e37ccb83@googlegroups.com> This proposal looks great. The only thing is that I don't understand the point of annotations in the first place, since Python has decorators. As the last part of your post describes, decorators can be used to do the same thing. With decorators, it is even possible to use annotation-like syntax: def defaults_as_parameter_metadata(f): names, args_name, kwargs_name, defaults = inspect.getargspec(f) assert len(names) == len(defaults) # To keep this example simple... f.parameter_metadata = {} for name, meta in zip(names, defaults): f.parameter_metadata[name] = meta f.__defaults__ = () # Again, for simplicity. return f @defaults_as_parameter_metadata def make_ice_cream(flavor=(options('vanilla', 'chocolate', ...), str, "What kind of delicious do you want?"), quantity=(positive, double, "How much (in pounds) do you want?")): ... I know this addresses a different issue, but I was directed to this thread from an answer that I got on StackOverflow, and this thread seems related enough. Sorry if I'm going off the rails here. On Saturday, December 1, 2012 4:28:50 AM UTC-8, Thomas Kluyver wrote: > > Function annotations (PEP 3107) are a very interesting new feature, but so > far have gone largely unused. The only project I've seen using them is > plac, a command-line option parser. One reason for this is that because > function annotations can be used to mean anything, we're wary of doing > anything in case we interfere with some other use case. A recent thread on > ipython-dev touched on this [1], and we'd like to suggest some conventions > to make annotations useful for everyone. > > 1. Code inspecting annotations should be prepared to ignore annotations it > can't understand. > > 2. Code creating annotations should use wrapper classes to indicate what > the annotation means. For instance, we are contemplating a way to specify > options for a parameter, to be used in tab completion, so we would do > something like this: > > from IPython.core.completer import options > def my_io(filename, mode: options('read','write') ='read'): > ... > > 3. There are a couple of important exceptions to 2: > - Annotations that are simply a string can be used like a docstring, to be > displayed to the user. Inspecting code should not expect to be able to > parse any machine-readable information out of these strings. > - Annotations that are a built-in type (int, str, etc.) indicate that the > value should always be an instance of that type. Inspecting code may use > these for type checking, introspection, optimisation, or other such > purposes. Note that for now, I have limited this to built-in types, so > other types can be used for other purposes, but this could be extended. For > instance, the ABCs from collections (collections.Mapping et al.) could well > be added to this category. > > 4. There should be a convention for attaching multiple annotations to one > value. I propose that all code using annotations expects to handle > tuples/lists of annotations. (We also considered dictionaries, but the > result is long and ugly). So in this definition: > > def my_io(filename, mode: (options('read','write'), str, 'The mode in > which to open the file') ='read'): > ... > > the mode parameter has a set of options (ignored by frameworks that don't > recognise it), should always be a string, and has a description. > > Any thoughts and suggestions are welcome. > > As an aside, we may also create a couple of decorators to fill in > __annotations__ on Python 2, something like: > > @return_annotation('A file obect') > @annotations(mode=(options('read','write'), str, 'The mode in which to > open the file')) > def my_io(filename, mode='read'): > ... > > [1] http://mail.scipy.org/pipermail/ipython-dev/2012-November/010697.html > > > Thanks, > Thomas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Sun Dec 9 21:31:32 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Sun, 9 Dec 2012 14:31:32 -0600 Subject: [Python-ideas] Fwd: Graph class In-Reply-To: References: Message-ID: Meant this to go to the whole list. Sorry. On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore wrote: > On 9 December 2012 01:29, Mark Adam wrote: >> All very interesting. I'm going to suggest a sort of "meta-discussion" >> about why -- despite the power of graphs as a data structure -- such a >> feature has not stabilized into a workable solution for inclusion in a >> high-level language like Python. >> >> I identity the following points of "wavery": >> >> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds, >> 2) what methods to include (degree + neighbors or the standard dict's >> __len__ + __getitem__): API grounds >> 3) how much flexibility to be offered (directed, multi-graphs, edge weights >> with arbitrary labeling, etc.): functionality grounds >> 4) what underlying data structure to use (sparse adjacency dicts, matrices, >> etc): representation conflicts. > > 4) Whether the library requires some sort of "Vertex" type, or works > with arbitrary values, similarly whether there is a defined "Edge" > class or edges can be labelled, weighted, etc with arbitrary Python > values. This I put under #3 (functionality grounds) "edge weights with arbitrary labeling", Vertex's with abitrary values i think would be included. > 5) Granularity - if all I want is a depth-first search algorithm, why > pull in a dependency on 100 graph algorithms I'm not interested in? Hmm, I would call this "5) comprehensiveness: whether to include every graph algorithm known to mankind." > My feeling is that graphs are right on the borderline of a data > structure that is simple enough that people invent their own rather > than bother conforming to a "standard" model but complex enough that > it's worth using library functions rather than getting the details > wrong. But this is also why (on both counts) it would be good to include it in the standard library. The *simplicity* of a graph makes everyone re-implement it, or (worse) work with some cruder work-around (like a dict of dicts but not at all clear you're dealing with an actual graph). But imagine if, for example, Subversion used a python graph class to track all branches and nodes in it's distributed revision control system. Then how easy it would be for third parties to make tools to view repos or other developers to come in and work with the dev team: they're already familiar with the standard graph class structure. And for the *complex enough* case, obviously it helps to have a standard library help you out and just provide the sophistication of a graph class for you. There are a lot of obvious uses for a graph, but if you don't know of it, a beginning programmer won't *think* of it and make some crude work-around. Mark From solipsis at pitrou.net Sun Dec 9 21:53:45 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 9 Dec 2012 21:53:45 +0100 Subject: [Python-ideas] Graph class References: Message-ID: <20121209215345.496fc6ef@pitrou.net> On Fri, 7 Dec 2012 15:45:25 -0600 Mark Adam wrote: > I have a decent semi-recursive Graph class that I think could be a > good addition to the Collections module. It probably needs some > refactoring, but I'm posting here to see if there's any interest. > > For those who aren't too abreast of CS theory, a graph is one of the > most abstract data structures in computer science, encompassing trees, > and lists. I'm a bit surprised that no one's offered one up yet, so > I'll present mine. > > The code is at http://github.com/theProphet/Social-Garden under the > pangaia directly called graph.py. It has a default dictionary > (defdict.py) dependency that I made before Python came up with it on > it's own (another place for refactoring). Do you know networkx? http://networkx.lanl.gov/ Regards Antoine. From stephen at xemacs.org Mon Dec 10 03:08:00 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 10 Dec 2012 11:08:00 +0900 Subject: [Python-ideas] Fwd: Graph class In-Reply-To: References: Message-ID: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> Mark Adam writes: > graph). But imagine if, for example, Subversion used a python graph > class to track all branches and nodes in it's distributed revision > control system. Then how easy it would be for third parties to make > tools to view repos or other developers to come in and work with the > dev team: they're already familiar with the standard graph class > structure. This is a fallacy. As has been pointed out, there is a variety of graphs, a large variety of computations to be done on and in them, and a huge variety in algorithms for dealing with those varied tasks. For a "standard" graph class to be useful enough to become the OOWTDI, it would need to deal with a large fraction of those aspects of graph theory. Even so, people would only really internalize the parts they need for the present task, forgetting or (worse) misremembering functionality that doesn't work for them right now. Corner cases would force many tasks to be done outside of the standard class. Differences in taste would surely result in a large number of API variants to reflect users' preferred syntaxes for representing graphs, and so on. I think making a "Graph" class that has a chance of becoming the OOWTDI is a big task. Not as big as SciPy, say, but then, SciPy isn't being proposed for stdlib inclusion, either. As usual for stdlib additions, I think this discussion would best be advanced not by "going all meta", but rather by proposing specific packages (either already available, perhaps on PyPI, or new ones -- but with actual code) for inclusion. The "meta" discussion should be conducted with specific reference to the advantages or shortcomings of those specific packages. N.B. A reasonably comprehensive package that has seen significant real-world use, and preferably has a primary distribution point of PyPI, would be the shortest path to inclusion. From techtonik at gmail.com Wed Dec 12 10:14:21 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 12 Dec 2012 12:14:21 +0300 Subject: [Python-ideas] Python is not perfect - let's add 'Wart' status to track Message-ID: I want to query all warts for specific Python 2.x versions to see how are they fixed in 3.x. Right now these warts are hidden beneath the "invalid" labels, which IMHO does as much damage to the language development as BC breaks. How about adding 'Wart' resolution to the closed status on tracker? -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Dec 12 10:45:44 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 12 Dec 2012 10:45:44 +0100 Subject: [Python-ideas] Python is not perfect - let's add 'Wart' status to track References: Message-ID: <20121212104544.06203118@pitrou.net> Le Wed, 12 Dec 2012 12:14:21 +0300, anatoly techtonik a ?crit : > I want to query all warts for specific Python 2.x versions to see how > are they fixed in 3.x. > > Right now these warts are hidden beneath the "invalid" labels, which > IMHO does as much damage to the language development as BC breaks. > > How about adding 'Wart' resolution to the closed status on tracker? That's what "won't fix" is for: things that we agree should ideally be fixed but that we keep it frozen for compatibility / other reasons. Regards Antoine. From storchaka at gmail.com Wed Dec 12 18:40:03 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 12 Dec 2012 19:40:03 +0200 Subject: [Python-ideas] Docstrings for namedtuple Message-ID: What interface is better for specifying namedtuple field docstrings? Point = namedtuple('Point', 'x y', doc='Point: 2-dimensional coordinate', field_docs=['abscissa', 'ordinate']) or Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')], doc='Point: 2-dimensional coordinate') ? http://bugs.python.org/issue16669 From solipsis at pitrou.net Wed Dec 12 20:35:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 12 Dec 2012 20:35:47 +0100 Subject: [Python-ideas] Docstrings for namedtuple References: Message-ID: <20121212203547.1d08a044@pitrou.net> On Wed, 12 Dec 2012 19:40:03 +0200 Serhiy Storchaka wrote: > What interface is better for specifying namedtuple field docstrings? > > Point = namedtuple('Point', 'x y', > doc='Point: 2-dimensional coordinate', > field_docs=['abscissa', 'ordinate']) field_docs={'x': 'abscissa', 'y': 'ordinate'} perhaps? From mal at egenix.com Wed Dec 12 20:56:40 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 12 Dec 2012 20:56:40 +0100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: <50C8E178.3040106@egenix.com> On 12.12.2012 18:40, Serhiy Storchaka wrote: > What interface is better for specifying namedtuple field docstrings? > > Point = namedtuple('Point', 'x y', > doc='Point: 2-dimensional coordinate', > field_docs=['abscissa', 'ordinate']) > > or > > Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')], > doc='Point: 2-dimensional coordinate') > > ? > > http://bugs.python.org/issue16669 IMO, attributes should be documented in the existing doc parameter, not separately. This makes the intention clear and the code overall more readable. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 12 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-12-05: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go37 2012-11-28: Released eGenix mx Base 3.2.5 ... http://egenix.com/go36 2013-01-22: Python Meeting Duesseldorf ... 41 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From storchaka at gmail.com Wed Dec 12 20:57:58 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 12 Dec 2012 21:57:58 +0200 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: <20121212203547.1d08a044@pitrou.net> References: <20121212203547.1d08a044@pitrou.net> Message-ID: On 12.12.12 21:35, Antoine Pitrou wrote: > field_docs={'x': 'abscissa', 'y': 'ordinate'} perhaps? This will force repeat the field names twice. If we have such docs_dict, we can use it as: field_names = ['x', 'y'] Point = namedtuple('Point', field_names, field_docs=list(map(docs_dict.get, field_names))) or as Point = namedtuple('Point', [(f, docs_dict.get(f)) for f in field_names]) In case of ordered dict it can be even simpler: Point = namedtuple('Point', ordered_dict.keys(), field_docs=list(ordered_dict.values())) or Point = namedtuple('Point', ordered_dict.items()) From storchaka at gmail.com Wed Dec 12 21:12:24 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 12 Dec 2012 22:12:24 +0200 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: <50C8E178.3040106@egenix.com> References: <50C8E178.3040106@egenix.com> Message-ID: On 12.12.12 21:56, M.-A. Lemburg wrote: > IMO, attributes should be documented in the existing doc parameter, > not separately. This makes the intention clear and the code overall > more readable. Sorry, I didn't understand what you mean. There is no doc parameter for namedtuple yet. For overloading class docstring we can use inheritance idiom. But there is no way to change field docstring. All field docstrings generated using template 'Alias for field number {index:d}'. From mal at egenix.com Wed Dec 12 21:19:19 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 12 Dec 2012 21:19:19 +0100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: <50C8E178.3040106@egenix.com> Message-ID: <50C8E6C7.6010807@egenix.com> On 12.12.2012 21:12, Serhiy Storchaka wrote: > On 12.12.12 21:56, M.-A. Lemburg wrote: >> IMO, attributes should be documented in the existing doc parameter, >> not separately. This makes the intention clear and the code overall >> more readable. > > Sorry, I didn't understand what you mean. There is no doc parameter for namedtuple yet. Ah, sorry. Please scratch the "existing" in my reply :-) +1 on a doc parameter on namedtuple() - property() already has such a parameter, which is probably why I got confused. -0 on having separate doc strings for the fields. Their meaning will usually be clear from the main doc string. > For overloading class docstring we can use inheritance idiom. But there is no way to change field > docstring. All field docstrings generated using template 'Alias for field number {index:d}'. Yes, I've seen that: http://docs.python.org/2/library/collections.html?highlight=namedtuple#collections.namedtuple It may not be too helpful, but it's an accurate description of the field's purpose :-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Dec 12 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-12-05: Released eGenix pyOpenSSL 0.13 ... http://egenix.com/go37 2012-11-28: Released eGenix mx Base 3.2.5 ... http://egenix.com/go36 2013-01-22: Python Meeting Duesseldorf ... 41 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From dreamingforward at gmail.com Sat Dec 15 07:19:41 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Sat, 15 Dec 2012 00:19:41 -0600 Subject: [Python-ideas] Fwd: Graph class In-Reply-To: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Sun, Dec 9, 2012 at 8:08 PM, Stephen J. Turnbull wrote: > Mark Adam writes: > > > graph). But imagine if, for example, Subversion used a python graph > > class to track all branches and nodes in it's distributed revision > > control system. Then how easy it would be for third parties to make > > tools to view repos or other developers to come in and work with the > > dev team: they're already familiar with the standard graph class > > structure. > > This is a fallacy. As has been pointed out, there is a variety of > graphs, a large variety of computations to be done on and in them, and > a huge variety in algorithms for dealing with those varied tasks. Yes, but the basic data structure concept is NOT in development, it is already well-developed. The remaining issues of API are wholly secondary. The usefulness of a graph is unquestioned and creates cross-functionality across a large number of possible interesting domains. Really its like having a car when everyone else is walking (...but should the car be a buick or a toyota? a four-door or two? -- is all besides the point) But your issue of proposing an actual implementation is well-taken rather than spend a lot of time arguing over it all. With any luck, I'll try to distill networkx with my work and put it all together. haha mark From tack at urandom.ca Sun Dec 16 02:36:18 2012 From: tack at urandom.ca (Jason Tackaberry) Date: Sat, 15 Dec 2012 20:36:18 -0500 Subject: [Python-ideas] Late to the async party (PEP 3156) Message-ID: <50CD2592.5010507@urandom.ca> Hi python-ideas, I've been somewhat living under a rock for the past few months and consequently I missed the ideal window of opportunity to weigh in on the async discussions this fall that culminated into PEP 3156. I've been reading through those discussions in the archives. I've not finished digesting it all, and I'm somewhat torn in that I feel I should shut up until I read everything to date so as not to decrease the SNR, but on the other hand, knowing myself, I strongly suspect this would result in my never speaking up. And so, at risk of lowering the SNR ... First let me say that PEP 3156 makes me very, very happy. Over the past few years I've been exploring these very ideas with a little-used library called Kaa. I'm not offering it up as a paragon of proper async library design, but I wanted to share some of my experiences in case they could be useful to the PEP. https://github.com/freevo/kaa-base/ http://api.freevo.org/kaa-base/ It does seem like many similar design choices were made. In particular, I'm happy that an explicit yield will be used rather than the greenlet style of implicit suspension/reentry. Even after I've been using them for years, coroutines often feel like a form of magic, and an explicit yield is more aligned with the principle of least surprise. With Kaa, our future-style object is called an InProgress (so forgive the differing terminology in the remainder of this post): http://api.freevo.org/kaa-base/async/inprogress.html A couple properties of InProgress objects that I've found have practical value: * they can be aborted, which raises a special InProgressAborted inside the coroutine function so it can perform cleanup actions o what makes this tricky is the question of what to do to any currently yielded tasks? If A yields B and A is aborted, should B be aborted? What if the same B task is being yielded by C? Should C also be aborted, even if it's considered a sibling of A? (For example, suppose B is a task that is refreshing some common cache that both A and C want to make sure is up-to-date before they move on.) o if the decision is B should be aborted, then within A, 'yield B' will raise an exception because A is aborted, but 'yield B' within C will raise because B was aborted. So there needs to be some mechanism to distinguish between these cases. (My approach was to have an origin attribute on the exception.) o if A yields B, it may want to prevent B from being aborted if A is aborted. (My approach was to have a noabort() method in InProgress objects to return a new, unabortable InProgress object that A can then yield.) o alternatively, the saner implementation may be to do nothing to B when A is aborted and require A catch InProgressAborted and explicitly abort B if that's the desired behaviour o discussion in the PEP on cancellation has some TBDs so perhaps the above will be food for thought * they have a timeout() method, which returns a new InProgress object representing the task that will abort when the timeout elapses if the task doesn't finish o it's noteworthy that timeout() returns a /new/ InProgress and the original task continues on even if the timeout occurs -- by default that is, unless you do timeout(abort=True) o I didn't see much discussion in the PEP on timeouts, but I think this is an important feature that should be standardized Coroutines in Kaa use "yield" rather than "yield from" but the general approach looks very similar to what's been proposed: http://api.freevo.org/kaa-base/async/coroutines.html The @coroutine decorator causes the decorated function to return an InProgress. Coroutines can of course yield other coroutines, but, more fundamentally, anything else that returns an InProgress object, which could be a @threaded function, or even an ordinary function that explicitly creates and returns an InProgress object. There are some features of Kaa's implementation that could be worth considering: * it is possible to yield a special object (called NotFinished) that allows a coroutine to "time slice" as a form of cooperative multitasking * coroutines can have certain policies that control invocation behaviour. The most obvious ones to describe are POLICY_SYNCHRONIZED which ensures that multiple invocations of the same coroutine are serialized, and POLICY_SINGLETON which effectively ignores subsequent invocations if it's already running * it is possible to have a special progress object passed into the coroutine function so that the coroutine's progress can be communicated to an outside observer Once you've standardized on a way to manage the lifecycle of an in-progress asynchronous task, threads are a natural extension: http://api.freevo.org/kaa-base/async/threads.html The important element here is that @threaded decorated functions can be yielded by coroutines. This means that truly blocking tasks can be wrapped in a thread but invocation from a coroutine is identical to any other coroutine. Consequently, a threaded task could later be implemented as a coroutine (or more generally via event loop hooks) without any API changes. I think I'll stop here. There's plenty more definition, discussion, and examples in the links above. Hopefully some ideas can be salvaged for PEP 3156, but even if that's not the case, I'll be happy to know they were considered and rejected rather than not considered at all. Cheers, Jason. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 16 06:37:15 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 15 Dec 2012 21:37:15 -0800 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: <50CD2592.5010507@urandom.ca> References: <50CD2592.5010507@urandom.ca> Message-ID: Hi Jason, I don't think you've missed anything. I had actually planned to keep PEP 3156 unpublished for a bit longer, since I'm not done writing the reference implementation -- I'm sure that many of the issues currently marked open or TBD will be resolved that way. There hasn't been any public discussion since the last threads on python-ideas some weeks ago -- however I've met in person with some Twisted folks and exchanged private emails with some other interested parties. You've also correctly noticed that the PEP is weakest in the area of cancellation (and timeouts aren't even mentioned in the current draft). I'm glad you have some experience in this area, and I'll try to study your solutions and suggestions in more detail soon. For integration with threads, I'm thinking that the PEP currently has the minimum needed with wrap_future() and run_in_executor() -- but I'll read your link on threads and see what I may be missing. (More later, but I don't want you to think you posted into a black hole!) --Guido On Sat, Dec 15, 2012 at 5:36 PM, Jason Tackaberry wrote: > Hi python-ideas, > > I've been somewhat living under a rock for the past few months and > consequently I missed the ideal window of opportunity to weigh in on the > async discussions this fall that culminated into PEP 3156. > > I've been reading through those discussions in the archives. I've not > finished digesting it all, and I'm somewhat torn in that I feel I should > shut up until I read everything to date so as not to decrease the SNR, but > on the other hand, knowing myself, I strongly suspect this would result in > my never speaking up. And so, at risk of lowering the SNR ... > > First let me say that PEP 3156 makes me very, very happy. > > Over the past few years I've been exploring these very ideas with a > little-used library called Kaa. I'm not offering it up as a paragon of > proper async library design, but I wanted to share some of my experiences in > case they could be useful to the PEP. > > https://github.com/freevo/kaa-base/ > http://api.freevo.org/kaa-base/ > > It does seem like many similar design choices were made. In particular, I'm > happy that an explicit yield will be used rather than the greenlet style of > implicit suspension/reentry. Even after I've been using them for years, > coroutines often feel like a form of magic, and an explicit yield is more > aligned with the principle of least surprise. > > With Kaa, our future-style object is called an InProgress (so forgive the > differing terminology in the remainder of this post): > > http://api.freevo.org/kaa-base/async/inprogress.html > > A couple properties of InProgress objects that I've found have practical > value: > > they can be aborted, which raises a special InProgressAborted inside the > coroutine function so it can perform cleanup actions > > what makes this tricky is the question of what to do to any currently > yielded tasks? If A yields B and A is aborted, should B be aborted? What > if the same B task is being yielded by C? Should C also be aborted, even if > it's considered a sibling of A? (For example, suppose B is a task that is > refreshing some common cache that both A and C want to make sure is > up-to-date before they move on.) > if the decision is B should be aborted, then within A, 'yield B' will raise > an exception because A is aborted, but 'yield B' within C will raise because > B was aborted. So there needs to be some mechanism to distinguish between > these cases. (My approach was to have an origin attribute on the > exception.) > if A yields B, it may want to prevent B from being aborted if A is aborted. > (My approach was to have a noabort() method in InProgress objects to return > a new, unabortable InProgress object that A can then yield.) > alternatively, the saner implementation may be to do nothing to B when A is > aborted and require A catch InProgressAborted and explicitly abort B if > that's the desired behaviour > discussion in the PEP on cancellation has some TBDs so perhaps the above > will be food for thought > > they have a timeout() method, which returns a new InProgress object > representing the task that will abort when the timeout elapses if the task > doesn't finish > > it's noteworthy that timeout() returns a new InProgress and the original > task continues on even if the timeout occurs -- by default that is, unless > you do timeout(abort=True) > I didn't see much discussion in the PEP on timeouts, but I think this is an > important feature that should be standardized > > > Coroutines in Kaa use "yield" rather than "yield from" but the general > approach looks very similar to what's been proposed: > > http://api.freevo.org/kaa-base/async/coroutines.html > > The @coroutine decorator causes the decorated function to return an > InProgress. Coroutines can of course yield other coroutines, but, more > fundamentally, anything else that returns an InProgress object, which could > be a @threaded function, or even an ordinary function that explicitly > creates and returns an InProgress object. > > There are some features of Kaa's implementation that could be worth > considering: > > it is possible to yield a special object (called NotFinished) that allows a > coroutine to "time slice" as a form of cooperative multitasking > coroutines can have certain policies that control invocation behaviour. The > most obvious ones to describe are POLICY_SYNCHRONIZED which ensures that > multiple invocations of the same coroutine are serialized, and > POLICY_SINGLETON which effectively ignores subsequent invocations if it's > already running > it is possible to have a special progress object passed into the coroutine > function so that the coroutine's progress can be communicated to an outside > observer > > > Once you've standardized on a way to manage the lifecycle of an in-progress > asynchronous task, threads are a natural extension: > > http://api.freevo.org/kaa-base/async/threads.html > > The important element here is that @threaded decorated functions can be > yielded by coroutines. This means that truly blocking tasks can be wrapped > in a thread but invocation from a coroutine is identical to any other > coroutine. Consequently, a threaded task could later be implemented as a > coroutine (or more generally via event loop hooks) without any API changes. > > I think I'll stop here. There's plenty more definition, discussion, and > examples in the links above. Hopefully some ideas can be salvaged for PEP > 3156, but even if that's not the case, I'll be happy to know they were > considered and rejected rather than not considered at all. > > Cheers, > Jason. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Dec 16 11:16:02 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 16 Dec 2012 11:16:02 +0100 Subject: [Python-ideas] Late to the async party (PEP 3156) References: <50CD2592.5010507@urandom.ca> Message-ID: <20121216111602.383ebf4d@pitrou.net> On Sat, 15 Dec 2012 21:37:15 -0800 Guido van Rossum wrote: > Hi Jason, > > I don't think you've missed anything. I had actually planned to keep > PEP 3156 unpublished for a bit longer, since I'm not done writing the > reference implementation -- I'm sure that many of the issues currently > marked open or TBD will be resolved that way. There hasn't been any > public discussion since the last threads on python-ideas some weeks > ago -- however I've met in person with some Twisted folks and > exchanged private emails with some other interested parties. For the record, have you looked at the pyuv API? It's rather nicely orthogonal, although it lacks a way to stop the event loop. https://pyuv.readthedocs.org/en Regards Antoine. From eliben at gmail.com Sun Dec 16 14:22:44 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 16 Dec 2012 05:22:44 -0800 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: On Wed, Dec 12, 2012 at 9:40 AM, Serhiy Storchaka wrote: > What interface is better for specifying namedtuple field docstrings? > > Point = namedtuple('Point', 'x y', > doc='Point: 2-dimensional coordinate', > field_docs=['abscissa', 'ordinate']) > > or > > Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')], > doc='Point: 2-dimensional coordinate') > > ? > > This may be a good time to say that personally I always disliked namedtuple's creation syntax. It is unpleasant in two respects: 1. You have to repeat the name 2. You have to specify the fields in a space-separated string I wish there was an alternative of something like: @namedtuple class Point: x = 0 y = 0 Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Sun Dec 16 14:24:08 2012 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 16 Dec 2012 05:24:08 -0800 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: This may be a good time to say that personally I always disliked > namedtuple's creation syntax. It is unpleasant in two respects: > > 1. You have to repeat the name > 2. You have to specify the fields in a space-separated string > > I wish there was an alternative of something like: > > @namedtuple > class Point: > x = 0 > y = 0 > > And to the point of Serhiy's original topic, with this syntax there would be no need to invent yet another non-standard way to specify things like docstrings. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Sun Dec 16 14:44:46 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 16 Dec 2012 13:44:46 +0000 (UTC) Subject: [Python-ideas] Late to the async party (PEP 3156) References: <50CD2592.5010507@urandom.ca> <20121216111602.383ebf4d@pitrou.net> Message-ID: Antoine Pitrou writes: > For the record, have you looked at the pyuv API? It's rather nicely > orthogonal, although it lacks a way to stop the event loop. > https://pyuv.readthedocs.org/en That link gives a 404, but you can use https://pyuv.readthedocs.org/en/latest/ Regards, Vinay Sajip From jsbueno at python.org.br Sun Dec 16 15:06:03 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sun, 16 Dec 2012 12:06:03 -0200 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: On 16 December 2012 11:24, Eli Bendersky wrote: > > > This may be a good time to say that personally I always disliked >> namedtuple's creation syntax. It is unpleasant in two respects: >> >> 1. You have to repeat the name >> 2. You have to specify the fields in a space-separated string >> >> I wish there was an alternative of something like: >> >> @namedtuple >> class Point: >> x = 0 >> y = 0 >> >> > And to the point of Serhiy's original topic, with this syntax there would > be no need to invent yet another non-standard way to specify things like > docstrings. > While we are at it, why nto simply: class Point(namedtuple): x = 0 y = 0 ? > > Eli > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sun Dec 16 15:39:10 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 16 Dec 2012 15:39:10 +0100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: Err, can class bodies ever be order-sensitive? I was under the impression names bound there work just like names bound anywhere... Unless of course that magical decorator is secretly an AST hack, in which case, yes, it can do whatever it wants :) On Sun, Dec 16, 2012 at 3:06 PM, Joao S. O. Bueno wrote: > > > On 16 December 2012 11:24, Eli Bendersky wrote: > >> >> >> This may be a good time to say that personally I always disliked >>> namedtuple's creation syntax. It is unpleasant in two respects: >>> >>> 1. You have to repeat the name >>> 2. You have to specify the fields in a space-separated string >>> >>> I wish there was an alternative of something like: >>> >>> @namedtuple >>> class Point: >>> x = 0 >>> y = 0 >>> >>> >> And to the point of Serhiy's original topic, with this syntax there would >> be no need to invent yet another non-standard way to specify things like >> docstrings. >> > > While we are at it, > why nto simply: > > class Point(namedtuple): > x = 0 > y = 0 > > ? > >> >> Eli >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyideas at rebertia.com Sun Dec 16 15:49:12 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sun, 16 Dec 2012 06:49:12 -0800 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: > On Sun, Dec 16, 2012 at 3:06 PM, Joao S. O. Bueno > wrote: >> On 16 December 2012 11:24, Eli Bendersky wrote: >>>> This may be a good time to say that personally I always disliked >>>> namedtuple's creation syntax. It is unpleasant in two respects: >>>> >>>> 1. You have to repeat the name >>>> 2. You have to specify the fields in a space-separated string >>>> >>>> I wish there was an alternative of something like: >>>> >>>> @namedtuple >>>> class Point: >>>> x = 0 >>>> y = 0 >>>> >>> >>> And to the point of Serhiy's original topic, with this syntax there would >>> be no need to invent yet another non-standard way to specify things like >>> docstrings. >> >> >> While we are at it, >> why nto simply: >> >> class Point(namedtuple): >> x = 0 >> y = 0 >> >> ? >> On Sun, Dec 16, 2012 at 6:39 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > Err, can class bodies ever be order-sensitive? Yep. You just have to define a metaclass with a __prepare__() that returns an OrderedDict (or similar). http://docs.python.org/3.4/reference/datamodel.html#preparing-the-class-namespace http://docs.python.org/2/library/collections.html#ordereddict-objects Cheers, Chris -- http://rebertia.com From solipsis at pitrou.net Sun Dec 16 15:52:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 16 Dec 2012 15:52:07 +0100 Subject: [Python-ideas] Docstrings for namedtuple References: Message-ID: <20121216155207.072707c1@pitrou.net> On Sun, 16 Dec 2012 05:22:44 -0800 Eli Bendersky wrote: > On Wed, Dec 12, 2012 at 9:40 AM, Serhiy Storchaka wrote: > > > What interface is better for specifying namedtuple field docstrings? > > > > Point = namedtuple('Point', 'x y', > > doc='Point: 2-dimensional coordinate', > > field_docs=['abscissa', 'ordinate']) > > > > or > > > > Point = namedtuple('Point', [('x', 'absciss'), ('y', 'ordinate')], > > doc='Point: 2-dimensional coordinate') > > > > ? > > > > > This may be a good time to say that personally I always disliked > namedtuple's creation syntax. It is unpleasant in two respects: > > 1. You have to repeat the name > 2. You have to specify the fields in a space-separated string > > I wish there was an alternative of something like: > > @namedtuple > class Point: > x = 0 > y = 0 +1, this would be very nice. It would also allow default values as shown above, which is a useful feature. Regards Antoine. From vinay_sajip at yahoo.co.uk Sun Dec 16 16:36:26 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 16 Dec 2012 15:36:26 +0000 (UTC) Subject: [Python-ideas] Graph class References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Mark Adam writes: > But your issue of proposing an actual implementation is well-taken > rather than spend a lot of time arguing over it all. With any luck, > I'll try to distill networkx with my work and put it all together. In terms of use cases, you might be interested in potential users of any stdlib graph library. I'm working on distlib [1], which evolved out of distutils2 and uses graphs in a couple of places: 1. A dependency graph for distributions. This came directly from distutils2, though I've added a couple of bits to it such as topological sorting and determination of strongly-connected components. 2. A lightweight sequencer for build steps, added to avoid the approach in distutils/distutils2 which makes it harder than necessary to handle custom build steps. I didn't use the graph system used in point 1, as it was too specific, and I haven't had time to look at refactoring it. There's another potential use case in the area of packaging, though perhaps not in distlib itself: the idea of generating build artifacts based on their dependencies. Ideally, this would consider not only build artifacts and their dependencies, but also the builders themselves as part of the graph. Regards, Vinay Sajip [1] https://distlib.readthedocs.org/en/latest/ From guido at python.org Sun Dec 16 16:39:14 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 16 Dec 2012 07:39:14 -0800 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: References: <50CD2592.5010507@urandom.ca> <20121216111602.383ebf4d@pitrou.net> Message-ID: I have to ask someone who has experience with libuv to comment on my PEP -- those docs are very low level and don't explain how things work together or why features are needed. I also have to explain my goals and motivations. But not now. --Guido On Sunday, December 16, 2012, Vinay Sajip wrote: > Antoine Pitrou writes: > > > For the record, have you looked at the pyuv API? It's rather nicely > > orthogonal, although it lacks a way to stop the event loop. > > https://pyuv.readthedocs.org/en > > That link gives a 404, but you can use > > https://pyuv.readthedocs.org/en/latest/ > > Regards, > > Vinay Sajip > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 16 16:41:07 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 16 Dec 2012 07:41:07 -0800 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: I think of graphs and trees as patterns, not data structures. -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 16 17:27:53 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 16 Dec 2012 08:27:53 -0800 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: <50CD2592.5010507@urandom.ca> References: <50CD2592.5010507@urandom.ca> Message-ID: On Sat, Dec 15, 2012 at 5:36 PM, Jason Tackaberry wrote: > With Kaa, our future-style object is called an InProgress (so forgive the > differing terminology in the remainder of this post): > > http://api.freevo.org/kaa-base/async/inprogress.html > > A couple properties of InProgress objects that I've found have practical > value: > > - they can be aborted, which raises a special InProgressAborted inside > the coroutine function so it can perform cleanup actions > - what makes this tricky is the question of what to do to any > currently yielded tasks? If A yields B and A is aborted, should B be > aborted? What if the same B task is being yielded by C? Should C also be > aborted, even if it's considered a sibling of A? (For example, suppose B > is a task that is refreshing some common cache that both A and C want to > make sure is up-to-date before they move on.) > - if the decision is B should be aborted, then within A, 'yield B' > will raise an exception because A is aborted, but 'yield B' within C will > raise because B was aborted. So there needs to be some mechanism to > distinguish between these cases. (My approach was to have an origin > attribute on the exception.) > - if A yields B, it may want to prevent B from being aborted if A > is aborted. (My approach was to have a noabort() method in InProgress > objects to return a new, unabortable InProgress object that A can then > yield.) > - alternatively, the saner implementation may be to do nothing to B > when A is aborted and require A catch InProgressAborted and explicitly > abort B if that's the desired behaviour > - discussion in the PEP on cancellation has some TBDs so perhaps > the above will be food for thought > > The PEP is definitely weak. Here are some thoughts/proposals though: - You can't cancel a coroutine; however you can cancel a Task, which is a Future wrapping a stack of coroutines linked via yield-from. - Cancellation only takes effect when a task is suspended. - When you cancel a Task, the most deeply nested coroutine (the one that caused it to be suspended) receives a special exception (I propose to reuse concurrent.futures.CancelledError from PEP 3148). If it doesn't catch this it bubbles all the way to the Task, and then out from there. - However when a coroutine in one Task uses yield-from to wait for another Task, the latter does not automatically get cancelled. So this is a difference between "yield from foo()" and "yield from Task(foo())", which otherwise behave pretty similarly. Of course the first Task could catch the exception and cancel the second task -- that is its responsibility though and not the default behavior. - PEP 3156 has a par() helper which lets you block for multiple tasks/coroutines in parallel. It takes arguments which are either coroutines, Tasks, or other Futures; it wraps the coroutines in Tasks to run them independently an just waits for the other arguments. Proposal: when the Task containing the par() call is cancelled, the par() call intercepts the cancellation and by default cancels those coroutines that were passed in "bare" but not the arguments that were passed in as Tasks or Futures. Some keyword argument to par() may be used to change this behavior to "cancel none" or "cancel all" (exact API spec TBD). > - they have a timeout() method, which returns a new InProgress object > representing the task that will abort when the timeout elapses if the task > doesn't finish > - it's noteworthy that timeout() returns a *new* InProgress and the > original task continues on even if the timeout occurs -- by default that > is, unless you do timeout(abort=True) > - I didn't see much discussion in the PEP on timeouts, but I think > this is an important feature that should be standardized > > Interesting. In Tulip v1 (the experimental version I wrote before PEP 3156) the Task() constructor has an optional timeout argument. It works by scheduling a callback at the given time in the future, and the callback simply cancel the task (which is a no-op if the task has already completed). It works okay, except it generates tracebacks that are sometimes logged and sometimes not properly caught -- though some of that may be my messy test code. The exception raised by a timeout is the same CancelledError, which is somewhat confusing. I wonder if Task.cancel() shouldn't take an exception with which to cancel the task with. (TimeoutError in PEP 3148 has a different role, it is when the timeout on a specific wait expires, so e.g. fut.result(timeout=2) waits up to 2 seconds for fut to complete, and if not, the call raises TimeoutError, but the code running in the executor is unaffected.) > > Coroutines in Kaa use "yield" rather than "yield from" but the general > approach looks very similar to what's been proposed: > > http://api.freevo.org/kaa-base/async/coroutines.html > > The @coroutine decorator causes the decorated function to return an > InProgress. Coroutines can of course yield other coroutines, but, more > fundamentally, anything else that returns an InProgress object, which could > be a @threaded function, or even an ordinary function that explicitly > creates and returns an InProgress object. > We've had long discussions about yield vs. yield-from. The latter is way more efficient and that's enough for me to push it through. When using yield, each yield causes you to bounce to the scheduler, which has to do a lot of work to decide what to do next, even if that is just resuming the suspended generator; and the scheduler is responsible for keeping track of the stack of generators. When using yield-from, calling another coroutine as a subroutine is almost free and doesn't involve the scheduler at all; thus it's much cheaper, and the scheduler can be simpler (doesn't need to keep track of the stack). Also stack traces and debugging are better. > > There are some features of Kaa's implementation that could be worth > considering: > > - it is possible to yield a special object (called NotFinished) that > allows a coroutine to "time slice" as a form of cooperative multitasking > > I can recommend yield from tulip.sleep(0) for that. > > - coroutines can have certain policies that control invocation > behaviour. The most obvious ones to describe are POLICY_SYNCHRONIZED which > ensures that multiple invocations of the same coroutine are serialized, and > POLICY_SINGLETON which effectively ignores subsequent invocations if it's > already running > - it is possible to have a special progress object passed into the > coroutine function so that the coroutine's progress can be communicated to > an outside observer > > These seem pretty esoteric and can probably implemented in user code if needed. > > > > Once you've standardized on a way to manage the lifecycle of an > in-progress asynchronous task, threads are a natural extension: > > http://api.freevo.org/kaa-base/async/threads.html > > The important element here is that @threaded decorated functions can be > yielded by coroutines. This means that truly blocking tasks can be wrapped > in a thread but invocation from a coroutine is identical to any other > coroutine. Consequently, a threaded task could later be implemented as a > coroutine (or more generally via event loop hooks) without any API changes. > As I said, I think wait_for_future() and run_in_executor() in the PEP give you all you need. The @threaded decorator you propose is just sugar; if a user wants to take an existing API and convert it from a coroutine to threaded without requiring changes to the caller, they can just introduce a helper that is run in a thread with run_in_executor(). > I think I'll stop here. There's plenty more definition, discussion, and > examples in the links above. Hopefully some ideas can be salvaged for PEP > 3156, but even if that's not the case, I'll be happy to know they were > considered and rejected rather than not considered at all. > Thanks for your very useful contribution! Kaa looks like an interesting system. Is it ported to Python 3 yet? Maybe you could look into integrating with the PEP 3156 event loop and/or scheduler. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From tack at urandom.ca Sun Dec 16 20:11:48 2012 From: tack at urandom.ca (Jason Tackaberry) Date: Sun, 16 Dec 2012 14:11:48 -0500 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: References: <50CD2592.5010507@urandom.ca> Message-ID: <50CE1CF4.4080704@urandom.ca> On 12-12-16 11:27 AM, Guido van Rossum wrote: > The PEP is definitely weak. Here are some thoughts/proposals though: > > * You can't cancel a coroutine; however you can cancel a Task, which > is a Future wrapping a stack of coroutines linked via yield-from. > I'll just underline your statement that "you can't cancel a coroutine" here, since I'm referencing it later. This distinction between "bare" coroutines, Futures, and Tasks is a bit foreign to me, since in Kaa all coroutines return (a subclass of) InProgress objects. The Tasks section in the PEP says that a bare coroutine (is this the same as the previously defined "coroutine object"?) has much less overhead than a Task but it's not clear to me why that would be, as both would ultimately need to be managed by the scheduler, wouldn't they? I could imagine that a coroutine object is implemented as a C object for performance, and a Task is a Python class, and maybe that explains the difference. But then why differentiate between Future and Task (particularly because they have the same interface, so I can't draw an analogy with jQuery's Deferreds and Promises, where Promises are a restricted form of Deferreds for public consumption to attach callbacks). > * Cancellation only takes effect when a task is suspended. > Yes, this is intuitive. > * When you cancel a Task, the most deeply nested coroutine (the one > that caused it to be suspended) receives a special exception (I > propose to reuse concurrent.futures.CancelledError from PEP 3148). > If it doesn't catch this it bubbles all the way to the Task, and > then out from there. > So if the most deeply nested coroutine catches the CancelledError and doesn't reraise, it can prevent its cancellation? I took a similar appoach, except that coroutines can't abort their own cancellation, and whether or not the nested coroutines actually get cancelled depends on whether something else was interested in their result. Consider a coroutine chain where A yields B yields C yields D, and we do B.abort() * if only C was interested in D's result, then D will get an InProgressAborted raised inside it (at whatever point it's currently suspended). If something other than C was also waiting on D, D will not be affected * similarly, if only B was interested in C's result, then C will get an InProgressAborted raised inside it (at yield D). * B will get InProgressAborted raised inside it (at yield C) * for B, C and D, the coroutines will not be reentered and they are not allowed to yield a value that suggests they expect reentry. There's nothing a coroutine can do to prevent its own demise. * A will get an InProgressAborted raised inside it (at yield B) * In all the above cases, the InProgressAborted instance has an origin attribute that is B's InProgress object * Although B, C, and D are now aborted, A isn't aborted. It's allowed to yield again. * with Kaa, coroutines are abortable by default (so they are like Tasks always). But in this example, B can present C from being aborted by yielding C().noabort() There are quite a few scenarios to consider: A yields B and B is cancelled or raises; A yields B and A is cancelled or raises; A yields B, C yields B, and A is cancelled or raises; A yields B, C yields B, and A or C is cancelled or raises; A yields par(B,C,D) and B is cancelled or raises; etc, etc. In my experience, there's no one-size-fits-all behaviour, and the best we can do is have sensible default behaviour with some API (different functions, kwargs, etc.) to control the cancellation propagation logic. > * However when a coroutine in one Task uses yield-from to wait for > another Task, the latter does not automatically get cancelled. So > this is a difference between "yield from foo()" and "yield from > Task(foo())", which otherwise behave pretty similarly. Of course > the first Task could catch the exception and cancel the second > task -- that is its responsibility though and not the default > behavior. > Ok, so nested bare coroutines will get cancelled implicitly, but nested Tasks won't? I'm having a bit of difficulty with this one. You said that coroutines can't be cancelled, but Tasks can be. But here, if they are being yielded, the opposite behaviour applies: yielded coroutines /are/ cancelled if a Task is cancelled, but yielded tasks /aren't/. Or have I misunderstood? > * PEP 3156 has a par() helper which lets you block for multiple > tasks/coroutines in parallel. It takes arguments which are either > coroutines, Tasks, or other Futures; it wraps the coroutines in > Tasks to run them independently an just waits for the other > arguments. Proposal: when the Task containing the par() call is > cancelled, the par() call intercepts the cancellation and by > default cancels those coroutines that were passed in "bare" but > not the arguments that were passed in as Tasks or Futures. Some > keyword argument to par() may be used to change this behavior to > "cancel none" or "cancel all" (exact API spec TBD). > Here again, par() would cancel a bare coroutine but not Tasks. It's consistent with your previous bullet but seems to contradict your first bullet that you can't cancel a coroutine. I guess the distinction is you can't explicitly cancel a coroutine, but coroutines can be implicitly cancelled? As I discussed previously, one of those tasks might be yielded by some other active coroutine, and so cancelling it may not be the right thing to do. Being able to control this behaviour is important, whether that's a par() kwarg, or special method like noabort() that constructs an unabortable Task instance. Kaa has similar constructs to allow yielding a collection of InProgress objects (whatever they might represent: coroutines, threaded functions, etc.). In particular, it allows you to yield multiple tasks and resume when ALL of them complete (InProgressAll), or when ANY of them complete (InProgressAny). For example: @kaa.coroutine() def is_any_host_up(*hosts): try: # ping() is a coroutine yield kaa.InProgressAny(ping(host) for host in hosts).timeout(5, abort=True) except kaa.TimeoutException: yield False else: yield True More details here: http://api.freevo.org/kaa-base/async/inprogress.html#inprogress-collections From what I understand of the proposed par() it would require//ALL of the supplied futures to complete, but there are many use-cases for the ANY variant as well. > Interesting. In Tulip v1 (the experimental version I wrote before PEP > 3156) the Task() constructor has an optional timeout argument. It > works by scheduling a callback at the given time in the future, and > the callback simply cancel the task (which is a no-op if the task has > already completed). It works okay, except it generates tracebacks that > are sometimes logged and sometimes not properly caught -- though some > of that may be my messy test code. The exception raised by a timeout > is the same CancelledError, which is somewhat confusing. I wonder if > Task.cancel() shouldn't take an exception with which to cancel the > task with. (TimeoutError in PEP 3148 has a different role, it is when > the timeout on a specific wait expires, so e.g. fut.result(timeout=2) > waits up to 2 seconds for fut to complete, and if not, the call raises > TimeoutError, but the code running in the executor is unaffected.) FWIW, the equivalent in Kaa which is InProgress.abort() does take an optional exception, which must subclass InProgressAborted. If None, a new InProgressAborted is created. InProgress.timeout(t) will start a timer that invokes InProgress.abort(TimeoutException()) (TimeoutException subclasses InProgressAborted). It sounds like your proposed implementation works like: @tulip.coroutine() def foo(): try: result = yield from Task(othercoroutine()).result(timeout=2) except TimeoutError: # ... othercoroutine() still lives on I think Kaa's syntax is cleaner but it seems functionally the same: @kaa.coroutine() def foo(): try: result = yield othercoroutine().timeout(2) except kaa.TimeoutException: # ... othercoroutine() still lives on It's also possible to conveniently ensure that othercoroutine() is aborted if the timeout elapses: try: result = yield othercoroutine().timeout(2, abort=True) except kaa.TimeoutException: # ... othercoroutine() is aborted > We've had long discussions about yield vs. yield-from. The latter is > way more efficient and that's enough for me to push it through. When > using yield, each yield causes you to bounce to the scheduler, which > has to do a lot of work to decide what to do next, even if that is > just resuming the suspended generator; and the scheduler is > responsible for keeping track of the stack of generators. When using > yield-from, calling another coroutine as a subroutine is almost free > and doesn't involve the scheduler at all; thus it's much cheaper, and > the scheduler can be simpler (doesn't need to keep track of the > stack). Also stack traces and debugging are better. But this sounds like a consequence of a particular implementation, isn't it? A @kaa.coroutine() decorated function is entered right away when invoked, and the decorator logic does as much as it can until the underlying generator yields an unfinished InProgress that needs to wait for (or kaa.NotFinished). Once it yields, /then/ the decorator sets up the necessary hooks with the scheduler / event loop. This means you can nest a stack of coroutines without involving the scheduler until something truly asynchronous needs to take place. Have I misunderstood? > * coroutines can have certain policies that control invocation > behaviour. The most obvious ones to describe are > POLICY_SYNCHRONIZED which ensures that multiple invocations of > the same coroutine are serialized, and POLICY_SINGLETON which > effectively ignores subsequent invocations if it's already running > * it is possible to have a special progress object passed into > the coroutine function so that the coroutine's progress can be > communicated to an outside observer > > > These seem pretty esoteric and can probably implemented in user code > if needed. I'm fine with that, provided the flexibility is there to allow for it. > As I said, I think wait_for_future() and run_in_executor() in the PEP > give you all you need. The @threaded decorator you propose is just > sugar; if a user wants to take an existing API and convert it from a > coroutine to threaded without requiring changes to the caller, they > can just introduce a helper that is run in a thread with > run_in_executor(). Also works for me. :) > Thanks for your very useful contribution! Kaa looks like an > interesting system. Is it ported to Python 3 yet? Maybe you could look > into integrating with the PEP 3156 event loop and/or scheduler. Kaa does work with Python 3, yes, although it still lacks very much needed unit tests so I'm not completely confident it has the same functional coverage as Python 2. I'm definitely interested in having it conform to whatever shakes out of PEP 3156, which is why I'm speaking up now. :) I've a couple other subjects I should bring up: Tasks/Futures as "signals": it's often necessary to be able to resume a coroutine based on some condition other than e.g. any IO tasks it's waiting on. For example, in one application, I have a (POLICY_SINGLETON) coroutine that works off a download queue. If there's nothing in the queue, it's suspended at a yield. It's the coroutine equivalent of a dedicated thread. [1] It must be possible to "wake" the queue manager when I enqueue a job for it. Kaa has this notion of "signals" which is similar to the gtk+ style of signals in that you can attach callbacks to them and emit them. Signals can be represented as InProgress objects, which means they can be yielded from coroutines and used in InProgressAny/All objects. So my download manager coroutine can yield an InProgressAny of all the active download coroutines /and/ the "new job enqueued" signal, and execution will resume as long as any of those conditions are met. Is there anything in your current proposal that would allow for this use-case? [1] https://github.com/jtackaberry/stagehand/blob/master/src/manager.py#L390 Another pain point for me has been this notion of unhandled asynchronous exceptions. Asynchronous tasks are represented as an InProgress object, and if a task fails, accessing InProgress.result will raise the exception at which point it's considered handled. This attribute access could happen at any time during the lifetime of the InProgress object, outside the task's call stack. The desirable behaviour is that when the InProgress object is destroyed, if there's an exception attached to it from a failed task that hasn't been accessed, we should output the stack as an unhandled exception. In Kaa, I do this with a weakref destroy callback, but this isn't ideal because with GC, the InProgress might not be destroyed until well after the exception is relevant. I make every effort to remove reference cycles and generally get the InProgress object destroyed as early as possible, but this changes subtly between Python versions. How will unhandled asynchronous exceptions be handled with tulip? Thanks! Jason. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Dec 16 21:05:36 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 16 Dec 2012 15:05:36 -0500 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: On 12/16/2012 8:22 AM, Eli Bendersky wrote: > This may be a good time to say that personally I always disliked > namedtuple's creation syntax. It is unpleasant in two respects: > > 1. You have to repeat the name > 2. You have to specify the fields in a space-separated string > > I wish there was an alternative of something like: > > @namedtuple > class Point: > x = 0 > y = 0 Pretty easy, once one figures out metaclass basics. import collections as co class ntmeta(): def __prepare__(name, bases, **kwds): return co.OrderedDict() def __new__(cls, name, bases, namespace): print(namespace) # shows why filter is needed return co.namedtuple(name, filter(lambda s: s[0] != '_', namespace)) class Point(metaclass=ntmeta): x = 0 y = 0 p = Point(1,2) print(p) # OrderedDict([('__module__', '__main__'), ('__qualname__', 'Point'), ('x', 0), ('y', 0)]) Point(x=1, y=2) To use the filtered namespace values as defaults (Antoine's suggestion), first replace namedtuple() with its body. Then modify the header of generated name.__new__. For Point, change def __new__(_cls, x, y): #to def __new__(_cls, x=0, y=0): Also change the newclass docstring. For Point, change 'Point(x, y)' to 'Point(x=0, y=0)' -- Terry Jan Reedy From timothy.c.delaney at gmail.com Sun Dec 16 22:08:18 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 17 Dec 2012 08:08:18 +1100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: It can be made a bit more intelligent. I haven't done anything with docstrings here, but it wouldn't be hard to add. This automatically handles defaults (you can call the namedtuple with either zero parameters or the exact number). You can specify __rename__ = True, which will then only exclude __dunder_names__ (otherwise all names starting with an underscore are excluded). You can also pass verbose=[True|False] to the subclass constructor. import collections class NamedTupleMetaClass(type): # The prepare function @classmethod def __prepare__(metacls, name, bases): # No keywords in this case return collections.OrderedDict() # The metaclass invocation def __new__(cls, name, bases, classdict): fields = collections.OrderedDict() rename = False verbose = False for f in classdict: if f == '__rename__': rename = classdict[f] elif f == '__verbose__': verbose = classdict[f] for f in classdict: if f.startswith('_'): if not rename: continue if f.startswith('__') and f.endswith('__'): continue fields[f] = classdict[f] result = type.__new__(cls, name, bases, classdict) result.fields = fields result.rename = rename result.verbose = verbose return result class NamedTuple(metaclass=NamedTupleMetaClass): def __new__(cls, *p, **kw): print(p) if not p: p = cls.fields.values() try: verbose = kw['verbose'] except KeyError: verbose = cls.verbose return collections.namedtuple(cls.__name__, list(cls.fields), rename=cls.rename, verbose=verbose)(*p) Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import namedtuple_baseclass >>> class Point(namedtuple_baseclass.NamedTuple): ... x = 0 ... y = 0 ... >>> print(Point()) Point(x=0, y=0) >>> print(Point(1, 2)) Point(x=1, y=2) >>> print(Point(1)) Traceback (most recent call last): File "", line 1, in File ".\namedtuple_baseclass.py", line 38, in __new__ return collections.namedtuple(cls.__name__, list(cls.fields), rename=cls.rename, verbose=cls.verbose)(*p) TypeError: __new__() missing 1 required positional argument: 'y' >>> print(Point(1, 2, 3)) Traceback (most recent call last): File "", line 1, in File ".\namedtuple_baseclass.py", line 38, in __new__ return collections.namedtuple(cls.__name__, list(cls.fields), rename=cls.rename, verbose=cls.verbose)(*p) TypeError: __new__() takes 3 positional arguments but 4 were given >>> Tim Delaney On 17 December 2012 07:05, Terry Reedy wrote: > On 12/16/2012 8:22 AM, Eli Bendersky wrote: > > This may be a good time to say that personally I always disliked >> namedtuple's creation syntax. It is unpleasant in two respects: >> >> 1. You have to repeat the name >> 2. You have to specify the fields in a space-separated string >> >> I wish there was an alternative of something like: >> >> @namedtuple >> class Point: >> x = 0 >> y = 0 >> > > Pretty easy, once one figures out metaclass basics. > > import collections as co > > class ntmeta(): > def __prepare__(name, bases, **kwds): > return co.OrderedDict() > def __new__(cls, name, bases, namespace): > print(namespace) # shows why filter is needed > return co.namedtuple(name, > filter(lambda s: s[0] != '_', namespace)) > > class Point(metaclass=ntmeta): > > x = 0 > y = 0 > > p = Point(1,2) > print(p) > # > OrderedDict([('__module__', '__main__'), ('__qualname__', 'Point'), ('x', > 0), ('y', 0)]) > Point(x=1, y=2) > > To use the filtered namespace values as defaults (Antoine's suggestion), > first replace namedtuple() with its body. > Then modify the header of generated name.__new__. For Point, change > > def __new__(_cls, x, y): > #to > def __new__(_cls, x=0, y=0): > > Also change the newclass docstring. For Point, change > 'Point(x, y)' > to > 'Point(x=0, y=0)' > > -- > Terry Jan Reedy > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Sun Dec 16 22:09:21 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 17 Dec 2012 08:09:21 +1100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: And ignore that extra debugging print in there ;) class NamedTuple(metaclass=NamedTupleMetaClass): def __new__(cls, *p, **kw): if not p: p = cls.fields.values() try: verbose = kw['verbose'] except KeyError: verbose = cls.verbose return collections.namedtuple(cls.__name__, list(cls.fields), rename=cls.rename, verbose=verbose)(*p) Tim Delaney On 17 December 2012 08:08, Tim Delaney wrote: > It can be made a bit more intelligent. I haven't done anything with > docstrings here, but it wouldn't be hard to add. This automatically handles > defaults (you can call the namedtuple with either zero parameters or the > exact number). You can specify __rename__ = True, which will then only > exclude __dunder_names__ (otherwise all names starting with an underscore > are excluded). You can also pass verbose=[True|False] to the subclass > constructor. > > import collections > > class NamedTupleMetaClass(type): > # The prepare function > @classmethod > def __prepare__(metacls, name, bases): # No keywords in this case > return collections.OrderedDict() > > # The metaclass invocation > def __new__(cls, name, bases, classdict): > fields = collections.OrderedDict() > rename = False > verbose = False > > for f in classdict: > if f == '__rename__': > rename = classdict[f] > elif f == '__verbose__': > verbose = classdict[f] > > for f in classdict: > if f.startswith('_'): > if not rename: > continue > > if f.startswith('__') and f.endswith('__'): > continue > > fields[f] = classdict[f] > > result = type.__new__(cls, name, bases, classdict) > result.fields = fields > result.rename = rename > result.verbose = verbose > return result > > class NamedTuple(metaclass=NamedTupleMetaClass): > def __new__(cls, *p, **kw): > print(p) > if not p: > p = cls.fields.values() > > try: > verbose = kw['verbose'] > except KeyError: > verbose = cls.verbose > > return collections.namedtuple(cls.__name__, list(cls.fields), > rename=cls.rename, verbose=verbose)(*p) > > Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 > bit (AMD64)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import namedtuple_baseclass > >>> class Point(namedtuple_baseclass.NamedTuple): > ... x = 0 > ... y = 0 > ... > >>> print(Point()) > Point(x=0, y=0) > >>> print(Point(1, 2)) > Point(x=1, y=2) > >>> print(Point(1)) > Traceback (most recent call last): > File "", line 1, in > File ".\namedtuple_baseclass.py", line 38, in __new__ > return collections.namedtuple(cls.__name__, list(cls.fields), > rename=cls.rename, verbose=cls.verbose)(*p) > TypeError: __new__() missing 1 required positional argument: 'y' > >>> print(Point(1, 2, 3)) > Traceback (most recent call last): > File "", line 1, in > File ".\namedtuple_baseclass.py", line 38, in __new__ > return collections.namedtuple(cls.__name__, list(cls.fields), > rename=cls.rename, verbose=cls.verbose)(*p) > TypeError: __new__() takes 3 positional arguments but 4 were given > >>> > > Tim Delaney > > > On 17 December 2012 07:05, Terry Reedy wrote: > >> On 12/16/2012 8:22 AM, Eli Bendersky wrote: >> >> This may be a good time to say that personally I always disliked >>> namedtuple's creation syntax. It is unpleasant in two respects: >>> >>> 1. You have to repeat the name >>> 2. You have to specify the fields in a space-separated string >>> >>> I wish there was an alternative of something like: >>> >>> @namedtuple >>> class Point: >>> x = 0 >>> y = 0 >>> >> >> Pretty easy, once one figures out metaclass basics. >> >> import collections as co >> >> class ntmeta(): >> def __prepare__(name, bases, **kwds): >> return co.OrderedDict() >> def __new__(cls, name, bases, namespace): >> print(namespace) # shows why filter is needed >> return co.namedtuple(name, >> filter(lambda s: s[0] != '_', namespace)) >> >> class Point(metaclass=ntmeta): >> >> x = 0 >> y = 0 >> >> p = Point(1,2) >> print(p) >> # >> OrderedDict([('__module__', '__main__'), ('__qualname__', 'Point'), ('x', >> 0), ('y', 0)]) >> Point(x=1, y=2) >> >> To use the filtered namespace values as defaults (Antoine's suggestion), >> first replace namedtuple() with its body. >> Then modify the header of generated name.__new__. For Point, change >> >> def __new__(_cls, x, y): >> #to >> def __new__(_cls, x=0, y=0): >> >> Also change the newclass docstring. For Point, change >> 'Point(x, y)' >> to >> 'Point(x=0, y=0)' >> >> -- >> Terry Jan Reedy >> >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Sun Dec 16 22:21:39 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 17 Dec 2012 08:21:39 +1100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: An improvement would be to cache the namedtuple types so that each only gets created once. Tim Delaney On 17 December 2012 08:09, Tim Delaney wrote: > And ignore that extra debugging print in there ;) > > class NamedTuple(metaclass=NamedTupleMetaClass): > def __new__(cls, *p, **kw): > if not p: > p = cls.fields.values() > > try: > verbose = kw['verbose'] > except KeyError: > verbose = cls.verbose > > return collections.namedtuple(cls.__name__, list(cls.fields), > rename=cls.rename, verbose=verbose)(*p) > > Tim Delaney > > > On 17 December 2012 08:08, Tim Delaney wrote: > >> It can be made a bit more intelligent. I haven't done anything with >> docstrings here, but it wouldn't be hard to add. This automatically handles >> defaults (you can call the namedtuple with either zero parameters or the >> exact number). You can specify __rename__ = True, which will then only >> exclude __dunder_names__ (otherwise all names starting with an underscore >> are excluded). You can also pass verbose=[True|False] to the subclass >> constructor. >> >> import collections >> >> class NamedTupleMetaClass(type): >> # The prepare function >> @classmethod >> def __prepare__(metacls, name, bases): # No keywords in this case >> return collections.OrderedDict() >> >> # The metaclass invocation >> def __new__(cls, name, bases, classdict): >> fields = collections.OrderedDict() >> rename = False >> verbose = False >> >> for f in classdict: >> if f == '__rename__': >> rename = classdict[f] >> elif f == '__verbose__': >> verbose = classdict[f] >> >> for f in classdict: >> if f.startswith('_'): >> if not rename: >> continue >> >> if f.startswith('__') and f.endswith('__'): >> continue >> >> fields[f] = classdict[f] >> >> result = type.__new__(cls, name, bases, classdict) >> result.fields = fields >> result.rename = rename >> result.verbose = verbose >> return result >> >> class NamedTuple(metaclass=NamedTupleMetaClass): >> def __new__(cls, *p, **kw): >> print(p) >> if not p: >> p = cls.fields.values() >> >> try: >> verbose = kw['verbose'] >> except KeyError: >> verbose = cls.verbose >> >> return collections.namedtuple(cls.__name__, list(cls.fields), >> rename=cls.rename, verbose=verbose)(*p) >> >> Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 >> bit (AMD64)] on win32 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import namedtuple_baseclass >> >>> class Point(namedtuple_baseclass.NamedTuple): >> ... x = 0 >> ... y = 0 >> ... >> >>> print(Point()) >> Point(x=0, y=0) >> >>> print(Point(1, 2)) >> Point(x=1, y=2) >> >>> print(Point(1)) >> Traceback (most recent call last): >> File "", line 1, in >> File ".\namedtuple_baseclass.py", line 38, in __new__ >> return collections.namedtuple(cls.__name__, list(cls.fields), >> rename=cls.rename, verbose=cls.verbose)(*p) >> TypeError: __new__() missing 1 required positional argument: 'y' >> >>> print(Point(1, 2, 3)) >> Traceback (most recent call last): >> File "", line 1, in >> File ".\namedtuple_baseclass.py", line 38, in __new__ >> return collections.namedtuple(cls.__name__, list(cls.fields), >> rename=cls.rename, verbose=cls.verbose)(*p) >> TypeError: __new__() takes 3 positional arguments but 4 were given >> >>> >> >> Tim Delaney >> >> >> On 17 December 2012 07:05, Terry Reedy wrote: >> >>> On 12/16/2012 8:22 AM, Eli Bendersky wrote: >>> >>> This may be a good time to say that personally I always disliked >>>> namedtuple's creation syntax. It is unpleasant in two respects: >>>> >>>> 1. You have to repeat the name >>>> 2. You have to specify the fields in a space-separated string >>>> >>>> I wish there was an alternative of something like: >>>> >>>> @namedtuple >>>> class Point: >>>> x = 0 >>>> y = 0 >>>> >>> >>> Pretty easy, once one figures out metaclass basics. >>> >>> import collections as co >>> >>> class ntmeta(): >>> def __prepare__(name, bases, **kwds): >>> return co.OrderedDict() >>> def __new__(cls, name, bases, namespace): >>> print(namespace) # shows why filter is needed >>> return co.namedtuple(name, >>> filter(lambda s: s[0] != '_', namespace)) >>> >>> class Point(metaclass=ntmeta): >>> >>> x = 0 >>> y = 0 >>> >>> p = Point(1,2) >>> print(p) >>> # >>> OrderedDict([('__module__', '__main__'), ('__qualname__', 'Point'), >>> ('x', 0), ('y', 0)]) >>> Point(x=1, y=2) >>> >>> To use the filtered namespace values as defaults (Antoine's suggestion), >>> first replace namedtuple() with its body. >>> Then modify the header of generated name.__new__. For Point, change >>> >>> def __new__(_cls, x, y): >>> #to >>> def __new__(_cls, x=0, y=0): >>> >>> Also change the newclass docstring. For Point, change >>> 'Point(x, y)' >>> to >>> 'Point(x=0, y=0)' >>> >>> -- >>> Terry Jan Reedy >>> >>> >>> ______________________________**_________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/**mailman/listinfo/python-ideas >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From timothy.c.delaney at gmail.com Sun Dec 16 22:55:10 2012 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Mon, 17 Dec 2012 08:55:10 +1100 Subject: [Python-ideas] Docstrings for namedtuple In-Reply-To: References: Message-ID: Improved version, with caching (verbose and non-verbose versions are different classes) and only parsing the fields once per class. import collections class NamedTupleMetaClass(type): # The prepare function @classmethod def __prepare__(metacls, name, bases): # No keywords in this case return collections.OrderedDict() # The metaclass invocation def __new__(cls, name, bases, classdict): result = type.__new__(cls, name, bases, classdict) result._classdict = classdict return result class NamedTuple(metaclass=NamedTupleMetaClass): _cache = {} def __new__(cls, *p, **kw): verbose = False try: verbose = kw_verbose = kw['verbose'] except KeyError: kw_verbose = None try: nt, fields = cls._cache[cls.__module__, cls.__qualname__, verbose] except KeyError: classdict = cls._classdict fields = collections.OrderedDict() rename = False for f in classdict: if f == '__rename__': rename = classdict[f] elif f == '__verbose__': verbose = classdict[f] for f in classdict: if f.startswith('_'): if not rename: continue if f.startswith('__') and f.endswith('__'): continue fields[f] = classdict[f] if kw_verbose is not None: verbose = kw_verbose nt = collections.namedtuple(cls.__name__, fields.keys(), rename=rename, verbose=verbose) nt, fields = cls._cache[cls.__module__, cls.__qualname__, verbose] = nt, list(fields.values()) if not p: p = fields return nt(*p) Tim Delaney On 17 December 2012 08:21, Tim Delaney wrote: > An improvement would be to cache the namedtuple types so that each only > gets created once. > > Tim Delaney > > > On 17 December 2012 08:09, Tim Delaney wrote: > >> And ignore that extra debugging print in there ;) >> >> class NamedTuple(metaclass=NamedTupleMetaClass): >> def __new__(cls, *p, **kw): >> if not p: >> p = cls.fields.values() >> >> try: >> verbose = kw['verbose'] >> except KeyError: >> verbose = cls.verbose >> >> return collections.namedtuple(cls.__name__, list(cls.fields), >> rename=cls.rename, verbose=verbose)(*p) >> >> Tim Delaney >> >> >> On 17 December 2012 08:08, Tim Delaney wrote: >> >>> It can be made a bit more intelligent. I haven't done anything with >>> docstrings here, but it wouldn't be hard to add. This automatically handles >>> defaults (you can call the namedtuple with either zero parameters or the >>> exact number). You can specify __rename__ = True, which will then only >>> exclude __dunder_names__ (otherwise all names starting with an underscore >>> are excluded). You can also pass verbose=[True|False] to the subclass >>> constructor. >>> >>> import collections >>> >>> class NamedTupleMetaClass(type): >>> # The prepare function >>> @classmethod >>> def __prepare__(metacls, name, bases): # No keywords in this case >>> return collections.OrderedDict() >>> >>> # The metaclass invocation >>> def __new__(cls, name, bases, classdict): >>> fields = collections.OrderedDict() >>> rename = False >>> verbose = False >>> >>> for f in classdict: >>> if f == '__rename__': >>> rename = classdict[f] >>> elif f == '__verbose__': >>> verbose = classdict[f] >>> >>> for f in classdict: >>> if f.startswith('_'): >>> if not rename: >>> continue >>> >>> if f.startswith('__') and f.endswith('__'): >>> continue >>> >>> fields[f] = classdict[f] >>> >>> result = type.__new__(cls, name, bases, classdict) >>> result.fields = fields >>> result.rename = rename >>> result.verbose = verbose >>> return result >>> >>> class NamedTuple(metaclass=NamedTupleMetaClass): >>> def __new__(cls, *p, **kw): >>> print(p) >>> if not p: >>> p = cls.fields.values() >>> >>> try: >>> verbose = kw['verbose'] >>> except KeyError: >>> verbose = cls.verbose >>> >>> return collections.namedtuple(cls.__name__, list(cls.fields), >>> rename=cls.rename, verbose=verbose)(*p) >>> >>> Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 >>> bit (AMD64)] on win32 >>> Type "help", "copyright", "credits" or "license" for more information. >>> >>> import namedtuple_baseclass >>> >>> class Point(namedtuple_baseclass.NamedTuple): >>> ... x = 0 >>> ... y = 0 >>> ... >>> >>> print(Point()) >>> Point(x=0, y=0) >>> >>> print(Point(1, 2)) >>> Point(x=1, y=2) >>> >>> print(Point(1)) >>> Traceback (most recent call last): >>> File "", line 1, in >>> File ".\namedtuple_baseclass.py", line 38, in __new__ >>> return collections.namedtuple(cls.__name__, list(cls.fields), >>> rename=cls.rename, verbose=cls.verbose)(*p) >>> TypeError: __new__() missing 1 required positional argument: 'y' >>> >>> print(Point(1, 2, 3)) >>> Traceback (most recent call last): >>> File "", line 1, in >>> File ".\namedtuple_baseclass.py", line 38, in __new__ >>> return collections.namedtuple(cls.__name__, list(cls.fields), >>> rename=cls.rename, verbose=cls.verbose)(*p) >>> TypeError: __new__() takes 3 positional arguments but 4 were given >>> >>> >>> >>> Tim Delaney >>> >>> >>> On 17 December 2012 07:05, Terry Reedy wrote: >>> >>>> On 12/16/2012 8:22 AM, Eli Bendersky wrote: >>>> >>>> This may be a good time to say that personally I always disliked >>>>> namedtuple's creation syntax. It is unpleasant in two respects: >>>>> >>>>> 1. You have to repeat the name >>>>> 2. You have to specify the fields in a space-separated string >>>>> >>>>> I wish there was an alternative of something like: >>>>> >>>>> @namedtuple >>>>> class Point: >>>>> x = 0 >>>>> y = 0 >>>>> >>>> >>>> Pretty easy, once one figures out metaclass basics. >>>> >>>> import collections as co >>>> >>>> class ntmeta(): >>>> def __prepare__(name, bases, **kwds): >>>> return co.OrderedDict() >>>> def __new__(cls, name, bases, namespace): >>>> print(namespace) # shows why filter is needed >>>> return co.namedtuple(name, >>>> filter(lambda s: s[0] != '_', namespace)) >>>> >>>> class Point(metaclass=ntmeta): >>>> >>>> x = 0 >>>> y = 0 >>>> >>>> p = Point(1,2) >>>> print(p) >>>> # >>>> OrderedDict([('__module__', '__main__'), ('__qualname__', 'Point'), >>>> ('x', 0), ('y', 0)]) >>>> Point(x=1, y=2) >>>> >>>> To use the filtered namespace values as defaults (Antoine's >>>> suggestion), first replace namedtuple() with its body. >>>> Then modify the header of generated name.__new__. For Point, change >>>> >>>> def __new__(_cls, x, y): >>>> #to >>>> def __new__(_cls, x=0, y=0): >>>> >>>> Also change the newclass docstring. For Point, change >>>> 'Point(x, y)' >>>> to >>>> 'Point(x=0, y=0)' >>>> >>>> -- >>>> Terry Jan Reedy >>>> >>>> >>>> ______________________________**_________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> http://mail.python.org/**mailman/listinfo/python-ideas >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Sun Dec 16 23:41:28 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 16 Dec 2012 22:41:28 +0000 Subject: [Python-ideas] Fwd: Graph class In-Reply-To: References: Message-ID: On 9 December 2012 20:31, Mark Adam wrote: > > On Sun, Dec 9, 2012 at 5:40 AM, Paul Moore wrote: > > On 9 December 2012 01:29, Mark Adam wrote: > >> All very interesting. I'm going to suggest a sort of "meta-discussion" > >> about why -- despite the power of graphs as a data structure -- such a > >> feature has not stabilized into a workable solution for inclusion in a > >> high-level language like Python. > >> > >> I identity the following points of "wavery": > >> > >> 1) the naming of methods (add_edge, vs add(1,2)): aesthetic grounds, > >> 2) what methods to include (degree + neighbors or the standard dict's > >> __len__ + __getitem__): API grounds > >> 3) how much flexibility to be offered (directed, multi-graphs, edge weights > >> with arbitrary labeling, etc.): functionality grounds > >> 4) what underlying data structure to use (sparse adjacency dicts, matrices, > >> etc): representation conflicts. For all the reasons above I don't much see the utility of implementing some kind of standard graph class. There are too many possibilities for any one implementation to be generally applicable. I have implemented graphs in Python many times and I very often find that the detail of what I want to do leads me to create a different implementation. Another consideration that you've not mentioned is the occasional need for a OrderedGraph that keeps track of some kind of order for its vertices. > > 4) Whether the library requires some sort of "Vertex" type, or works > > with arbitrary values, similarly whether there is a defined "Edge" > > class or edges can be labelled, weighted, etc with arbitrary Python > > values. > > This I put under #3 (functionality grounds) "edge weights with > arbitrary labeling", Vertex's with abitrary values i think would be > included. Having implemented graphs a few times now, I have come to the conclusion that it is a good idea to make the restriction that the vertices should be hashable. Otherwise, how would you get O(1) behaviour for methods like has_edge()? > > 5) Granularity - if all I want is a depth-first search algorithm, why > > pull in a dependency on 100 graph algorithms I'm not interested in? > > Hmm, I would call this "5) comprehensiveness: whether to include every > graph algorithm known to mankind." This is the one part of a graph library that is really useful. Creating a class or a data structure that represents a graph in some way is trivially easy. Creating trustworthy implementations of all the graph-theoretic algorithms with the right kind of big-O behaviour is not. > > My feeling is that graphs are right on the borderline of a data > > structure that is simple enough that people invent their own rather > > than bother conforming to a "standard" model but complex enough that > > it's worth using library functions rather than getting the details > > wrong. What details would you get wrong? I contend that it is very easy to implement a graph without getting any of the details wrong. It is the graph algorithms that are hard, not the data structure. Here's a couple of examples: G = { 'A':{'B', 'C'}, 'B':{'A'}, 'C':{'B'} } M = [[0, 1, 1], [1, 0, 0], [0, 1, 0]] You may want to wrap the above in some kind of class in which case you'll end up with something like the following (from a private project - modified a little before posting so it may not work now): class Graph: def __init__(self, nodes, edges): self._nodes = frozenset(nodes) self._edges = defaultdict(set) for n1, n2 in edges: self._edges[n1].add(n2) @property def nodes(self): return iter(self._nodes) @property def edges(self): for n1 in self._nodes: for n2 in self.edges_node(n1): yield (n1, n2) def edges_node(self, node): return iter(self._edges[node]) def has_edge(self, nfrom, nto): return nto in self._edges[nfrom] def __str__(self): return '\n'.join(self._iterdot()) def _iterdot(self): yield 'digraph G {' for n in self.nodes: yield ' %s;' % n for nfrom, nto in self.edges: yield ' %s -> %s;' % (nfrom, nto) yield '}' G2 = Graph('ABC', [('A', 'B'), ('A', 'C'), ('B', 'C'), ('C', 'A')]) The above class is unusual in the sense that it is a pure Graph class. Normally I would simply be adding a few graphy methods onto a class that represents a network of some kind. The problem that I found with current support for graphs in Python is not the lack of an appropriate data structure. Rather the problem is that implementations of graph-theoretic algorithms (as in e.g. pygraph) are tied to a specific Graph class that I didn't want to or couldn't use in my own project. This means that to determine if you have something that represents a strongly connected graph you first need to create a separate redundant data structure and then pass that into the algorithm. What would be more useful than a new Graph class would be implementations of graph algorithms that can easily be applied to any representation of a graph. As an example, I can write a function for determining if a graph is a DAG using a small subset of the possible methods that a Graph class would have: def is_dag(nodes, edges_node): '''Determine if a directed graph is acyclic nodes is an iterable yielding all vertices in the graph edges_node(node) is an iterable giving all nodes that node connects to ''' visited = set() visiting = set() for node in nodes: if node not in visited: if has_backedge(node, edges_node, visited, visiting): return False else: return True def has_backedge(node, edges_node, visited, visiting): '''Helper for is_dag()''' if node in visiting: return True visited.add(node) visiting.add(node) for childnode in edges_node(node): if has_backedge(childnode, edges_node, visited, visiting): return True visiting.remove(node) return False This can be used with the Graph class or just as easily with the dict-of-sets like so: is_dag(G2.nodes, G2.edges_node) is_dag(G, G.__getitem__) It is possible to do something like this for all of the graph algorithms and I think that a library like this would be more useful than a new Graph type. From hannu at krosing.net Mon Dec 17 00:28:04 2012 From: hannu at krosing.net (Hannu Krosing) Date: Mon, 17 Dec 2012 00:28:04 +0100 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <50CE5904.9090102@krosing.net> On 12/16/2012 04:41 PM, Guido van Rossum wrote: > I think of graphs and trees as patterns, not data structures. How do you draw line between what is data structure and what is pattern ? Do you have any ideas on how to represent "patterns" in python standard library ? By a set of samples ? By (a set of) classes realising the patterns ? By a set of functions working on existing structures which implement the pattern ? Duck-typing should lend itself well to this last approach. Do we currently have any modules in standard library which are more patterns and less data structures ? ----------------------- Hannu > > > > -- > --Guido van Rossum (on iPad) > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Mon Dec 17 12:08:27 2012 From: geertj at gmail.com (Geert Jansen) Date: Mon, 17 Dec 2012 12:08:27 +0100 Subject: [Python-ideas] async: feedback on EventLoop API Message-ID: Hi, below is some feedback on the EventLoop API as implemented in tulip. I am interested in this for an (alternate) dbus interface that I've written for Python that supports evented IO. I'm hoping tulip's EventLoop could be an abstraction as well as a default implementation that allows me to support just one event interface. I looked at it from two angles: 1. Does EventLoop provide everything that is needed from a library writer point of view? 2. Can EventLoop efficiently expose a subset of the functionality of some of the main event loop implementations out there today (i looked at libuv, libev and Qt). First some code pointers... * https://github.com/geertj/looping - Here i've implemented the EventLoop interface for libuv, libev and Qt. It includes a slightly modified version of tulip's "polling.py" where I've implemented some of the suggestions below. It also adds support for Python 2.6/2.7 as the Python Qt interface (PySide) doesn't support Python 3 yet. * https://github.com/geertj/python-dbusx - A Python interface for libdbus that supports evented IO using an EventLoop interface. This module is also tests all the different loops from "looping" by doing D-BUS tests with them (looping itself doesn't have tests yet). My main points of feedback are below: * It would be nice to have repeatable timers. Repeatable timers are expected for example by libdbus when integrating it with an event loop. Without repeatable timers, I could emulate a repeatable timer by using call_later() and adding a new timer every time the timer fires. This would be an inefficient interface though for event loops that natively support repeatable timers. This could possibly be done by adding a "repeat" argument to call_later(). * It would be nice to be a way to call a callback once per loop iteration. An example here is dispatching in libdbus. The easiest way to do this is to call dbus_connection_dispatch() every iteration of the loop (a more complicated way exists to get notifications when the dispatch status changes, but it is edge triggered and difficult to get right). This could possibly be implemented by adding a "repeat" argument to call_soon(). * A useful semantic for run_once() would be to run the callbacks for readers and writers in the same iteration as when the FD got ready. This allows for the idiom below when expecting a single event to happen on a file descriptor from outside the event loop: # handle_read() sets the "ready" flag loop.add_reader(fd, handle_read) while not ready: loop.run_once() I use this idiom for example in a blocking method_call() method that calls into a D-BUS method. Currently, the handle_read() callback would be called in the iteration *after* the FD became readable. So this would not work, unless some more IO becomes available. As far as I can see libev, libuv and Qt all work like this. * If remove_reader() / remove_writer() would accept the DelayedCall instance returned by their add_xxx() cousins, then that would allow for multiple callbacks per FD. Not all event loops support this (libuv doesn't, libev and Qt do), but for the ones that do could have their functionality could be exposed like this. For event loops that don't support this, an exception could be raised when adding multiple callbacks per FD. Support for multiple callbacks per FD could be advertised as a capability. * After a DelayedCall is cancelled, it would also be very useful to have a second method to enable it again. Having that functionality is more efficient than creating a new event. For example, the D-BUS event loop integration API has specific methods for toggling events on and off that you need to provide. * (Nitpick) Multiplexing absolute and relative timeouts for the "when" argument in call_later() is a little too smart in my view and can lead to bugs. With some input, I'd be happy to produce patches. Regards, Geert Jansen From guido at python.org Sun Dec 16 23:23:51 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 16 Dec 2012 14:23:51 -0800 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: <50CE1CF4.4080704@urandom.ca> References: <50CD2592.5010507@urandom.ca> <50CE1CF4.4080704@urandom.ca> Message-ID: On Sun, Dec 16, 2012 at 11:11 AM, Jason Tackaberry wrote: > On 12-12-16 11:27 AM, Guido van Rossum wrote: > > The PEP is definitely weak. Here are some thoughts/proposals though: > > - You can't cancel a coroutine; however you can cancel a Task, which > is a Future wrapping a stack of coroutines linked via yield-from. > > > I'll just underline your statement that "you can't cancel a coroutine" > here, since I'm referencing it later. > > This distinction between "bare" coroutines, Futures, and Tasks is a bit > foreign to me, since in Kaa all coroutines return (a subclass of) > InProgress objects. > Task is a subclass of Future; a Future may wrap some I/O or some other system call, but a Task wraps a coroutine. Bare coroutines are introduced by PEP 380, so it's no surprise you have to get used to them. But trust me they are useful. I have a graphical representation in my head; drawing with a computer is not my strong point, but here's some ASCII art: [Task: coroutine -> coroutine -> ... -> coroutine) The -> arrow represents a yield from, and each coroutine has its own stack frame (the frame's back pointer points left). The leftmost coroutine is the one you pass to Task(); the rightmost one is the one whose code is currently running. When it blocks for I/O, the entire stack is suspended; the Task object is given to the scheduler for resumption when the I/O completes. I'm drawing a '[' to the left of the Task because it is a definite end point; I'm drawing a ')' to the right of the last coroutine because whenever the coroutine uses yield from another one gets added to the right. When a coroutine blocks for a Future, it looks like this: [Task: coroutine -> coroutine -> ... -> coroutine -> Future] (I'm using ']' here to suggest that the Future is also an end point.) When it blocks for a Task, it ends up looking like this: [Task 1: coroutine -> ... -> coroutine -> [Task 2: coroutine -> ... -> coroutine)] > The Tasks section in the PEP says that a bare coroutine (is this the same > as the previously defined "coroutine object"?) > Yes. > has much less overhead than a Task but it's not clear to me why that > would be, as both would ultimately need to be managed by the scheduler, > wouldn't they? > No. This takes a lot of time to wrap your head around but it is important to get this. This is because "yield from" is built into the language, and because of the way it is defined to behave. Suppose you have this: def inner(): yield 1 yield 2 def outer(): yield 'A' yield from inner() yield 'B' def main(): # Not a generator -- no yield in sight! for x in outer(): print(x) The output of calling main() is as follows: A 1 2 B There is no scheduler in sight, this is basic Python 3. The Python 2 equivalent would have the middle line of outer() replaced by for x in inner(): yield x (It's more complicated when 'yield from' is used as an expression and when sending values or throwing exceptions into the outer generator, but that doesn't matter for this part of the explanation.) Given that a coroutine function (despite being marked with @tulip.coroutine) is just a generator, when one coroutine invokes another via 'yield from', the scheduler doesn't find out about this at all. However, if a coroutine uses 'yield' instead of 'yield from', the scheduler *does* hear about it. The mechanism for this is best understood by looking at the Python 2 equivalent: each arrow in my diagrams stands for 'yield from', which you can replace by a for loop yielding each value, and thus the value yielded by the innermost coroutine ends up being yielded by the outermost one to the scheduler. The trick is that 'yield from' is implemented more efficiently that the equivalent for loop. Another thing to keep in mind is that when you use yield from with a Future (or a Task, which is a subclass of Future), the Future has an __iter__() method that uses 'yield' (*not* 'yield from') to signal the scheduler that it is waiting for some I/O. (There's debate about whether you tell the scheduler what kind of I/O it should perform before invoking 'yield' or as a parameter to 'yield', but that's immaterial for understanding this part of the explanation.) > I could imagine that a coroutine object is implemented as a C object for > performance, > Kind of -- the transfer is built into the Python interpreter. > and a Task is a Python class, and maybe that explains the difference. But > then why differentiate between Future and Task (particularly because they > have the same interface, so I can't draw an analogy with jQuery's Deferreds > and Promises, where Promises are a restricted form of Deferreds for public > consumption to attach callbacks). > You'll have to ask a Twisted person about Deferreds; they have all kinds of extra useful functionality related to error handling and chaining. (Apparently Deferreds became popular in the JS world after they were introduced in Twisted.) I find Deferreds elusive, and my PEP won't have them. (Coroutines take their place as the preferred way to write user code.) AFAICT a Promise is more like a Future, which is a much simpler thing. Another difference between bare coroutines and Tasks: a bare coroutine *only* runs when another coroutine that is running is waiting for it using 'yield from'. But a coroutine wrapped in a Task will be run by the schedulereven when nobody is waiting for it. (In Kaa's world, which is similar to Twisted's @inlineCallbacks, Monocle, and Google App Engine's NDB, every coroutine is wrapped in something like a task.) This is the reason why the par() operation needs to wrap bare coroutine arguments in Tasks. > > - Cancellation only takes effect when a task is suspended. > > > Yes, this is intuitive. > > > > > - When you cancel a Task, the most deeply nested coroutine (the one > that caused it to be suspended) receives a special exception (I propose to > reuse concurrent.futures.CancelledError from PEP 3148). If it doesn't catch > this it bubbles all the way to the Task, and then out from there. > > > So if the most deeply nested coroutine catches the CancelledError and > doesn't reraise, it can prevent its cancellation? > Yes. That's probably something you shouldn't be doing though. Also, cancel() sets a flag on the Task that remains set, and when the coroutine suspends itself in response to the CancelledError, the scheduler will just throw the exception into it again. Or perhaps it should throw something that's harder to catch? There's some similarity with the close() method on generators introduced by PEP 342; this causes GeneratorExit to be thrown into the generator (if it's not terminated), and if the generator chooses to catch and ignore this, the generator is declared dead anyway. > I took a similar appoach, except that coroutines can't abort their own > cancellation, and whether or not the nested coroutines actually get > cancelled depends on whether something else was interested in their result. > Yeah, since you have a Task/Future at every level you are forced to do it that way. > Consider a coroutine chain where A yields B yields C yields D, and we do > B.abort() > > - if only C was interested in D's result, then D will get an > InProgressAborted raised inside it (at whatever point it's currently > suspended). If something other than C was also waiting on D, D will not be > affected > - similarly, if only B was interested in C's result, then C will get > an InProgressAborted raised inside it (at yield D). > - B will get InProgressAborted raised inside it (at yield C) > - for B, C and D, the coroutines will not be reentered and they are > not allowed to yield a value that suggests they expect reentry. There's > nothing a coroutine can do to prevent its own demise. > - A will get an InProgressAborted raised inside it (at yield B) > - In all the above cases, the InProgressAborted instance has an origin > attribute that is B's InProgress object > - Although B, C, and D are now aborted, A isn't aborted. It's allowed > to yield again. > - with Kaa, coroutines are abortable by default (so they are like > Tasks always). But in this example, B can present C from being aborted by > yielding C().noabort() > > > There are quite a few scenarios to consider: A yields B and B is cancelled > or raises; A yields B and A is cancelled or raises; A yields B, C yields B, > and A is cancelled or raises; A yields B, C yields B, and A or C is > cancelled or raises; A yields par(B,C,D) and B is cancelled or raises; etc, > etc. > > In my experience, there's no one-size-fits-all behaviour, and the best we > can do is have sensible default behaviour with some API (different > functions, kwargs, etc.) to control the cancellation propagation logic. > Yeah, I think that the default behavior I sketched in my previous message is fine, and the user can implement other behaviors through a combination of Task wrappers, catching exceptions, and explicitly cancelling tasks. > > - However when a coroutine in one Task uses yield-from to wait for > another Task, the latter does not automatically get cancelled. So this is a > difference between "yield from foo()" and "yield from Task(foo())", which > otherwise behave pretty similarly. Of course the first Task could catch the > exception and cancel the second task -- that is its responsibility though > and not the default behavior. > > > Ok, so nested bare coroutines will get cancelled implicitly, but nested > Tasks won't? > Correct. If you have a simple stack-like usage pattern there's no need to introduce a Task; Tasks are useful if you want to decouple the stacks, e.g. have two other places both wait for the same Task (or for some other Future, for that matter). > I'm having a bit of difficulty with this one. You said that coroutines > can't be cancelled, but Tasks can be. But here, if they are being yielded, > the opposite behaviour applies: yielded coroutines *are* cancelled if a > Task is cancelled, but yielded tasks *aren't*. > > Or have I misunderstood? > I hope my explanation above of the relationship between Tasks and bare coroutines helps. I can see how it gets confusing if you are used to thinking in terms of a system where there is always a Task involved when one coroutine waits for another. > - PEP 3156 has a par() helper which lets you block for multiple > tasks/coroutines in parallel. It takes arguments which are either > coroutines, Tasks, or other Futures; it wraps the coroutines in Tasks to > run them independently an just waits for the other arguments. Proposal: > when the Task containing the par() call is cancelled, the par() call > intercepts the cancellation and by default cancels those coroutines that > were passed in "bare" but not the arguments that were passed in as Tasks or > Futures. Some keyword argument to par() may be used to change this behavior > to "cancel none" or "cancel all" (exact API spec TBD). > > > Here again, par() would cancel a bare coroutine but not Tasks. It's > consistent with your previous bullet but seems to contradict your first > bullet that you can't cancel a coroutine. > > I guess the distinction is you can't explicitly cancel a coroutine, but > coroutines can be implicitly cancelled? > Right. As I discussed previously, one of those tasks might be yielded by some > other active coroutine, and so cancelling it may not be the right thing to > do. Being able to control this behaviour is important, whether that's a > par() kwarg, or special method like noabort() that constructs an > unabortable Task instance. > I think we're in violent agreement. :-) Kaa has similar constructs to allow yielding a collection of InProgress > objects (whatever they might represent: coroutines, threaded functions, > etc.). In particular, it allows you to yield multiple tasks and resume > when ALL of them complete (InProgressAll), or when ANY of them complete > (InProgressAny). For example: > > @kaa.coroutine() > def is_any_host_up(*hosts): > try: > # ping() is a coroutine > yield kaa.InProgressAny(ping(host) for host in hosts).timeout(5, abort=True) > except kaa.TimeoutException: > yield False > else: > yield True > > > More details here: > > > http://api.freevo.org/kaa-base/async/inprogress.html#inprogress-collections > > From what I understand of the proposed par() it would require* *ALL of > the supplied futures to complete, but there are many use-cases for the ANY > variant as well. > Good point. I'd forgotten about this while writing the PEP, but Tulip v1 has this. The way to spell it is a little awkward and I could use some fresh ideas though. In Tulip v1 you can write ready_tasks = yield from wait_any(set_of_tasks) The result ready_tasks is a set of tasks that are done; it has at least one element. This is a generalization of ready_tasks = yield from wait_for(N, set_of_tasks) which returns a set of size at least N done tasks; set N to the length of the input to implement waiting for all. But the semantics of always returning a set (even when N == 1) are somewhat awkward, and ideally you probably want something that you can call in a loop until all tasks are done, e.g. todo = while todo: result = yield from wait_one(todo) Here wait_one(todo) blocks until at least one task in todo is done, then removes it from todo, and returns its result (or raises its exception). > Interesting. In Tulip v1 (the experimental version I wrote before PEP > 3156) the Task() constructor has an optional timeout argument. It works by > scheduling a callback at the given time in the future, and the callback > simply cancel the task (which is a no-op if the task has already > completed). It works okay, except it generates tracebacks that are > sometimes logged and sometimes not properly caught -- though some of that > may be my messy test code. The exception raised by a timeout is the same > CancelledError, which is somewhat confusing. I wonder if Task.cancel() > shouldn't take an exception with which to cancel the task with. > (TimeoutError in PEP 3148 has a different role, it is when the timeout on a > specific wait expires, so e.g. fut.result(timeout=2) waits up to 2 seconds > for fut to complete, and if not, the call raises TimeoutError, but the code > running in the executor is unaffected.) > > > FWIW, the equivalent in Kaa which is InProgress.abort() does take an > optional exception, which must subclass InProgressAborted. If None, a new > InProgressAborted is created. InProgress.timeout(t) will start a timer > that invokes InProgress.abort(TimeoutException()) (TimeoutException > subclasses InProgressAborted). > > It sounds like your proposed implementation works like: > > @tulip.coroutine() > def foo(): > try: > result = yield from Task(othercoroutine()).result(timeout=2) > > Actually in Tulip you never combine result() with 'yield from' and you never use timeout=N with result(); this line would be written as follows: result = yield from Task(othercoroutine(), timeout=2) > except TimeoutError: > # ... othercoroutine() still lives on > > > I think Kaa's syntax is cleaner but it seems functionally the same: > > @kaa.coroutine() > def foo(): > try: > result = yield othercoroutine().timeout(2) > except kaa.TimeoutException: > # ... othercoroutine() still lives on > > > It's also possible to conveniently ensure that othercoroutine() is aborted > if the timeout elapses: > > try: > result = yield othercoroutine().timeout(2, abort=True) > except kaa.TimeoutException: > # ... othercoroutine() is aborted > > When do you use that? We've had long discussions about yield vs. yield-from. The latter is way > more efficient and that's enough for me to push it through. When using > yield, each yield causes you to bounce to the scheduler, which has to do a > lot of work to decide what to do next, even if that is just resuming the > suspended generator; and the scheduler is responsible for keeping track of > the stack of generators. When using yield-from, calling another coroutine > as a subroutine is almost free and doesn't involve the scheduler at all; > thus it's much cheaper, and the scheduler can be simpler (doesn't need to > keep track of the stack). Also stack traces and debugging are better. > > > But this sounds like a consequence of a particular implementation, isn't > it? > These semantics are built into the language as of Python 3.3; sure, it's a quality of implementation issue to make it as fast as possible, but the stack trace semantics are not optional, and if CPython can make it efficient then other implementations will try to compete by making it even more efficient. :-) There are still some optimizations possibly beyond what Python 3.3 currently does, maybe 3.3.1 or 3.4 will speed it up even more. (In particular, in the ideal implementation, a yield at a deeply nested coroutine should reach the caller of the outermost coroutine in O(1) time rather than O(N) where N is the stack depth. I think it is currently O(N) with a rather small constant factor. > A @kaa.coroutine() decorated function is entered right away when invoked, > and the decorator logic does as much as it can until the underlying > generator yields an unfinished InProgress that needs to wait for (or > kaa.NotFinished). Once it yields, *then* the decorator sets up the > necessary hooks with the scheduler / event loop. > That's a good optimization if your semantics require a Task at every level. But IIRC (from implementing something like this myself for NDB) it is quite subtle to get it right in all edge cases. And you still have at least two Python function invocations for every level of coroutine invocation. > This means you can nest a stack of coroutines without involving the > scheduler until something truly asynchronous needs to take place. > > Have I misunderstood? > Misunderstood what? You are describing Kaa here. :-) > >> - coroutines can have certain policies that control invocation >> behaviour. The most obvious ones to describe are POLICY_SYNCHRONIZED which >> ensures that multiple invocations of the same coroutine are serialized, and >> POLICY_SINGLETON which effectively ignores subsequent invocations if it's >> already running >> - it is possible to have a special progress object passed into the >> coroutine function so that the coroutine's progress can be communicated to >> an outside observer >> >> > These seem pretty esoteric and can probably implemented in user code if > needed. > > > I'm fine with that, provided the flexibility is there to allow for it. > > > > As I said, I think wait_for_future() and run_in_executor() in the PEP > give you all you need. The @threaded decorator you propose is just sugar; > if a user wants to take an existing API and convert it from a coroutine to > threaded without requiring changes to the caller, they can just introduce a > helper that is run in a thread with run_in_executor(). > > > Also works for me. :) > > > > Thanks for your very useful contribution! Kaa looks like an interesting > system. Is it ported to Python 3 yet? Maybe you could look into integrating > with the PEP 3156 event loop and/or scheduler. > > > Kaa does work with Python 3, yes, although it still lacks very much needed > unit tests so I'm not completely confident it has the same functional > coverage as Python 2. > > I'm definitely interested in having it conform to whatever shakes out of > PEP 3156, which is why I'm speaking up now. :) > I'm sorry I don't have a reference implementation available yet. I hope to finish one before Christmas. > I've a couple other subjects I should bring up: > > Tasks/Futures as "signals": it's often necessary to be able to resume a > coroutine based on some condition other than e.g. any IO tasks it's waiting > on. For example, in one application, I have a (POLICY_SINGLETON) coroutine > that works off a download queue. If there's nothing in the queue, it's > suspended at a yield. It's the coroutine equivalent of a dedicated thread. > [1] > > It must be possible to "wake" the queue manager when I enqueue a job for > it. Kaa has this notion of "signals" which is similar to the gtk+ style of > signals in that you can attach callbacks to them and emit them. Signals > can be represented as InProgress objects, which means they can be yielded > from coroutines and used in InProgressAny/All objects. > (Aside: I can never get used to that terminology; I am too used to the UNIX meaning of "signal". It sounds like a publish-subscribe mechanism.) > > So my download manager coroutine can yield an InProgressAny of all the > active download coroutines *and* the "new job enqueued" signal, and > execution will resume as long as any of those conditions are met. > > Is there anything in your current proposal that would allow for this > use-case? > > [1] > https://github.com/jtackaberry/stagehand/blob/master/src/manager.py#L390 > That example is a little beyond my comprehension. I'm guessing though that you could probably cobble something like this together from the wait_one() primitive I described above. Or perhaps we need a set of synchronization primitives similar to those provided by threading.py: Lock, Condition, Semaphore, Event, Barrier, and some variations. > Another pain point for me has been this notion of unhandled asynchronous > exceptions. Asynchronous tasks are represented as an InProgress object, > and if a task fails, accessing InProgress.result will raise the exception > at which point it's considered handled. This attribute access could happen > at any time during the lifetime of the InProgress object, outside the > task's call stack. > > The desirable behaviour is that when the InProgress object is destroyed, > if there's an exception attached to it from a failed task that hasn't been > accessed, we should output the stack as an unhandled exception. In Kaa, I > do this with a weakref destroy callback, but this isn't ideal because with > GC, the InProgress might not be destroyed until well after the exception is > relevant. > > I make every effort to remove reference cycles and generally get the > InProgress object destroyed as early as possible, but this changes subtly > between Python versions. > > How will unhandled asynchronous exceptions be handled with tulip? > That's actually a clever idea: log the exception when the Task object is destroyed if it hasn't been raised (from result()) or inspected (using exception()) at least once. I know these have been haunting me in NDB -- it logs all, some or none of the exceptions depending on the log settings, but that's not right, and your approach is much better. So it may come down to implementation cleverness to try and GC Task objects sooner rather than later -- which will also depend on the Python implementation. In the end, debugging convenience cannot help but depend on the implementation. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 17 18:47:22 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 09:47:22 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 3:08 AM, Geert Jansen wrote: > below is some feedback on the EventLoop API as implemented in tulip. Great feedback! I hope you will focus on PEP 3156 (http://www.python.org/dev/peps/pep-3156/) and Tulip v2 next; Tulip v2 isn't written but is quickly taking shape in the 'tulip' subdirectory of the Tulip project. > I am interested in this for an (alternate) dbus interface that I've written > for Python that supports evented IO. I'm hoping tulip's EventLoop could be an > abstraction as well as a default implementation that allows me to support > just one event interface. Nice. The more interop this event loop offers the better. I don't know much about dbus, though, so occasionally my responses may not make any sense -- please be gentle and educate me when my ignorance gets in the way of understanding. > I looked at it from two angles: > > 1. Does EventLoop provide everything that is needed from a library writer > point of view? > 2. Can EventLoop efficiently expose a subset of the functionality of > some of the main event loop implementations out there today > (i looked at libuv, libev and Qt). > > First some code pointers... > > * https://github.com/geertj/looping - Here i've implemented the EventLoop > interface for libuv, libev and Qt. It includes a slightly modified version of > tulip's "polling.py" where I've implemented some of the suggestions below. > It also adds support for Python 2.6/2.7 as the Python Qt interface (PySide) > doesn't support Python 3 yet. Cool. For me, right now, Python 2 compatibility is a distraction, but I am not against others adding it. I'll be happy to consider small tweaks to the PEP to make this easier. Exception: I'm not about to give up on 'yield from'; but that doesn't seem your focus anyway. > * https://github.com/geertj/python-dbusx - A Python interface for libdbus that > supports evented IO using an EventLoop interface. This module is also > tests all the different loops from "looping" by doing D-BUS tests with them > (looping itself doesn't have tests yet). I'm actually glad to see there are so many event loop implementations around. This suggests to me that there's a real demand for this type of functionality, and I'd be real happy if PEP 3156 and Tulip came to improve the interop situation (especially for Python 3.3 and beyond). > My main points of feedback are below: > > * It would be nice to have repeatable timers. Repeatable timers are expected > for example by libdbus when integrating it with an event loop. > > Without repeatable timers, I could emulate a repeatable timer by using > call_later() and adding a new timer every time the timer fires. This would > be an inefficient interface though for event loops that natively support > repeatable timers. > > This could possibly be done by adding a "repeat" argument to call_later(). I've not used repeatable timers myself but I see them in several other interfaces. I do think they deserve a different method call to set them up, even if the implementation will just be to add a repeat field to the DelayedCall. When I start a timer with a 2 second repeat, does it run now and then 2, 4, 6, ... seconds after, or should the first run be in 2 seconds? Or are these separate parameters? Strawman proposal: it runs in 2 seconds and then every 2 seconds. The API would be event_loop.call_repeatedly(interval, callback, *args), returning a DelayedCall with an interval attribute set to the interval value. (BTW, can someone *please* come up with a better name for DelayedCall? It's tedious and doesn't abbreviate well. But I don't want to name the class 'Callback' since I already use 'callback' for function objects that are used as callbacks.) > * It would be nice to be a way to call a callback once per loop iteration. > An example here is dispatching in libdbus. The easiest way to do this is > to call dbus_connection_dispatch() every iteration of the loop (a more > complicated way exists to get notifications when the dispatch status > changes, but it is edge triggered and difficult to get right). > > This could possibly be implemented by adding a "repeat" argument to > call_soon(). Again, I'd rather introduce a new method. What should the semantics be? Is this called just before or after we potentially go to sleep, or at some other point, or at the very top or bottom of run_once()? > * A useful semantic for run_once() would be to run the callbacks for > readers and writers in the same iteration as when the FD got ready. Good catch, I've struggled with this. I ended up not needing to call run_once(), so I've left it out of the PEP. I agree if there's a strong enough use case for it (what's yours?) it should probably be redesigned. Another thing I don't like about it is that a callback that calls call_soon() with itself will starve I/O completely. OTOH that's perhaps no worse than a callback containing an infinite loop; and there's something to say for the semantics that if a callback just schedules another callback as an immediate 'continuation', it's reasonable to run that before even attempting to poll for I/O. > This allows for the idiom below when expecting a single event to happen > on a file descriptor from outside the event loop: > > # handle_read() sets the "ready" flag > loop.add_reader(fd, handle_read) > while not ready: > loop.run_once() > > I use this idiom for example in a blocking method_call() method that calls > into a D-BUS method. > > Currently, the handle_read() callback would be called in the iteration > *after* the FD became readable. So this would not work, unless some more > IO becomes available. > > As far as I can see libev, libuv and Qt all work like this. Hm, okay, it seems reasonable to support that. (My original intent with run_unce() was to allow mixing multiple event loops -- you'd just call each event loop's run_once() equivalent in a round-robin fashion.) How about the following semantics for run_once(): 1. compute deadline as the smallest of: - the time until the first event in the timer heap, if non empty - 0 if the ready queue is non empty - Infinity(*) 2. poll for I/O with the computed deadline, adding anything that is ready to the ready queue 3. run items from the ready queue until it is empty (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as infinity, with the idea that if somehow a race condition added something to the ready queue just as we went to sleep, and there's no I/O at all, the system will recover eventually. But I've also heard people worried about power conservation on mobile devices (or laptops) complain about servers that wake up regularly even when there is no work to do. Thoughts? I think I'll leave this out of the PEP, but what should Tulip do? > * If remove_reader() / remove_writer() would accept the DelayedCall instance > returned by their add_xxx() cousins, then that would allow for multiple > callbacks per FD. Not all event loops support this (libuv doesn't, libev > and Qt do), but for the ones that do could have their functionality could > be exposed like this. For event loops that don't support this, an exception > could be raised when adding multiple callbacks per FD. Hm. The PEP currently states that you can call cancel() on the DelayedCall returned by e.g. add_reader() and it will act as if you called remove_reader(). (Though I haven't implemented this yet -- either there would have to be a cancel callback on the DelayedCall or the effect would be delayed.) But multiple callbacks per FD seems a different issue -- currently add_reader() just replaces the previous callback if one is already set. Since not every event loop can support this, I'm not sure it ought to be in the PEP, and making it optional sounds like a recipe for trouble (a library that depends on this may break subtly or only under pressure). Also, what's the use case? If you really need this you are free to implement a mechanism on top of the standard in user code that dispatches to multiple callbacks -- that sounds like a small amount of work if you really need it, but it sounds like an attractive nuisance to put this in the spec. > Support for multiple callbacks per FD could be advertised as a capability. I'm not keen on having optional functionality as I explained above. (In fact, I probably will change the PEP to make those APIs that are currently marked as optional required -- it will just depend on the platform which paradigm performs better, but using the transport/protocol abstraction will automatically select the best paradigm). > * After a DelayedCall is cancelled, it would also be very useful to have a > second method to enable it again. Having that functionality is more > efficient than creating a new event. For example, the D-BUS event loop > integration API has specific methods for toggling events on and off that > you need to provide. Really? Doesn't this functionality imply that something (besides user code) is holding on to the DelayedCall after it is cancelled? It seems iffy to have to bend over backwards to support this alternate way of doing something that we can already do, just because (on some platform?) it might shave a microsecond off callback registration. > * (Nitpick) Multiplexing absolute and relative timeouts for the "when" > argument in call_later() is a little too smart in my view and can lead > to bugs. Agreed; that's why I left it out of the PEP. The v2 implementation will use time.monotonic(), > With some input, I'd be happy to produce patches. I hope I've given you enough input; it's probably better to discuss the specs first before starting to code. But please do review the tulip v2 code in the tulip subdirectory; if you want to help you I'll be happy to give you commit privileges to that repo, or I'll take patches if you send them. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Dec 17 19:19:25 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 10:19:25 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 9:47 AM, Guido van Rossum wrote: > I hope I've given you enough input; it's probably better to discuss > the specs first before starting to code. But please do review the > tulip v2 code in the tulip subdirectory; if you want to help you I'll > be happy to give you commit privileges to that repo, or I'll take > patches if you send them. Patches against PEP 3156 are also welcome! (The repo is at hg.python.org/peps) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Dec 17 20:57:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 17 Dec 2012 20:57:34 +0100 Subject: [Python-ideas] async: feedback on EventLoop API References: Message-ID: <20121217205734.103dc4f2@pitrou.net> On Mon, 17 Dec 2012 09:47:22 -0800 Guido van Rossum wrote: > > (BTW, can someone *please* come up with a better name for DelayedCall? > It's tedious and doesn't abbreviate well. But I don't want to name the > class 'Callback' since I already use 'callback' for function objects > that are used as callbacks.) Does it need to be abbreviated? I don't think users have to spell "DelayedCall" at all (they just call call_later()). That said, some proposals: - Timer (might be mixed up with threading.Timer) - Deadline - RDV (French abbrev. for rendez-vous) Regards Antoine. From ronan.lamy at gmail.com Mon Dec 17 21:33:23 2012 From: ronan.lamy at gmail.com (Ronan Lamy) Date: Mon, 17 Dec 2012 20:33:23 +0000 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: <50CF8193.4040501@gmail.com> Le 17/12/2012 17:47, Guido van Rossum a ?crit : > (BTW, can someone *please* come up with a better name for DelayedCall? > It's tedious and doesn't abbreviate well. But I don't want to name the > class 'Callback' since I already use 'callback' for function objects > that are used as callbacks.) It seems to me that a DelayedCall is nothing but a frozen, reified function call. That it's a reified thing is already obvious from the fact that it's an object, so how about naming it just "Call"? "Delayed" is actually only one of the possible relations between the object and the actual call - it could also represent a cancelled call, or a cached one, or ...? This idea has some implications for the design: in particular, it means that .cancel() should be a method of the EventLoop, not of Call. So Call would only have the attributes 'callback' (I'd prefer 'func' or similar) and 'args', and one method to execute the call. HTH, Ronan Lamy From guido at python.org Mon Dec 17 21:49:46 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 12:49:46 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <20121217205734.103dc4f2@pitrou.net> References: <20121217205734.103dc4f2@pitrou.net> Message-ID: On Mon, Dec 17, 2012 at 11:57 AM, Antoine Pitrou wrote: > On Mon, 17 Dec 2012 09:47:22 -0800 > Guido van Rossum wrote: >> >> (BTW, can someone *please* come up with a better name for DelayedCall? >> It's tedious and doesn't abbreviate well. But I don't want to name the >> class 'Callback' since I already use 'callback' for function objects >> that are used as callbacks.) > > Does it need to be abbreviated? I don't think users have to spell > "DelayedCall" at all (they just call call_later()). They save the result in a variable. Naming that variable delayed_call feels awkward. In my code I've called it 'dcall' but that's not great either. > That said, some proposals: > - Timer (might be mixed up with threading.Timer) But often there's no time involved... > - Deadline Same... > - RDV (French abbrev. for rendez-vous) Hmmmm. :-) Maybe Callback is okay after all? The local variable can be 'cb'. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Dec 17 21:56:26 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 17 Dec 2012 21:56:26 +0100 Subject: [Python-ideas] async: feedback on EventLoop API References: <20121217205734.103dc4f2@pitrou.net> Message-ID: <20121217215626.762ac2d3@pitrou.net> On Mon, 17 Dec 2012 12:49:46 -0800 Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 11:57 AM, Antoine Pitrou wrote: > > On Mon, 17 Dec 2012 09:47:22 -0800 > > Guido van Rossum wrote: > >> > >> (BTW, can someone *please* come up with a better name for DelayedCall? > >> It's tedious and doesn't abbreviate well. But I don't want to name the > >> class 'Callback' since I already use 'callback' for function objects > >> that are used as callbacks.) > > > > Does it need to be abbreviated? I don't think users have to spell > > "DelayedCall" at all (they just call call_later()). > > They save the result in a variable. Naming that variable delayed_call > feels awkward. In my code I've called it 'dcall' but that's not great > either. > > > That said, some proposals: > > - Timer (might be mixed up with threading.Timer) > > But often there's no time involved... Ah, I see you use the same class for add_reader() and friends. I was assuming that, like in Twisted, DelayedCall was only returned by call_later(). Is it useful to return a DelayedCall in add_reader()? Is it so that you can remove the reader? But you already define remove_reader() for that, so I'm not sure what an alternative way to do it brings :-) Regards Antoine. From guido at python.org Mon Dec 17 22:01:35 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 13:01:35 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <50CF8193.4040501@gmail.com> References: <50CF8193.4040501@gmail.com> Message-ID: On Mon, Dec 17, 2012 at 12:33 PM, Ronan Lamy wrote: > Le 17/12/2012 17:47, Guido van Rossum a ?crit : > >> (BTW, can someone *please* come up with a better name for DelayedCall? >> It's tedious and doesn't abbreviate well. But I don't want to name the >> class 'Callback' since I already use 'callback' for function objects >> that are used as callbacks.) > > It seems to me that a DelayedCall is nothing but a frozen, reified function > call. That it's a reified thing is already obvious from the fact that it's > an object, so how about naming it just "Call"? "Delayed" is actually only > one of the possible relations between the object and the actual call - it > could also represent a cancelled call, or a cached one, or ... Call is not a bad suggestion for the name. Let me mull that over. > This idea has some implications for the design: in particular, it means that > .cancel() should be a method of the EventLoop, not of Call. So Call would > only have the attributes 'callback' (I'd prefer 'func' or similar) and > 'args', and one method to execute the call. Not sure. Cancelling it must set a flag on the object, since the object could be buried deep inside any number of data structures owned by the event loop: e.g. the ready queue, the pollster's readers or writers (dicts mapping FD to DelayedCall), or the timer heap. When you cancel a call you don't immediately remove it from its data structure -- instead, when you get to it naturally (e.g. its time comes up) you notice that it's been cancelled and ignore it. The one place where this is awkward is when it's a FD reader or writer -- it won't come up if the FD doesn't get any new I/O, and it's even possible that the FD is closed. (I don't actually know what epoll(), kqueue() etc. do when one of the FDs is closed, but none of the behaviors I can think of are particularly convenient...) I had thought of giving the DelayedCall a 'cancel callback' that is used if/when it is cancelled, and for readers/writers it could be something that calls remove_reader/writer with the right FD. (Maybe I need multiple cancel-callbacks, in case the same object is used as a callback for multiple queues.) Hm, this gets messy. (Another think in this area: pyftpdlib's event loop keeps track of how many calls are cancelled, and if a large number are cancelled it reconstructs the heap. The use case is apparently registering lots of callbacks far in the future and then cancelling them all. Not sure how good a use case that it. But I admit that it would be easier if cancelling was a method on the event loop.) PS. Cancelling a future is a different thing. There you still want the callback to be called, you just want it to notice that the operation was cancelled. Same for tasks. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Dec 17 22:07:21 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 13:07:21 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <20121217215626.762ac2d3@pitrou.net> References: <20121217205734.103dc4f2@pitrou.net> <20121217215626.762ac2d3@pitrou.net> Message-ID: On Mon, Dec 17, 2012 at 12:56 PM, Antoine Pitrou wrote: > On Mon, 17 Dec 2012 12:49:46 -0800 > Guido van Rossum wrote: >> On Mon, Dec 17, 2012 at 11:57 AM, Antoine Pitrou wrote: >> > On Mon, 17 Dec 2012 09:47:22 -0800 >> > Guido van Rossum wrote: >> >> >> >> (BTW, can someone *please* come up with a better name for DelayedCall? >> >> It's tedious and doesn't abbreviate well. But I don't want to name the >> >> class 'Callback' since I already use 'callback' for function objects >> >> that are used as callbacks.) >> > >> > Does it need to be abbreviated? I don't think users have to spell >> > "DelayedCall" at all (they just call call_later()). >> >> They save the result in a variable. Naming that variable delayed_call >> feels awkward. In my code I've called it 'dcall' but that's not great >> either. >> >> > That said, some proposals: >> > - Timer (might be mixed up with threading.Timer) >> >> But often there's no time involved... > > Ah, I see you use the same class for add_reader() and friends. I was > assuming that, like in Twisted, DelayedCall was only returned by > call_later(). > > Is it useful to return a DelayedCall in add_reader()? Is it so that you > can remove the reader? But you already define remove_reader() for that, > so I'm not sure what an alternative way to do it brings :-) I'm not sure myself. I added it to the PEP (with a question mark) because I use DelayedCalls to represent I/O callbacks internally -- it's handy to have an object that represents a function plus its arguments, and I also have a shortcut for adding such objects to the ready queue (the ready queue *also* stores DelayedCalls). It is probably a mistake offering two ways to cancel an I/O callback; but I'm not sure whether to drop remove_{reader,writer} or whether to drop cancelling the callback. (The latter would means that add_{reader,writer} should not return anything.) I *think* I'll keep remove_* and drop callback cacellation, because the entity that most likely wants to revoke the callback already has the file descriptor in hand (it comes with the socket, which they need anyway so they can call its recv/send method), but they would have to hold on to the callback object separately. OTOH callback objects might make it possible to have multiple callbacks per FD, which I currently don't support. (See discussion earlier in this thread.) -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Mon Dec 17 23:00:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 18 Dec 2012 11:00:35 +1300 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: <50CF9603.6040409@canterbury.ac.nz> Guido van Rossum wrote: > (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as > infinity, with the idea that if somehow a race condition added > something to the ready queue just as we went to sleep, and there's no > I/O at all, the system will recover eventually. I don't see how such a race condition can occur in a cooperative multitasking system. There are no true interrupts that can cause something to happen when you're not expecting it. So I'd say let infinity really mean infinity. -- Greg From solipsis at pitrou.net Mon Dec 17 23:11:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 17 Dec 2012 23:11:34 +0100 Subject: [Python-ideas] async: feedback on EventLoop API References: <50CF9603.6040409@canterbury.ac.nz> Message-ID: <20121217231134.19ede507@pitrou.net> On Tue, 18 Dec 2012 11:00:35 +1300 Greg Ewing wrote: > Guido van Rossum wrote: > > (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as > > infinity, with the idea that if somehow a race condition added > > something to the ready queue just as we went to sleep, and there's no > > I/O at all, the system will recover eventually. > > I don't see how such a race condition can occur in a > cooperative multitasking system. There are no true > interrupts that can cause something to happen when > you're not expecting it. So I'd say let infinity > really mean infinity. Most event loops out there allow you to schedule callbacks from other (preemptive, OS-level) threads. Regards Antoine. From geertj at gmail.com Mon Dec 17 23:57:51 2012 From: geertj at gmail.com (Geert Jansen) Date: Mon, 17 Dec 2012 23:57:51 +0100 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 6:47 PM, Guido van Rossum wrote: > Cool. For me, right now, Python 2 compatibility is a distraction, but > I am not against others adding it. I'll be happy to consider small > tweaks to the PEP to make this easier. Exception: I'm not about to > give up on 'yield from'; but that doesn't seem your focus anyway. Correct - my focus right now is on the event loop only. I intend to have a deeper look at the coroutine scheduler as well later (right now i'm using greenlets for that). > I've not used repeatable timers myself but I see them in several other > interfaces. I do think they deserve a different method call to set > them up, even if the implementation will just be to add a repeat field > to the DelayedCall. When I start a timer with a 2 second repeat, does > it run now and then 2, 4, 6, ... seconds after, or should the first > run be in 2 seconds? Or are these separate parameters? Strawman > proposal: it runs in 2 seconds and then every 2 seconds. The API would > be event_loop.call_repeatedly(interval, callback, *args), returning a > DelayedCall with an interval attribute set to the interval value. That would work (in 2 secs, then 4, 6, ...). This is the Qt QTimer model. Both libev and libuv have a slightly more general timer that take a timeout and a repeat value. When the timeout reaches zero, the timer will fire, and if repeat != 0, it will re-seed the timeout to that value. I haven't seen any real need for such a timer where interval != repeat, and in any case it can pretty cheaply be emulated by adding a new timer on the first expiration only. So your call_repeatedly() call above should be fine. > (BTW, can someone *please* come up with a better name for DelayedCall? > It's tedious and doesn't abbreviate well. But I don't want to name the > class 'Callback' since I already use 'callback' for function objects > that are used as callbacks.) libev uses the generic term "Watcher", libuv uses "Handle". But their APIs are structured a bit differently from tulip so i'm not sure if those names would make sense. They support many different types of events (including more esoteric events like process watches, on-fork handlers, and wall-clock timer events). Each event has its own class that named after the event type, and that inherits from "Watcher" or "Handle". When an event is created, you pass it a reference to its loop. You manage the event fully through the event instance (e.g. starting it, setting its callback and other parameters, stopping it). The loop has only a few methods, notably "run" and "run_once". So for example, you'd say: loop = Loop() timer = Timer(loop) timer.start(2.0, callback) loop.run() The advantages of this approach is that naming is easier, and that you can also have a natural place to put methods that update the event after you created it. For example, you might want to temporarily suspend a timer or change its interval. I quite liked the fresh approach taken by tulip so that's why i tried to stay within its design. However, the disadvantage is that modifying events after you've created them is difficult (unless you create one DelayedCall subtype per event in which case you're probably better off creating those events through their constructor in the first place). >> * It would be nice to be a way to call a callback once per loop iteration. >> An example here is dispatching in libdbus. The easiest way to do this is >> to call dbus_connection_dispatch() every iteration of the loop (a more >> complicated way exists to get notifications when the dispatch status >> changes, but it is edge triggered and difficult to get right). >> >> This could possibly be implemented by adding a "repeat" argument to >> call_soon(). > > Again, I'd rather introduce a new method. What should the semantics > be? Is this called just before or after we potentially go to sleep, or > at some other point, or at the very top or bottom of run_once()? That is a good question. Both libuv and libev have both options. The one that is called before we go to sleep is called a "Prepare" handler, the one after we come back from sleep a "Check" handler. The libev documentation has some words on check and prepare handlers here: http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che I am not sure both are needed, but i can't oversee all the consequences. > How about the following semantics for run_once(): > > 1. compute deadline as the smallest of: > - the time until the first event in the timer heap, if non empty > - 0 if the ready queue is non empty > - Infinity(*) > > 2. poll for I/O with the computed deadline, adding anything that is > ready to the ready queue > > 3. run items from the ready queue until it is empty I think doing this would work but i again can't fully oversee all the consequences. Let me play with this a little. > (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as > infinity, with the idea that if somehow a race condition added > something to the ready queue just as we went to sleep, and there's no > I/O at all, the system will recover eventually. But I've also heard > people worried about power conservation on mobile devices (or laptops) > complain about servers that wake up regularly even when there is no > work to do. Thoughts? I think I'll leave this out of the PEP, but what > should Tulip do? I had a look at libuv and libev. They take two different approaches: * libev uses a ~60 second timeout by default. This reason is subtle. Libev supports a wall-clock time event that fires when a certain wall-clock time has passed. Having a non-infinite timeout will allow it to pick up changes to the system time (e.g. by NTP), which would change when the wall-clock timer needs to run. * libuv does not have a wall-clock timer and uses an infinite timeout. In my view it would be best for tulip to use an infinite timeout unless at some point a wall-clock timer will be added. That will help with power management. Regarding race-conditions, i think they should be solved in other ways (e.g by having a special method that can post callbacks to the loop in a thread-safe way and possibly write to a self-pipe). > Hm. The PEP currently states that you can call cancel() on the > DelayedCall returned by e.g. add_reader() and it will act as if you > called remove_reader(). (Though I haven't implemented this yet -- > either there would have to be a cancel callback on the DelayedCall or > the effect would be delayed.) Right now i think that cancelling a DelayedCall is not safe. It could busy-loop if the fd is ready. > But multiple callbacks per FD seems a different issue -- currently > add_reader() just replaces the previous callback if one is already > set. Since not every event loop can support this, I'm not sure it > ought to be in the PEP, and making it optional sounds like a recipe > for trouble (a library that depends on this may break subtly or only > under pressure). Also, what's the use case? If you really need this > you are free to implement a mechanism on top of the standard in user > code that dispatches to multiple callbacks -- that sounds like a small > amount of work if you really need it, but it sounds like an attractive > nuisance to put this in the spec. A not-so-good use case are libraries like libdbus that don't document their assumptions regarding this. For example, i have to provide an "add watch" function that creates a new watch (a watch is just a generic term for an FD event that can be read, write or read|write). I have observed that it only ever sets one read and one write watch per FD. If we go for one reader/writer per FD, then it's probably fine, but it would be nice if code that does install multiple readers/writers per FD would get an exception rather than silently updating the callback. The requirement could be that you need to remove the event before you can add a new event for the same FD. >> * After a DelayedCall is cancelled, it would also be very useful to have a >> second method to enable it again. Having that functionality is more >> efficient than creating a new event. For example, the D-BUS event loop >> integration API has specific methods for toggling events on and off that >> you need to provide. > > Really? Doesn't this functionality imply that something (besides user > code) is holding on to the DelayedCall after it is cancelled? Not that i can see. At least not for libuv and libev. > It seems > iffy to have to bend over backwards to support this alternate way of > doing something that we can already do, just because (on some > platform?) it might shave a microsecond off callback registration. According to the libdbus documentation there is a separate function to toggle an event on/off because that could be implemented without allocating memory. But actually there's one kind-of idiomatic use for this that i've seen quite a few times in libraries. Assume you have a library that defines a connection. Often, you create two events for that connection in the constructor: a "write_event" and a "read_event". The read_event is normally enabled, but gets temporarily disabled when you need to throttle input. The write_event is normally disabled except when you get a short write on output. Just enabling/disabling these events is a bit more friendly to the programmer IMHO than having to cancel and recreate them when needed. >> * (Nitpick) Multiplexing absolute and relative timeouts for the "when" >> argument in call_later() is a little too smart in my view and can lead >> to bugs. > > Agreed; that's why I left it out of the PEP. The v2 implementation > will use time.monotonic(), > >> With some input, I'd be happy to produce patches. > > I hope I've given you enough input; it's probably better to discuss > the specs first before starting to code. But please do review the > tulip v2 code in the tulip subdirectory; if you want to help you I'll > be happy to give you commit privileges to that repo, or I'll take > patches if you send them. OK great. Let me work on this over the next couple of days and hopefully come up with something. Regards, Geert From guido at python.org Tue Dec 18 01:00:55 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 16:00:55 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <20121217231134.19ede507@pitrou.net> References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On Mon, Dec 17, 2012 at 2:11 PM, Antoine Pitrou wrote: > On Tue, 18 Dec 2012 11:00:35 +1300 > Greg Ewing wrote: >> Guido van Rossum wrote: >> > (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as >> > infinity, with the idea that if somehow a race condition added >> > something to the ready queue just as we went to sleep, and there's no >> > I/O at all, the system will recover eventually. >> >> I don't see how such a race condition can occur in a >> cooperative multitasking system. There are no true >> interrupts that can cause something to happen when >> you're not expecting it. So I'd say let infinity >> really mean infinity. > > Most event loops out there allow you to schedule callbacks from other > (preemptive, OS-level) threads. That's what call_soon_threadsafe() is for. But bugs happen (in either user code or library code). And yes, call_soon_threadsafe() will use a self-pipe on UNIX. (I hope someone else will write the Windows main loop.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Dec 18 01:40:47 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 16:40:47 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 2:57 PM, Geert Jansen wrote: > On Mon, Dec 17, 2012 at 6:47 PM, Guido van Rossum wrote: >> I've not used repeatable timers myself but I see them in several other >> interfaces. I do think they deserve a different method call to set >> them up, even if the implementation will just be to add a repeat field >> to the DelayedCall. When I start a timer with a 2 second repeat, does >> it run now and then 2, 4, 6, ... seconds after, or should the first >> run be in 2 seconds? Or are these separate parameters? Strawman >> proposal: it runs in 2 seconds and then every 2 seconds. The API would >> be event_loop.call_repeatedly(interval, callback, *args), returning a >> DelayedCall with an interval attribute set to the interval value. > > That would work (in 2 secs, then 4, 6, ...). This is the Qt QTimer model. > > Both libev and libuv have a slightly more general timer that take a > timeout and a repeat value. When the timeout reaches zero, the timer > will fire, and if repeat != 0, it will re-seed the timeout to that > value. > > I haven't seen any real need for such a timer where interval != > repeat, and in any case it can pretty cheaply be emulated by adding a > new timer on the first expiration only. So your call_repeatedly() call > above should be fine. I'm trying to stick to a somewhat minimalistic design here; repeated timers sound fine; extra complexities seem redundant. (What's next -- built-in support for exponential back-off? :-) >> (BTW, can someone *please* come up with a better name for DelayedCall? >> It's tedious and doesn't abbreviate well. But I don't want to name the >> class 'Callback' since I already use 'callback' for function objects >> that are used as callbacks.) > > libev uses the generic term "Watcher", libuv uses "Handle". But their > APIs are structured a bit differently from tulip so i'm not sure if > those names would make sense. They support many different types of > events (including more esoteric events like process watches, on-fork > handlers, and wall-clock timer events). Each event has its own class > that named after the event type, and that inherits from "Watcher" or > "Handle". When an event is created, you pass it a reference to its > loop. You manage the event fully through the event instance (e.g. > starting it, setting its callback and other parameters, stopping it). > The loop has only a few methods, notably "run" and "run_once". I see. That's a fundamentally different API style, and one I'm less familiar with. DelayedCall isn't meant to be that at all -- it's just meant to be this object that (a) is sortable by time (needed for heapq) and (b) can be cancelled (useful functionality in general). I expect that at least one of the reasons for libuv etc. to do it their way is probably that the languages are different -- Python has keyword arguments to pass options, while C/C++ must use something else. Anyway, Handler sounds like a pretty good name. Let me think it over. > So for example, you'd say: > > loop = Loop() > timer = Timer(loop) > timer.start(2.0, callback) > loop.run() > > The advantages of this approach is that naming is easier, and that you > can also have a natural place to put methods that update the event > after you created it. For example, you might want to temporarily > suspend a timer or change its interval. Ah, that's where the desire to cancel and restart a callback comes from. > I quite liked the fresh approach taken by tulip so that's why i tried > to stay within its design. However, the disadvantage is that modifying > events after you've created them is difficult (unless you create one > DelayedCall subtype per event in which case you're probably better off > creating those events through their constructor in the first place). I wonder how often one needs to modify an event after it's been in use for a while. The mutation API seems mostly useful to separate construction from setting various parameters (to avoid insane overloading of the constructor). >>> * It would be nice to be a way to call a callback once per loop iteration. >>> An example here is dispatching in libdbus. The easiest way to do this is >>> to call dbus_connection_dispatch() every iteration of the loop (a more >>> complicated way exists to get notifications when the dispatch status >>> changes, but it is edge triggered and difficult to get right). >>> >>> This could possibly be implemented by adding a "repeat" argument to >>> call_soon(). >> >> Again, I'd rather introduce a new method. What should the semantics >> be? Is this called just before or after we potentially go to sleep, or >> at some other point, or at the very top or bottom of run_once()? > > That is a good question. Both libuv and libev have both options. The > one that is called before we go to sleep is called a "Prepare" > handler, the one after we come back from sleep a "Check" handler. The > libev documentation has some words on check and prepare handlers here: > > http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che > > I am not sure both are needed, but i can't oversee all the consequences. I'm still not convinced that both are needed. However they are easy to add, so if the need really does arise in practical use I am fine with evolving the API that way. Until then, let's stick to KISS. >> How about the following semantics for run_once(): >> >> 1. compute deadline as the smallest of: >> - the time until the first event in the timer heap, if non empty >> - 0 if the ready queue is non empty >> - Infinity(*) >> >> 2. poll for I/O with the computed deadline, adding anything that is >> ready to the ready queue >> >> 3. run items from the ready queue until it is empty > > I think doing this would work but i again can't fully oversee all the > consequences. Let me play with this a little. It's hard to oversee all consequences. But it looks good to me too, so I'll implement it this way. Maybe the Twisted folks have wisdom in this area (though quite often, when pressed, they admit that their APIs are not ideal, and have warts due to backward compatibility :-). >> (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as >> infinity, with the idea that if somehow a race condition added >> something to the ready queue just as we went to sleep, and there's no >> I/O at all, the system will recover eventually. But I've also heard >> people worried about power conservation on mobile devices (or laptops) >> complain about servers that wake up regularly even when there is no >> work to do. Thoughts? I think I'll leave this out of the PEP, but what >> should Tulip do? > > I had a look at libuv and libev. They take two different approaches: > > * libev uses a ~60 second timeout by default. This reason is subtle. > Libev supports a wall-clock time event that fires when a certain > wall-clock time has passed. Having a non-infinite timeout will allow > it to pick up changes to the system time (e.g. by NTP), which would > change when the wall-clock timer needs to run. > > * libuv does not have a wall-clock timer and uses an infinite timeout. I've not actually ever seen a use case for the wall-clock timer, so I've taken it out. > In my view it would be best for tulip to use an infinite timeout > unless at some point a wall-clock timer will be added. That will help > with power management. Regarding race-conditions, i think they should > be solved in other ways (e.g by having a special method that can post > callbacks to the loop in a thread-safe way and possibly write to a > self-pipe). Right, a self-pipe is already there. I'll stick with infinity in Tulip, but an implementation can of course do what it wants to. >> Hm. The PEP currently states that you can call cancel() on the >> DelayedCall returned by e.g. add_reader() and it will act as if you >> called remove_reader(). (Though I haven't implemented this yet -- >> either there would have to be a cancel callback on the DelayedCall or >> the effect would be delayed.) > > Right now i think that cancelling a DelayedCall is not safe. It could > busy-loop if the fd is ready. That's because I'm not done implementing it. :-) But the more I think about it the more I don't like calling cancel() on a read/write handler. >> But multiple callbacks per FD seems a different issue -- currently >> add_reader() just replaces the previous callback if one is already >> set. Since not every event loop can support this, I'm not sure it >> ought to be in the PEP, and making it optional sounds like a recipe >> for trouble (a library that depends on this may break subtly or only >> under pressure). Also, what's the use case? If you really need this >> you are free to implement a mechanism on top of the standard in user >> code that dispatches to multiple callbacks -- that sounds like a small >> amount of work if you really need it, but it sounds like an attractive >> nuisance to put this in the spec. > > A not-so-good use case are libraries like libdbus that don't document > their assumptions regarding this. For example, i have to provide an > "add watch" function that creates a new watch (a watch is just a > generic term for an FD event that can be read, write or read|write). I > have observed that it only ever sets one read and one write watch per > FD. > > If we go for one reader/writer per FD, then it's probably fine, but it > would be nice if code that does install multiple readers/writers per > FD would get an exception rather than silently updating the callback. > The requirement could be that you need to remove the event before you > can add a new event for the same FD. That makes sense. If we wanted to be fancy we could have several different APIs: add (must not be set), set (may be set), replace (must be set). But I think just offering the add and remove APIs is nicely minimalistic and lets you do everything else with ease. (I'll make the remove API return True if it did remove something, False otherwise.) >>> * After a DelayedCall is cancelled, it would also be very useful to have a >>> second method to enable it again. Having that functionality is more >>> efficient than creating a new event. For example, the D-BUS event loop >>> integration API has specific methods for toggling events on and off that >>> you need to provide. >> >> Really? Doesn't this functionality imply that something (besides user >> code) is holding on to the DelayedCall after it is cancelled? > > Not that i can see. At least not for libuv and libev. Never mind, this is just due to the difference in API style. I'm going to ignore it unless I get a lot more pushback. >> It seems >> iffy to have to bend over backwards to support this alternate way of >> doing something that we can already do, just because (on some >> platform?) it might shave a microsecond off callback registration. > > According to the libdbus documentation there is a separate function to > toggle an event on/off because that could be implemented without > allocating memory. Yeah, not gonna happen in Python. :-) > But actually there's one kind-of idiomatic use for this that i've seen > quite a few times in libraries. Assume you have a library that defines > a connection. Often, you create two events for that connection in the > constructor: a "write_event" and a "read_event". The read_event is > normally enabled, but gets temporarily disabled when you need to > throttle input. The write_event is normally disabled except when you > get a short write on output. > > Just enabling/disabling these events is a bit more friendly to the > programmer IMHO than having to cancel and recreate them when needed. The methods on the Transport class take care of this at a higher level: pause() and resume() to suspend reading, and the write() method takes care of buffering and so on. >>> * (Nitpick) Multiplexing absolute and relative timeouts for the "when" >>> argument in call_later() is a little too smart in my view and can lead >>> to bugs. >> >> Agreed; that's why I left it out of the PEP. The v2 implementation >> will use time.monotonic(), >> >>> With some input, I'd be happy to produce patches. >> >> I hope I've given you enough input; it's probably better to discuss >> the specs first before starting to code. But please do review the >> tulip v2 code in the tulip subdirectory; if you want to help you I'll >> be happy to give you commit privileges to that repo, or I'll take >> patches if you send them. > > OK great. Let me work on this over the next couple of days and > hopefully come up with something. Excellent. Please do check back regularly for additions to the tulip subdirectory! -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Dec 18 04:20:53 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Dec 2012 13:20:53 +1000 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Tue, Dec 18, 2012 at 10:40 AM, Guido van Rossum wrote: > I see. That's a fundamentally different API style, and one I'm less > familiar with. DelayedCall isn't meant to be that at all -- it's just > meant to be this object that (a) is sortable by time (needed for > heapq) and (b) can be cancelled (useful functionality in general). I > expect that at least one of the reasons for libuv etc. to do it their > way is probably that the languages are different -- Python has keyword > arguments to pass options, while C/C++ must use something else. > > Anyway, Handler sounds like a pretty good name. Let me think it over. > Is DelayedCall a subclass of Future, like Task? If so, FutureCall might work. >>> * It would be nice to be a way to call a callback once per loop > iteration. > >>> An example here is dispatching in libdbus. The easiest way to do > this is > >>> to call dbus_connection_dispatch() every iteration of the loop (a > more > >>> complicated way exists to get notifications when the dispatch status > >>> changes, but it is edge triggered and difficult to get right). > >>> > >>> This could possibly be implemented by adding a "repeat" argument to > >>> call_soon(). > >> > >> Again, I'd rather introduce a new method. What should the semantics > >> be? Is this called just before or after we potentially go to sleep, or > >> at some other point, or at the very top or bottom of run_once()? > > > > That is a good question. Both libuv and libev have both options. The > > one that is called before we go to sleep is called a "Prepare" > > handler, the one after we come back from sleep a "Check" handler. The > > libev documentation has some words on check and prepare handlers here: > > > > > http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#code_ev_prepare_code_and_code_ev_che > > > > I am not sure both are needed, but i can't oversee all the consequences. > > I'm still not convinced that both are needed. However they are easy to > add, so if the need really does arise in practical use I am fine with > evolving the API that way. Until then, let's stick to KISS. > > > * libev uses a ~60 second timeout by default. This reason is subtle. > > Libev supports a wall-clock time event that fires when a certain > > wall-clock time has passed. Having a non-infinite timeout will allow > > it to pick up changes to the system time (e.g. by NTP), which would > > change when the wall-clock timer needs to run. > > > > * libuv does not have a wall-clock timer and uses an infinite timeout. > > I've not actually ever seen a use case for the wall-clock timer, so > I've taken it out. > If someone really does want a wall-clock timer with a given granularity, it can be handled by adding a repeating timer with that granularity (with the obvious consequences for low power modes). > >> But multiple callbacks per FD seems a different issue -- currently > >> add_reader() just replaces the previous callback if one is already > >> set. Since not every event loop can support this, I'm not sure it > >> ought to be in the PEP, and making it optional sounds like a recipe > >> for trouble (a library that depends on this may break subtly or only > >> under pressure). Also, what's the use case? If you really need this > >> you are free to implement a mechanism on top of the standard in user > >> code that dispatches to multiple callbacks -- that sounds like a small > >> amount of work if you really need it, but it sounds like an attractive > >> nuisance to put this in the spec. > > > > A not-so-good use case are libraries like libdbus that don't document > > their assumptions regarding this. For example, i have to provide an > > "add watch" function that creates a new watch (a watch is just a > > generic term for an FD event that can be read, write or read|write). I > > have observed that it only ever sets one read and one write watch per > > FD. > > > > If we go for one reader/writer per FD, then it's probably fine, but it > > would be nice if code that does install multiple readers/writers per > > FD would get an exception rather than silently updating the callback. > > The requirement could be that you need to remove the event before you > > can add a new event for the same FD. > > That makes sense. If we wanted to be fancy we could have several > different APIs: add (must not be set), set (may be set), replace (must > be set). But I think just offering the add and remove APIs is nicely > minimalistic and lets you do everything else with ease. (I'll make the > remove API return True if it did remove something, False otherwise.) > Perhaps the best bet would be to have the standard API allow multiple callbacks, and emulate that on systems which don't natively support multiple callbacks for a single event? Otherwise, I don't see how an event loop could efficiently expose access to the multiple callback APIs without requiring awkward fallbacks in the code interacting with the event loop. Given that the natural fallback implementation is reasonably clear (i.e. a single callback that calls all of the other callbacks), why force reimplementing that on users rather than event loop authors? Related, the protocol/transport API design may end up needing to consider the gather/scatter problem (i.e. fanning out data from a single transport to multiple consumers, as well as feeding data from multiple producers into a single underlying transport). Actual *implementations* of such tools shouldn't be needed in the standard suite, but at least understanding how you would go about writing multiplexers and demultiplexers can be a good test of a stacked I/O design. > Just enabling/disabling these events is a bit more friendly to the > > programmer IMHO than having to cancel and recreate them when needed. > > The methods on the Transport class take care of this at a higher > level: pause() and resume() to suspend reading, and the write() method > takes care of buffering and so on. > And the main advantage of handling that at a higher level is that suitable buffering designs are going to be transport specific. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 18 04:26:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Dec 2012 13:26:38 +1000 Subject: [Python-ideas] Graph class In-Reply-To: <50CE5904.9090102@krosing.net> References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing wrote: > On 12/16/2012 04:41 PM, Guido van Rossum wrote: > > I think of graphs and trees as patterns, not data structures. > > > How do you draw line between what is data structure and what is pattern ? > A rough rule of thumb is that if it's harder to remember the configuration options in the API than it is to just write a purpose-specific function, it's probably better as a pattern that can be tweaked for a given use case than it is as an actual data structure. More generally, ABCs and magic methods are used to express patterns (like iteration), which may be implemented by various data structures. A graph library that focused on defining a good abstraction (and adapters) that allowed graph algorithms to be written that worked with multiple existing Python graph data structures could be quite interesting. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Dec 18 05:01:18 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 17 Dec 2012 20:01:18 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 7:20 PM, Nick Coghlan wrote: > On Tue, Dec 18, 2012 at 10:40 AM, Guido van Rossum wrote: [A better name for DelayedCall] >> Anyway, Handler sounds like a pretty good name. Let me think it over. > Is DelayedCall a subclass of Future, like Task? If so, FutureCall might > work. No, they're completely related. (I'm even thinking of renaming its cancel() to avoid the confusion? I still like Handler best. In fact, if I'd thought of Handler before, I wouldn't have asked for a better name. :-) Going once, going twice... [Wall-clock timers] > If someone really does want a wall-clock timer with a given granularity, it > can be handled by adding a repeating timer with that granularity (with the > obvious consequences for low power modes). +1. [Multiple calls per FD] >> That makes sense. If we wanted to be fancy we could have several >> different APIs: add (must not be set), set (may be set), replace (must >> be set). But I think just offering the add and remove APIs is nicely >> minimalistic and lets you do everything else with ease. (I'll make the >> remove API return True if it did remove something, False otherwise.) > Perhaps the best bet would be to have the standard API allow multiple > callbacks, and emulate that on systems which don't natively support multiple > callbacks for a single event? Hm. AFAIK Twisted doesn't support this either. Antoine, do you know? I didn't see it in the Tornado event loop either. > Otherwise, I don't see how an event loop could efficiently expose access to > the multiple callback APIs without requiring awkward fallbacks in the code > interacting with the event loop. Given that the natural fallback > implementation is reasonably clear (i.e. a single callback that calls all of > the other callbacks), why force reimplementing that on users rather than > event loop authors? But what's the use case? I don't think our goal should be to offer APIs for any feature that any event loop might offer. It's not quite a least-common denominator either though -- it's about offering commonly needed functionality, and interoperability. Also, event loop implementations are allowed to offer additional APIs on their implementation. If the need for multiple handlers per FD only exists on those platforms where the platform's event loop supports it, no harm is done if the functionality is only available through a platform-specific API. But still, I don't understand the use case. Possibly it is using file descriptors as a more general signaling mechanism? That sounds pretty platform specific anyway (on Windows, FDs must represent sockets). If someone shows me a real-world use case I may change my mind. > Related, the protocol/transport API design may end up needing to consider > the gather/scatter problem (i.e. fanning out data from a single transport to > multiple consumers, as well as feeding data from multiple producers into a > single underlying transport). Actual *implementations* of such tools > shouldn't be needed in the standard suite, but at least understanding how > you would go about writing multiplexers and demultiplexers can be a good > test of a stacked I/O design. Twisted supports this for writing through its writeSequence(), which appears in Tulip and PEP 3156 as writelines(). (Though IIRC Glyph told me that Twisted rarely uses the platform's scatter/gather primitives, because they are so damn hard to use, and the kernel implementation often just joins the buffers together before passing it to the regular send()...) But regardless, I don't think scatter/gather would use multiple callbacks per FD. I think it would be really hard to benefit from reading into multiple buffers in Python. >> > Just enabling/disabling these events is a bit more friendly to the >> > programmer IMHO than having to cancel and recreate them when needed. >> >> The methods on the Transport class take care of this at a higher >> level: pause() and resume() to suspend reading, and the write() method >> takes care of buffering and so on. > And the main advantage of handling that at a higher level is that suitable > buffering designs are going to be transport specific. +1 -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Dec 18 08:21:37 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Dec 2012 17:21:37 +1000 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Tue, Dec 18, 2012 at 2:01 PM, Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 7:20 PM, Nick Coghlan > wrote:Also, event loop implementations are allowed to offer additional APIs > on their implementation. If the need for multiple handlers per FD only > exists on those platforms where the platform's event loop supports it, > no harm is done if the functionality is only available through a > platform-specific API. > Sure, but since we know this capability is offered by multiple event loops, it would be good if there was a defined way to go about exposing it. > But still, I don't understand the use case. Possibly it is using file > descriptors as a more general signaling mechanism? That sounds pretty > platform specific anyway (on Windows, FDs must represent sockets). > > If someone shows me a real-world use case I may change my mind. > The most likely use case that comes to mind is monitoring and debugging (i.e. the event loop equivalent of a sys.settrace). Being able to tap into a datastream (e.g. to dump it to a console or pipe it to a monitoring process) can be really powerful, and being able to do it at the Python level means you have this kind of capability even without root access to the machine to run Wireshark. There are other more obscure signal analysis use cases that occur to me, but those could readily be handled with a custom transport implementation that duplicated that data stream, so I don't think there's any reason to worry about those. > Related, the protocol/transport API design may end up needing to consider > > the gather/scatter problem (i.e. fanning out data from a single > transport to > > multiple consumers, as well as feeding data from multiple producers into > a > > single underlying transport). Actual *implementations* of such tools > > shouldn't be needed in the standard suite, but at least understanding how > > you would go about writing multiplexers and demultiplexers can be a good > > test of a stacked I/O design. > > Twisted supports this for writing through its writeSequence(), which > appears in Tulip and PEP 3156 as writelines(). (Though IIRC Glyph told > me that Twisted rarely uses the platform's scatter/gather primitives, > because they are so damn hard to use, and the kernel implementation > often just joins the buffers together before passing it to the regular > send()...) > > But regardless, I don't think scatter/gather would use multiple > callbacks per FD. > > I think it would be really hard to benefit from reading into multiple > buffers in Python. > Sorry, I wasn't quite clear on what I meant by gather/scatter and it's more a protocol thing than an event loop thing. Specifically, gather/scatter interfaces are most useful for multiplexed transports. The ones I'm particularly familiar with are traditional telephony transports like E1 links, with 15 time-division-multiplexed channels on the wire (and a signalling timeslot), as well a few different HF comms protocols. When reading from one of those, you have a demultiplexing component which is reading the serial data coming in on the wire and making it look like 15 distinct data channels from the application's point of view. Similarly, the output multiplexer takes 15 streams of data from the application and interleaves them into the single stream on the wire. The rise of packet switching means that sharing connections like that is increasingly less common, though, so gather/scatter devices are correspondingly less useful in a networking context. The only modern use cases I can think of that someone might want to handle with Python are things like sharing a single USB or classic serial connection amongst multiple data streams. However, I suspect the standard transport and protocol API definitions already proposed should also suffice for the gather/scatter use case, as such a component would largely work like any other protocol-as-transport adapter, with the difference being that there would be a many-to-one relationship between the number of interfaces on the application side and those on the communications side. (Technically, gather/scatter components can also be used the other way around to distribute a single data stream across multi transports, but that use case is even less likely to come up when programming in Python. Multi-channel HF data comms is the only possibility that really comes to mind) -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Dec 18 08:29:55 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 18 Dec 2012 08:29:55 +0100 Subject: [Python-ideas] async: feedback on EventLoop API References: Message-ID: <20121218082955.77e325e0@pitrou.net> On Mon, 17 Dec 2012 20:01:18 -0800 Guido van Rossum wrote: > [Multiple calls per FD] > >> That makes sense. If we wanted to be fancy we could have several > >> different APIs: add (must not be set), set (may be set), replace (must > >> be set). But I think just offering the add and remove APIs is nicely > >> minimalistic and lets you do everything else with ease. (I'll make the > >> remove API return True if it did remove something, False otherwise.) > > > Perhaps the best bet would be to have the standard API allow multiple > > callbacks, and emulate that on systems which don't natively support multiple > > callbacks for a single event? > > Hm. AFAIK Twisted doesn't support this either. Antoine, do you know? I > didn't see it in the Tornado event loop either. I think neither Twisted nor Tornado support it. add_reader() / add_writer() APIs are not for the end user, they are a building block for the framework to write higher-level abstractions. (although, Tornado being quite low-level, you can end up having to use add_reader() / add_writer() anyway - e.g. for UDP) It also doesn't seem to me to make a lot of sense to allow multiplexing at the event loop level. It is probably a protocol- or transport- level feature (depending on the protocol and transport, obviously :-)). Nick mentions debugging / monitoring, but I don't understand how you do that with a write callback (or a read callback, actually, since reading from a socket will consume the data and make it unavailable for other readers). You really need to do it at a protocol/transport's write()/data_received() level. Regards Antoine. From benoitc at gunicorn.org Tue Dec 18 08:25:17 2012 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Tue, 18 Dec 2012 08:25:17 +0100 Subject: [Python-ideas] Late to the async party (PEP 3156) In-Reply-To: <20121216111602.383ebf4d@pitrou.net> References: <50CD2592.5010507@urandom.ca> <20121216111602.383ebf4d@pitrou.net> Message-ID: <37A96766-6709-4B85-9005-7221A753A2FF@gunicorn.org> On Dec 16, 2012, at 11:16 AM, Antoine Pitrou wrote: > On Sat, 15 Dec 2012 21:37:15 -0800 > Guido van Rossum wrote: >> Hi Jason, >> >> I don't think you've missed anything. I had actually planned to keep >> PEP 3156 unpublished for a bit longer, since I'm not done writing the >> reference implementation -- I'm sure that many of the issues currently >> marked open or TBD will be resolved that way. There hasn't been any >> public discussion since the last threads on python-ideas some weeks >> ago -- however I've met in person with some Twisted folks and >> exchanged private emails with some other interested parties. > > For the record, have you looked at the pyuv API? It's rather nicely > orthogonal, although it lacks a way to stop the event loop. > https://pyuv.readthedocs.org/en > For now the only way to stop the event loop is either to stop any events in trigger its execution in a loop: while True: if loop.run_once(): ? continue If you have any questions about it I can help. I plan to use it in my own lib and already use it in gaffer [1]. One of the advantage of libuv is its multi-platform support: on windows it is using IOCP, on unix, plain sockets apis , etc? - beno?t [1] http://github.com/benoitc/gaffer From ncoghlan at gmail.com Tue Dec 18 08:39:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 18 Dec 2012 17:39:39 +1000 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <20121218082955.77e325e0@pitrou.net> References: <20121218082955.77e325e0@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 5:29 PM, Antoine Pitrou wrote: > Nick mentions debugging / monitoring, but I don't understand how you do > that with a write callback (or a read callback, actually, since > reading from a socket will consume the data and make it unavailable > for other readers). You really need to do it at a protocol/transport's > write()/data_received() level. > Yeah, monitoring probably falls into the same gather/scatter design model as demultiplexing (receive side) and multi-channel transports (transmit side). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Tue Dec 18 08:26:00 2012 From: geertj at gmail.com (Geert Jansen) Date: Tue, 18 Dec 2012 08:26:00 +0100 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 1:00 AM, Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 2:11 PM, Antoine Pitrou wrote: >> On Tue, 18 Dec 2012 11:00:35 +1300 >> Greg Ewing wrote: >>> Guido van Rossum wrote: >>> > (*) Most event loops I've seen use e.g. 30 seconds or 1 hour as >>> > infinity, with the idea that if somehow a race condition added >>> > something to the ready queue just as we went to sleep, and there's no >>> > I/O at all, the system will recover eventually. >>> >>> I don't see how such a race condition can occur in a >>> cooperative multitasking system. There are no true >>> interrupts that can cause something to happen when >>> you're not expecting it. So I'd say let infinity >>> really mean infinity. >> >> Most event loops out there allow you to schedule callbacks from other >> (preemptive, OS-level) threads. > > That's what call_soon_threadsafe() is for. But bugs happen (in either > user code or library code). And yes, call_soon_threadsafe() will use a > self-pipe on UNIX. (I hope someone else will write the Windows main > loop.) I needed a self-pipe on Windows before. See below. With this, the select() based loop might work unmodified on Windows. https://gist.github.com/4325783 Of course it wouldn't be as efficient as an IOCP based loop. Regards, Geert From tjreedy at udel.edu Tue Dec 18 10:06:39 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 18 Dec 2012 04:06:39 -0500 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On 12/17/2012 10:26 PM, Nick Coghlan wrote: > On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing > > wrote: > > On 12/16/2012 04:41 PM, Guido van Rossum wrote: >> I think of graphs and trees as patterns, not data structures. > > How do you draw line between what is data structure and what is > pattern ? > > > A rough rule of thumb is that if it's harder to remember the > configuration options in the API than it is to just write a > purpose-specific function, it's probably better as a pattern that can be > tweaked for a given use case than it is as an actual data structure. > > More generally, ABCs and magic methods are used to express patterns > (like iteration), which may be implemented by various data structures. > > A graph library that focused on defining a good abstraction (and > adapters) that allowed graph algorithms to be written that worked with > multiple existing Python graph data structures could be quite interesting. I was just thinking that what is needed, at least as a first step, is a graph api, like the db api, that would allow the writing of algorithms to one api and adapters to various implementations. I expect to be writing some graph algorithms (in Python) in the next year and will try to keep that idea in mind and see if it makes any sense, versus just whipping up a implementation that fits the particular problem. -- Terry Jan Reedy From solipsis at pitrou.net Tue Dec 18 11:01:36 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 18 Dec 2012 11:01:36 +0100 Subject: [Python-ideas] PEP 3156 feedback Message-ID: <20121218110136.1f85cfae@pitrou.net> Hello, Here is my own feedback on the in-progress PEP 3156. Please discard it if it's too early to give feedback :-)) Event loop API -------------- I would like to say that I prefer Tornado's model: for each primitive provided by Tornado, you can pass an explicit Loop instance which you instantiated manually. There is no module function or policy object hiding this mechanism: it's simple, explicit and flexible (in other words: if you want a per-thread event loop, just do it yourself using TLS :-)). There are some requirements I've found useful: - being able to instantiate multiple loops, either at the same time or serially (this is especially nice for unit tests; Twisted has to use a dedicated test runner just because their reactor doesn't support multiple instances or restarts) - being able to stop a loop explicitly: having to unregister all handlers or delayed calls is a PITA in non-trivial situations (for example you might have multiple protocol instances, each with a bunch of timers, some perhaps even in third-party libraries; keeping track of all this is the event loop's job) * The optional sock_*() methods: how about having different ABCs, e.g. the EventLoop ABC for basic behaviour, and the NetworkedEventLoop ABC adding the socket helpers? Protocols and transports ------------------------ We probably want to provide a Protocol base class and encourage people to inherit it. It can provide useful functionality (perhaps write() and writelines() shims? it can make mocking easier). My own opinion about Twisted's API is that the Factory class is often useless, and adds a cognitive burden. If you need a place to track all protocols of a given kind (e.g. all connections), you can do it yourself. Also, the Factory implies that you don't control how exactly your protocol gets instantiated (unless you override some method on the Factory I'm missing the name of: it is cumbersome). So, when creating a client, I would pass it a protocol instance. When creating a server, I would pass it a protocol class. Here the base Protocol class comes into play, its __init__() could take the transport as argument and set the "transport" attribute with it. Further args could be optionally passed to the constructor: class MyProtocol(Protocol): def __init__(self, transport, my_personal_attribute): Protocol.__init__(self, transport) self.my_personal_attribute = my_personal_attribute ... def listen(ioloop): # Each new connection will instantiate a MyProtocol with "foobar" # for my_personal_attribute. ioloop.listen_tcp(("0.0.0.0", 8080), MyProtocol, "foobar") (The hypothetical listen_tcp() is just a name: perhaps it's actually start_serving(). It should accept any callable, not just a class: therefore, you can define complex behaviour if you like) I think the transport / protocol registration must be done early, not in connection_made(). Sometimes you will want to do things on a protocol before you know a connection is established, for example queue things to write on the transport. An use case is a reconnecting TCP client: the protocol will continue existing at times when the connection is down. Unconnected protocols need their own base class and API: data_received()'s signature should be (data, remote_addr) or (remote_addr, data). Same for write(). * writelines() sounds ambiguous for datagram protocols: does it send those "lines" as a single datagram, or one separate datagram per "line"? The equivalent code suggests the latter, but which one makes more sense? * connection_lost(): you definitely want to know whether it's you or the other end who closed the connection. Typically, if the other end closed the connection, you will have to run some cleanup steps, and perhaps even log an error somewhere (if the connection was closed unexpectedly). Actually, I'm not sure it's useful to call connection_lost() when you closed the connection yourself: are there any use cases? Regards Antoine. From shane at umbrellacode.com Tue Dec 18 11:47:41 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 02:47:41 -0800 Subject: [Python-ideas] Python-ideas Digest, Vol 73, Issue 38 In-Reply-To: References: Message-ID: <84FBCF87-EB2E-4887-9184-C5CE3B074ABA@umbrellacode.com> Sending the demultiplexed data through 15 pipes so the application actually is dealing with 15 streams of data using single callback notifications from the event loop seems like the more KISS approach, in this case? Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Dec 17, 2012, at 11:21 PM, python-ideas-request at python.org wrote: > Send Python-ideas mailing list submissions to > python-ideas at python.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.python.org/mailman/listinfo/python-ideas > or, via email, send a message with subject or body 'help' to > python-ideas-request at python.org > > You can reach the person managing the list at > python-ideas-owner at python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Python-ideas digest..." > Today's Topics: > > 1. Re: Graph class (Nick Coghlan) > 2. Re: async: feedback on EventLoop API (Guido van Rossum) > 3. Re: async: feedback on EventLoop API (Nick Coghlan) > > From: Nick Coghlan > Subject: Re: [Python-ideas] Graph class > Date: December 17, 2012 7:26:38 PM PST > To: Hannu Krosing > Cc: Vinay Sajip , "python-ideas at python.org" > > > On Mon, Dec 17, 2012 at 9:28 AM, Hannu Krosing wrote: > On 12/16/2012 04:41 PM, Guido van Rossum wrote: >> I think of graphs and trees as patterns, not data structures. > > How do you draw line between what is data structure and what is pattern ? > > A rough rule of thumb is that if it's harder to remember the configuration options in the API than it is to just write a purpose-specific function, it's probably better as a pattern that can be tweaked for a given use case than it is as an actual data structure. > > More generally, ABCs and magic methods are used to express patterns (like iteration), which may be implemented by various data structures. > > A graph library that focused on defining a good abstraction (and adapters) that allowed graph algorithms to be written that worked with multiple existing Python graph data structures could be quite interesting. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > From: Guido van Rossum > Subject: Re: [Python-ideas] async: feedback on EventLoop API > Date: December 17, 2012 8:01:18 PM PST > To: Nick Coghlan , Antoine Pitrou > Cc: python-ideas at python.org > > > On Mon, Dec 17, 2012 at 7:20 PM, Nick Coghlan wrote: >> On Tue, Dec 18, 2012 at 10:40 AM, Guido van Rossum wrote: > > [A better name for DelayedCall] >>> Anyway, Handler sounds like a pretty good name. Let me think it over. > >> Is DelayedCall a subclass of Future, like Task? If so, FutureCall might >> work. > > No, they're completely related. (I'm even thinking of renaming its > cancel() to avoid the confusion? > > I still like Handler best. In fact, if I'd thought of Handler before, > I wouldn't have asked for a better name. :-) > > Going once, going twice... > > [Wall-clock timers] >> If someone really does want a wall-clock timer with a given granularity, it >> can be handled by adding a repeating timer with that granularity (with the >> obvious consequences for low power modes). > > +1. > > [Multiple calls per FD] >>> That makes sense. If we wanted to be fancy we could have several >>> different APIs: add (must not be set), set (may be set), replace (must >>> be set). But I think just offering the add and remove APIs is nicely >>> minimalistic and lets you do everything else with ease. (I'll make the >>> remove API return True if it did remove something, False otherwise.) > >> Perhaps the best bet would be to have the standard API allow multiple >> callbacks, and emulate that on systems which don't natively support multiple >> callbacks for a single event? > > Hm. AFAIK Twisted doesn't support this either. Antoine, do you know? I > didn't see it in the Tornado event loop either. > >> Otherwise, I don't see how an event loop could efficiently expose access to >> the multiple callback APIs without requiring awkward fallbacks in the code >> interacting with the event loop. Given that the natural fallback >> implementation is reasonably clear (i.e. a single callback that calls all of >> the other callbacks), why force reimplementing that on users rather than >> event loop authors? > > But what's the use case? > > I don't think our goal should be to offer APIs for any feature that > any event loop might offer. It's not quite a least-common denominator > either though -- it's about offering commonly needed functionality, > and interoperability. > > Also, event loop implementations are allowed to offer additional APIs > on their implementation. If the need for multiple handlers per FD only > exists on those platforms where the platform's event loop supports it, > no harm is done if the functionality is only available through a > platform-specific API. > > But still, I don't understand the use case. Possibly it is using file > descriptors as a more general signaling mechanism? That sounds pretty > platform specific anyway (on Windows, FDs must represent sockets). > > If someone shows me a real-world use case I may change my mind. > >> Related, the protocol/transport API design may end up needing to consider >> the gather/scatter problem (i.e. fanning out data from a single transport to >> multiple consumers, as well as feeding data from multiple producers into a >> single underlying transport). Actual *implementations* of such tools >> shouldn't be needed in the standard suite, but at least understanding how >> you would go about writing multiplexers and demultiplexers can be a good >> test of a stacked I/O design. > > Twisted supports this for writing through its writeSequence(), which > appears in Tulip and PEP 3156 as writelines(). (Though IIRC Glyph told > me that Twisted rarely uses the platform's scatter/gather primitives, > because they are so damn hard to use, and the kernel implementation > often just joins the buffers together before passing it to the regular > send()...) > > But regardless, I don't think scatter/gather would use multiple > callbacks per FD. > > I think it would be really hard to benefit from reading into multiple > buffers in Python. > >>>> Just enabling/disabling these events is a bit more friendly to the >>>> programmer IMHO than having to cancel and recreate them when needed. >>> >>> The methods on the Transport class take care of this at a higher >>> level: pause() and resume() to suspend reading, and the write() method >>> takes care of buffering and so on. > >> And the main advantage of handling that at a higher level is that suitable >> buffering designs are going to be transport specific. > > +1 > > -- > --Guido van Rossum (python.org/~guido) > > > > > From: Nick Coghlan > Subject: Re: [Python-ideas] async: feedback on EventLoop API > Date: December 17, 2012 11:21:37 PM PST > To: Guido van Rossum > Cc: Antoine Pitrou , python-ideas at python.org > > > On Tue, Dec 18, 2012 at 2:01 PM, Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 7:20 PM, Nick Coghlan wrote:Also, event loop implementations are allowed to offer additional APIs > on their implementation. If the need for multiple handlers per FD only > exists on those platforms where the platform's event loop supports it, > no harm is done if the functionality is only available through a > platform-specific API. > > Sure, but since we know this capability is offered by multiple event loops, it would be good if there was a defined way to go about exposing it. > > But still, I don't understand the use case. Possibly it is using file > descriptors as a more general signaling mechanism? That sounds pretty > platform specific anyway (on Windows, FDs must represent sockets). > > If someone shows me a real-world use case I may change my mind. > > The most likely use case that comes to mind is monitoring and debugging (i.e. the event loop equivalent of a sys.settrace). Being able to tap into a datastream (e.g. to dump it to a console or pipe it to a monitoring process) can be really powerful, and being able to do it at the Python level means you have this kind of capability even without root access to the machine to run Wireshark. > > There are other more obscure signal analysis use cases that occur to me, but those could readily be handled with a custom transport implementation that duplicated that data stream, so I don't think there's any reason to worry about those. > > > Related, the protocol/transport API design may end up needing to consider > > the gather/scatter problem (i.e. fanning out data from a single transport to > > multiple consumers, as well as feeding data from multiple producers into a > > single underlying transport). Actual *implementations* of such tools > > shouldn't be needed in the standard suite, but at least understanding how > > you would go about writing multiplexers and demultiplexers can be a good > > test of a stacked I/O design. > > Twisted supports this for writing through its writeSequence(), which > appears in Tulip and PEP 3156 as writelines(). (Though IIRC Glyph told > me that Twisted rarely uses the platform's scatter/gather primitives, > because they are so damn hard to use, and the kernel implementation > often just joins the buffers together before passing it to the regular > send()...) > > But regardless, I don't think scatter/gather would use multiple > callbacks per FD. > > I think it would be really hard to benefit from reading into multiple > buffers in Python. > > Sorry, I wasn't quite clear on what I meant by gather/scatter and it's more a protocol thing than an event loop thing. > > Specifically, gather/scatter interfaces are most useful for multiplexed transports. The ones I'm particularly familiar with are traditional telephony transports like E1 links, with 15 time-division-multiplexed channels on the wire (and a signalling timeslot), as well a few different HF comms protocols. When reading from one of those, you have a demultiplexing component which is reading the serial data coming in on the wire and making it look like 15 distinct data channels from the application's point of view. Similarly, the output multiplexer takes 15 streams of data from the application and interleaves them into the single stream on the wire. > > The rise of packet switching means that sharing connections like that is increasingly less common, though, so gather/scatter devices are correspondingly less useful in a networking context. The only modern use cases I can think of that someone might want to handle with Python are things like sharing a single USB or classic serial connection amongst multiple data streams. However, I suspect the standard transport and protocol API definitions already proposed should also suffice for the gather/scatter use case, as such a component would largely work like any other protocol-as-transport adapter, with the difference being that there would be a many-to-one relationship between the number of interfaces on the application side and those on the communications side. > > (Technically, gather/scatter components can also be used the other way around to distribute a single data stream across multi transports, but that use case is even less likely to come up when programming in Python. Multi-channel HF data comms is the only possibility that really comes to mind) > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From amauryfa at gmail.com Tue Dec 18 11:54:40 2012 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 18 Dec 2012 11:54:40 +0100 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: <20121218110136.1f85cfae@pitrou.net> References: <20121218110136.1f85cfae@pitrou.net> Message-ID: 2012/12/18 Antoine Pitrou > My own opinion about Twisted's API is that the Factory class is often > useless, and adds a cognitive burden. If you need a place to track all > protocols of a given kind (e.g. all connections), you can do it > yourself. Also, the Factory implies that you don't control how exactly > your protocol gets instantiated (unless you override some method on the > Factory I'm missing the name of: it is cumbersome). > > So, when creating a client, I would pass it a protocol instance. > Factories are useful to implement clients that reconnect automatically: the framework needs to spawn a new protocol object. The connect method could take a protocol class, but how would you implement the reconnect strategy? When creating a server, I would pass it a protocol class. Here the base > Protocol class comes into play, its __init__() could take the transport > as argument and set the "transport" attribute with it. Further args > could be optionally passed to the constructor: > > class MyProtocol(Protocol): > def __init__(self, transport, my_personal_attribute): > Protocol.__init__(self, transport) > self.my_personal_attribute = my_personal_attribute > ... > > def listen(ioloop): > # Each new connection will instantiate a MyProtocol with "foobar" > # for my_personal_attribute. > ioloop.listen_tcp(("0.0.0.0", 8080), MyProtocol, "foobar") > This is indeed very similar to a factory function (a callback that creates the protocol) Anything with a __call__ would be acceptable IMO. (The hypothetical listen_tcp() is just a name: perhaps it's actually > start_serving(). It should accept any callable, not just a class: > therefore, you can define complex behaviour if you like) > > > I think the transport / protocol registration must be done early, not in > connection_made(). Sometimes you will want to do things on a protocol > before you know a connection is established, for example queue things > to write on the transport. An use case is a reconnecting TCP client: > the protocol will continue existing at times when the connection is > down. > We should be clear on what a protocol is. In my mind, a protocol manages the events on a given transport; it will also probably buffer data. For example, data for the HTTP protocol always starts with "GET ... HTTP/1.0\r\n". If a protocol can change transports in the middle, it can be difficult to track which socket you write to or receive from, and manage your buffers correctly. An alternative could be a "reset()" method, but then we are not far from a factory class. > * connection_lost(): you definitely want to know whether it's you or the > other end who closed the connection. Typically, if the other end > closed the connection, you will have to run some cleanup steps, and > perhaps even log an error somewhere (if the connection was closed > unexpectedly). > Actually, I'm not sure it's useful to call connection_lost() when you > closed the connection yourself: are there any use cases? > The "yourself" can in another part of the code; some protocols will certainly close the connection when they receive unexpected data. Also, this example from Twisted documentation: attempt = myEndpoint.connect(myFactory) reactor.callback(30, attempt.cancel) Even if these lines appear in my code, it's easier to have all errors caught in one place. The alternative would be: attempt = myEndpoint.connect(myFactory) def cancel_attempt_and_notify_error(): attempt.cancel() notify_error("cancelled after timeout") reactor.callback(30, cancel_attempt_and_notify_error) -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Dec 18 12:27:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 18 Dec 2012 12:27:30 +0100 Subject: [Python-ideas] PEP 3156 feedback References: <20121218110136.1f85cfae@pitrou.net> Message-ID: <20121218122730.4b230781@pitrou.net> Le Tue, 18 Dec 2012 11:54:40 +0100, "Amaury Forgeot d'Arc" a ?crit : > 2012/12/18 Antoine Pitrou > > > > My own opinion about Twisted's API is that the Factory class is > > often useless, and adds a cognitive burden. If you need a place to > > track all protocols of a given kind (e.g. all connections), you can > > do it yourself. Also, the Factory implies that you don't control > > how exactly your protocol gets instantiated (unless you override > > some method on the Factory I'm missing the name of: it is > > cumbersome). > > > > So, when creating a client, I would pass it a protocol instance. > > > > Factories are useful to implement clients that reconnect > automatically: the framework needs to spawn a new protocol object. > The connect method could take a protocol class, > but how would you implement the reconnect strategy? I view it differently: the *same* protocol *instance* should be re-used for the new connection. That's because the protocol can keep data that lasts longer than a single connection (many protocols have session ids or other state that can persist accross connections: this is typical of RPC APIs affecting the state of an always-running equipment). > We should be clear on what a protocol is. In my mind, a protocol > manages the events on a given transport; it will also probably buffer > data. For example, data for the HTTP protocol always starts with > "GET ... HTTP/1.0\r\n". > If a protocol can change transports in the middle, it can be > difficult to track > which socket you write to or receive from, and manage your buffers > correctly. > > An alternative could be a "reset()" method, but then we are not far > from a factory class. Well, the problem when switching transports is that you want to: - wait for all outgoing data to be flushed - migrate all pending incoming data to the new transport IMO, this begs for a solution on the transport side, not on the client side (some kind of migrate() API on the transport?). In other words, you switch transports, but you keep the same protocol instance: when your FTP protocol switches from plain TCP to TLS, it remembers the current directory, etc. > Also, this example from Twisted documentation: > attempt = myEndpoint.connect(myFactory) > reactor.callback(30, attempt.cancel) > Even if these lines appear in my code, it's easier to have all errors > caught in one place. Ah, I think there's a misunderstanding. Protocol.connection_lost() should be called when an *established* connection is lost. Indeed, there should be a separate Protocol.connection_failed() method for when the connect() calls never succeeds (either times out or returns with an error). And this is a reason why it is better for the transport to be registered early on the protocol (or vice-versa) :-) Regards Antoine. From oscar.j.benjamin at gmail.com Tue Dec 18 13:08:50 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 18 Dec 2012 12:08:50 +0000 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On 18 December 2012 09:06, Terry Reedy wrote: > On 12/17/2012 10:26 PM, Nick Coghlan wrote: >> A graph library that focused on defining a good abstraction (and >> adapters) that allowed graph algorithms to be written that worked with >> multiple existing Python graph data structures could be quite interesting. > > > I was just thinking that what is needed, at least as a first step, is a > graph api, like the db api, that would allow the writing of algorithms to > one api and adapters to various implementations. I expect to be writing some > graph algorithms (in Python) in the next year and will try to keep that idea > in mind and see if it makes any sense, versus just whipping up a > implementation that fits the particular problem. I'd be interested to use (and possibly to contribute to) a graph library of this type on PyPI. I have some suggestions about the appropriate level of abstraction below. The graph algorithms that are the most useful can be written in terms of two things: 1) An iterator over the nodes 2) A way to map each node into an iterator over its children (or partners) It is also required to place some restriction on how the nodes can be used. Descriptions of graph algorithms refer to marking/colouring the nodes of a graph. If the nodes are instances of user defined classes, then you can do this in a relatively literal sense by adding attributes to the nodes, but this is fragile in the event of errors, not thread-safe, etc. Really though, the idea of marking the nodes just means that you need an O(1) method for determining if a node has been marked or checking the value that it was marked with. In Python this is easily done with sets and dicts, which is not fragile and is thread-safe etc. (provided the graph is not being mutated). This requires that the nodes be hashable. In the thread about deleting keys from a dict yesterday it occurred to me (after MRAB's suggestion) that you can still apply the same methods to non-hashable objects. That is, provided you have a situation where node equality is determined by node identity you can just use id(node) in each hash table. While this method works equally well for user-defined class instances it does not work for immutable types where, for example, two strings may be equal but have differing id()s. One way to cover all cases is simply to provide a hashkey argument to each algorithm that defaults to the identity function (lambda x: x), but may be replaced by the id function in appropriate cases. This means that all of the graph algorithms that I would want can be implemented with a basic signature that goes like: def strongly_connected(nodes, edgesfunc, hashkey=None): ''' `nodes` is an iterable over the nodes of the graph. `edgesfunc(node)` is an iterable over the children of node. `hashkey` is an optional key function to apply when adding nodes to a hash-table. For mutable objects where identity is equality use `hashkey=id`. ''' if hashkey is None: # Would be great to have operator.identity here hashkey = lambda x: x There are some cases where optimisation is possible given additional information. One example: I think it is possible to conclude that an undirected graph contains at least one cycle if |E|>=|V|, so in this case an optional hint parameter could give a shortcut for some graphs. Generally, though, there are few algorithms where other quantities are either required or are sufficient for all input graphs (exceptions to the sufficient part of this rule are typically the relatively easy algorithms like determining the mean degree). Once you have algorithms that are implemented in this way it becomes possible to piece them together as a concrete graph class, a mixin, an ABC, a decorator that works like @functools.total_ordering or some other class-based idiom. Crucially, though, unlike all of these class based approaches, defining the algorithms firstly in a functional way makes it easy to apply them to any data structure composed of elementary types or of classes that you yourself cannot write or subclass. Oscar From shane at umbrellacode.com Tue Dec 18 13:37:58 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 04:37:58 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <84FBCF87-EB2E-4887-9184-C5CE3B074ABA@umbrellacode.com> References: <84FBCF87-EB2E-4887-9184-C5CE3B074ABA@umbrellacode.com> Message-ID: <5E168314-653B-4A4E-AFD1-D438EAADEE39@umbrellacode.com> Sorry for the utter lack of formatting etiquette in my previous responses everyone? My message (the next sentence) was in response to the message below. Sorry for the confusion.. Sending the demultiplexed data through 15 pipes so the application actually is dealing with 15 streams of data using single callback notifications from the event loop seems like the more KIS approach, in this case? > From: Nick Coghlan > Subject: Re: [Python-ideas] async: feedback on EventLoop API > Date: December 17, 2012 11:21:37 PM PST > To: Guido van Rossum > Cc: Antoine Pitrou , python-ideas at python.org > > > > > > >> Sorry, I wasn't quite clear on what I meant by gather/scatter and it's more a protocol thing than an event loop thing. >> >> Specifically, gather/scatter interfaces are most useful for multiplexed transports. The ones I'm particularly familiar with are traditional telephony transports like E1 links, with 15 time-division-multiplexed channels on the wire (and a signalling timeslot), as well a few different HF comms protocols. When reading from one of those, you have a demultiplexing component which is reading the serial data coming in on the wire and making it look like 15 distinct data channels from the application's point of view. Similarly, the output multiplexer takes 15 streams of data from the application and interleaves them into the single stream on the wire. >> >> The rise of packet switching means that sharing connections like that is increasingly less common, though, so gather/scatter devices are correspondingly less useful in a networking context. The only modern use cases I can think of that someone might want to handle with Python are things like sharing a single USB or classic serial connection amongst multiple data streams. However, I suspect the standard transport and protocol API definitions already proposed should also suffice for the gather/scatter use case, as such a component would largely work like any other protocol-as-transport adapter, with the difference being that there would be a many-to-one relationship between the number of interfaces on the application side and those on the communications side. >> >> (Technically, gather/scatter components can also be used the other way around to distribute a single data stream across multi transports, but that use case is even less likely to come up when programming in Python. Multi-channel HF data comms is the only possibility that really comes to mind) >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Tue Dec 18 15:24:01 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 18 Dec 2012 16:24:01 +0200 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin wrote: > On 18 December 2012 09:06, Terry Reedy wrote: > > On 12/17/2012 10:26 PM, Nick Coghlan wrote: > >> A graph library that focused on defining a good abstraction (and > >> adapters) that allowed graph algorithms to be written that worked with > >> multiple existing Python graph data structures could be quite > interesting. > > > > > > I was just thinking that what is needed, at least as a first step, is a > > graph api, like the db api, that would allow the writing of algorithms to > > one api and adapters to various implementations. I expect to be writing > some > > graph algorithms (in Python) in the next year and will try to keep that > idea > > in mind and see if it makes any sense, versus just whipping up a > > implementation that fits the particular problem. > > I'd be interested to use (and possibly to contribute to) a graph > library of this type on PyPI. I have some suggestions about the > appropriate level of abstraction below. > > The graph algorithms that are the most useful can be written in terms > of two things: > 1) An iterator over the nodes > 2) A way to map each node into an iterator over its children (or partners) > > Some graphs don't care for the nodes, all their information is in the edges. That's why most graph frameworks have iter_edges and iter_nodes functions. I'm not sure what's the clean way to represent the optional directionality of edges though. Some example API's from networkx: http://networkx.lanl.gov/reference/classes.html http://networkx.lanl.gov/reference/classes.digraph.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Dec 18 17:06:29 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 18 Dec 2012 11:06:29 -0500 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On 12/18/2012 9:24 AM, Yuval Greenfield wrote: > > On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin > > wrote: > I'd be interested to use (and possibly to contribute to) a graph > library of this type on PyPI. I have some suggestions about the > appropriate level of abstraction below. > > The graph algorithms that are the most useful can be written in terms > of two things: > 1) An iterator over the nodes Or iterable if re-iteration is needed. > 2) A way to map each node into an iterator over its children (or > partners) A callable could be either an iterator class or a generator function. > Some graphs don't care for the nodes, all their information is in the > edges. That's why most graph frameworks have iter_edges and iter_nodes > functions. I'm not sure what's the clean way to represent the > optional directionality of edges though. > > Some example API's from networkx: > > http://networkx.lanl.gov/reference/classes.html > http://networkx.lanl.gov/reference/classes.digraph.html Thank you both the the 'thought food'. Defining things in terms of iterables and iterators instead of (for instance) sets is certainly the Python3 way. Oscar, I don't consider hashability an issue. General class instances are hashable by default. One can even consider such instances as hashable facades for unhashable dicts. Giving each instance a list attribute does the same for lists. The more important question, it seems to me, is whether to represent nodes by counts and let the algorithm do its bookkeeping in private structures, or to represent them by externally defined instances that the algorithm mutates. -- Terry Jan Reedy From guido at python.org Tue Dec 18 18:03:07 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 09:03:07 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: Message-ID: On Mon, Dec 17, 2012 at 11:21 PM, Nick Coghlan wrote: > On Tue, Dec 18, 2012 at 2:01 PM, Guido van Rossum wrote: >> >> On Mon, Dec 17, 2012 at 7:20 PM, Nick Coghlan >> wrote:Also, event loop implementations are allowed to offer additional APIs >> on their implementation. If the need for multiple handlers per FD only >> exists on those platforms where the platform's event loop supports it, >> no harm is done if the functionality is only available through a >> platform-specific API. > > > Sure, but since we know this capability is offered by multiple event loops, > it would be good if there was a defined way to go about exposing it. Only if there's a use case. >> But still, I don't understand the use case. Possibly it is using file >> descriptors as a more general signaling mechanism? That sounds pretty >> platform specific anyway (on Windows, FDs must represent sockets). >> >> If someone shows me a real-world use case I may change my mind. > The most likely use case that comes to mind is monitoring and debugging > (i.e. the event loop equivalent of a sys.settrace). Being able to tap into a > datastream (e.g. to dump it to a console or pipe it to a monitoring process) > can be really powerful, and being able to do it at the Python level means > you have this kind of capability even without root access to the machine to > run Wireshark. I can't see how that would work. Once one callback reads the data the other callback won't see it. There's also the issue of ordering. Solving this seems easier by implementing a facade for the event loop that wraps certain callbacks, and installing it using a custom event loop policy. So, I still don't see the use case. > There are other more obscure signal analysis use cases that occur to me, but > those could readily be handled with a custom transport implementation that > duplicated that data stream, so I don't think there's any reason to worry > about those. Right, that seems a better way to go about it. >> Twisted supports this for writing through its writeSequence(), which >> appears in Tulip and PEP 3156 as writelines(). (Though IIRC Glyph told >> me that Twisted rarely uses the platform's scatter/gather primitives, >> because they are so damn hard to use, and the kernel implementation >> often just joins the buffers together before passing it to the regular >> send()...) >> >> But regardless, I don't think scatter/gather would use multiple >> callbacks per FD. >> >> I think it would be really hard to benefit from reading into multiple >> buffers in Python. > Sorry, I wasn't quite clear on what I meant by gather/scatter and it's more > a protocol thing than an event loop thing. > > Specifically, gather/scatter interfaces are most useful for multiplexed > transports. The ones I'm particularly familiar with are traditional > telephony transports like E1 links, with 15 time-division-multiplexed > channels on the wire (and a signalling timeslot), as well a few different HF > comms protocols. When reading from one of those, you have a demultiplexing > component which is reading the serial data coming in on the wire and making > it look like 15 distinct data channels from the application's point of view. > Similarly, the output multiplexer takes 15 streams of data from the > application and interleaves them into the single stream on the wire. > > The rise of packet switching means that sharing connections like that is > increasingly less common, though, so gather/scatter devices are > correspondingly less useful in a networking context. The only modern use > cases I can think of that someone might want to handle with Python are > things like sharing a single USB or classic serial connection amongst > multiple data streams. However, I suspect the standard transport and > protocol API definitions already proposed should also suffice for the > gather/scatter use case, as such a component would largely work like any > other protocol-as-transport adapter, with the difference being that there > would be a many-to-one relationship between the number of interfaces on the > application side and those on the communications side. > > (Technically, gather/scatter components can also be used the other way > around to distribute a single data stream across multi transports, but that > use case is even less likely to come up when programming in Python. > Multi-channel HF data comms is the only possibility that really comes to > mind) I'm glad you talked yourself out of that objection. :-) -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Dec 18 17:59:55 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 08:59:55 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On Mon, Dec 17, 2012 at 11:26 PM, Geert Jansen wrote: > I needed a self-pipe on Windows before. See below. With this, the > select() based loop might work unmodified on Windows. > > https://gist.github.com/4325783 Thanks! Before I paste this into Tulip, is there any kind of copyright on this? > Of course it wouldn't be as efficient as an IOCP based loop. The socket loop is definitely handy on Windows in a pinch. I have plans for an IOCP-based loop based on Richard Oudkerk's 'proactor' branch of Tulip v1, but I don't have a Windows machine to test it on ATM (hopefully that'll change once I am actually at Dropbox). -- --Guido van Rossum (python.org/~guido) From geertj at gmail.com Tue Dec 18 18:10:13 2012 From: geertj at gmail.com (Geert Jansen) Date: Tue, 18 Dec 2012 18:10:13 +0100 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 5:59 PM, Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 11:26 PM, Geert Jansen wrote: >> I needed a self-pipe on Windows before. See below. With this, the >> select() based loop might work unmodified on Windows. >> >> https://gist.github.com/4325783 > > Thanks! Before I paste this into Tulip, is there any kind of copyright on this? [include list] I wrote the code. I hereby put it in the public domain. Regards, Geert From guido at python.org Tue Dec 18 19:02:05 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 10:02:05 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: <20121218110136.1f85cfae@pitrou.net> References: <20121218110136.1f85cfae@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou wrote: > > Here is my own feedback on the in-progress PEP 3156. Please discard it > if it's too early to give feedback :-)) Thank you, it's very to the point. > Event loop API > -------------- > > I would like to say that I prefer Tornado's model: for each primitive > provided by Tornado, you can pass an explicit Loop instance which you > instantiated manually. > There is no module function or policy object hiding this mechanism: > it's simple, explicit and flexible (in other words: if you want a > per-thread event loop, just do it yourself using TLS :-)). It sounds though as if the explicit loop is optional, and still defaults to some global default loop? Having one global loop shared by multiple threads is iffy though. Only one thread should be *running* the loop, otherwise the loop can' be used as a mutual exclusion device. Worse, all primitives for adding and removing callbacks/handlers must be made threadsafe, and then basically the entire event loop becomes full of locks, which seems wrong to me. PEP 3156 lets the loop implementation choose the policy, which seems safer than letting the user choose a policy that may or may not be compatible with the loop's implementation. Steve Dower keeps telling me that on Windows 8 the loop is built into the OS. The Windows 8 loop also seems to be eager to use threads, so I don't know if it can be relied on to serialize callbacks, but there is probably a way to do that, or else the Python wrapper could add a lock around callbacks. > There are some requirements I've found useful: > > - being able to instantiate multiple loops, either at the same time or > serially (this is especially nice for unit tests; Twisted has to use > a dedicated test runner just because their reactor doesn't support > multiple instances or restarts) Serially, for unit tests: definitely. The loop policy has init_event_loop() for this, which forcibly creates a new loop. At the same time: that seems to be an esoteric use case and not favorable to interop with Twisted. I want the loop to be mostly out of the way of the user, at least for users using the high-level APIs (tasks, futures, transports, protocols). In fact, just for this reason it may be better if the protocol-creating methods had wrapper functions that just called get_event_loop() and then called the corresponding method on the loop, so the user code doesn't have to call get_event_loop() at all, ever (or at least, if you call it, you should feel a slight tinge of guilt about using a low-level API :-). > - being able to stop a loop explicitly: having to unregister all > handlers or delayed calls is a PITA in non-trivial situations (for > example you might have multiple protocol instances, each with a bunch > of timers, some perhaps even in third-party libraries; keeping track > of all this is the event loop's job) I've been convinced of that too. I'm just procrastinating on the implementation at this point. TBH the details of what you should put in your main program will probably change a few times before we're done... > * The optional sock_*() methods: how about having different ABCs, e.g. > the EventLoop ABC for basic behaviour, and the NetworkedEventLoop ABC > adding the socket helpers? Hm. That smells of Twisted's tree of interfaces, which I'm honestly trying to get away from (and Glyph didn't push back on that :-). I'm actually leaning towards requiring these for all loop implementations -- surely they can all be emulated using each other. But I'm not totally wedded to that either. I need more experience using the stuff first. And Steve Dower says he's not interested in any of the async I/O stuff (I suppose he means sockets), just in futures and coroutines. So maybe the socket operations do have to be optional. In that case, I propose to add inquiry functions that can tell you whether certain groups of APIs are supported. Though you can probably get away with hasattr(loop, 'sock_recv') and so on. > Protocols and transports > ------------------------ > > We probably want to provide a Protocol base class and encourage people > to inherit it. Glyph suggested that too, and hinted that it does some useful stuff that users otherwise forget. I'm a bit worried though that the functionality of the base implementation becomes the de-facto standard rather than the PEP. (Glyph mentions that the base class has a method that sets self.transport and without it lots of other stuff breaks.) > It can provide useful functionality (perhaps write() > and writelines() shims? it can make mocking easier). Those are transport methods though. > My own opinion about Twisted's API is that the Factory class is often > useless, and adds a cognitive burden. If you need a place to track all > protocols of a given kind (e.g. all connections), you can do it > yourself. Also, the Factory implies that you don't control how exactly > your protocol gets instantiated (unless you override some method on the > Factory I'm missing the name of: it is cumbersome). Yeah, Glyph complains that people laugh at Twisted for using factories. :-) > So, when creating a client, I would pass it a protocol instance. Heh. That's how I started, and Glyph told me to pass a protocol factory. It can just be a Protocol subclass though, as long as the constructor has the right signature. So maybe we can avoid calling it protocol_factory and name it protocol_class instead. I struggled with what to do if the socket cannot be connected and hence the transport not created. If you've already created the protocol you're in a bit of trouble at that point. I proposed to call connection_lost() in that case (without ever having called connection_made()) but Glyph suggested that would be asking for rare bugs (the connection_lost() code might not expect a half-initialized protocol instance). Glyph proposed instead that create_transport() should return a Future and the error should be that Future's exception, and I like that much better. > When creating a server, I would pass it a protocol class. Here the base > Protocol class comes into play, its __init__() could take the transport > as argument and set the "transport" attribute with it. Further args > could be optionally passed to the constructor: > > class MyProtocol(Protocol): > def __init__(self, transport, my_personal_attribute): > Protocol.__init__(self, transport) > self.my_personal_attribute = my_personal_attribute > ... > > def listen(ioloop): > # Each new connection will instantiate a MyProtocol with "foobar" > # for my_personal_attribute. > ioloop.listen_tcp(("0.0.0.0", 8080), MyProtocol, "foobar") > > (The hypothetical listen_tcp() is just a name: perhaps it's actually > start_serving(). It should accept any callable, not just a class: > therefore, you can define complex behaviour if you like) I agree that it should be a callable, not necessarily a class. I don't think it should take the transport -- that's what connection_made() is for. I don't think we should make the API have additional arguments either; you can use a lambda or functools.partial to pass those in. (There are too many other arguments to start_serving() to make it convenient or clear to have a *args, I think, though maybe we could rearrange the argument order.) > I think the transport / protocol registration must be done early, not in > connection_made(). Sometimes you will want to do things on a protocol > before you know a connection is established, for example queue things > to write on the transport. An use case is a reconnecting TCP client: > the protocol will continue existing at times when the connection is > down. Hm. That seems a pretty advanced use case. I think it is better handled by passing a "factory function" that returns a pre-created protocol: pr = MyProtocol(...) ev.create_transport(lambda: pr, host, port) However you do this, such a protocol object must expect multiple connection_made - connection_lost cycles, which sounds to me like asking for trouble. So maybe it's better to have a thin protocol class that is newly instantiated for each reconnection but given a pointer to a more permanent data structure that carries state between reconnections. > Unconnected protocols need their own base class and API: > data_received()'s signature should be (data, remote_addr) or > (remote_addr, data). Same for write(). You mean UDP? Let's put that off until later. But yes, it probably needs more thought. > * writelines() sounds ambiguous for datagram protocols: does it send > those "lines" as a single datagram, or one separate datagram per > "line"? The equivalent code suggests the latter, but which one makes > more sense? It is the transport's choice. Twisted has writeSequence(), which is just as ambiguous. > * connection_lost(): you definitely want to know whether it's you or the > other end who closed the connection. Typically, if the other end > closed the connection, you will have to run some cleanup steps, and > perhaps even log an error somewhere (if the connection was closed > unexpectedly). Glyph's idea was to always pass an exception and use special exception subclasses to distinguish the three cases (clean eof from other end, self.close(), self.abort(). I resisted this but maybe it's the only way? > Actually, I'm not sure it's useful to call connection_lost() when you > closed the connection yourself: are there any use cases? Well, close() first has to finish writing buffered data, so any cleanup needs to be done asynchronously after that is taken care off. AFAIK Twisted always calls it, and I think that's the best approach to ensure cleanup is always taken care of. -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Tue Dec 18 19:21:42 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 18 Dec 2012 18:21:42 +0000 Subject: [Python-ideas] Graph class In-Reply-To: References: <87txru1wxr.fsf@uwakimon.sk.tsukuba.ac.jp> <50CE5904.9090102@krosing.net> Message-ID: On 18 December 2012 16:06, Terry Reedy wrote: > On 12/18/2012 9:24 AM, Yuval Greenfield wrote: >> On Tue, Dec 18, 2012 at 2:08 PM, Oscar Benjamin >> > > wrote: >> >> The graph algorithms that are the most useful can be written in terms >> of two things: >> 1) An iterator over the nodes > > > Or iterable if re-iteration is needed. True. Although there aren't many cases where re-iteration is needed. The main exception would be if you wanted to instantiate a new Graph as a result of the algorithm. For example a transitive pruning function could be written to accept a factory like def transitive_prune(nodes, childfunc, factory): return factory(nodes, pruned_edges(nodes, childfunc)) in which case you need to be able to iterate once over the nodes for the pruning algorithm and once to construct the new graph. In these cases, the fact that you want to instantiate a new graph suggests that your original graph was a concrete data structure so that it is probably okay to require an iterable. To mutate the graph in place the user would need to supply a function to remove edges: def transitive_prune(nodes, childfunc, remove_edge): >> 2) A way to map each node into an iterator over its children (or >> partners) > > A callable could be either an iterator class or a generator function. > >> Some graphs don't care for the nodes, all their information is in the >> edges. That's why most graph frameworks have iter_edges and iter_nodes >> functions. This is true. Some algorithms would rather have this information. There are also a few that can proceed just from a particular node rather than needing an iterable over all nodes. >> I'm not sure what's the clean way to represent the >> optional directionality of edges though. I would have said that each API entry point should state how it will interpret the edges. Are there algorithms that simultaneously make sense for directed and undirected graphs while needing to behave differently in the two cases (in which case is it really the same algorithm)? > Oscar, I don't consider hashability an issue. General class instances are > hashable by default. One can even consider such instances as hashable > facades for unhashable dicts. Giving each instance a list attribute does the > same for lists. True, I've not found hashability to be a problem in practice. > The more important question, it seems to me, is whether to represent nodes > by counts and let the algorithm do its bookkeeping in private structures, or > to represent them by externally defined instances that the algorithm > mutates. I don't think I understand: How would the "externally defined instances" work? Do you mean that the caller of a function would supply functions like mark(), is_marked(), set_colour(), get_colour() and so on? If that's the case what would the advantages be? I can think of one: if desired the algorithm could be made to store all of its computations in say a database so that it would be very scalable. Though to me that seems like quite a specialised case that would probably merit from reimplementing the desired algorithm anyway. Otherwise I guess it's a lot simpler/safer to implement everything in private data structures. Oscar From sam-pydeas at rushing.nightmare.com Tue Dec 18 19:55:27 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 18 Dec 2012 10:55:27 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <50CF8193.4040501@gmail.com> References: <50CF8193.4040501@gmail.com> Message-ID: <50D0BC1F.30509@rushing.nightmare.com> On 12/17/12 12:33 PM, Ronan Lamy wrote: > It seems to me that a DelayedCall is nothing but a frozen, reified > function call. That it's a reified thing is already obvious from the > fact that it's an object, so how about naming it just "Call"? > "Delayed" is actually only one of the possible relations between the > object and the actual call - it could also represent a cancelled call, > or a cached one, or ...? In the functional world, these are called 'thunks'. I don't know if that's a more obvious name, but a fun one. http://en.wikipedia.org/wiki/Thunk_(functional_programming) -Sam From solipsis at pitrou.net Tue Dec 18 20:21:06 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 18 Dec 2012 20:21:06 +0100 Subject: [Python-ideas] PEP 3156 feedback References: <20121218110136.1f85cfae@pitrou.net> Message-ID: <20121218202106.0ad96d3b@pitrou.net> On Tue, 18 Dec 2012 10:02:05 -0800 Guido van Rossum wrote: > > Event loop API > > -------------- > > > > I would like to say that I prefer Tornado's model: for each primitive > > provided by Tornado, you can pass an explicit Loop instance which you > > instantiated manually. > > There is no module function or policy object hiding this mechanism: > > it's simple, explicit and flexible (in other words: if you want a > > per-thread event loop, just do it yourself using TLS :-)). > > It sounds though as if the explicit loop is optional, and still > defaults to some global default loop? Yes. > Having one global loop shared by multiple threads is iffy though. Only > one thread should be *running* the loop, otherwise the loop can' be > used as a mutual exclusion device. Worse, all primitives for adding > and removing callbacks/handlers must be made threadsafe, and then > basically the entire event loop becomes full of locks, which seems > wrong to me. Hmm, I don't think that's implied. Only call_soon_threadsafe() needs to be thread-safe. Calling other methods from another thread is simply a programming error. Since Tornado's and Twisted's global event loops already work like that, I don't think the surprise will be huge for users. > > There are some requirements I've found useful: > > > > - being able to instantiate multiple loops, either at the same time or > > serially (this is especially nice for unit tests; Twisted has to use > > a dedicated test runner just because their reactor doesn't support > > multiple instances or restarts) > > Serially, for unit tests: definitely. The loop policy has > init_event_loop() for this, which forcibly creates a new loop. Ah, nice. > > Protocols and transports > > ------------------------ > > > > We probably want to provide a Protocol base class and encourage people > > to inherit it. > > Glyph suggested that too, and hinted that it does some useful stuff > that users otherwise forget. I'm a bit worried though that the > functionality of the base implementation becomes the de-facto standard > rather than the PEP. (Glyph mentions that the base class has a method > that sets self.transport and without it lots of other stuff breaks.) Well, in the I/O stack we do have base classes with useful method implementations too (IOBase and friends). > > So, when creating a client, I would pass it a protocol instance. > > Heh. That's how I started, and Glyph told me to pass a protocol > factory. It can just be a Protocol subclass though, as long as the > constructor has the right signature. So maybe we can avoid calling it > protocol_factory and name it protocol_class instead. > > I struggled with what to do if the socket cannot be connected and > hence the transport not created. If you've already created the > protocol you're in a bit of trouble at that point. I proposed to call > connection_lost() in that case (without ever having called > connection_made()) but Glyph suggested that would be asking for rare > bugs (the connection_lost() code might not expect a half-initialized > protocol instance). I'm proposing something different: the transport should be created before the socket is connected, and it should handle the connection itself (by calling sock_connect() on the loop, perhaps). Then: - if connect() succeeds, protocol.connection_made() is called - if connect() fails, protocol.connection_failed(exc) is called (not connection_lost()) I think it makes more sense for the transport to do the connecting: why should the I/O loop know about specific transports? Ideally, it should only know about socket objects or fds. I don't know if Twisted had a specific reason for having connectTCP() and friends on the reactor (other than they want the reactor to be the API entry point, perhaps). I'd be curious to hear about it. > Glyph proposed instead that create_transport() > should return a Future and the error should be that Future's > exception, and I like that much better. But then you have several API layers with different conventions: connection_made() / connection_lost() use well-defined protocol methods, while create_transport() returns you a Future on which you must register success / failure callbacks. > > I think the transport / protocol registration must be done early, not in > > connection_made(). Sometimes you will want to do things on a protocol > > before you know a connection is established, for example queue things > > to write on the transport. An use case is a reconnecting TCP client: > > the protocol will continue existing at times when the connection is > > down. > > Hm. That seems a pretty advanced use case. I think it is better > handled by passing a "factory function" that returns a pre-created > protocol: > > pr = MyProtocol(...) > ev.create_transport(lambda: pr, host, port) > > However you do this, such a protocol object must expect multiple > connection_made - connection_lost cycles, which sounds to me like > asking for trouble. It's quite straightforward actually (*). Of course, only a protocol explicitly designed for use with a reconnecting client has to be well-behaved in that regard. (*) I'm using such a pattern at work, where I've stacked a protocol abstraction on top of Tornado. > > * connection_lost(): you definitely want to know whether it's you or the > > other end who closed the connection. Typically, if the other end > > closed the connection, you will have to run some cleanup steps, and > > perhaps even log an error somewhere (if the connection was closed > > unexpectedly). > > Glyph's idea was to always pass an exception and use special exception > subclasses to distinguish the three cases (clean eof from other end, > self.close(), self.abort(). I resisted this but maybe it's the only > way? Perhaps both self.close() and self.abort() should pass None. So "if error is None: return" is all you have to do to filter out the boring case. Regards Antoine. From shibturn at gmail.com Tue Dec 18 20:41:55 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 18 Dec 2012 19:41:55 +0000 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On 18/12/2012 4:59pm, Guido van Rossum wrote: > On Mon, Dec 17, 2012 at 11:26 PM, Geert Jansen wrote: >> I needed a self-pipe on Windows before. See below. With this, the >> select() based loop might work unmodified on Windows. >> >> https://gist.github.com/4325783 > > Thanks! Before I paste this into Tulip, is there any kind of copyright on this? > >> Of course it wouldn't be as efficient as an IOCP based loop. > > The socket loop is definitely handy on Windows in a pinch. I have > plans for an IOCP-based loop based on Richard Oudkerk's 'proactor' > branch of Tulip v1, but I don't have a Windows machine to test it on > ATM (hopefully that'll change once I am actually at Dropbox). > polling.py in the proactor branch already had an implementation of socketpair() for Windows;-) Also note that on Windows a connecting socket needs to be added to wfds *and* xfds when you do ... = select(rfds, wfds, xfds, timeout) If the connection fails then the handle is reported as being exceptional but *not* writable. It might make sense to have add_connector()/remove_connector() which on Unix is just an alias for add_writer()/remove_writer(). This would be useful if tulip ever has a loop based on WSAPoll() for Windows (Vista and later), since WSAPoll() has an awkward bug concerning asynchronous connects. -- Richard From guido at python.org Tue Dec 18 21:41:04 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 12:41:04 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: <20121218202106.0ad96d3b@pitrou.net> References: <20121218110136.1f85cfae@pitrou.net> <20121218202106.0ad96d3b@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 11:21 AM, Antoine Pitrou wrote: > On Tue, 18 Dec 2012 10:02:05 -0800 Guido van Rossum wrote: >> > Protocols and transports >> > ------------------------ >> > >> > We probably want to provide a Protocol base class and encourage people >> > to inherit it. >> >> Glyph suggested that too, and hinted that it does some useful stuff >> that users otherwise forget. I'm a bit worried though that the >> functionality of the base implementation becomes the de-facto standard >> rather than the PEP. (Glyph mentions that the base class has a method >> that sets self.transport and without it lots of other stuff breaks.) > > Well, in the I/O stack we do have base classes with useful method > implementations too (IOBase and friends). True. If we go that way they should be in the PEP as well. >> > So, when creating a client, I would pass it a protocol instance. >> >> Heh. That's how I started, and Glyph told me to pass a protocol >> factory. It can just be a Protocol subclass though, as long as the >> constructor has the right signature. So maybe we can avoid calling it >> protocol_factory and name it protocol_class instead. >> >> I struggled with what to do if the socket cannot be connected and >> hence the transport not created. If you've already created the >> protocol you're in a bit of trouble at that point. I proposed to call >> connection_lost() in that case (without ever having called >> connection_made()) but Glyph suggested that would be asking for rare >> bugs (the connection_lost() code might not expect a half-initialized >> protocol instance). > > I'm proposing something different: the transport should be created > before the socket is connected, and it should handle the connection > itself (by calling sock_connect() on the loop, perhaps). That's a possible implementation technique. But it will still be created implicitly by create_transport() or start_serving(). > Then: > - if connect() succeeds, protocol.connection_made() is called > - if connect() fails, protocol.connection_failed(exc) is called > (not connection_lost()) That's what I had, but it just adds extra APIs to the abstract class. Returning a Future that can succeed (probably returning the protocol) or fail (with some exception) doesn't require adding new methods. > I think it makes more sense for the transport to do the connecting: why > should the I/O loop know about specific transports? Ideally, it should > only know about socket objects or fds. Actually, there's one reason why the loop should know (something) about transports: different loop implementations will want to use different transport implementations to meet the same requirements. E.g. an IOCP-based loop will use different transports than a UNIXy *poll-based loop. > I don't know if Twisted had a specific reason for having connectTCP() > and friends on the reactor (other than they want the reactor to be the > API entry point, perhaps). I'd be curious to hear about it. That's the reason. >> Glyph proposed instead that create_transport() >> should return a Future and the error should be that Future's >> exception, and I like that much better. > > But then you have several API layers with different conventions: > connection_made() / connection_lost() use well-defined protocol > methods, while create_transport() returns you a Future on which you > must register success / failure callbacks. Different layers have different needs. Note that if you're using coroutines the Futures are very easy to use. And Twisted will just wrap the Future in a Deferred. >> > I think the transport / protocol registration must be done early, not in >> > connection_made(). Sometimes you will want to do things on a protocol >> > before you know a connection is established, for example queue things >> > to write on the transport. An use case is a reconnecting TCP client: >> > the protocol will continue existing at times when the connection is >> > down. >> >> Hm. That seems a pretty advanced use case. I think it is better >> handled by passing a "factory function" that returns a pre-created >> protocol: >> >> pr = MyProtocol(...) >> ev.create_transport(lambda: pr, host, port) >> >> However you do this, such a protocol object must expect multiple >> connection_made - connection_lost cycles, which sounds to me like >> asking for trouble. > > It's quite straightforward actually (*). Of course, only a protocol > explicitly designed for use with a reconnecting client has to be > well-behaved in that regard. Yeah, but it still is an odd corner case. Anyway, I think I've shown you how to do it in several different ways while still having a protocol_factory argument. > (*) I'm using such a pattern at work, where I've stacked a protocol > abstraction on top of Tornado. > >> > * connection_lost(): you definitely want to know whether it's you or the >> > other end who closed the connection. Typically, if the other end >> > closed the connection, you will have to run some cleanup steps, and >> > perhaps even log an error somewhere (if the connection was closed >> > unexpectedly). >> >> Glyph's idea was to always pass an exception and use special exception >> subclasses to distinguish the three cases (clean eof from other end, >> self.close(), self.abort(). I resisted this but maybe it's the only >> way? > > Perhaps both self.close() and self.abort() should pass None. They do. > So "if error is None: return" is all you have to do to filter out the > boring case. But a clean close from the other end (as opposed to an unexpected disconnect e.g. due to a sudden network partition) also passes None. I guess this is okay because in that case eof_received() is first called. So I guess the PEP is already okay here. :-) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Tue Dec 18 21:44:22 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 18 Dec 2012 21:44:22 +0100 Subject: [Python-ideas] PEP 3156 feedback References: <20121218110136.1f85cfae@pitrou.net> <20121218202106.0ad96d3b@pitrou.net> Message-ID: <20121218214422.323a4d2f@pitrou.net> On Tue, 18 Dec 2012 12:41:04 -0800 Guido van Rossum wrote: > > So "if error is None: return" is all you have to do to filter out the > > boring case. > > But a clean close from the other end (as opposed to an unexpected > disconnect e.g. due to a sudden network partition) also passes None. I > guess this is okay because in that case eof_received() is first > called. So I guess the PEP is already okay here. :-) Only if the protocol supports EOF, though? Or do you "emulate" by calling eof_received() in any case? Regards Antoine. From tjreedy at udel.edu Tue Dec 18 22:15:17 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 18 Dec 2012 16:15:17 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On 12/18/2012 12:10 PM, Geert Jansen wrote: > On Tue, Dec 18, 2012 at 5:59 PM, Guido van Rossum wrote: >> On Mon, Dec 17, 2012 at 11:26 PM, Geert Jansen wrote: >>> I needed a self-pipe on Windows before. See below. With this, the >>> select() based loop might work unmodified on Windows. >>> >>> https://gist.github.com/4325783 >> >> Thanks! Before I paste this into Tulip, is there any kind of copyright on this? > > [include list] > > I wrote the code. I hereby put it in the public domain. Sign a PSF contributor agreement if you have not done so yet and that should cover it for distribution with CPython. -- Terry Jan Reedy From guido at python.org Tue Dec 18 23:39:47 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 14:39:47 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: <20121218214422.323a4d2f@pitrou.net> References: <20121218110136.1f85cfae@pitrou.net> <20121218202106.0ad96d3b@pitrou.net> <20121218214422.323a4d2f@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 12:44 PM, Antoine Pitrou wrote: > On Tue, 18 Dec 2012 12:41:04 -0800 > Guido van Rossum wrote: >> > So "if error is None: return" is all you have to do to filter out the >> > boring case. >> >> But a clean close from the other end (as opposed to an unexpected >> disconnect e.g. due to a sudden network partition) also passes None. I >> guess this is okay because in that case eof_received() is first >> called. So I guess the PEP is already okay here. :-) > > Only if the protocol supports EOF, though? Or do you "emulate" by > calling eof_received() in any case? EOF is part of TCP (although I'm sure it has a different name at the protocol level). The sender can force it by using shutdown(SHUT_WR) (== write_eof() in Tulip/PEP 3156) or just by closing the socket (if they don't expect a response). The low-level reader detects this by recv() returning an empty string. Of course, if the other end closed both halves and you try to write before reading, send() may raise an exception and then you'll not get the EOF. And then again, send() may not raise an exception, it all depends on where stuff gets buffered. But arguably you get what you ask for in that case. I plan to call eof_received(), once, if and only if recv() returns an empty byte string. (The PEP says that eof_received() should call close() by default, but I don't actually think that's correct -- it also is hard to put in the abstract Protocol class unless a specific instance variable holding the transport is made part of the spec, which I am hesitant to do. I don't think that ignoring it by default is actually a problem.) -- --Guido van Rossum (python.org/~guido) From andrew.svetlov at gmail.com Tue Dec 18 23:49:15 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Wed, 19 Dec 2012 00:49:15 +0200 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: References: <20121218110136.1f85cfae@pitrou.net> <20121218202106.0ad96d3b@pitrou.net> <20121218214422.323a4d2f@pitrou.net> Message-ID: About protocols: I think eventloop should support UDP datagrams as well as operations with file descriptors which are not sockets at all. I mean timerfd_create and inotify as examples. On Wed, Dec 19, 2012 at 12:39 AM, Guido van Rossum wrote: > On Tue, Dec 18, 2012 at 12:44 PM, Antoine Pitrou wrote: >> On Tue, 18 Dec 2012 12:41:04 -0800 >> Guido van Rossum wrote: >>> > So "if error is None: return" is all you have to do to filter out the >>> > boring case. >>> >>> But a clean close from the other end (as opposed to an unexpected >>> disconnect e.g. due to a sudden network partition) also passes None. I >>> guess this is okay because in that case eof_received() is first >>> called. So I guess the PEP is already okay here. :-) >> >> Only if the protocol supports EOF, though? Or do you "emulate" by >> calling eof_received() in any case? > > EOF is part of TCP (although I'm sure it has a different name at the > protocol level). The sender can force it by using shutdown(SHUT_WR) > (== write_eof() in Tulip/PEP 3156) or just by closing the socket (if > they don't expect a response). The low-level reader detects this by > recv() returning an empty string. Of course, if the other end closed > both halves and you try to write before reading, send() may raise an > exception and then you'll not get the EOF. And then again, send() may > not raise an exception, it all depends on where stuff gets buffered. > But arguably you get what you ask for in that case. > > I plan to call eof_received(), once, if and only if recv() returns an > empty byte string. > > (The PEP says that eof_received() should call close() by default, but > I don't actually think that's correct -- it also is hard to put in the > abstract Protocol class unless a specific instance variable holding > the transport is made part of the spec, which I am hesitant to do. I > don't think that ignoring it by default is actually a problem.) > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Thanks, Andrew Svetlov From guido at python.org Tue Dec 18 23:51:05 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 14:51:05 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <50CF9603.6040409@canterbury.ac.nz> <20121217231134.19ede507@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 11:41 AM, Richard Oudkerk wrote: > polling.py in the proactor branch already had an implementation of > socketpair() for Windows;-) D'oh! And it always uses sockets for the "self-pipe". That makes sense. > Also note that on Windows a connecting socket needs to be added to wfds > *and* xfds when you do > > ... = select(rfds, wfds, xfds, timeout) > > If the connection fails then the handle is reported as being exceptional but > *not* writable. But SelectProactor in proactor.py doesn't seem to do this. > It might make sense to have add_connector()/remove_connector() which on Unix > is just an alias for add_writer()/remove_writer(). This would be useful if > tulip ever has a loop based on WSAPoll() for Windows (Vista and later), > since WSAPoll() has an awkward bug concerning asynchronous connects. Can't we do this for all writers? (If we have to make a distinction, so be it, but it seems easy to have latent bugs if some platforms require you to make a different call but others don't care either way.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Dec 19 00:00:38 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 15:00:38 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: References: <20121218110136.1f85cfae@pitrou.net> <20121218202106.0ad96d3b@pitrou.net> <20121218214422.323a4d2f@pitrou.net> Message-ID: On Tue, Dec 18, 2012 at 2:49 PM, Andrew Svetlov wrote: > About protocols: I think eventloop should support UDP datagrams Supporting UDP should be relatively straightforward, I just haven't used it in ages so I could use some help in describing the needed APIs. There are a lot of recv() variants: recv(), recvfrom(), recvmsg(), and then an _into() variant for each. And for sending there's send()/sendall(), sendmsg(), and sendto(). I'd be ecstatic if someone contributed code to tulip. > as well as operations with file descriptors which are not sockets at all. That won't work on Windows though. On UNIX you can always use the add/remove reader/writer APIs and make the calls yourself -- the patterns in sock_recv() and sock_sendall() are simple enough. (These are standardized in the PEP mainly because on Windows, with IOCP, the expectation is that they won't use "ready callbacks" (polling using select/*poll/kqueue) but instead Windows-specific APIs for starting I/O operations with a "completion callback". > I mean timerfd_create and inotify as examples. I think those will work -- they look very platform specific but in the end there's nothing in the add/remove reader/writer API that prevents you from using non-socket FDs on UNIX. (It's different on Windows, where select() is the only pollster supported, and Windows select only works with socket FDs.) -- --Guido van Rossum (python.org/~guido) From shane at umbrellacode.com Wed Dec 19 04:44:03 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 19:44:03 -0800 Subject: [Python-ideas] async: feedback on EventLoop API Message-ID: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Ignoring the API overlap with Futures/Promises for a moment, let me throw out this straw man approach to the event loop that seems to my naive eye like it pull together a lot of these ideas? Rather than passing in your callbacks, factories, etc., asynchronous APIs return a lightweight object you register your callback with. Unlike promises, deferrers, etc., this is a one-time thing: only one callback can register with it. However, it can be chained. The registered callback is invoked with the output of the operation when it completes. Timer.wait(20).then(callme, *args, **kw) # I could do Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) #I could not do handler = Timer.wait(20) handler.then(callme) handler.then(callme2) # this would throw an exception. # I/O example? sock.accept().then(handle_connection) # invokes handle_connection(conn, addr) # Read some data conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to 1024 bytes, read asynchronously. # Write some data conn.write("data").then(handle_written) # handle_written invoked with up number 5, giving number of bytes written async. # Connect HTTP channel and add it to HTTP dispatcher. channel.connect((hostname,80)).then(dispatcher.add_channel) # Listen to FD's for I/O events descriptors.select(r, w, e).then(handle) # handle(readable, writables, oobs) It seems like only supporting a single callback per returned handle lets us circumvent a lot of the weight associated with normal promise/future/deferred pattern type implementations, but the chaining could come in handy as it may cover some of the use-cases being considered when multiple events per fd came up, plus chaining is pretty powerful, especially when it comes at little cost. The API would be much more extensive than "then()", of course, with things like "every", etc. we'd have to pull examples from everything already discussed. Just wanted to throw out there to get beat up about ;-) Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Wed Dec 19 04:51:25 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Tue, 18 Dec 2012 22:51:25 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: A lot of things become trivially easy if they assumption is that they can never fail. Deferreds/Promises/Tasks/Futures are about sane error handling, not sane success handling. (There's a few parts in the current proposal where this falls short, like par, but that's another post) On Tue, Dec 18, 2012 at 10:44 PM, Shane Green wrote: > Ignoring the API overlap with Futures/Promises for a moment, let me throw > out this straw man approach to the event loop that seems to my naive eye > like it pull together a lot of these ideas? > > Rather than passing in your callbacks, factories, etc., asynchronous APIs > return a lightweight object you register your callback with. Unlike > promises, deferrers, etc., this is a one-time thing: only one callback can > register with it. However, it can be chained. The registered callback is > invoked with the output of the operation when it completes. > > Timer.wait(20).then(callme, *args, **kw) > # I could do > Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) > > #I could not do > handler = Timer.wait(20) > handler.then(callme) > handler.then(callme2) # this would throw an exception. > > # I/O example? > sock.accept().then(handle_connection) # invokes handle_connection(conn, > addr) > # Read some data > conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to > 1024 bytes, read asynchronously. > # Write some data > conn.write("data").then(handle_written) # handle_written invoked with up > number 5, giving number of bytes written async. > # Connect HTTP channel and add it to HTTP dispatcher. > channel.connect((hostname,80)).then(dispatcher.add_channel) > > > # Listen to FD's for I/O events > descriptors.select(r, w, e).then(handle) # handle(readable, writables, > oobs) > > It seems like only supporting a single callback per returned handle lets > us circumvent a lot of the weight associated with normal > promise/future/deferred pattern type implementations, but the chaining > could come in handy as it may cover some of the use-cases being considered > when multiple events per fd came up, plus chaining is pretty powerful, > especially when it comes at little cost. The API would be much more > extensive than "then()", of course, with things like "every", etc. we'd > have to pull examples from everything already discussed. Just wanted to > throw out there to get beat up about ;-) > > > > Shane Green > www.umbrellacode.com > 805-452-9666 | shane at umbrellacode.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Dec 19 04:55:52 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 19:55:52 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: <69DB9EF6-CDBB-4C7A-B14C-D6F2F36BD217@umbrellacode.com> Oh, I forgot error-backs. True, though, error-handling is a bit more difficult. I'm not sure I see it as being much more challenging that asynchronous callback error handling/reporting in general, though: it can still be executed in the exception context, etc. And the chaining can even be used to attach extended error logging to all your callback chains, without losing or swallowing anything that wouldn't have been lost or swallowed by other approaches, unless I'm overlooking something. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Dec 18, 2012, at 7:51 PM, "Jasper St. Pierre" wrote: > A lot of things become trivially easy if they assumption is that they can never fail. > > Deferreds/Promises/Tasks/Futures are about sane error handling, not sane success handling. > > (There's a few parts in the current proposal where this falls short, like par, but that's another post) > > > On Tue, Dec 18, 2012 at 10:44 PM, Shane Green wrote: > Ignoring the API overlap with Futures/Promises for a moment, let me throw out this straw man approach to the event loop that seems to my naive eye like it pull together a lot of these ideas? > > Rather than passing in your callbacks, factories, etc., asynchronous APIs return a lightweight object you register your callback with. Unlike promises, deferrers, etc., this is a one-time thing: only one callback can register with it. However, it can be chained. The registered callback is invoked with the output of the operation when it completes. > > Timer.wait(20).then(callme, *args, **kw) > # I could do > Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) > > #I could not do > handler = Timer.wait(20) > handler.then(callme) > handler.then(callme2) # this would throw an exception. > > # I/O example? > sock.accept().then(handle_connection) # invokes handle_connection(conn, addr) > # Read some data > conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to 1024 bytes, read asynchronously. > # Write some data > conn.write("data").then(handle_written) # handle_written invoked with up number 5, giving number of bytes written async. > # Connect HTTP channel and add it to HTTP dispatcher. > channel.connect((hostname,80)).then(dispatcher.add_channel) > > > # Listen to FD's for I/O events > descriptors.select(r, w, e).then(handle) # handle(readable, writables, oobs) > > It seems like only supporting a single callback per returned handle lets us circumvent a lot of the weight associated with normal promise/future/deferred pattern type implementations, but the chaining could come in handy as it may cover some of the use-cases being considered when multiple events per fd came up, plus chaining is pretty powerful, especially when it comes at little cost. The API would be much more extensive than "then()", of course, with things like "every", etc. we'd have to pull examples from everything already discussed. Just wanted to throw out there to get beat up about ;-) > > > > Shane Green > www.umbrellacode.com > 805-452-9666 | shane at umbrellacode.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Jasper > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Dec 19 04:57:36 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 19:57:36 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <69DB9EF6-CDBB-4C7A-B14C-D6F2F36BD217@umbrellacode.com> References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> <69DB9EF6-CDBB-4C7A-B14C-D6F2F36BD217@umbrellacode.com> Message-ID: <7C468382-2A39-47B3-A868-558207CAC02D@umbrellacode.com> Or maybe I just misread your response. Can you elaborate on what you mean by "not about sane success handling"? Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Dec 18, 2012, at 7:55 PM, Shane Green wrote: > Oh, I forgot error-backs. True, though, error-handling is a bit more difficult. I'm not sure I see it as being much more challenging that asynchronous callback error handling/reporting in general, though: it can still be executed in the exception context, etc. And the chaining can even be used to attach extended error logging to all your callback chains, without losing or swallowing anything that wouldn't have been lost or swallowed by other approaches, unless I'm overlooking something. > > > > > > Shane Green > www.umbrellacode.com > 805-452-9666 | shane at umbrellacode.com > > On Dec 18, 2012, at 7:51 PM, "Jasper St. Pierre" wrote: > >> A lot of things become trivially easy if they assumption is that they can never fail. >> >> Deferreds/Promises/Tasks/Futures are about sane error handling, not sane success handling. >> >> (There's a few parts in the current proposal where this falls short, like par, but that's another post) >> >> >> On Tue, Dec 18, 2012 at 10:44 PM, Shane Green wrote: >> Ignoring the API overlap with Futures/Promises for a moment, let me throw out this straw man approach to the event loop that seems to my naive eye like it pull together a lot of these ideas? >> >> Rather than passing in your callbacks, factories, etc., asynchronous APIs return a lightweight object you register your callback with. Unlike promises, deferrers, etc., this is a one-time thing: only one callback can register with it. However, it can be chained. The registered callback is invoked with the output of the operation when it completes. >> >> Timer.wait(20).then(callme, *args, **kw) >> # I could do >> Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) >> >> #I could not do >> handler = Timer.wait(20) >> handler.then(callme) >> handler.then(callme2) # this would throw an exception. >> >> # I/O example? >> sock.accept().then(handle_connection) # invokes handle_connection(conn, addr) >> # Read some data >> conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to 1024 bytes, read asynchronously. >> # Write some data >> conn.write("data").then(handle_written) # handle_written invoked with up number 5, giving number of bytes written async. >> # Connect HTTP channel and add it to HTTP dispatcher. >> channel.connect((hostname,80)).then(dispatcher.add_channel) >> >> >> # Listen to FD's for I/O events >> descriptors.select(r, w, e).then(handle) # handle(readable, writables, oobs) >> >> It seems like only supporting a single callback per returned handle lets us circumvent a lot of the weight associated with normal promise/future/deferred pattern type implementations, but the chaining could come in handy as it may cover some of the use-cases being considered when multiple events per fd came up, plus chaining is pretty powerful, especially when it comes at little cost. The API would be much more extensive than "then()", of course, with things like "every", etc. we'd have to pull examples from everything already discussed. Just wanted to throw out there to get beat up about ;-) >> >> >> >> Shane Green >> www.umbrellacode.com >> 805-452-9666 | shane at umbrellacode.com >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> >> >> >> -- >> Jasper >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 19 05:28:35 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 20:28:35 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: The point of PEP 3156 is not to make using callbacks easy. It is to make callbacks mostly disappear in favor of coroutines, but keeping them around in order to provide interoperability with callback-based frameworks such as Twisted or Tornado. Your handlers appear to be an attempt at reinventing Twisted's Deferred. But Deferred already exists, and it works perfectly fine with the current callback-based event loop spec in the PEP. It's not clear how your handlers will enable a coroutine to wait for the result (or exception) however. --Guido On Tue, Dec 18, 2012 at 7:44 PM, Shane Green wrote: > Ignoring the API overlap with Futures/Promises for a moment, let me throw > out this straw man approach to the event loop that seems to my naive eye > like it pull together a lot of these ideas? > > Rather than passing in your callbacks, factories, etc., asynchronous APIs > return a lightweight object you register your callback with. Unlike > promises, deferrers, etc., this is a one-time thing: only one callback can > register with it. However, it can be chained. The registered callback is > invoked with the output of the operation when it completes. > > Timer.wait(20).then(callme, *args, **kw) > # I could do > Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) > > #I could not do > handler = Timer.wait(20) > handler.then(callme) > handler.then(callme2) # this would throw an exception. > > # I/O example? > sock.accept().then(handle_connection) # invokes handle_connection(conn, > addr) > # Read some data > conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to > 1024 bytes, read asynchronously. > # Write some data > conn.write("data").then(handle_written) # handle_written invoked with up > number 5, giving number of bytes written async. > # Connect HTTP channel and add it to HTTP dispatcher. > channel.connect((hostname,80)).then(dispatcher.add_channel) > > > # Listen to FD's for I/O events > descriptors.select(r, w, e).then(handle) # handle(readable, writables, oobs) > > It seems like only supporting a single callback per returned handle lets us > circumvent a lot of the weight associated with normal > promise/future/deferred pattern type implementations, but the chaining could > come in handy as it may cover some of the use-cases being considered when > multiple events per fd came up, plus chaining is pretty powerful, especially > when it comes at little cost. The API would be much more extensive than > "then()", of course, with things like "every", etc. we'd have to pull > examples from everything already discussed. Just wanted to throw out there > to get beat up about ;-) > > > > Shane Green > www.umbrellacode.com > 805-452-9666 | shane at umbrellacode.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From shane at umbrellacode.com Wed Dec 19 05:47:55 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 18 Dec 2012 20:47:55 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: <07806C41-9384-442C-BDA1-3D0E04C6F441@umbrellacode.com> Ah, I see. I did not read though the PEP like I should have. Given that I didn't do my homework, it would be an awesome coincidence if they enabled a coroutine to wait for the result (or exception) ;-) Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Dec 18, 2012, at 8:28 PM, Guido van Rossum wrote: > The point of PEP 3156 is not to make using callbacks easy. It is to > make callbacks mostly disappear in favor of coroutines, but keeping > them around in order to provide interoperability with callback-based > frameworks such as Twisted or Tornado. > > Your handlers appear to be an attempt at reinventing Twisted's > Deferred. But Deferred already exists, and it works perfectly fine > with the current callback-based event loop spec in the PEP. It's not > clear how your handlers will enable a coroutine to wait for the result > (or exception) however. > > --Guido > > On Tue, Dec 18, 2012 at 7:44 PM, Shane Green wrote: >> Ignoring the API overlap with Futures/Promises for a moment, let me throw >> out this straw man approach to the event loop that seems to my naive eye >> like it pull together a lot of these ideas? >> >> Rather than passing in your callbacks, factories, etc., asynchronous APIs >> return a lightweight object you register your callback with. Unlike >> promises, deferrers, etc., this is a one-time thing: only one callback can >> register with it. However, it can be chained. The registered callback is >> invoked with the output of the operation when it completes. >> >> Timer.wait(20).then(callme, *args, **kw) >> # I could do >> Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) >> >> #I could not do >> handler = Timer.wait(20) >> handler.then(callme) >> handler.then(callme2) # this would throw an exception. >> >> # I/O example? >> sock.accept().then(handle_connection) # invokes handle_connection(conn, >> addr) >> # Read some data >> conn.read(1024).then(handle_incoming) # handle_incoming invoked with up to >> 1024 bytes, read asynchronously. >> # Write some data >> conn.write("data").then(handle_written) # handle_written invoked with up >> number 5, giving number of bytes written async. >> # Connect HTTP channel and add it to HTTP dispatcher. >> channel.connect((hostname,80)).then(dispatcher.add_channel) >> >> >> # Listen to FD's for I/O events >> descriptors.select(r, w, e).then(handle) # handle(readable, writables, oobs) >> >> It seems like only supporting a single callback per returned handle lets us >> circumvent a lot of the weight associated with normal >> promise/future/deferred pattern type implementations, but the chaining could >> come in handy as it may cover some of the use-cases being considered when >> multiple events per fd came up, plus chaining is pretty powerful, especially >> when it comes at little cost. The API would be much more extensive than >> "then()", of course, with things like "every", etc. we'd have to pull >> examples from everything already discussed. Just wanted to throw out there >> to get beat up about ;-) >> >> >> >> Shane Green >> www.umbrellacode.com >> 805-452-9666 | shane at umbrellacode.com >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Wed Dec 19 05:45:34 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Tue, 18 Dec 2012 23:45:34 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: I guess this is a good place as any to bring this up, but we really need to address issues with error handling and things like par(). par() has one way to handle errors: if one task (using it as a general term to encompass futures and coroutines) fails, all tasks fail. This is nowhere near acceptable. As a simple example, par(grab_page(" http://google.com"), grab_page("http://yahoo.com")) should not fail if one of the two sites returns a 500; the results of another may still be useful to us. I can think of an approach that doesn't require passing more arguments to par(), but may be absurdly silly: the results generated by par() are not directly results returned by the task, but instead an intermediate wrapper value that allows us to hoist the error handling into the caller. for intermediate in par(*tasks): try: result = intermediate.result() except ValueError as e: print("bad") else: print("good") But this makes the trade-off that you can't immediately cancel all the other tasks when one task fails. The only truly way to be notified when a task has finished, either with success, with error, is a callback, which I think we should flesh out entirely in our Futures model. And, of course, we should make sure that we can handle the four situations mentioned in [0] , even if we don't solve them with callbacks. [0] https://gist.github.com/3889970 On Tue, Dec 18, 2012 at 11:28 PM, Guido van Rossum wrote: > The point of PEP 3156 is not to make using callbacks easy. It is to > make callbacks mostly disappear in favor of coroutines, but keeping > them around in order to provide interoperability with callback-based > frameworks such as Twisted or Tornado. > > Your handlers appear to be an attempt at reinventing Twisted's > Deferred. But Deferred already exists, and it works perfectly fine > with the current callback-based event loop spec in the PEP. It's not > clear how your handlers will enable a coroutine to wait for the result > (or exception) however. > > --Guido > > On Tue, Dec 18, 2012 at 7:44 PM, Shane Green > wrote: > > Ignoring the API overlap with Futures/Promises for a moment, let me throw > > out this straw man approach to the event loop that seems to my naive eye > > like it pull together a lot of these ideas? > > > > Rather than passing in your callbacks, factories, etc., asynchronous APIs > > return a lightweight object you register your callback with. Unlike > > promises, deferrers, etc., this is a one-time thing: only one callback > can > > register with it. However, it can be chained. The registered callback > is > > invoked with the output of the operation when it completes. > > > > Timer.wait(20).then(callme, *args, **kw) > > # I could do > > Timer.wait(20).then(callme, *args, **kw).then(piped_from_callme) > > > > #I could not do > > handler = Timer.wait(20) > > handler.then(callme) > > handler.then(callme2) # this would throw an exception. > > > > # I/O example? > > sock.accept().then(handle_connection) # invokes handle_connection(conn, > > addr) > > # Read some data > > conn.read(1024).then(handle_incoming) # handle_incoming invoked with up > to > > 1024 bytes, read asynchronously. > > # Write some data > > conn.write("data").then(handle_written) # handle_written invoked with up > > number 5, giving number of bytes written async. > > # Connect HTTP channel and add it to HTTP dispatcher. > > channel.connect((hostname,80)).then(dispatcher.add_channel) > > > > > > # Listen to FD's for I/O events > > descriptors.select(r, w, e).then(handle) # handle(readable, writables, > oobs) > > > > It seems like only supporting a single callback per returned handle lets > us > > circumvent a lot of the weight associated with normal > > promise/future/deferred pattern type implementations, but the chaining > could > > come in handy as it may cover some of the use-cases being considered when > > multiple events per fd came up, plus chaining is pretty powerful, > especially > > when it comes at little cost. The API would be much more extensive than > > "then()", of course, with things like "every", etc. we'd have to pull > > examples from everything already discussed. Just wanted to throw out > there > > to get beat up about ;-) > > > > > > > > Shane Green > > www.umbrellacode.com > > 805-452-9666 | shane at umbrellacode.com > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 19 06:36:34 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 21:36:34 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Tue, Dec 18, 2012 at 8:45 PM, Jasper St. Pierre wrote: > I guess this is a good place as any to bring this up, but we really need to > address issues with error handling and things like par(). > > par() has one way to handle errors: if one task (using it as a general term > to encompass futures and coroutines) fails, all tasks fail. > > This is nowhere near acceptable. As a simple example, > par(grab_page("http://google.com"), grab_page("http://yahoo.com")) should > not fail if one of the two sites returns a 500; the results of another may > still be useful to us. Yes, there need to be a few variants. If you want all the results, regardless of errors, we can provide a variant of par() whose result is a list of futures instead of a list of results (or a single exception). This could also add a timeout. There also needs to be a way to take a set of tasks and wait for the first one to complete. (In fact, put a timeout on this and you can build any other variant easily.) PEP 3148 probably shows the way here, it has as_completed() and wait(), although we cannot emulate these APIs exactly (since they block -- we need something you can use in a yield from, e.g. fs = {set of Futures} while fs: f = yield from wait_one(fs) # Optionally with a timeout fs.remove(f) (We could possibly do the remove() call ih wait_one(), although that may limit the argument type to a set.) > I can think of an approach that doesn't require passing more arguments to > par(), but may be absurdly silly: the results generated by par() are not > directly results returned by the task, but instead an intermediate wrapper > value that allows us to hoist the error handling into the caller. > > for intermediate in par(*tasks): > try: > result = intermediate.result() > except ValueError as e: > print("bad") > else: > print("good") > > But this makes the trade-off that you can't immediately cancel all the other > tasks when one task fails. Yeah, that's the par() variant that returns futures instead of results. > The only truly way to be notified when a task has finished, either with > success, with error, is a callback, which I think we should flesh out > entirely in our Futures model. Proposal? > And, of course, we should make sure that we can handle the four situations > mentioned in [0] , even if we don't solve them with callbacks. > > [0] https://gist.github.com/3889970 That's longwinded and written in a confrontational style. Can you summarize? -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Wed Dec 19 07:10:36 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Wed, 19 Dec 2012 01:10:36 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Wed, Dec 19, 2012 at 12:36 AM, Guido van Rossum wrote: > On Tue, Dec 18, 2012 at 8:45 PM, Jasper St. Pierre > wrote: > > I guess this is a good place as any to bring this up, but we really need > to > > address issues with error handling and things like par(). > > > > par() has one way to handle errors: if one task (using it as a general > term > > to encompass futures and coroutines) fails, all tasks fail. > > > > This is nowhere near acceptable. As a simple example, > > par(grab_page("http://google.com"), grab_page("http://yahoo.com")) > should > > not fail if one of the two sites returns a 500; the results of another > may > > still be useful to us. > > Yes, there need to be a few variants. If you want all the results, > regardless of errors, we can provide a variant of par() whose result > is a list of futures instead of a list of results (or a single > exception). This could also add a timeout. There also needs to be a > way to take a set of tasks and wait for the first one to complete. (In > fact, put a timeout on this and you can build any other variant > easily.) > > PEP 3148 probably shows the way here, it has as_completed() and > wait(), although we cannot emulate these APIs exactly (since they > block -- we need something you can use in a yield from, e.g. > > fs = {set of Futures} > while fs: > f = yield from wait_one(fs) # Optionally with a timeout > fs.remove(f) > > > (We could possibly do the remove() call ih wait_one(), although that > may limit the argument type to a set.) > > > I can think of an approach that doesn't require passing more arguments to > > par(), but may be absurdly silly: the results generated by par() are not > > directly results returned by the task, but instead an intermediate > wrapper > > value that allows us to hoist the error handling into the caller. > > > > for intermediate in par(*tasks): > > try: > > result = intermediate.result() > > except ValueError as e: > > print("bad") > > else: > > print("good") > > > > But this makes the trade-off that you can't immediately cancel all the > other > > tasks when one task fails. > > Yeah, that's the par() variant that returns futures instead of results. > > > The only truly way to be notified when a task has finished, either with > > success, with error, is a callback, which I think we should flesh out > > entirely in our Futures model. > > Proposal? > I'm not sure if this will work out, but I think the par() could have some sort of "immediate result" callback which fires when one of the sub-tasks fire. If we then take out the part where we fail and abort automatically, we might have a close enough approximation: def fail_silently(par_task, subtask): try: return subtask.result() except Exception as e: print("grabbing failed", e) return None pages = list(yield par(grab_page("http://google.com"), grab_page(" http://yahoo.com"), subtask_completed=fail_silently)) Where par returns a list of values instead of a list of tasks. But maybe the ability to manipulate the return value from the subtask completion callback hands it a bit too much power. I like the initial approach, but the details need fleshing out. I think it would be neat if we could have several standard behaviors in the stdlib: subtask_completed=fail_silently, subtask_completed=abort_task, etc. > And, of course, we should make sure that we can handle the four situations > > mentioned in [0] , even if we don't solve them with callbacks. > > > > [0] https://gist.github.com/3889970 > > That's longwinded and written in a confrontational style. Can you > summarize? > Yeah, this was more at a lament at libraries like jQuery that implement the CommonJS Promise/A specification wrong. It's really only relevant if we choose to add errbacks, as it's about the composition and sematics between callbacks/errbacks, and chaining the two. -- > --Guido van Rossum (python.org/~guido) > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 19 07:24:50 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 18 Dec 2012 22:24:50 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Tuesday, December 18, 2012, Jasper St. Pierre wrote: > On Wed, Dec 19, 2012 at 12:36 AM, Guido van Rossum > > wrote: > >> On Tue, Dec 18, 2012 at 8:45 PM, Jasper St. Pierre >> > 'jstpierre at mecheye.net');>> wrote: >> > I guess this is a good place as any to bring this up, but we really >> need to >> > address issues with error handling and things like par(). >> > >> > par() has one way to handle errors: if one task (using it as a general >> term >> > to encompass futures and coroutines) fails, all tasks fail. >> > >> > This is nowhere near acceptable. As a simple example, >> > par(grab_page("http://google.com"), grab_page("http://yahoo.com")) >> should >> > not fail if one of the two sites returns a 500; the results of another >> may >> > still be useful to us. >> >> Yes, there need to be a few variants. If you want all the results, >> regardless of errors, we can provide a variant of par() whose result >> is a list of futures instead of a list of results (or a single >> exception). This could also add a timeout. There also needs to be a >> way to take a set of tasks and wait for the first one to complete. (In >> fact, put a timeout on this and you can build any other variant >> easily.) >> >> PEP 3148 probably shows the way here, it has as_completed() and >> wait(), although we cannot emulate these APIs exactly (since they >> block -- we need something you can use in a yield from, e.g. >> >> fs = {set of Futures} >> while fs: >> f = yield from wait_one(fs) # Optionally with a timeout >> fs.remove(f) >> >> >> (We could possibly do the remove() call ih wait_one(), although that >> may limit the argument type to a set.) >> >> > I can think of an approach that doesn't require passing more arguments >> to >> > par(), but may be absurdly silly: the results generated by par() are not >> > directly results returned by the task, but instead an intermediate >> wrapper >> > value that allows us to hoist the error handling into the caller. >> > >> > for intermediate in par(*tasks): >> > try: >> > result = intermediate.result() >> > except ValueError as e: >> > print("bad") >> > else: >> > print("good") >> > >> > But this makes the trade-off that you can't immediately cancel all the >> other >> > tasks when one task fails. >> >> Yeah, that's the par() variant that returns futures instead of results. >> >> > The only truly way to be notified when a task has finished, either with >> > success, with error, is a callback, which I think we should flesh out >> > entirely in our Futures model. >> >> Proposal? >> > > I'm not sure if this will work out, but I think the par() could have some > sort of "immediate result" callback which fires when one of the sub-tasks > fire. If we then take out the part where we fail and abort automatically, > we might have a close enough approximation: > > def fail_silently(par_task, subtask): > try: > return subtask.result() > except Exception as e: > print("grabbing failed", e) > return None > > pages = list(yield par(grab_page("http://google.com"), grab_page(" > http://yahoo.com"), subtask_completed=fail_silently)) > > Where par returns a list of values instead of a list of tasks. But maybe > the ability to manipulate the return value from the subtask completion > callback hands it a bit too much power. > That looks reasonable too, although the signature may need to be adjusted. (How does it cancel the remaining tasks if it wants to? Or does par() do that if this callback raises?) maybe call it filter? But what did you think of my wait_one() proposal? It may work beter in a coroutine, where callbacks are considered a nuisance. > I like the initial approach, but the details need fleshing out. I think it > would be neat if we could have several standard behaviors in the stdlib: > subtask_completed=fail_silently, subtask_completed=abort_task, etc. > > > And, of course, we should make sure that we can handle the four >> situations >> > mentioned in [0] , even if we don't solve them with callbacks. >> > >> > [0] https://gist.github.com/3889970 >> >> That's longwinded and written in a confrontational style. Can you >> summarize? >> > > Yeah, this was more at a lament at libraries like jQuery that implement > the CommonJS Promise/A specification wrong. It's really only relevant if we > choose to add errbacks, as it's about the composition and sematics between > callbacks/errbacks, and chaining the two. > No, no, no! Please. No errbacks. No chaining. Coroutines have a different way to spell those already: errbacks -> except clauses, chaining -> multiple yield-froms in one coroutine, or call another coroutine. Please. --Guido -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Wed Dec 19 07:41:07 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Wed, 19 Dec 2012 01:41:07 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Wed, Dec 19, 2012 at 1:24 AM, Guido van Rossum wrote: ... snip ... That looks reasonable too, although the signature may need to be adjusted. > (How does it cancel the remaining tasks if it wants to? Or does par() do > that if this callback raises?) maybe call it filter? > The subtask completion callback can call abort() on the overall par_task, which could cancel the rest of the unfinished tasks. def abort_task(par_task, subtask): try: return subtask.result() except ValueError: par_task.abort() The issue with this approach is that since the par() would return values again, not tasks, we'd can't handle errors locally. Futures are also immutable, so we can't modify the values after they resolve. Maybe we'd have something like: def fail_silently(par_task, subtask): try: subtask.result() except ValueError as e: return Future.completed(None) # an already completed future that has a value of None, sorry, don't remember the exact spelling else: return subtask which allows us: for task in par(*tasks, subtask_completion=fail_silently): # ... Which allows us both local error handling, as well as batch error handling. But it's very verbose from the side of the callback. Hm. > But what did you think of my wait_one() proposal? It may work beter in a > coroutine, where callbacks are considered a nuisance. > To be honest, I didn't quite understand it. I'd have to go back and re-read PEP 3148. > I like the initial approach, but the details need fleshing out. I think it >> would be neat if we could have several standard behaviors in the stdlib: >> subtask_completed=fail_silently, subtask_completed=abort_task, etc. >> >> > And, of course, we should make sure that we can handle the four >>> situations >>> > mentioned in [0] , even if we don't solve them with callbacks. >>> > >>> > [0] https://gist.github.com/3889970 >>> >>> That's longwinded and written in a confrontational style. Can you >>> summarize? >>> >> >> Yeah, this was more at a lament at libraries like jQuery that implement >> the CommonJS Promise/A specification wrong. It's really only relevant if we >> choose to add errbacks, as it's about the composition and sematics between >> callbacks/errbacks, and chaining the two. >> > > No, no, no! Please. No errbacks. No chaining. Coroutines have a different > way to spell those already: errbacks -> except clauses, chaining -> > multiple yield-froms in one coroutine, or call another coroutine. Please. > > --Guido > > > -- > --Guido van Rossum (on iPad) > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Wed Dec 19 15:51:43 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Wed, 19 Dec 2012 15:51:43 +0100 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: References: <20121218110136.1f85cfae@pitrou.net> Message-ID: 2012/12/18 Guido van Rossum > > On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou wrote:> Event loop API > > -------------- > > > > I would like to say that I prefer Tornado's model: for each primitive > > provided by Tornado, you can pass an explicit Loop instance which you > > instantiated manually. > > There is no module function or policy object hiding this mechanism: > > it's simple, explicit and flexible (in other words: if you want a > > per-thread event loop, just do it yourself using TLS :-)). > > It sounds though as if the explicit loop is optional, and still > defaults to some global default loop? > > Having one global loop shared by multiple threads is iffy though. Only > one thread should be *running* the loop, otherwise the loop can' be > used as a mutual exclusion device. Worse, all primitives for adding > and removing callbacks/handlers must be made threadsafe, and then > basically the entire event loop becomes full of locks, which seems > wrong to me. The basic idea is to have multiple threads/processes, each running its own IO loop. No locks are required because each IO poller instance will deal with its own socket-map / callbacks-queue and no resources are shared. In asyncore this was achieved by introducing the "map" parameter. Similarly to Tornado, pyftpdlib uses an "ioloop" parameter which can be passed to all the classes which will handle the connection (the handlers). If "ioloop" is provided all the handlers will use that (...and register() against it, add_reader() etc..) otherwise the "global" ioloop instance will be used (default). A dynamic IO poller like this is important because in case the connection handlers are forced to block for some reason, you can switch from a concurrency model (async / non-blocking) to another (multi threads/process) very easily. See: http://code.google.com/p/pyftpdlib/issues/detail?id=212#c9 http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py?spec=svn1137&r=1137 Hope this helps, --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From techtonik at gmail.com Wed Dec 19 16:11:10 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 19 Dec 2012 18:11:10 +0300 Subject: [Python-ideas] Tree as a data structure (Was: Graph class) Message-ID: On Sun, Dec 16, 2012 at 6:41 PM, Guido van Rossum wrote: > I think of graphs and trees as patterns, not data structures. In my world strings, ints and lists are 1D data types, and tree can be a very important 2D data structure. Even if it is a pattern, this pattern is vital for the transformation of structured data, because it allows to represent any data structure in canonical format. Speaking of tree as a data structure, I assume that it has a very basic definition: 1. tree consists of nodes 2. some nodes are containers for other nodes 3. every node has properties 4. every node has 0 or 1 parent 5. every container has 1+ children 6. tree has a single starting root node 7. no child of a parent can be its ancestor (no cyclic dependencies between elements) List of trees is a forest. Every subtree is a complete tree. To see which tree data type would be really useful in Python distribution (e.g. provides a simple, extendable and intuitive interface), I see only one way - is to scratching some itches relevant to Python and then try to scale it to other problems. The outcome should be the answer - what for native tree type is not suitable? More ideas: [ ] Much experience for working with trees can be brought from XML and DOM manipulation practices (jQuery and friends) [ ] every element in a tree can be accessed by its address specificator as 'root/node[3]/last' [ ] but it is also convenient to access tree data using node names as 'mylib.books[:1]' [ ] and of course, you can run queries over trees [ ] Tree is the base for any "Data Transformation Framework" as it allows to jump from "data type conversion" to "data structure conversion and mapping" [ ] Trees can be converted to other trees and to more complicated structures [ ] Conversion can be symmetrical and non-symmetrical [ ] Conversion can be lossy and lossless [ ] Conversion can be lossless and non-symmetrical at the same time Trees can be used, for example, for realtime migration of issues from one tracker to another. For managing changesets with additional meta information. For presenting package dependencies and working with them. For atomic (transactional) file management. For managing operating system capability information. For logging setup. For debugging structures in Python. For working and converting binary file formats. For the common AST transformation and query interface. For the understanding how 2to3 fixers work. For the common ground of visual representation, comparison and transformation of data structures. That's probably enough of my itches. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at jeffreyjenkins.ca Wed Dec 19 17:07:31 2012 From: jeff at jeffreyjenkins.ca (Jeff Jenkins) Date: Wed, 19 Dec 2012 11:07:31 -0500 Subject: [Python-ideas] Tree as a data structure (Was: Graph class) In-Reply-To: References: Message-ID: trying again, this email address was apparently not on the list: My experience dealing with trees is always that the "tree" part is always so simple that it isn't a big deal to re-implement it. The problem is dealing with all of the extra stuff that you need and the details of what happens when you do different operations. I think it makes more sense to have an interface for a kind of thing you want to do with a tree (e.g. sorted sets, or ordered maps) rather than the tree itself. On Wed, Dec 19, 2012 at 10:55 AM, Jeff Jenkins wrote: > My experience dealing with trees is always that the "tree" part is always > so simple that it isn't a big deal to re-implement it. The problem is > dealing with all of the extra stuff that you need and the details of what > happens when you do different operations. I think it makes more sense to > have an interface for a kind of thing you want to do with a tree (e.g. > sorted sets, or ordered maps) rather than the tree itself. > > > On Wed, Dec 19, 2012 at 10:11 AM, anatoly techtonik wrote: > >> On Sun, Dec 16, 2012 at 6:41 PM, Guido van Rossum wrote: >> >>> I think of graphs and trees as patterns, not data structures. >> >> >> In my world strings, ints and lists are 1D data types, and tree can be a >> very important 2D data structure. Even if it is a pattern, this pattern is >> vital for the transformation of structured data, because it allows to >> represent any data structure in canonical format. >> >> Speaking of tree as a data structure, I assume that it has a very basic >> definition: >> >> 1. tree consists of nodes >> 2. some nodes are containers for other nodes >> 3. every node has properties >> 4. every node has 0 or 1 parent >> 5. every container has 1+ children >> 6. tree has a single starting root node >> 7. no child of a parent can be its ancestor >> (no cyclic dependencies between elements) >> >> List of trees is a forest. Every subtree is a complete tree. >> >> >> To see which tree data type would be really useful in Python distribution >> (e.g. provides a simple, extendable and intuitive interface), I see only >> one way - is to scratching some itches relevant to Python and then try to >> scale it to other problems. The outcome should be the answer - >> what for native tree type is not suitable? >> >> More ideas: >> [ ] Much experience for working with trees can be brought from XML and >> DOM manipulation practices (jQuery and friends) >> [ ] every element in a tree can be accessed by its address specificator >> as 'root/node[3]/last' >> [ ] but it is also convenient to access tree data using node names as >> 'mylib.books[:1]' >> [ ] and of course, you can run queries over trees >> >> [ ] Tree is the base for any "Data Transformation Framework" as it allows >> to jump from "data type conversion" to "data structure conversion and >> mapping" >> [ ] Trees can be converted to other trees and to more complicated >> structures >> [ ] Conversion can be symmetrical and non-symmetrical >> [ ] Conversion can be lossy and lossless >> [ ] Conversion can be lossless and non-symmetrical at the same time >> >> Trees can be used, for example, for realtime migration of issues from one >> tracker to another. For managing changesets with additional meta >> information. For presenting package dependencies and working with them. For >> atomic (transactional) file management. For managing operating system >> capability information. For logging setup. For debugging structures in >> Python. For working and converting binary file formats. For the common AST >> transformation and query interface. For the understanding how 2to3 fixers >> work. For the common ground of visual representation, comparison and >> transformation of data structures. That's probably enough of my itches. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Wed Dec 19 17:38:03 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 19 Dec 2012 11:38:03 -0500 Subject: [Python-ideas] Tree as a data structure (Was: Graph class) In-Reply-To: References: Message-ID: On 12/19/12, anatoly techtonik wrote: > On Sun, Dec 16, 2012 at 6:41 PM, Guido van Rossum wrote: >> I think of graphs and trees as patterns, not data structures. > In my world strings, ints and lists are 1D data types, and tree can be a > very important 2D data structure. Yes; the catch is that the details of that data structure will differ depending on the problem. Most problems do not need the fancy algorithms -- or the extra overhead that supports them. Since a simple tree (or graph) is easy to write, and the fiddly details are often -- but not always -- wasted overhead, it doesn't make sense to designate a single physical structure as "the" tree (or graph) representation. So it stays a pattern, rather than a concrete data structure. > Speaking of tree as a data structure, I assume that it has a very basic > definition: > 1. tree consists of nodes > 2. some nodes are containers for other nodes Are the leaves a different type, or just nodes that happen to have zero children at the moment? > 3. every node has properties What sort of properties? A single value of a given class, plus some binary flags that are internal to the graph implementation? A fixed set of values that occur on every node? (Possibly differing between leaves and regular nodes?) A fixed value (used for ordering) plus an arbitrary collection that can vary by node? > More ideas: > [ ] every element in a tree can be accessed by its address specificator > as 'root/node[3]/last' That assumes an arbitrary number of children, and that the children are ordered. A sensible choice, but it adds way too much overhead for some cases. (And of course, the same goes for the overhead of balancing, etc.) -jJ From guido at python.org Wed Dec 19 17:55:02 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Dec 2012 08:55:02 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: References: <20121218110136.1f85cfae@pitrou.net> Message-ID: On Wed, Dec 19, 2012 at 6:51 AM, Giampaolo Rodol? wrote: > 2012/12/18 Guido van Rossum >> >> On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou wrote:> Event loop API >> > -------------- >> > >> > I would like to say that I prefer Tornado's model: for each primitive >> > provided by Tornado, you can pass an explicit Loop instance which you >> > instantiated manually. >> > There is no module function or policy object hiding this mechanism: >> > it's simple, explicit and flexible (in other words: if you want a >> > per-thread event loop, just do it yourself using TLS :-)). >> >> It sounds though as if the explicit loop is optional, and still >> defaults to some global default loop? >> >> Having one global loop shared by multiple threads is iffy though. Only >> one thread should be *running* the loop, otherwise the loop can' be >> used as a mutual exclusion device. Worse, all primitives for adding >> and removing callbacks/handlers must be made threadsafe, and then >> basically the entire event loop becomes full of locks, which seems >> wrong to me. > > The basic idea is to have multiple threads/processes, each running its > own IO loop. I understand that, and the Tulip implementation supports this. However different frameworks may have different policies (e.g. AFAIK Twisted only supports one reactor, period, and it is not threadsafe). I don't want to put requirements in the PEP that *require* compliant implementations to support the loop-per-thread model. OTOH I do want compliant implementations to decide on their own policy. I guess the minimal requirement for a compliant implementation is that callbacks associated with the same loop are serialized and never executed concurrently on different threads. > No locks are required because each IO poller instance will deal with > its own socket-map / callbacks-queue and no resources are shared. > In asyncore this was achieved by introducing the "map" parameter. > Similarly to Tornado, pyftpdlib uses an "ioloop" parameter which can > be passed to all the classes which will handle the connection (the > handlers). Read the description in the PEP of the event loop policy, or the default implementation in Tulip. It discourages user code from creating new event loops (since the framework may not support this) but does not prevent e.g. unit tests from creating a new loop for each test (even Twisted supports that). > If "ioloop" is provided all the handlers will use that (...and > register() against it, add_reader() etc..) otherwise the "global" > ioloop instance will be used (default). > A dynamic IO poller like this is important because in case the > connection handlers are forced to block for some reason, you can > switch from a concurrency model (async / non-blocking) to another > (multi threads/process) very easily. Did you see run_in_executor() and wrap_future() in the PEP or in the Tulip implementation? They make it perfectly simple to run something in another thread (and the default implementation will use this to call getaddrinfo(), since the stdlib wrappers for it have no async version. The two APIs are even capable of using a ProcessPoolExecutor. > See: > http://code.google.com/p/pyftpdlib/issues/detail?id=212#c9 > http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/servers.py?spec=svn1137&r=1137 Of course, if all you want is a server that creates a new thread or process for each connection, PEP 3156 and Tulip are overkill -- in that case there's no reason not to use the stdlib's SocketServer class, which has supported this for over a decade. :-) -- --Guido van Rossum (python.org/~guido) From feedbackflow at gmail.com Wed Dec 19 17:51:05 2012 From: feedbackflow at gmail.com (Bart Thate) Date: Wed, 19 Dec 2012 17:51:05 +0100 Subject: [Python-ideas] context aware execution In-Reply-To: References: Message-ID: Thanks for your response Chris ! Ha ! the job of the mad man is todo the things the are not "advisable" and see what gives. Like why it is not advisable and, if possible, their are ways to achieve things that are previously overseen. i already do a lot of travelling of the callstack to see from where a function is called. mostely for logging purposes, like what plugin registered this callback etc. lately i also had the need to log the variable name of a object, and the thought of be able to "break out of the namespace" concept got me thinking what i am thinking of is code that can examine the context it is run in. The object, when called can figure out in what kind of space it is living in and discover what kind of API other objects in the space offer. This is a pre function that gets called before the actual function/method and can thus determine if a certain request can be fullfilled. I bet a decorator could be made, in which i can assign certain caller context references into variables in the function/method ? I use 1 generic parameter object, in which i can stuff lots of things, but i rather have the function be able to see what is around ;] Think of sending JSON over the wire, reconstruct an object with it and then let the object figure out what it can and cannot do in this external environment. Code i use now is this: # life/plugs/context.py # # """ show context. """ ## basic import import sys ## context command def context(event): result = [] frame = sys._getframe() code = frame.f_back.f_code for i in dir(code): print("%s => %s" % (i, getattr(code, i))) del frame context.cmnd = "context" So much to explore ;] -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Dec 19 18:26:33 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Dec 2012 09:26:33 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Tue, Dec 18, 2012 at 10:41 PM, Jasper St. Pierre wrote: > On Wed, Dec 19, 2012 at 1:24 AM, Guido van Rossum wrote: > > ... snip ... > >> That looks reasonable too, although the signature may need to be adjusted. >> (How does it cancel the remaining tasks if it wants to? Or does par() do >> that if this callback raises?) maybe call it filter? > > > The subtask completion callback can call abort() on the overall par_task, Tasks don't have abort(), I suppose you meant cancel(). > which could cancel the rest of the unfinished tasks. > > def abort_task(par_task, subtask): > try: > return subtask.result() > except ValueError: > par_task.abort() > > The issue with this approach is that since the par() would return values > again, not tasks, we'd can't handle errors locally. Futures are also > immutable, so we can't modify the values after they resolve. Maybe we'd have > something like: > > def fail_silently(par_task, subtask): > try: > subtask.result() > except ValueError as e: > return Future.completed(None) # an already completed future that > has a value of None, sorry, don't remember the exact spelling > else: > return subtask > > which allows us: > > for task in par(*tasks, subtask_completion=fail_silently): > # ... > > Which allows us both local error handling, as well as batch error handling. > But it's very verbose from the side of the callback. Hm. Hm indeed. Unless you can get your thoughts straight I think I'd rather go with the wait_one() API, which can be used to build anything else you like, but doesn't require one to be quite so clever with callbacks. (Did I say I hate callbacks?) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Wed Dec 19 19:55:55 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 19 Dec 2012 19:55:55 +0100 Subject: [Python-ideas] PEP 3156 feedback References: <20121218110136.1f85cfae@pitrou.net> Message-ID: <20121219195555.577593f2@pitrou.net> On Wed, 19 Dec 2012 08:55:02 -0800 Guido van Rossum wrote: > On Wed, Dec 19, 2012 at 6:51 AM, Giampaolo Rodol? wrote: > > 2012/12/18 Guido van Rossum > >> > >> On Tue, Dec 18, 2012 at 2:01 AM, Antoine Pitrou wrote:> Event loop API > >> > -------------- > >> > > >> > I would like to say that I prefer Tornado's model: for each primitive > >> > provided by Tornado, you can pass an explicit Loop instance which you > >> > instantiated manually. > >> > There is no module function or policy object hiding this mechanism: > >> > it's simple, explicit and flexible (in other words: if you want a > >> > per-thread event loop, just do it yourself using TLS :-)). > >> > >> It sounds though as if the explicit loop is optional, and still > >> defaults to some global default loop? > >> > >> Having one global loop shared by multiple threads is iffy though. Only > >> one thread should be *running* the loop, otherwise the loop can' be > >> used as a mutual exclusion device. Worse, all primitives for adding > >> and removing callbacks/handlers must be made threadsafe, and then > >> basically the entire event loop becomes full of locks, which seems > >> wrong to me. > > > > The basic idea is to have multiple threads/processes, each running its > > own IO loop. > > I understand that, and the Tulip implementation supports this. However > different frameworks may have different policies (e.g. AFAIK Twisted > only supports one reactor, period, and it is not threadsafe). I don't > want to put requirements in the PEP that *require* compliant > implementations to support the loop-per-thread model. Why not let implementations raise NotImplementedError when they don't want to support certain use cases? > Read the description in the PEP of the event loop policy, or the > default implementation in Tulip. It discourages user code from > creating new event loops (since the framework may not support this) > but does not prevent e.g. unit tests from creating a new loop for each > test (even Twisted supports that). Is it the plan that code written for an event loop will always work with another one? Will tulip offer more than the GCD of the other event loops? Regards Antoine. From guido at python.org Thu Dec 20 00:40:36 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 19 Dec 2012 15:40:36 -0800 Subject: [Python-ideas] PEP 3156 feedback In-Reply-To: <20121219195555.577593f2@pitrou.net> References: <20121218110136.1f85cfae@pitrou.net> <20121219195555.577593f2@pitrou.net> Message-ID: On Wed, Dec 19, 2012 at 10:55 AM, Antoine Pitrou wrote: > Why not let implementations raise NotImplementedError when they don't > want to support certain use cases? That's always a last resort, but the problem is that an app or library can't be sure that everything will work, and the failure might be subtle and late. That said, my remark about the loop needing to be wholly threadsafe was misguided. I think there are two reasonable policies with regards to thread that any reasonable implementation could follow: 1. There's only one loop, it runs in a dedicated thread, and other threads can only use call_soon_threadsafe(). 2. There's (potentially) a loop per thread, and these are effectively independent. (TBD: How would these pass work or results between one another? Probably by calling call_soon_threadsafe() back and forth.) The default implementation actually takes a halfway position: it supports (2), but you must manually call init_event_loop() in each thread except for the main thread, and you must call run() in each thread, including the main thread. The requirement to call init_event_loop() is to prevent code running in some random thread trying to schedule callback, which would never run because the thread isn't calling run(). When we get further along we may have a compliance test suite, separate from the unittests (I am working on unittests but I'm aware they aren't at all thorough yet). > Is it the plan that code written for an event loop will always work with > another one? The plan is to make it easy to write code that will work with all (or most) event loops, without making it impossible to write code that depends on a specific event loop implementation. This is Python's general attitude about platform-specific APIs. > Will tulip offer more than the GCD of the other event loops? People writing PEP 3156 compliant implementations on top of some other event loop, whether it's Twisted or libuv, may have to emulate some functionality, and there will also be some functionality that their underlying loop supports that PEP 3156 doesn't. The goal is to offer a wide enough range of features that it's possible to write many useful types of apps without resorting to platform-specific APIs, and to make these fast enough. But if an app knows it will only be used with a certain loop implementation it is free to use extra APIs that only that loop offers. There's still a benefit in that situation: the app may be tied to a platform, but it may still want to use some 3rd party libraries that also require event loop integration, and by conforming to PEP 3156 the platform's loop implementation can ensure that such libraries actually work and interact with the rest of the app in a reasonable manner. (In particular, they should all use the same Future and Task classes.) -- --Guido van Rossum (python.org/~guido) From jonathan at slenders.be Thu Dec 20 23:52:27 2012 From: jonathan at slenders.be (Jonathan Slenders) Date: Thu, 20 Dec 2012 23:52:27 +0100 Subject: [Python-ideas] An async facade? Message-ID: Hi All, This week I finished some Python syntax on a Pypy fork. It was an experiment I was working on this week. We really needed a cleaner way of writing asynchronous code. So, instead of using the yield keyword and an @async decorator, we implemented the 'await' keyword, similar to c#. So, because I just now subscribed to python-ideas, I cannot reply immediately to the following thread: An async facade? (was Re: [Python-Dev] Socket timeout and completion based sockets) Anyway, like c# does, I implemented the await keyword for Python, and should say that I'm really confident of the usability of the result. Personally, I think this is a very clean solution for Twisted's @defer.inlineCalbacks, Tornado's @gen.engine, and similar functions in other async frameworks. We use it right now in a commercial web environment, where third party users should have to be able to write asynchronous code as easy as possible in a web based IDE. https://bitbucket.org/jonathanslenders/pypy Two interpreter hooks were added: (both accept a callable as parameter.) >>>> sys.setawaithandler(wrapper) >>>> sys.setawaitresultwrapper(result_wrapper) The first will set the I/O scheduler a functions for wrapping others functions which contain 'await' instead of 'yield'. This wrapper function will receive a generator as input. So, 'await' still acts like 'yield' for the interpreter, but the result is automatically wrapped by this function, if the await keyword was found. The second function will wrap the return result of asynchronous functions. So, unlike normal generators with 'yield' keywords, where 'await' has been used, we still can return a result. But this result will be wrapped by this function, so that the generator in the scheduler will be able te recognize the returned result. This: @defer.inlineCallbacks def async_function(deferred_param): a = yield deferred_param b = yield some_call(a) yield defer.returnValue(b) will now become: def async_function(deferred_param): a = await deferred_param b = await some_call(a) return b So, while still being explicit, it requires minimal syntax, and allows distinguishing between when to 'wait' for an asynchronous task, and when to pass the Future object around. I really have no idea whether this has been proposed before, I can only say that we are using it and it works pretty well. Cheers, Jonathan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Fri Dec 21 00:17:12 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Thu, 20 Dec 2012 18:17:12 -0500 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: Note that the "return b" is already being handled through the "StopIteration" proposal. I'm not a fan of the new syntax because it means that removing all the "await" keywords from a method changes the return value. Requiring the decorator means that this can cleanly be handled in all cases, even if the decorator implementation is a bit ugly. This means that all we have left is the "await" vs. "yield" vs. "yield from" discussion. I don't think the new valuable enough to warrant a new keyword. On Thu, Dec 20, 2012 at 5:52 PM, Jonathan Slenders wrote: > Hi All, > > This week I finished some Python syntax on a Pypy fork. It was an > experiment I was working on this week. We really needed a cleaner way of > writing asynchronous code. So, instead of using the yield keyword and an > @async decorator, we implemented the 'await' keyword, similar to c#. > > So, because I just now subscribed to python-ideas, I cannot reply > immediately to the following thread: > > An async facade? (was Re: [Python-Dev] Socket timeout and completion based > sockets) > > > Anyway, like c# does, I implemented the await keyword for Python, and > should say that I'm really confident of the usability of the result. Personally, > I think this is a very clean solution for Twisted's @defer.inlineCalbacks, > Tornado's @gen.engine, and similar functions in other async frameworks. We > use it right now in a commercial web environment, where third party users > should have to be able to write asynchronous code as easy as possible in a > web based IDE. > > https://bitbucket.org/jonathanslenders/pypy > > Two interpreter hooks were added: (both accept a callable as parameter.) > > >>>> sys.setawaithandler(wrapper) > >>>> sys.setawaitresultwrapper(result_wrapper) > > > The first will set the I/O scheduler a functions for wrapping others > functions which contain 'await' instead of 'yield'. This wrapper function > will receive a generator as input. So, 'await' still acts like 'yield' for > the interpreter, but the result is automatically wrapped by this function, > if the await keyword was found. > > The second function will wrap the return result of asynchronous functions. > So, unlike normal generators with 'yield' keywords, where 'await' has been > used, we still can return a result. But this result will be wrapped by this > function, so that the generator in the scheduler will be able > te recognize the returned result. > > This: > > @defer.inlineCallbacks > def async_function(deferred_param): > a = yield deferred_param > b = yield some_call(a) > yield defer.returnValue(b) > > > will now become: > > def async_function(deferred_param): > a = await deferred_param > b = await some_call(a) > return b > > > So, while still being explicit, it requires minimal syntax, and > allows distinguishing between when to 'wait' for an asynchronous task, and > when to pass the Future object around. > > I really have no idea whether this has been proposed before, I can only > say that we are using it and it works pretty well. > > Cheers, > Jonathan > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Fri Dec 21 00:21:20 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 20 Dec 2012 18:21:20 -0500 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: <50D39D70.5010802@udel.edu> On 12/20/2012 5:52 PM, Jonathan Slenders wrote: Please post plain text rather than html (same for all python.org lists). Html posts ofter come out a bit weird. > Anyway, like c# does, I implemented the await keyword for Python, On my reader, this is normal size text. > Personally, I think this is a very clean solution for Twisted's While this was half sized micro text. (It is normal here because by default Thunderbird converts to plain text for newsgroups and I am posting via news.gmane.org.) The alternation between full and half-height characters makes your post hard to read. -- Terry Jan Reedy From jonathan at slenders.be Fri Dec 21 00:34:55 2012 From: jonathan at slenders.be (Jonathan Slenders) Date: Fri, 21 Dec 2012 00:34:55 +0100 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: As removing "yield" changes the return value of a function. Nothing different. For me +1 for the "StopIteration" proposal. That's certainly better, and more generic than what I said. So, the difference is still that the "await" proposal makes the @async decorator implicit. I'm still in favor of this because in asynchronous code, you can have really many functions with this decorator. And if someone forgets about that, getting a generator object instead of a Future is quite different in semantics. P.S. excuse me, Terry. 2012/12/21 Jasper St. Pierre > Note that the "return b" is already being handled through the > "StopIteration" proposal. > > I'm not a fan of the new syntax because it means that removing all the > "await" keywords from a method changes the return value. Requiring the > decorator means that this can cleanly be handled in all cases, even if the > decorator implementation is a bit ugly. > > This means that all we have left is the "await" vs. "yield" vs. "yield > from" discussion. I don't think the new valuable enough to warrant a new > keyword. > > > > On Thu, Dec 20, 2012 at 5:52 PM, Jonathan Slenders wrote: > >> Hi All, >> >> This week I finished some Python syntax on a Pypy fork. It was an >> experiment I was working on this week. We really needed a cleaner way of >> writing asynchronous code. So, instead of using the yield keyword and an >> @async decorator, we implemented the 'await' keyword, similar to c#. >> >> So, because I just now subscribed to python-ideas, I cannot reply >> immediately to the following thread: >> >> An async facade? (was Re: [Python-Dev] Socket timeout and completion >> based sockets) >> >> >> Anyway, like c# does, I implemented the await keyword for Python, and >> should say that I'm really confident of the usability of the result. Personally, >> I think this is a very clean solution for Twisted's @defer.inlineCalbacks, >> Tornado's @gen.engine, and similar functions in other async frameworks. We >> use it right now in a commercial web environment, where third party users >> should have to be able to write asynchronous code as easy as possible in a >> web based IDE. >> >> https://bitbucket.org/jonathanslenders/pypy >> >> Two interpreter hooks were added: (both accept a callable as parameter.) >> >> >>>> sys.setawaithandler(wrapper) >> >>>> sys.setawaitresultwrapper(result_wrapper) >> >> >> The first will set the I/O scheduler a functions for wrapping others >> functions which contain 'await' instead of 'yield'. This wrapper function >> will receive a generator as input. So, 'await' still acts like 'yield' for >> the interpreter, but the result is automatically wrapped by this function, >> if the await keyword was found. >> >> The second function will wrap the return result >> of asynchronous functions. So, unlike normal generators with 'yield' >> keywords, where 'await' has been used, we still can return a result. But >> this result will be wrapped by this function, so that the generator in >> the scheduler will be able te recognize the returned result. >> >> This: >> >> @defer.inlineCallbacks >> def async_function(deferred_param): >> a = yield deferred_param >> b = yield some_call(a) >> yield defer.returnValue(b) >> >> >> will now become: >> >> def async_function(deferred_param): >> a = await deferred_param >> b = await some_call(a) >> return b >> >> >> So, while still being explicit, it requires minimal syntax, and >> allows distinguishing between when to 'wait' for an asynchronous task, and >> when to pass the Future object around. >> >> I really have no idea whether this has been proposed before, I can only >> say that we are using it and it works pretty well. >> >> Cheers, >> Jonathan >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Jasper > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 21 00:46:03 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 20 Dec 2012 15:46:03 -0800 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: Have you read PEP 3156 and PEP 380? Instead of await, Python 3.3 has yield from, with the same semantics. This is somewhat more verbose, but has the advantage that it doesn't introduce a new keyword, and it's already in Python 3.3, so you can start using it now -- no fork of the language required. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Dec 21 00:49:44 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 20 Dec 2012 15:49:44 -0800 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: On Thu, Dec 20, 2012 at 3:34 PM, Jonathan Slenders wrote: > So, the difference is still that the "await" proposal makes the @async > decorator implicit. I'm still in favor of this because in asynchronous code, > you can have really many functions with this decorator. And if someone > forgets about that, getting a generator object instead of a Future is quite > different in semantics. Carefully read PEP 3156, and the tulip implementation: http://code.google.com/p/tulip/source/browse/tulip/tasks.py . The @coroutine decorator is technically redundant when you use yield from. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Fri Dec 21 11:51:18 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 21 Dec 2012 10:51:18 +0000 (UTC) Subject: [Python-ideas] Tree as a data structure (Was: Graph class) References: Message-ID: Jim Jewett writes: > > On 12/19/12, anatoly techtonik wrote: > > On Sun, Dec 16, 2012 at 6:41 PM, Guido van Rossum wrote: > > >> I think of graphs and trees as patterns, not data structures. > > > In my world strings, ints and lists are 1D data types, and tree can be a > > very important 2D data structure. > > Yes; the catch is that the details of that data structure will differ > depending on the problem. Most problems do not need the fancy > algorithms -- or the extra overhead that supports them. Since a > simple tree (or graph) is easy to write, and the fiddly details are > often -- but not always -- wasted overhead, it doesn't make sense to > designate a single physical structure as "the" tree (or graph) > representation. Do you care about the overhead of an OrderedDict? As long as you are not manipulating a huge amount of data, a generic tree structure such as provided by e.g. the networkx library is perfectly fine. And if you want to reimplement a more optimized structure, sure, that's fine. But that's not an argument against a generic data structure that would be sufficient for 99.9% of all use cases. Regards Antoine. From jstpierre at mecheye.net Fri Dec 21 12:38:53 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 21 Dec 2012 06:38:53 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: I read over the wait_one() proposal again, and I still don't understand it, so it would need more explanation to me. But I don't see the point of avoiding callbacks. In this case, we have two or more in-flight requests that can be finished at any time. This does not have a synchronous code equivalent -- callbacks are pretty much the only mechanism we can use to be notified when something is done. On Wed, Dec 19, 2012 at 12:26 PM, Guido van Rossum wrote: > On Tue, Dec 18, 2012 at 10:41 PM, Jasper St. Pierre > wrote: > > On Wed, Dec 19, 2012 at 1:24 AM, Guido van Rossum > wrote: > > > > ... snip ... > > > >> That looks reasonable too, although the signature may need to be > adjusted. > >> (How does it cancel the remaining tasks if it wants to? Or does par() do > >> that if this callback raises?) maybe call it filter? > > > > > > The subtask completion callback can call abort() on the overall par_task, > > Tasks don't have abort(), I suppose you meant cancel(). > > > which could cancel the rest of the unfinished tasks. > > > > def abort_task(par_task, subtask): > > try: > > return subtask.result() > > except ValueError: > > par_task.abort() > > > > The issue with this approach is that since the par() would return values > > again, not tasks, we'd can't handle errors locally. Futures are also > > immutable, so we can't modify the values after they resolve. Maybe we'd > have > > something like: > > > > def fail_silently(par_task, subtask): > > try: > > subtask.result() > > except ValueError as e: > > return Future.completed(None) # an already completed future > that > > has a value of None, sorry, don't remember the exact spelling > > else: > > return subtask > > > > which allows us: > > > > for task in par(*tasks, subtask_completion=fail_silently): > > # ... > > > > Which allows us both local error handling, as well as batch error > handling. > > But it's very verbose from the side of the callback. Hm. > > Hm indeed. Unless you can get your thoughts straight I think I'd > rather go with the wait_one() API, which can be used to build anything > else you like, but doesn't require one to be quite so clever with > callbacks. (Did I say I hate callbacks?) > > -- > --Guido van Rossum (python.org/~guido) > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at slenders.be Fri Dec 21 12:47:53 2012 From: jonathan at slenders.be (Jonathan Slenders) Date: Fri, 21 Dec 2012 12:47:53 +0100 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: Thank you, Guido! I didn't know about this PEP, but it looks interesting. I'll try to find some spare time this weekend to read through the PEP, maybe giving some feedback. Cheers! 2012/12/21 Guido van Rossum > On Thu, Dec 20, 2012 at 3:34 PM, Jonathan Slenders > wrote: > > So, the difference is still that the "await" proposal makes the @async > > decorator implicit. I'm still in favor of this because in asynchronous > code, > > you can have really many functions with this decorator. And if someone > > forgets about that, getting a generator object instead of a Future is > quite > > different in semantics. > > Carefully read PEP 3156, and the tulip implementation: > http://code.google.com/p/tulip/source/browse/tulip/tasks.py . The > @coroutine decorator is technically redundant when you use yield from. > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Fri Dec 21 15:31:33 2012 From: geertj at gmail.com (Geert Jansen) Date: Fri, 21 Dec 2012 15:31:33 +0100 Subject: [Python-ideas] Tulip patches Message-ID: Hi, [if this is not the right forum to post patches for tulip, please redirect me to the correct one. There doesn't appear to be a mailing list for tulip at the moment. And this list is where most/all of the discussion is taking place.] Please find attached 4 patches: 0001-run-fd-callbacks.patch This patch will run callbacks for readers and writers in the same loop iteration as when the fd got ready. Copying from my previous email, this is to support the following idiom: # handle_read() sets the "ready" flag loop.add_reader(fd, handle_read) while not ready: loop.run_once() The patch currently dispatches callbacks twice in each iteration, once before blocking and once after. I tried to dispatch only once after blocking, but this made the SSL transport test hang. The reason is that the create_transport task is scheduled with call_soon(), and only when the task first runs, a file descriptor is added. So unless you dispatch before blocking, this task will never get started. 0002-call-every-iteration.patch This adds a call_every_iteration() method to the event loop. This callback runs *before* blocking. 0003-fix-typo.patch 0004-remove-wrong-comments.patch Two trivial patches. Regards, Geert -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-run-fd-callbacks.patch Type: application/octet-stream Size: 2088 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-call-every-iteration.patch Type: application/octet-stream Size: 2699 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0003-fix-typo.patch Type: application/octet-stream Size: 640 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0004-remove-wrong-comments.patch Type: application/octet-stream Size: 856 bytes Desc: not available URL: From guido at python.org Fri Dec 21 16:45:46 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 07:45:46 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Fri, Dec 21, 2012 at 3:38 AM, Jasper St. Pierre wrote: > I read over the wait_one() proposal again, and I still don't understand it, > so it would need more explanation to me. > > But I don't see the point of avoiding callbacks. In this case, we have two > or more in-flight requests that can be finished at any time. This does not > have a synchronous code equivalent -- callbacks are pretty much the only > mechanism we can use to be notified when something is done. Perhaps you haven't quite gotten used to coroutines? There are callbacks underneath making it all work, but the user code rarely sees those. Let's start with the following *synchronous* code as an example. def indexer(urls): # urls is a set of strings done = {} # dict mapping url to (data, links) while urls: data = urlfetch(url.pop()) links = parse(data) done[url] = (data, links) for link in link: if link not in urls and link not in done: urls.add(link) return done (Let's hope this is indexing a small static site and not the entire internet. :-) Now suppose we make urlfetch() a coroutine and we want to run all the urlfetches in parallel. The toplevel index() function becomes a coroutine too. We use the convention that coroutines' names end in _async, to remind us that they return Futures. The phrase "x = yield from foo_async()" is equivalent to the synchronous call "x = foo()". @coroutine def indexer_async(urls): done = {} # A dict mapping tasks to urls: running = {Task(urlfetch_async(url)), url for url in urls} while running: # The yield from will return a Future tsk = *yield from* wait_one_async(running) url = running.pop(tsk) data = tsk.result() # May raise links = parse(data) done[url] = (data, links) for link in links: if link not in urls and link not in done: urls.add(link) tsk = Task(urlfetch_async(link) running[tsk] = link return done This creates len(urls) initial tasks to parse the urls, and creates new urls as new links are parsed. The assumption here is that the only blocking I/O is done in the urlfetch_async() task. The indexer blocks at the *yield from* in the marked line, at which point any or all of the urlfetch tasks get to run some, and once one of them completes, wait_one_async() returns that task. (A task is a Future that wraps a coroutine, by the way. wait_one_async() works with Futures too.) We then inspect the completed task with .result(), which gives us the data, which we parse as usual. The data structures are a little more elaborate because we have to keep track of the mapping from task to url. We add new tasks to the running dict as soon as we have parsed their links, so they can all get started. Note that in PEP 3156, I don't use the _async convention, but everything in this example will work there once wait_one() is added. Also note that the trick is that wait_one_async() must return a Future whose result is another Future. The first Future is used (and thrown away) by *yield from*; that Future's result is one of the original Futures representing a completed task. I hope this is clearer. I'm not saying this is the best or only way of writing an async indexer using yield from (and I left out error handling) but hopefully it is an illustrative example. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Fri Dec 21 16:57:06 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 21 Dec 2012 10:57:06 -0500 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Fri, Dec 21, 2012 at 10:45 AM, Guido van Rossum wrote: ... snip ... (gmail messed up parsing this, apparently) Aha, that cleared it up, thanks. wait_one_async() takes an iterable of tasks, and returns a Future that will fire when a Future completes, containing that Future. I can't think of anything *wrong* with that, except that if anything, 1) it feels like a bit of an abuse to use Futures this way, 2) it feels a bit low-level. -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 21 16:57:04 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 07:57:04 -0800 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 6:31 AM, Geert Jansen wrote: > > Hi, > > [if this is not the right forum to post patches for tulip, please > redirect me to the correct one. There doesn't appear to be a mailing > list for tulip at the moment. And this list is where most/all of the > discussion is taking place.] This is a fine place, but you would make my life even easier by uploading the patches to codereview.appspot.com, so I can review them and send comments in-line. I've given you checkin permissions. Please send a contributor form to the PSF (http://www.python.org/psf/contrib/contrib-form/). > Please find attached 4 patches: > > 0001-run-fd-callbacks.patch > > This patch will run callbacks for readers and writers in the same loop > iteration as when the fd got ready. Copying from my previous email, > this is to support the following idiom: > > # handle_read() sets the "ready" flag > loop.add_reader(fd, handle_read) > while not ready: > loop.run_once() > > The patch currently dispatches callbacks twice in each iteration, once > before blocking and once after. I tried to dispatch only once after > blocking, but this made the SSL transport test hang. The reason is > that the create_transport task is scheduled with call_soon(), and only > when the task first runs, a file descriptor is added. So unless you > dispatch before blocking, this task will never get started. Interesting. Go ahead and submit. > 0002-call-every-iteration.patch > > This adds a call_every_iteration() method to the event loop. This > callback runs *before* blocking. There's one odd thing here: you remove cancelled everytime handlers *after* already scheduling them. It would seem to make more sense to schedule them first. Also, a faster way to do this would be self._everytime = [handler in self._everytime if not handler.cancelled] (Even if you iterate from the back, remove() is still O(N), so if half the handlers are to be removed, your original code would be O(N**2).) > 0003-fix-typo.patch > 0004-remove-wrong-comments.patch > > Two trivial patches. Go ahead! PS. If you want to set up a mailing list or other cleverness I can set you up as a project admin. (I currently have all patches mailed to me but we may want to set up a separate list for that.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Dec 21 17:03:21 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 08:03:21 -0800 Subject: [Python-ideas] async: feedback on EventLoop API In-Reply-To: References: <3B90AC3A-C73B-4BD1-9BE8-9ECF21F0D243@umbrellacode.com> Message-ID: On Fri, Dec 21, 2012 at 7:57 AM, Jasper St. Pierre wrote: > Aha, that cleared it up, thanks. wait_one_async() takes an iterable of > tasks, and returns a Future that will fire when a Future completes, > containing that Future. > > I can't think of anything *wrong* with that, except that if anything, 1) it > feels like a bit of an abuse to use Futures this way, 2) it feels a bit > low-level. But not more low-level than callbacks. Once you're used to coroutines and Futures, you don't want things that use callbacks. Fortunately there's an easy way to turn a callback into a Future: f = Future() old_style_async(callback=f.set_result) result = yield from f Assuming old_style_async() calls its callback with one arg, a useful result, that result will now end up in the variable 'result'. If this happens a lot it's easy to wrap it in a helper function, so you can write: result = yield from wrap_in_future(old_style_async) -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Dec 21 18:30:46 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 09:30:46 -0800 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 7:57 AM, Guido van Rossum wrote: > On Fri, Dec 21, 2012 at 6:31 AM, Geert Jansen wrote: >> Please find attached 4 patches: >> >> 0001-run-fd-callbacks.patch >> >> This patch will run callbacks for readers and writers in the same loop >> iteration as when the fd got ready. Copying from my previous email, >> this is to support the following idiom: >> >> # handle_read() sets the "ready" flag >> loop.add_reader(fd, handle_read) >> while not ready: >> loop.run_once() >> >> The patch currently dispatches callbacks twice in each iteration, once >> before blocking and once after. I tried to dispatch only once after >> blocking, but this made the SSL transport test hang. The reason is >> that the create_transport task is scheduled with call_soon(), and only >> when the task first runs, a file descriptor is added. So unless you >> dispatch before blocking, this task will never get started. > > Interesting. Go ahead and submit. Whoa! I just figured out the problem. You don't have to run the ready queue twice. You just have to set the poll timeout to 0 if there's anything in the ready queue. Please send me an updated patch before submitting. -- --Guido van Rossum (python.org/~guido) From felipecruz at loogica.net Fri Dec 21 19:09:05 2012 From: felipecruz at loogica.net (Felipe Cruz) Date: Fri, 21 Dec 2012 16:09:05 -0200 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: Hi! I've been working in some tests to the pollers (Kqueue, Epoll ..) that may interest you guys.. My goal is to create test cases for each poller situation (ie: how to detect client disconnection with epoll and unix pipes? or tcp sockets..) and understand how all those pollers are different from each other and how we can map a generic events with all those possible underlying implementations. I already did some Epoll and Kqueue tests here: https://bitbucket.org/felipecruz/tulip/commits best regards, Felipe Cruz 2012/12/21 Guido van Rossum > On Fri, Dec 21, 2012 at 7:57 AM, Guido van Rossum > wrote: > > On Fri, Dec 21, 2012 at 6:31 AM, Geert Jansen wrote: > >> Please find attached 4 patches: > >> > >> 0001-run-fd-callbacks.patch > >> > >> This patch will run callbacks for readers and writers in the same loop > >> iteration as when the fd got ready. Copying from my previous email, > >> this is to support the following idiom: > >> > >> # handle_read() sets the "ready" flag > >> loop.add_reader(fd, handle_read) > >> while not ready: > >> loop.run_once() > >> > >> The patch currently dispatches callbacks twice in each iteration, once > >> before blocking and once after. I tried to dispatch only once after > >> blocking, but this made the SSL transport test hang. The reason is > >> that the create_transport task is scheduled with call_soon(), and only > >> when the task first runs, a file descriptor is added. So unless you > >> dispatch before blocking, this task will never get started. > > > > Interesting. Go ahead and submit. > > Whoa! I just figured out the problem. You don't have to run the ready > queue twice. You just have to set the poll timeout to 0 if there's > anything in the ready queue. Please send me an updated patch before > submitting. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Dec 21 19:38:35 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 10:38:35 -0800 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 10:09 AM, Felipe Cruz wrote: > I've been working in some tests to the pollers (Kqueue, Epoll ..) that may > interest you guys.. My goal is to create test cases for each poller > situation (ie: how to detect client disconnection with epoll and unix pipes? > or tcp sockets..) and understand how all those pollers are different from > each other and how we can map a generic events with all those possible > underlying implementations. That goal sounds great. > I already did some Epoll and Kqueue tests here: > https://bitbucket.org/felipecruz/tulip/commits Hm... Your clone is behind, a lot has changed since you made those commits. You may have to merge from the main repo. Specific comments: - I prefer to use my existing test infrastructure rather than 3rd party tools; dependencies in this early stage make it too hard for people to experiment. (It's okay to add a rule to the Makefile to invoke your favorite test discovery tool; but it's not okay to add imports to the Python code that depends on a 3rd party test framework.) - Your code to add flags or eventmask to the events list seems incomplete -- the UnixEventLoop doesn't expect poll() to return a tuple so all my own tests break... -- --Guido van Rossum (python.org/~guido) From felipecruz at loogica.net Fri Dec 21 19:46:58 2012 From: felipecruz at loogica.net (Felipe Cruz) Date: Fri, 21 Dec 2012 16:46:58 -0200 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: Hi Guido! I was just hacking without thinking about actually make patches. I can make patches without 3rd parties dependencies and no Makefile modification. :) I'll fix the second issue. 2012/12/21 Guido van Rossum > On Fri, Dec 21, 2012 at 10:09 AM, Felipe Cruz > wrote: > > I've been working in some tests to the pollers (Kqueue, Epoll ..) that > may > > interest you guys.. My goal is to create test cases for each poller > > situation (ie: how to detect client disconnection with epoll and unix > pipes? > > or tcp sockets..) and understand how all those pollers are different from > > each other and how we can map a generic events with all those possible > > underlying implementations. > > That goal sounds great. > > > I already did some Epoll and Kqueue tests here: > > https://bitbucket.org/felipecruz/tulip/commits > > Hm... Your clone is behind, a lot has changed since you made those > commits. You may have to merge from the main repo. > > Specific comments: > > - I prefer to use my existing test infrastructure rather than 3rd > party tools; dependencies in this early stage make it too hard for > people to experiment. (It's okay to add a rule to the Makefile to > invoke your favorite test discovery tool; but it's not okay to add > imports to the Python code that depends on a 3rd party test > framework.) > > - Your code to add flags or eventmask to the events list seems > incomplete -- the UnixEventLoop doesn't expect poll() to return a > tuple so all my own tests break... > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jnoller at gmail.com Fri Dec 21 20:06:47 2012 From: jnoller at gmail.com (Jesse Noller) Date: Fri, 21 Dec 2012 14:06:47 -0500 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Friday, December 21, 2012 at 1:57 PM, Guido van Rossum wrote: > Dear python-dev *and* python-ideas, > > I am posting PEP 3156 here for early review and discussion. As you can > see from the liberally sprinkled TBD entries it is not done, but I am > about to disappear on vacation for a few weeks and I am reasonably > happy with the state of things so far. (Of course feedback may change > this. :-) Also, there has already been some discussion on python-ideas > (and even on Twitter) so I don't want python-dev to feel out of the > loop -- this *is* a proposal for a new standard library module. (But > no, I haven't picked the module name yet. :-) > > There's an -- also incomplete -- reference implementation at > http://code.google.com/p/tulip/ -- unlike the first version of tulip, > this version actually has (some) unittests. > > Let the bikeshedding begin! > > (Oh, happy holidays too. :-) > > -- > --Guido van Rossum (python.org/~guido (http://python.org/~guido)) > I really do like tulip as the name. It's quite pretty. From guido at python.org Fri Dec 21 20:09:39 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 11:09:39 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 11:06 AM, Jesse Noller wrote: > I really do like tulip as the name. It's quite pretty. I chose it because Twisted and Tornado both start with T. But those have kind of dark associations; I wanted to offset that with something lighter. (OTOH we could use a black tulip as a logo. :-) Regardless, it's not the kind of name we tend to use for the stdlib. It'll probably end up being asynclib or something... -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Dec 21 19:57:12 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 10:57:12 -0800 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted Message-ID: Dear python-dev *and* python-ideas, I am posting PEP 3156 here for early review and discussion. As you can see from the liberally sprinkled TBD entries it is not done, but I am about to disappear on vacation for a few weeks and I am reasonably happy with the state of things so far. (Of course feedback may change this. :-) Also, there has already been some discussion on python-ideas (and even on Twitter) so I don't want python-dev to feel out of the loop -- this *is* a proposal for a new standard library module. (But no, I haven't picked the module name yet. :-) There's an -- also incomplete -- reference implementation at http://code.google.com/p/tulip/ -- unlike the first version of tulip, this version actually has (some) unittests. Let the bikeshedding begin! (Oh, happy holidays too. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- PEP: 3156 Title: Asynchronous IO Support Rebooted Version: $Revision$ Last-Modified: $Date$ Author: Guido van Rossum Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 12-Dec-2012 Post-History: TBD Abstract ======== This is a proposal for asynchronous I/O in Python 3, starting with Python 3.3. Consider this the concrete proposal that is missing from PEP 3153. The proposal includes a pluggable event loop API, transport and protocol abstractions similar to those in Twisted, and a higher-level scheduler based on ``yield from`` (PEP 380). A reference implementation is in the works under the code name tulip. Introduction ============ The event loop is the place where most interoperability occurs. It should be easy for (Python 3.3 ports of) frameworks like Twisted, Tornado, or ZeroMQ to either adapt the default event loop implementation to their needs using a lightweight wrapper or proxy, or to replace the default event loop implementation with an adaptation of their own event loop implementation. (Some frameworks, like Twisted, have multiple event loop implementations. This should not be a problem since these all have the same interface.) It should even be possible for two different third-party frameworks to interoperate, either by sharing the default event loop implementation (each using its own adapter), or by sharing the event loop implementation of either framework. In the latter case two levels of adaptation would occur (from framework A's event loop to the standard event loop interface, and from there to framework B's event loop). Which event loop implementation is used should be under control of the main program (though a default policy for event loop selection is provided). Thus, two separate APIs are defined: - getting and setting the current event loop object - the interface of a conforming event loop and its minimum guarantees An event loop implementation may provide additional methods and guarantees. The event loop interface does not depend on ``yield from``. Rather, it uses a combination of callbacks, additional interfaces (transports and protocols), and Futures. The latter are similar to those defined in PEP 3148, but have a different implementation and are not tied to threads. In particular, they have no wait() method; the user is expected to use callbacks. For users (like myself) who don't like using callbacks, a scheduler is provided for writing asynchronous I/O code as coroutines using the PEP 380 ``yield from`` expressions. The scheduler is not pluggable; pluggability occurs at the event loop level, and the scheduler should work with any conforming event loop implementation. For interoperability between code written using coroutines and other async frameworks, the scheduler has a Task class that behaves like a Future. A framework that interoperates at the event loop level can wait for a Future to complete by adding a callback to the Future. Likewise, the scheduler offers an operation to suspend a coroutine until a callback is called. Limited interoperability with threads is provided by the event loop interface; there is an API to submit a function to an executor (see PEP 3148) which returns a Future that is compatible with the event loop. Non-goals ========= Interoperability with systems like Stackless Python or greenlets/gevent is not a goal of this PEP. Specification ============= Dependencies ------------ Python 3.3 is required. No new language or standard library features beyond Python 3.3 are required. No third-party modules or packages are required. Module Namespace ---------------- The specification here will live in a new toplevel package. Different components will live in separate submodules of that package. The package will import common APIs from their respective submodules and make them available as package attributes (similar to the way the email package works). The name of the toplevel package is currently unspecified. The reference implementation uses the name 'tulip', but the name will change to something more boring if and when the implementation is moved into the standard library (hopefully for Python 3.4). Until the boring name is chosen, this PEP will use 'tulip' as the toplevel package name. Classes and functions given without a module name are assumed to be accessed via the toplevel package. Event Loop Policy: Getting and Setting the Event Loop ----------------------------------------------------- To get the current event loop, use ``get_event_loop()``. This returns an instance of the ``EventLoop`` class defined below or an equivalent object. It is possible that ``get_event_loop()`` returns a different object depending on the current thread, or depending on some other notion of context. To set the current event loop, use ``set_event_loop(event_loop)``, where ``event_loop`` is an instance of the ``EventLoop`` class or equivalent. This uses the same notion of context as ``get_event_loop()``. For the benefit of unit tests and other special cases there's a third policy function: ``init_event_loop()``, which creates a new EventLoop instance and calls ``set_event_loop()`` with it. TBD: Maybe we should have a ``create_default_event_loop_instance()`` function instead? To change the way the above three functions work (including their notion of context), call ``set_event_loop_policy(policy)``, where ``policy`` is an event loop policy object. The policy object can be any object that has methods ``get_event_loop()``, ``set_event_loop(event_loop)`` and ``init_event_loop()`` behaving like the functions described above. The default event loop policy is an instance of the class ``DefaultEventLoopPolicy``. The current event loop policy object can be retrieved by calling ``get_event_loop_policy()``. An event loop policy may but does not have to enforce that there is only one event loop in existence. The default event loop policy does not enforce this, but it does enforce that there is only one event loop per thread. Event Loop Interface -------------------- (A note about times: as usual in Python, all timeouts, intervals and delays are measured in seconds, and may be ints or floats. The accuracy and precision of the clock are up to the implementation; the default implementation uses ``time.monotonic()``.) A conforming event loop object has the following methods: - ``run()``. Runs the event loop until there is nothing left to do. This means, in particular: - No more calls scheduled with ``call_later()``, ``call_repeatedly()``, ``call_soon()``, or ``call_soon_threadsafe()``, except for cancelled calls. - No more registered file descriptors. It is up to the registering party to unregister a file descriptor when it is closed. Note: ``run()`` blocks until the termination condition is met, or until ``stop()`` is called. Note: if you schedule a call with ``call_repeatedly()``, ``run()`` will not exit until you cancel it. TBD: How many variants of this do we really need? - ``stop()``. Stops the event loop as soon as it is convenient. It is fine to restart the loop with ``run()`` (or one of its variants) subsequently. Note: How soon exactly is up to the implementation. All immediate callbacks that were already scheduled to run before ``stop()`` is called must still be run, but callbacks scheduled after it is called (or scheduled to be run later) will not be run. - ``run_forever()``. Runs the event loop until ``stop()`` is called. - ``run_until_complete(future, timeout=None)``. Runs the event loop until the Future is done. If a timeout is given, it waits at most that long. If the Future is done, its result is returned, or its exception is raised; if the timeout expires before the Future is done, or if ``stop()`` is called, ``TimeoutError`` is raised (but the Future is not cancelled). This cannot be called when the event loop is already running. Note: This API is most useful for tests and the like. It should not be used as a substitute for ``yield from future`` or other ways to wait for a Future (e.g. registering a done callback). - ``run_once(timeout=None)``. Run the event loop for a little while. If a timeout is given, an I/O poll made will block at most that long; otherwise, an I/O poll is not constrained in time. Note: Exactlly how much work this does is up to the implementation. One constraint: if a callback immediately schedules itself using ``call_soon()``, causing an infinite loop, ``run_once()`` should still return. - ``call_later(delay, callback, *args)``. Arrange for ``callback(*args)`` to be called approximately ``delay`` seconds in the future, once, unless cancelled. Returns a ``Handler`` object representing the callback, whose ``cancel()`` method can be used to cancel the callback. - ``call_repeatedly(interval, callback, **args)``. Like ``call_later()`` but calls the callback repeatedly, every ``interval`` seconds, until the ``Handler`` returned is cancelled. The first call is in ``interval`` seconds. - ``call_soon(callback, *args)``. Equivalent to ``call_later(0, callback, *args)``. - ``call_soon_threadsafe(callback, *args)``. Like ``call_soon(callback, *args)``, but when called from another thread while the event loop is blocked waiting for I/O, unblocks the event loop. This is the *only* method that is safe to call from another thread or from a signal handler. (To schedule a callback for a later time in a threadsafe manner, you can use ``ev.call_soon_threadsafe(ev.call_later, when, callback, *args)``.) - TBD: A way to register a callback that is already wrapped in a ``Handler``. Maybe ``call_soon()`` could just check ``isinstance(callback, Handler)``? It should silently skip a cancelled callback. Some methods in the standard conforming interface return Futures: - ``wrap_future(future)``. This takes a PEP 3148 Future (i.e., an instance of ``concurrent.futures.Future``) and returns a Future compatible with the event loop (i.e., a ``tulip.Future`` instance). - ``run_in_executor(executor, function, *args)``. Arrange to call ``function(*args)`` in an executor (see PEP 3148). Returns a Future whose result on success is the return value that call. This is equivalent to ``wrap_future(executor.submit(function, *args))``. If ``executor`` is ``None``, a default ``ThreadPoolExecutor`` with 5 threads is used. (TBD: Should the default executor be shared between different event loops? Should we even have a default executor? Should be be able to set its thread count? Shoul we even have this method?) - ``set_default_executor(executor)``. Set the default executor used by ``run_in_executor()``. - ``getaddrinfo(host, port, family=0, type=0, proto=0, flags=0)``. Similar to the ``socket.getaddrinfo()`` function but returns a Future. The Future's result on success will be a list of the same format as returned by ``socket.getaddrinfo()``. The default implementation calls ``socket.getaddrinfo()`` using ``run_in_executor()``, but other implementations may choose to implement their own DNS lookup. - ``getnameinfo(sockaddr, flags=0)``. Similar to ``socket.getnameinfo()`` but returns a Future. The Future's result on success will be a tuple ``(host, port)``. Same implementation remarks as for ``getaddrinfo()``. - ``create_transport(protocol_factory, host, port, **kwargs)``. Creates a transport and a protocol and ties them together. Returns a Future whose result on success is a (transport, protocol) pair. Note that when the Future completes, the protocol's ``connection_made()`` method has not yet been called; that will happen when the connection handshake is complete. When it is impossible to connect to the given host and port, the Future will raise an exception instead. Optional keyword arguments: - ``family``, ``type``, ``proto``, ``flags``: Address familty, socket type, protcol, and miscellaneous flags to be passed through to ``getaddrinfo()``. These all default to ``0`` except ``type`` which defaults to ``socket.SOCK_STREAM``. - ``ssl``: Pass ``True`` to create an SSL transport (by default a plain TCP is created). Or pass an ``ssl.SSLContext`` object to override the default SSL context object to be used. TBD: Should this be called create_connection()? - ``start_serving(...)``. Enters a loop that accepts connections. TBD: Signature. There are two possibilities: 1. You pass it a non-blocking socket that you have already prepared with ``bind()`` and ``listen()`` (these system calls do not block AFAIK), a protocol factory (I hesitate to use this word :-), and optional flags that control the transport creation (e.g. ssl). 2. Instead of a socket, you pass it a host and port, and some more optional flags (e.g. to control IPv4 vs IPv6, or to set the backlog value to be passed to ``listen()``). In either case, once it has a socket, it will wrap it in a transport, and then enter a loop accepting connections (the best way to implement such a loop depends on the platform). Each time a connection is accepted, a transport and protocol are created for it. This should return an object that can be used to control the serving loop, e.g. to stop serving, abort all active connections, and (if supported) adjust the backlog or other parameters. It may also have an API to inquire about active connections. If version (2) is selected, it should probably return a Future whose result on success will be that control object, and which becomes done once the accept loop is started. TBD: It may be best to use version (2), since on some platforms the best way to start a server may not involve sockets (but will still involve transports and protocols). TBD: Be more specific. TBD: Some platforms may not be interested in implementing all of these, e.g. start_serving() may be of no interest to mobile apps. (Although, there's a Minecraft server on my iPad...) The following methods for registering callbacks for file descriptors are optional. If they are not implemented, accessing the method (without calling it) returns AttributeError. The default implementation provides them but the user normally doesn't use these directly -- they are used by the transport implementations exclusively. Also, on Windows these may be present or not depending on whether a select-based or IOCP-based event loop is used. These take integer file descriptors only, not objects with a fileno() method. The file descriptor should represent something pollable -- i.e. no disk files. - ``add_reader(fd, callback, *args)``. Arrange for ``callback(*args)`` to be called whenever file descriptor ``fd`` is ready for reading. Returns a ``Handler`` object which can be used to cancel the callback. Note that, unlike ``call_later()``, the callback may be called many times. Calling ``add_reader()`` again for the same file descriptor implicitly cancels the previous callback for that file descriptor. (TBD: Returning a ``Handler`` that can be cancelled seems awkward. Let's forget about that.) (TBD: Change this to raise an exception if a handler is already set.) - ``add_writer(fd, callback, *args)``. Like ``add_reader()``, but registers the callback for writing instead of for reading. - ``remove_reader(fd)``. Cancels the current read callback for file descriptor ``fd``, if one is set. A no-op if no callback is currently set for the file descriptor. (The reason for providing this alternate interface is that it is often more convenient to remember the file descriptor than to remember the ``Handler`` object.) (TBD: Return ``True`` if a handler was removed, ``False`` if not.) - ``remove_writer(fd)``. This is to ``add_writer()`` as ``remove_reader()`` is to ``add_reader()``. - ``add_connector(fd, callback, *args)``. Like ``add_writer()`` but meant to wait for ``connect()`` operations, which on some platforms require different handling (e.g. ``WSAPoll()`` on Windows). - ``remove_connector(fd)``. This is to ``remove_writer()`` as ``add_connector()`` is to ``add_writer()``. TBD: What about multiple callbacks per fd? The current semantics is that ``add_reader()/add_writer()`` replace a previously registered callback. Change this to raise an exception if a callback is already registered. The following methods for doing async I/O on sockets are optional. They are alternative to the previous set of optional methods, intended for transport implementations on Windows using IOCP (if the event loop supports it). The socket argument has to be a non-blocking socket. - ``sock_recv(sock, n)``. Receive up to ``n`` bytes from socket ``sock``. Returns a Future whose result on success will be a bytes object on success. - ``sock_sendall(sock, data)``. Send bytes ``data`` to the socket ``sock``. Returns a Future whose result on success will be ``None``. (TBD: Is it better to emulate ``sendall()`` or ``send()`` semantics? I think ``sendall()`` -- but perhaps it should still be *named* ``send()``?) - ``sock_connect(sock, address)``. Connect to the given address. Returns a Future whose result on success will be ``None``. - ``sock_accept(sock)``. Accept a connection from a socket. The socket must be in listening mode and bound to an address. Returns a Future whose result on success will be a tuple ``(conn, peer)`` where ``conn`` is a connected non-blocking socket and ``peer`` is the peer address. (TBD: People tell me that this style of API is too slow for high-volume servers. So there's also ``start_serving()`` above. Then do we still need this?) TBD: Optional methods are not so good. Perhaps these should be required? It may still depend on the platform which set is more efficient. Callback Sequencing ------------------- When two callbacks are scheduled for the same time, they are run in the order in which they are registered. For example:: ev.call_soon(foo) ev.call_soon(bar) guarantees that ``foo()`` is called before ``bar()``. If ``call_soon()`` is used, this guarantee is true even if the system clock were to run backwards. This is also the case for ``call_later(0, callback, *args)``. However, if ``call_later()`` is used with a nonzero delay, all bets are off if the system clock were to runs backwards. (A good event loop implementation should use ``time.monotonic()`` to avoid problems when the clock runs backward. See PEP 418.) Context ------- All event loops have a notion of context. For the default event loop implementation, the context is a thread. An event loop implementation should run all callbacks in the same context. An event loop implementation should run only one callback at a time, so callbacks can assume automatic mutual exclusion with other callbacks scheduled in the same event loop. Exceptions ---------- There are two categories of exceptions in Python: those that derive from the ``Exception`` class and those that derive from ``BaseException``. Exceptions deriving from ``Exception`` will generally be caught and handled appropriately; for example, they will be passed through by Futures, and they will be logged and ignored when they occur in a callback. However, exceptions deriving only from ``BaseException`` are never caught, and will usually cause the program to terminate with a traceback. (Examples of this category include ``KeyboardInterrupt`` and ``SystemExit``; it is usually unwise to treat these the same as most other exceptions.) The Handler Class ----------------- The various methods for registering callbacks (e.g. ``call_later()``) all return an object representing the registration that can be used to cancel the callback. For want of a better name this object is called a ``Handler``, although the user never needs to instantiate instances of this class. There is one public method: - ``cancel()``. Attempt to cancel the callback. TBD: Exact specification. Read-only public attributes: - ``callback``. The callback function to be called. - ``args``. The argument tuple with which to call the callback function. - ``cancelled``. True if ``cancel()`` has been called. Note that some callbacks (e.g. those registered with ``call_later()``) are meant to be called only once. Others (e.g. those registered with ``add_reader()``) are meant to be called multiple times. TBD: An API to call the callback (encapsulating the exception handling necessary)? Should it record how many times it has been called? Maybe this API should just be ``__call__()``? (But it should suppress exceptions.) TBD: Public attribute recording the realtime value when the callback is scheduled? (Since this is needed anyway for storing it in a heap.) Futures ------- The ``tulip.Future`` class here is intentionally similar to the ``concurrent.futures.Future`` class specified by PEP 3148, but there are slight differences. The supported public API is as follows, indicating the differences with PEP 3148: - ``cancel()``. TBD: Exact specification. - ``cancelled()``. - ``running()``. Note that the meaning of this method is essentially "cannot be cancelled and isn't done yet". (TBD: Would be nice if this could be set *and* cleared in some cases, e.g. sock_recv().) - ``done()``. - ``result()``. Difference with PEP 3148: This has no timeout argument and does *not* wait; if the future is not yet done, it raises an exception. - ``exception()``. Difference with PEP 3148: This has no timeout argument and does *not* wait; if the future is not yet done, it raises an exception. - ``add_done_callback(fn)``. Difference with PEP 3148: The callback is never called immediately, and always in the context of the caller. (Typically, a context is a thread.) You can think of this as calling the callback through ``call_soon_threadsafe()``. Note that the callback (unlike all other callbacks defined in this PEP, and ignoring the convention from the section "Callback Style" below) is always called with a single argument, the Future object. The internal methods defined in PEP 3148 are not supported. (TBD: Maybe we do need to support these, in order to make it easy to write user code that returns a Future?) A ``tulip.Future`` object is not acceptable to the ``wait()`` and ``as_completed()`` functions in the ``concurrent.futures`` package. A ``tulip.Future`` object is acceptable to a ``yield from`` expression when used in a coroutine. This is implemented through the ``__iter__()`` interface on the Future. See the section "Coroutines and the Scheduler" below. When a Future is garbage-collected, if it has an associated exception but neither ``result()`` nor ``exception()`` nor ``__iter__()`` has ever been called (or the latter hasn't raised the exception yet -- details TBD), the exception should be logged. TBD: At what level? In the future (pun intended) we may unify ``tulip.Future`` and ``concurrent.futures.Future``, e.g. by adding an ``__iter__()`` method to the latter that works with ``yield from``. To prevent accidentally blocking the event loop by calling e.g. ``result()`` on a Future that's not don yet, the blocking operation may detect that an event loop is active in the current thread and raise an exception instead. However the current PEP strives to have no dependencies beyond Python 3.3, so changes to ``concurrent.futures.Future`` are off the table for now. Transports ---------- A transport is an abstraction on top of a socket or something similar (for example, a UNIX pipe or an SSL connection). Transports are strongly influenced by Twisted and PEP 3153. Users rarely implement or instantiate transports -- rather, event loops offer utility methods to set up transports. Transports work in conjunction with protocols. Protocols are typically written without knowing or caring about the exact type of transport used, and transports can be used with a wide variety of protocols. For example, an HTTP client protocol implementation may be used with either a plain socket transport or an SSL transport. The plain socket transport can be used with many different protocols besides HTTP (e.g. SMTP, IMAP, POP, FTP, IRC, SPDY). Most connections have an asymmetric nature: the client and server usually have very different roles and behaviors. Hence, the interface between transport and protocol is also asymmetric. From the protocol's point of view, *writing* data is done by calling the ``write()`` method on the transport object; this buffers the data and returns immediately. However, the transport takes a more active role in *reading* data: whenever some data is read from the socket (or other data source), the transport calls the protocol's ``data_received()`` method. Transports have the following public methods: - ``write(data)``. Write some bytes. The argument must be a bytes object. Returns ``None``. The transport is free to buffer the bytes, but it must eventually cause the bytes to be transferred to the entity at the other end, and it must maintain stream behavior. That is, ``t.write(b'abc'); t.write(b'def')`` is equivalent to ``t.write(b'abcdef')``, as well as to:: t.write(b'a') t.write(b'b') t.write(b'c') t.write(b'd') t.write(b'e') t.write(b'f') - ``writelines(iterable)``. Equivalent to:: for data in iterable: self.write(data) - ``write_eof()``. Close the writing end of the connection. Subsequent calls to ``write()`` are not allowed. Once all buffered data is transferred, the transport signals to the other end that no more data will be received. Some protocols don't support this operation; in that case, calling ``write_eof()`` will raise an exception. (Note: This used to be called ``half_close()``, but unless you already know what it is for, that name doesn't indicate *which* end is closed.) - ``can_write_eof()``. Return ``True`` if the protocol supports ``write_eof()``, ``False`` if it does not. (This method is needed because some protocols need to change their behavior when ``write_eof()`` is unavailable. For example, in HTTP, to send data whose size is not known ahead of time, the end of the data is typically indicated using ``write_eof()``; however, SSL does not support this, and an HTTP protocol implementation would have to use the "chunked" transfer encoding in this case. But if the data size is known ahead of time, the best approach in both cases is to use the Content-Length header.) - ``pause()``. Suspend delivery of data to the protocol until a subsequent ``resume()`` call. Between ``pause()`` and ``resume()``, the protocol's ``data_received()`` method will not be called. This has no effect on ``write()``. - ``resume()``. Restart delivery of data to the protocol via ``data_received()``. - ``close()``. Sever the connection with the entity at the other end. Any data buffered by ``write()`` will (eventually) be transferred before the connection is actually closed. The protocol's ``data_received()`` method will not be called again. Once all buffered data has been flushed, the protocol's ``connection_lost()`` method will be called with ``None`` as the argument. Note that this method does not wait for all that to happen. - ``abort()``. Immediately sever the connection. Any data still buffered by the transport is thrown away. Soon, the protocol's ``connection_lost()`` method will be called with ``None`` as argument. (TBD: Distinguish in the ``connection_lost()`` argument between ``close()``, ``abort()`` or a close initated by the other end? Or add a transport method to inquire about this? Glyph's proposal was to pass different exceptions for this purpose.) TBD: Provide flow control the other way -- the transport may need to suspend the protocol if the amount of data buffered becomes a burden. Proposal: let the transport call ``protocol.pause()`` and ``protocol.resume()`` if they exist; if they don't exist, the protocol doesn't support flow control. (Perhaps different names to avoid confusion between protocols and transports?) Protocols --------- Protocols are always used in conjunction with transports. While a few common protocols are provided (e.g. decent though not necessarily excellent HTTP client and server implementations), most protocols will be implemented by user code or third-party libraries. A protocol must implement the following methods, which will be called by the transport. Consider these callbacks that are always called by the event loop in the right context. (See the "Context" section above.) - ``connection_made(transport)``. Indicates that the transport is ready and connected to the entity at the other end. The protocol should probably save the transport reference as an instance variable (so it can call its ``write()`` and other methods later), and may write an initial greeting or request at this point. - ``data_received(data)``. The transport has read some bytes from the connection. The argument is always a non-empty bytes object. There are no guarantees about the minimum or maximum size of the data passed along this way. ``p.data_received(b'abcdef')`` should be treated exactly equivalent to:: p.data_received(b'abc') p.data_received(b'def') - ``eof_received()``. This is called when the other end called ``write_eof()`` (or something equivalent). The default implementation calls ``close()`` on the transport, which causes ``connection_lost()`` to be called (eventually) on the protocol. - ``connection_lost(exc)``. The transport has been closed or aborted, has detected that the other end has closed the connection cleanly, or has encountered an unexpected error. In the first three cases the argument is ``None``; for an unexpected error, the argument is the exception that caused the transport to give up. (TBD: Do we need to distinguish between the first three cases?) Here is a chart indicating the order and multiplicity of calls: 1. ``connection_made()`` -- exactly once 2. ``data_received()`` -- zero or more times 3. ``eof_received()`` -- at most once 4. ``connection_lost()`` -- exactly once TBD: Discuss whether user code needs to do anything to make sure that protocol and transport aren't garbage-collected prematurely. Callback Style -------------- Most interfaces taking a callback also take positional arguments. For instance, to arrange for ``foo("abc", 42)`` to be called soon, you call ``ev.call_soon(foo, "abc", 42)``. To schedule the call ``foo()``, use ``ev.call_soon(foo)``. This convention greatly reduces the number of small lambdas required in typical callback programming. This convention specifically does *not* support keyword arguments. Keyword arguments are used to pass optional extra information about the callback. This allows graceful evolution of the API without having to worry about whether a keyword might be significant to a callee somewhere. If you have a callback that *must* be called with a keyword argument, you can use a lambda or ``functools.partial``. For example:: ev.call_soon(functools.partial(foo, "abc", repeat=42)) Choosing an Event Loop Implementation ------------------------------------- TBD. (This is about the choice to use e.g. select vs. poll vs. epoll, and how to override the choice. Probably belongs in the event loop policy.) Coroutines and the Scheduler ============================ This is a separate toplevel section because its status is different from the event loop interface. Usage of coroutines is optional, and it is perfectly fine to write code using callbacks only. On the other hand, there is only one implementation of the scheduler/coroutine API, and if you're using coroutines, that's the one you're using. Coroutines ---------- A coroutine is a generator that follows certain conventions. For documentation purposes, all coroutines should be decorated with ``@tulip.coroutine``, but this cannot be strictly enforced. Coroutines use the ``yield from`` syntax introduced in PEP 380, instead of the original ``yield`` syntax. The word "coroutine", like the word "generator", is used for two different (though related) concepts: - The function that defines a coroutine (a function definition decorated with ``tulip.coroutine``). If disambiguation is needed, we call this a *coroutine function*. - The object obtained by calling a coroutine function. This object represents a computation or an I/O operation (usually a combination) that will complete eventually. For disambiguation we call it a *coroutine object*. Things a coroutine can do: - ``result = yield from future`` -- suspends the coroutine until the future is done, then returns the future's result, or raises its exception, which will be propagated. - ``result = yield from coroutine`` -- wait for another coroutine to produce a result (or raise an exception, which will be propagated). The ``coroutine`` expression must be a *call* to another coroutine. - ``results = yield from tulip.par(futures_and_coroutines)`` -- Wait for a list of futures and/or coroutines to complete and return a list of their results. If one of the futures or coroutines raises an exception, that exception is propagated, after attempting to cancel all other futures and coroutines in the list. - ``return result`` -- produce a result to the coroutine that is waiting for this one using ``yield from``. - ``raise exception`` -- raise an exception in the coroutine that is waiting for this one using ``yield from``. Calling a coroutine does not start its code running -- it is just a generator, and the coroutine object returned by the call is really a generator object, which doesn't do anything until you iterate over it. In the case of a coroutine object, there are two basic ways to start it running: call ``yield from coroutine`` from another coroutine (assuming the other coroutine is already running!), or convert it to a Task. Coroutines can only run when the event loop is running. Tasks ----- A Task is an object that manages an independently running coroutine. The Task interface is the same as the Future interface. The task becomes done when its coroutine returns or raises an exception; if it returns a result, that becomes the task's result, if it raises an exception, that becomes the task's exception. Cancelling a task that's not done yet prevents its coroutine from completing; in this case an exception is thrown into the coroutine that it may catch to further handle cancellation, but it doesn't have to (this is done using the standard ``close()`` method on generators, described in PEP 342). The ``par()`` function described above runs coroutines in parallel by converting them to Tasks. (Arguments that are already Tasks or Futures are not converted.) Tasks are also useful for interoperating between coroutines and callback-based frameworks like Twisted. After converting a coroutine into a Task, callbacks can be added to the Task. You may ask, why not convert all coroutines to Tasks? The ``@tulip.coroutine`` decorator could do this. This would slow things down considerably in the case where one coroutine calls another (and so on), as switching to a "bare" coroutine has much less overhead than switching to a Task. The Scheduler ------------- The scheduler has no public interface. You interact with it by using ``yield from future`` and ``yield from task``. In fact, there is no single object representing the scheduler -- its behavior is implemented by the ``Task`` and ``Future`` classes using only the public interface of the event loop, so it will work with third-party event loop implementations, too. Sleeping -------- TBD: ``yield sleep(seconds)``. Can use ``sleep(0)`` to suspend to poll for I/O. Wait for First -------------- TBD: Need an interface to wait for the first of a collection of Futures. Coroutines and Protocols ------------------------ The best way to use coroutines to implement protocols is probably to use a streaming buffer that gets filled by ``data_received()`` and can be read asynchronously using methods like ``read(n)`` and ``readline()`` that return a Future. When the connection is closed, ``read()`` should return a Future whose result is ``b''``, or raise an exception if ``connection_closed()`` is called with an exception. To write, the ``write()`` method (and friends) on the transport can be used -- these do not return Futures. A standard protocol implementation should be provided that sets this up and kicks off the coroutine when ``connection_made()`` is called. TBD: Be more specific. Cancellation ------------ TBD. When a Task is cancelled its coroutine may see an exception at any point where it is yielding to the scheduler (i.e., potentially at any ``yield from`` operation). We need to spell out which exception is raised. Also TBD: timeouts. Open Issues =========== - A debugging API? E.g. something that logs a lot of stuff, or logs unusual conditions (like queues filling up faster than they drain) or even callbacks taking too much time... - Do we need introspection APIs? E.g. asking for the read callback given a file descriptor. Or when the next scheduled call is. Or the list of file descriptors registered with callbacks. - Should we have ``future.add_callback(callback, *args)``, using the convention from the section "Callback Style" above, or should we stick with the PEP 3148 specification of ``future.add_done_callback(callback)`` which calls ``callback(future)``? (Glyph suggested using a different method name since add_done_callback() does not guarantee that the callback will be called in the right context.) - Returning a Future is relatively expensive, and it is quite possible that some types of calls *usually* complete immediately (e.g. writing small amounts of data to a socket). A trick used by Richard Oudkerk in the tulip project's proactor branch makes calls like recv() either return a regular result or *raise* a Future. The caller (likely a transport) must then write code like this:: try: res = ev.sock_recv(sock, 8192) except Future as f: yield from sch.block_future(f) res = f.result() - Do we need a larger vocabulary of operations for combining coroutines and/or futures? E.g. in addition to par() we could have a way to run several coroutines sequentially (returning all results or passing the result of one to the next and returning the final result?). We might also introduce explicit locks (though these will be a bit of a pain to use, as we can't use the ``with lock: block`` syntax). Anyway, I think all of these are easy enough to write using ``Task``. Proposal: ``f = yield from wait_one(fs)`` takes a set of Futures and sets f to the first of those that is done. (Yes, this requires an intermediate Future to wait for.) You can then write:: while fs: f = tulip.wait_one(fs) fs.remove(f) - Support for datagram protocols, "connected" or otherwise? Probably need more socket I/O methods, e.g. ``sock_sendto()`` and ``sock_recvfrom()``. Or users can write their own (it's not rocket science). Is it reasonable to map ``write()``, ``writelines()``, ``data_received()`` to single datagrams? - Task or callback priorities? (I hope not.) - An EventEmitter in the style of NodeJS? Or make this a separate PEP? It's easy enough to do in user space, though it may benefit from standardization. (See https://github.com/mnot/thor/blob/master/thor/events.py and https://github.com/mnot/thor/blob/master/doc/events.md for examples.) Acknowledgments =============== Apart from PEP 3153, influences include PEP 380 and Greg Ewing's tutorial for ``yield from``, Twisted, Tornado, ZeroMQ, pyftpdlib, tulip (the author's attempts at synthesis of all these), wattle (Steve Dower's counter-proposal), numerous discussions on python-ideas from September through December 2012, a Skype session with Steve Dower and Dino Viehland, email exchanges with Ben Darnell, an audience with Niels Provos (original author of libevent), and two in-person meetings with several Twisted developers, including Glyph, Brian Warner, David Reid, and Duncan McGreggor. Also, the author's previous work on async support in the NDB library for Google App Engine was an important influence. Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: From _ at lvh.cc Fri Dec 21 22:04:04 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Fri, 21 Dec 2012 22:04:04 +0100 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: Looks reasonable to me :) Comments: create_transport "combines" a transport and a protocol. Is that process reversible? that might seem like an exotic thing (and I guess it kind of is), but I've wanted this e.g for websockets, and I guess there's a few other cases where it could be useful :) eof_received on protocols seems unusual. What's the rationale? I know we disagree that callbacks (of the line_received variety) are a good idea for blocking IO (I think we should have universal protocol implementations), but can we agree that they're what we want for tulip? If so, I can try to figure out a way to get them to fit together :) I'm assuming that this means you'd like protocols and transports in this PEP? A generic comment on yield from APIs that I'm sure has been discussed in some e-mail I missed: is there an obvious way to know up front whether something needs to be yielded or yield frommed? In twisted, which is what I'm used to it's all deferreds; but here a future's yield from but sleep's yield? Will comment more as I keep reading I'm sure :) On Fri, Dec 21, 2012 at 8:09 PM, Guido van Rossum wrote: > On Fri, Dec 21, 2012 at 11:06 AM, Jesse Noller wrote: > > I really do like tulip as the name. It's quite pretty. > > I chose it because Twisted and Tornado both start with T. But those > have kind of dark associations; I wanted to offset that with something > lighter. (OTOH we could use a black tulip as a logo. :-) > > Regardless, it's not the kind of name we tend to use for the stdlib. > It'll probably end up being asynclib or something... > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Fri Dec 21 22:59:23 2012 From: geertj at gmail.com (Geert Jansen) Date: Fri, 21 Dec 2012 22:59:23 +0100 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 4:57 PM, Guido van Rossum wrote: > This is a fine place, but you would make my life even easier by > uploading the patches to codereview.appspot.com, so I can review them > and send comments in-line. I tried to get Tulip added as a new repository there, but i'm probably doing something wrong.. In the mean time i'm sending my updated patches below.. > I've given you checkin permissions. Please send a contributor form to > the PSF (http://www.python.org/psf/contrib/contrib-form/). Done! >> 0001-run-fd-callbacks.patch [...] > Interesting. Go ahead and submit. [from your other email] > Whoa! I just figured out the problem. You don't have to run the ready > queue twice. You just have to set the poll timeout to 0 if there's > anything in the ready queue. Please send me an updated patch before > submitting. New patch attached. >> 0002-call-every-iteration.patch [...] > There's one odd thing here: you remove cancelled everytime handlers > *after* already scheduling them. It would seem to make more sense to > schedule them first. Also, a faster way to do this would be > > self._everytime = [handler in self._everytime if not handler.cancelled] > > (Even if you iterate from the back, remove() is still O(N), so if half > the handlers are to be removed, your original code would be O(N**2).) ACK regarding the comment on O(N^2). The reason i implemented it like this is that i didn't want to regenerate the list at every iteration of the loop (maybe i'm unduly worried though...). The attached patch does as you suggest but only in case there are cancelled handlers. > PS. If you want to set up a mailing list or other cleverness I can set > you up as a project admin. (I currently have all patches mailed to me > but we may want to set up a separate list for that.) I'm happy to be an admin and set up a Google Groups for this. On the other hand, tulip is supposed to become part of the standard library, right? Maybe python-dev is as a good place to discuss tulip? Your call.. I'll go ahead and commit the two trivial patches, and wait for your ACK on the updated versions of the other two. Regards, Geert -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-run-fd-callbacks-v2.patch Type: application/octet-stream Size: 3151 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 0002-call-every-iteration-v2.patch Type: application/octet-stream Size: 2791 bytes Desc: not available URL: From jonathan at slenders.be Fri Dec 21 23:26:09 2012 From: jonathan at slenders.be (Jonathan Slenders) Date: Fri, 21 Dec 2012 23:26:09 +0100 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: As far as I understand, "yield from" will always work, because a Future object can act like an iterator, and you can delegate your own generator to this iterator at the place of "yield from". "yield" only works if the parameter behind yield is already a Future object. Right Guido? In case of sleep, sleep could be implemented to return a Future object. 2012/12/21 Laurens Van Houtven <_ at lvh.cc> > A generic comment on yield from APIs that I'm sure has been discussed in > some e-mail I missed: is there an obvious way to know up front whether > something needs to be yielded or yield frommed? In twisted, which is what > I'm used to it's all deferreds; but here a future's yield from but sleep's > yield? > > > > On Fri, Dec 21, 2012 at 8:09 PM, Guido van Rossum wrote: > >> On Fri, Dec 21, 2012 at 11:06 AM, Jesse Noller wrote: >> > I really do like tulip as the name. It's quite pretty. >> >> I chose it because Twisted and Tornado both start with T. But those >> have kind of dark associations; I wanted to offset that with something >> lighter. (OTOH we could use a black tulip as a logo. :-) >> >> Regardless, it's not the kind of name we tend to use for the stdlib. >> It'll probably end up being asynclib or something... >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > cheers > lvh > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jonathan at slenders.be Fri Dec 21 23:21:05 2012 From: jonathan at slenders.be (Jonathan Slenders) Date: Fri, 21 Dec 2012 23:21:05 +0100 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: Just read through the PEP3156. It's interesting to see. (I had no idea that yield from would return the result of the generator. It's clever, given that at this point it behaves different than a normal 'yield'.) One question. Why does @coroutine not convert the generator into a Future object right away? Just like @defer.inlineCallbacks in Twisted. This has the advantage that calling the function would simply start the coroutine. The point of my 'await' experiment was that I could do the following: >>> def do_something(): >>> result = await "query" # Query could be a Task object. >>> return result >>> do_something() Task('do_something') # (And there it starts executing) It's very personal, but I find it nicer to see the name of the called function as a Future instead of seeing a generator. Technically, coroutines and generators may be the same, but normally you wouldn't write a for-loop over a coroutine, and you can't make a Future of -say- an xrange-generator. And when not calling from another coroutine (like from the global scope during start-up), it's also a little more work to turn the generator into a Future every time. Here, "await" does what "yield" does. If you automatically turn coroutines into a Future object when calling, you'll never need a "yield from" in this case. I agree that "await" would be redundant, but somehow, if we had a hint to the interpreter that it would turn generator functions into Future objects during calling, that would be nice. I'm happy to get convinced otherwise. :) Jonathan 2012/12/21 Jonathan Slenders > Thank you, Guido! I didn't know about this PEP, but it looks interesting. > I'll try to find some spare time this weekend to read through the PEP, > maybe giving some feedback. > > Cheers! > > > > 2012/12/21 Guido van Rossum > >> On Thu, Dec 20, 2012 at 3:34 PM, Jonathan Slenders >> wrote: >> > So, the difference is still that the "await" proposal makes the @async >> > decorator implicit. I'm still in favor of this because in asynchronous >> code, >> > you can have really many functions with this decorator. And if someone >> > forgets about that, getting a generator object instead of a Future is >> quite >> > different in semantics. >> >> Carefully read PEP 3156, and the tulip implementation: >> http://code.google.com/p/tulip/source/browse/tulip/tasks.py . The >> @coroutine decorator is technically redundant when you use yield from. >> >> -- >> --Guido van Rossum (python.org/~guido) >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 22 00:07:51 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Dec 2012 09:07:51 +1000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: We were tentatively calling it "concurrent.eventloop" at the 2011 language summit. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 22 01:45:53 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 16:45:53 -0800 Subject: [Python-ideas] Tulip patches In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 1:59 PM, Geert Jansen wrote: > On Fri, Dec 21, 2012 at 4:57 PM, Guido van Rossum wrote: > >> This is a fine place, but you would make my life even easier by >> uploading the patches to codereview.appspot.com, so I can review them >> and send comments in-line. > > I tried to get Tulip added as a new repository there, but i'm probably > doing something wrong.. In the mean time i'm sending my updated > patches below.. Yeah, sorry, the upload form is not to be used. You should use the upload.py utility instead: https://codereview.appspot.com/static/upload.py >> I've given you checkin permissions. Please send a contributor form to >> the PSF (http://www.python.org/psf/contrib/contrib-form/). > > Done! > >>> 0001-run-fd-callbacks.patch > [...] >> Interesting. Go ahead and submit. > [from your other email] >> Whoa! I just figured out the problem. You don't have to run the ready >> queue twice. You just have to set the poll timeout to 0 if there's >> anything in the ready queue. Please send me an updated patch before >> submitting. > > New patch attached. Looks good to me. Check it in! >>> 0002-call-every-iteration.patch > [...] >> There's one odd thing here: you remove cancelled everytime handlers >> *after* already scheduling them. It would seem to make more sense to >> schedule them first. Also, a faster way to do this would be >> >> self._everytime = [handler in self._everytime if not handler.cancelled] >> >> (Even if you iterate from the back, remove() is still O(N), so if half >> the handlers are to be removed, your original code would be O(N**2).) > > ACK regarding the comment on O(N^2). The reason i implemented it like > this is that i didn't want to regenerate the list at every iteration > of the loop (maybe i'm unduly worried though...). The attached patch > does as you suggest but only in case there are cancelled handlers. LG, except: - Maybe rename 'cancelled' to 'any_cancelled'. - PEP 8 conformance: [foo bar], not [ foo bar ]. You can check it in after fixing those issues. >> PS. If you want to set up a mailing list or other cleverness I can set >> you up as a project admin. (I currently have all patches mailed to me >> but we may want to set up a separate list for that.) > > I'm happy to be an admin and set up a Google Groups for this. Made you an admin. Go ahead. > On the > other hand, tulip is supposed to become part of the standard library, > right? Maybe python-dev is as a good place to discuss tulip? Your > call.. I think it's too soon to flood python-dev with every little detail (though I just did post there about the PEP). > I'll go ahead and commit the two trivial patches, and wait for your > ACK on the updated versions of the other two. Thanks! -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 01:50:06 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 16:50:06 -0800 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 2:21 PM, Jonathan Slenders wrote: > Just read through the PEP3156. > It's interesting to see. (I had no idea that yield from would return the > result of the generator. It's clever, given that at this point it behaves > different than a normal 'yield'.) > > One question. Why does @coroutine not convert the generator into a Future > object right away? Because once it is a Future, the scheduler has to get involved every time it yields, even if the yield doesn't do any I/O but just transfers control to a "subroutine". This is hard to get your head around, but it is worth it. > Just like @defer.inlineCallbacks in Twisted. This has the advantage that > calling the function would simply start the coroutine. But it would be much slower. > The point of my 'await' experiment was that I could do the following: > > >>>> def do_something(): >>>> result = await "query" # Query could be a Task object. >>>> return result > >>>> do_something() > Task('do_something') > > # (And there it starts executing) Yeah, and the same works with yield from in TUlip. The @coroutine decorator is not needed. > It's very personal, but I find it nicer to see the name of the called > function as a Future instead of seeing a generator. Technically, coroutines > and generators may be the same, but normally you wouldn't write a for-loop > over a coroutine, and you can't make a Future of -say- an xrange-generator. > And when not calling from another coroutine (like from the global scope > during start-up), it's also a little more work to turn the generator into a > Future every time. > > Here, "await" does what "yield" does. If you automatically turn coroutines > into a Future object when calling, you'll never need a "yield from" in this > case. I agree that "await" would be redundant, but somehow, if we had a hint > to the interpreter that it would turn generator functions into Future > objects during calling, that would be nice. > > I'm happy to get convinced otherwise. :) It's water under the bridge. We have PEP 380 in Python 3.3. I don't want to change the language again in 3.4. Maybe after that we can reconsider. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 02:02:09 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 17:02:09 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 1:04 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > Looks reasonable to me :) Comments: > > create_transport "combines" a transport and a protocol. Is that process > reversible? that might seem like an exotic thing (and I guess it kind of > is), but I've wanted this e.g for websockets, and I guess there's a few > other cases where it could be useful :) If you really need this, it's probably best to start out doing this as a nonstandard extension of an implementation. The current *implementation* makes it simple enough, but I don't think it's worth complicating the PEP. Working code might convince me otherwise. > eof_received on protocols seems unusual. What's the rationale? Well how else would you indicate that the other end did a half-close (in Twisted terminology)? You can't call connection_lost() because you might still want to write more. E.g. this is how HTTP servers work if there's no Content-length or chunked encoding on a request body: they read until EOF, then do their thing and write the response. > I know we disagree that callbacks (of the line_received variety) are a good > idea for blocking IO (I think we should have universal protocol > implementations), but can we agree that they're what we want for tulip? If > so, I can try to figure out a way to get them to fit together :) I'm > assuming that this means you'd like protocols and transports in this PEP? Sorry, I have no idea what you're talking about. Can you clarify? I do know that the PEP is weakest in specifying how a coroutine can implement a transport. However my plans are clear: ild the old tulip code there's a BufferedReader; somehow the coroutine will receive a "stdin" and a "stdout" where the "stdin" is a BufferedReader, which has methods like read(), readline() etc. which return Futures and must be invoked using yield from; and "stdout" is a transport, which has write() and friends that don't return anything but just buffer stuff and start the I/O asynchronous (and may try to slow down the protocol by calling its pause() method). > A generic comment on yield from APIs that I'm sure has been discussed in > some e-mail I missed: is there an obvious way to know up front whether > something needs to be yielded or yield frommed? In twisted, which is what > I'm used to it's all deferreds; but here a future's yield from but sleep's > yield? In PEP 3156 conformant code you're supposed always to use 'yield from'. The only time you see a bare yield is when it's part of the implementation's internals. (However I think tulip actually will handle a yield the same way as a yield from, except that it's slower because it makes a roundtrip to the scheduler, a.k.a. trampoline.) > Will comment more as I keep reading I'm sure :) Please do! -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 02:03:26 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 17:03:26 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 2:26 PM, Jonathan Slenders wrote: > As far as I understand, "yield from" will always work, because a Future > object can act like an iterator, and you can delegate your own generator to > this iterator at the place of "yield from". > "yield" only works if the parameter behind yield is already a Future object. > Right Guido? Correct! Sounds like you got it now. That's the magic of yield from.. > In case of sleep, sleep could be implemented to return a Future object. It does; in tulip/futures.py: def sleep(when, result=None): future = Future() future._event_loop.call_later(when, future.set_result, result) return future -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Sat Dec 22 02:13:48 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 21 Dec 2012 20:13:48 -0500 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 7:50 PM, Guido van Rossum wrote: > > It's water under the bridge. We have PEP 380 in Python 3.3. I don't > want to change the language again in 3.4. Maybe after that we can > reconsider. One thing I'll say is that I think the coroutine decorator should convert something like: @coroutine def blah(): return "result" into the generator equivalent. You can do a syntax hack with: @coroutine def blah(): if 0: yield return "result" but that feels bad. This sort of bug may seem unlikely, but a user may hit it if they're commenting out code, Maybe a generic @force_generator decorator might be useful... -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 22 02:16:12 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 17:16:12 -0800 Subject: [Python-ideas] An async facade? In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 5:13 PM, Jasper St. Pierre wrote: > > > On Fri, Dec 21, 2012 at 7:50 PM, Guido van Rossum wrote: >> >> >> It's water under the bridge. We have PEP 380 in Python 3.3. I don't >> want to change the language again in 3.4. Maybe after that we can >> reconsider. > > > One thing I'll say is that I think the coroutine decorator should convert > something like: > > @coroutine > def blah(): > return "result" > > into the generator equivalent. There's a tiny part of me that says that this might hide some bugs. But mostly I agree and not doing it might make certain changes harder. I did this in NDB too. > You can do a syntax hack with: > > @coroutine > def blah(): > if 0: yield > return "result" > > but that feels bad. This sort of bug may seem unlikely, but a user may hit > it if they're commenting out code, Right. > Maybe a generic @force_generator decorator might be useful... That would be somebody else's PEP. :-) -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Sat Dec 22 02:17:16 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 21 Dec 2012 20:17:16 -0500 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 8:02 PM, Guido van Rossum wrote: ... snip ... In PEP 3156 conformant code you're supposed always to use 'yield > from'. The only time you see a bare yield is when it's part of the > implementation's internals. (However I think tulip actually will > handle a yield the same way as a yield from, except that it's slower > because it makes a roundtrip to the scheduler, a.k.a. trampoline.) > Would it be possible to fail on "yield"? Silently being slower when you forget to type a keyword is something I can imagine will creep up a lot by mistake, and I don't think it's a good idea to silently be slower when the only different is five more characters. > Will comment more as I keep reading I'm sure :) > > Please do! > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Dec 22 02:24:15 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 17:24:15 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 5:17 PM, Jasper St. Pierre wrote: > On Fri, Dec 21, 2012 at 8:02 PM, Guido van Rossum wrote: > > ... snip ... > >> In PEP 3156 conformant code you're supposed always to use 'yield >> from'. The only time you see a bare yield is when it's part of the >> implementation's internals. (However I think tulip actually will >> handle a yield the same way as a yield from, except that it's slower >> because it makes a roundtrip to the scheduler, a.k.a. trampoline.) > > > Would it be possible to fail on "yield"? Silently being slower when you > forget to type a keyword is something I can imagine will creep up a lot by > mistake, and I don't think it's a good idea to silently be slower when the > only different is five more characters. That's also a possibility. If someone can figure out a patch that would be great. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sat Dec 22 05:46:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Dec 2012 14:46:39 +1000 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait Message-ID: I figure python-ideas is still the best place for PEP 3156 feedback - I think it's being revised too heavily for in-depth discussion on python-dev to be a good idea, and I think spinning out a separate list would lose too many people that are interested-but-not-enough-to-subscribe-to-yet-another-mailing-list (including me). The current draft of the PEP suggests the use of par() for the barrier operation (waiting for all futures and coroutines in a collection to be ready), while tentatively suggesting wait_one() as the API for waiting for the first completed operation in a collection. That inconsistency is questionable all by itself, but there's a greater stdlib level inconsistency that I find more concerning The corresponding blocking API in concurrent.futures is the module level "wait" function, which accepts a "return_when" parameter, with the permitted values FIRST_COMPLETED, FIRST_EXCEPTION and ALL_COMPLETED (the default). In the case where everything succeeds, FIRST_EXCEPTION is the same as ALL_COMPLETED. This function also accepts a timeout which allows the operation to finish early if the operations take too long. This flexibility also leads to a difference in the structure of the return type: concurrent.futures.wait always returns a pair of sets, with the first set being those futures which completed, while the second contains those which remaining incomplete at the time the call returned. It seems to me that this "wait" API can be applied directly to the equivalent problems in the async space, and, accordingly, *should* be applied so that the synchronous and asynchronous APIs remain as consistent as possible. The low level equivalent to par() would be: incomplete = complete, incomplete = yield from tulip.wait(incomplete) assert not incomplete # Without a timeout, everything should complete for f in complete: # Handle the completed operations Limiting the maximum execution time of any task to 10 seconds is straightforward: incomplete = complete, incomplete = yield from tulip.wait(incomplete, timeout=10) for f in incomplete: f.cancel() # Took too long, kill it for f in complete: # Handle the completed operations The low level equivalent to the wait_one() example would become: incomplete = while incomplete: complete, incomplete = yield from tulip.wait(incomplete, return_when=FIRST_COMPLETED) for f in complete: # Handle the completed operations par() becomes easy to define as a coroutine: @coroutine def par(fs): complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_EXCEPTION) for f in incomplete: f.cancel() # Something must have failed, so cancel the rest # If something failed, calling f.result() will raise that exception return [f.result() for f in complete] Defining wait_one() is also straightforward (although it isn't clearly superior to just using the underlying API directly): @coroutine def wait_one(fs): complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop() The async equivalent to "as_completed" under this scheme is far more interesting, as it would be an iterator that produces coroutines: def as_completed(fs): incomplete = fs while incomplete: # Phase 1 of the loop, we yield a coroutine that actually starts operations running @coroutine def _wait_for_some(): nonlocal complete, incomplete complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop().result() yield _wait_for_some() # Phase 2 of the loop, we pass back the already complete operations while complete: # Note this use case for @coroutine *forcing* objects to behave like a generator, # as well as exploiting the ability to avoid trips around the event loop @coroutine def _next_result(): return complete.pop().result() yield _next_result() # This is almost as easy to use as the synchronous equivalent, the only difference # is the use of "yield from f" instead of the synchronous "f.result()" for f in as_completed(fs): next = yield from f Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Sat Dec 22 06:17:07 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 21:17:07 -0800 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 8:46 PM, Nick Coghlan wrote: > I figure python-ideas is still the best place for PEP 3156 feedback - > I think it's being revised too heavily for in-depth discussion on > python-dev to be a good idea, and I think spinning out a separate list > would lose too many people that are > interested-but-not-enough-to-subscribe-to-yet-another-mailing-list > (including me). > > The current draft of the PEP suggests the use of par() for the barrier > operation (waiting for all futures and coroutines in a collection to > be ready), while tentatively suggesting wait_one() as the API for > waiting for the first completed operation in a collection. That > inconsistency is questionable all by itself, but there's a greater > stdlib level inconsistency that I find more concerning > > The corresponding blocking API in concurrent.futures is the module > level "wait" function, which accepts a "return_when" parameter, with > the permitted values FIRST_COMPLETED, FIRST_EXCEPTION and > ALL_COMPLETED (the default). In the case where everything succeeds, > FIRST_EXCEPTION is the same as ALL_COMPLETED. This function also > accepts a timeout which allows the operation to finish early if the > operations take too long. > > This flexibility also leads to a difference in the structure of the > return type: concurrent.futures.wait always returns a pair of sets, > with the first set being those futures which completed, while the > second contains those which remaining incomplete at the time the call > returned. > > It seems to me that this "wait" API can be applied directly to the > equivalent problems in the async space, and, accordingly, *should* be > applied so that the synchronous and asynchronous APIs remain as > consistent as possible. You've convinced me. I've never used the wait() and as_completed() APIs in c.f, but you're right that with the exception of requiring 'yield from' they can be carried over exactly, and given that we're doing the same thing with Future, this is eminently reasonable. I may not get to implementing these for two weeks (I'll be traveling without a computer) but they will not be forgotten. --Guido > The low level equivalent to par() would be: > > incomplete = > complete, incomplete = yield from tulip.wait(incomplete) > assert not incomplete # Without a timeout, everything should complete > for f in complete: > # Handle the completed operations > > Limiting the maximum execution time of any task to 10 seconds is > straightforward: > > incomplete = > complete, incomplete = yield from tulip.wait(incomplete, timeout=10) > for f in incomplete: > f.cancel() # Took too long, kill it > for f in complete: > # Handle the completed operations > > The low level equivalent to the wait_one() example would become: > > incomplete = > while incomplete: > complete, incomplete = yield from tulip.wait(incomplete, > return_when=FIRST_COMPLETED) > for f in complete: > # Handle the completed operations > > par() becomes easy to define as a coroutine: > > @coroutine > def par(fs): > complete, incomplete = yield from tulip.wait(fs, > return_when=FIRST_EXCEPTION) > for f in incomplete: > f.cancel() # Something must have failed, so cancel the rest > # If something failed, calling f.result() will raise that exception > return [f.result() for f in complete] > > Defining wait_one() is also straightforward (although it isn't clearly > superior to just > using the underlying API directly): > > @coroutine > def wait_one(fs): > complete, incomplete = yield from tulip.wait(fs, > return_when=FIRST_COMPLETED) > return complete.pop() > > The async equivalent to "as_completed" under this scheme is far more > interesting, as it would be an iterator that produces coroutines: > > def as_completed(fs): > incomplete = fs > while incomplete: > # Phase 1 of the loop, we yield a coroutine that actually > starts operations running > @coroutine > def _wait_for_some(): > nonlocal complete, incomplete > complete, incomplete = yield from tulip.wait(fs, > return_when=FIRST_COMPLETED) > return complete.pop().result() > yield _wait_for_some() > # Phase 2 of the loop, we pass back the already complete operations > while complete: > # Note this use case for @coroutine *forcing* objects > to behave like a generator, > # as well as exploiting the ability to avoid trips > around the event loop > @coroutine > def _next_result(): > return complete.pop().result() > yield _next_result() > > # This is almost as easy to use as the synchronous equivalent, the > only difference > # is the use of "yield from f" instead of the synchronous "f.result()" > for f in as_completed(fs): > next = yield from f > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 07:20:12 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 21 Dec 2012 22:20:12 -0800 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: On Fri, Dec 21, 2012 at 9:17 PM, Guido van Rossum wrote: > On Fri, Dec 21, 2012 at 8:46 PM, Nick Coghlan wrote: >> I figure python-ideas is still the best place for PEP 3156 feedback - >> I think it's being revised too heavily for in-depth discussion on >> python-dev to be a good idea, and I think spinning out a separate list >> would lose too many people that are >> interested-but-not-enough-to-subscribe-to-yet-another-mailing-list >> (including me). >> >> The current draft of the PEP suggests the use of par() for the barrier >> operation (waiting for all futures and coroutines in a collection to >> be ready), while tentatively suggesting wait_one() as the API for >> waiting for the first completed operation in a collection. That >> inconsistency is questionable all by itself, but there's a greater >> stdlib level inconsistency that I find more concerning >> >> The corresponding blocking API in concurrent.futures is the module >> level "wait" function, which accepts a "return_when" parameter, with >> the permitted values FIRST_COMPLETED, FIRST_EXCEPTION and >> ALL_COMPLETED (the default). In the case where everything succeeds, >> FIRST_EXCEPTION is the same as ALL_COMPLETED. This function also >> accepts a timeout which allows the operation to finish early if the >> operations take too long. >> >> This flexibility also leads to a difference in the structure of the >> return type: concurrent.futures.wait always returns a pair of sets, >> with the first set being those futures which completed, while the >> second contains those which remaining incomplete at the time the call >> returned. >> >> It seems to me that this "wait" API can be applied directly to the >> equivalent problems in the async space, and, accordingly, *should* be >> applied so that the synchronous and asynchronous APIs remain as >> consistent as possible. > > You've convinced me. I've never used the wait() and as_completed() > APIs in c.f, but you're right that with the exception of requiring > 'yield from' they can be carried over exactly, and given that we're > doing the same thing with Future, this is eminently reasonable. > > I may not get to implementing these for two weeks (I'll be traveling > without a computer) but they will not be forgotten. I did update the PEP. There are some questions about details; e.g. I think the 'fs' argument should allow a mixture of Futures and coroutines (the latter will be wrapped Tasks) and the sets returned by wait() should contain Futures and Tasks. You propose that as_completed() returns an iterator whose items are coroutines; why not Futures? (They're more versatile even if slightly slower that coroutines.) I can sort of see the reasoning but want to tease out whether you meant it that way. Also, we can't have __next__() raise TimeoutError, since it never blocks; it will have to be the coroutine (or Future) returned by __next__(). > --Guido > >> The low level equivalent to par() would be: >> >> incomplete = >> complete, incomplete = yield from tulip.wait(incomplete) >> assert not incomplete # Without a timeout, everything should complete >> for f in complete: >> # Handle the completed operations >> >> Limiting the maximum execution time of any task to 10 seconds is >> straightforward: >> >> incomplete = >> complete, incomplete = yield from tulip.wait(incomplete, timeout=10) >> for f in incomplete: >> f.cancel() # Took too long, kill it >> for f in complete: >> # Handle the completed operations >> >> The low level equivalent to the wait_one() example would become: >> >> incomplete = >> while incomplete: >> complete, incomplete = yield from tulip.wait(incomplete, >> return_when=FIRST_COMPLETED) >> for f in complete: >> # Handle the completed operations >> >> par() becomes easy to define as a coroutine: >> >> @coroutine >> def par(fs): >> complete, incomplete = yield from tulip.wait(fs, >> return_when=FIRST_EXCEPTION) >> for f in incomplete: >> f.cancel() # Something must have failed, so cancel the rest >> # If something failed, calling f.result() will raise that exception >> return [f.result() for f in complete] >> >> Defining wait_one() is also straightforward (although it isn't clearly >> superior to just >> using the underlying API directly): >> >> @coroutine >> def wait_one(fs): >> complete, incomplete = yield from tulip.wait(fs, >> return_when=FIRST_COMPLETED) >> return complete.pop() >> >> The async equivalent to "as_completed" under this scheme is far more >> interesting, as it would be an iterator that produces coroutines: >> >> def as_completed(fs): >> incomplete = fs >> while incomplete: >> # Phase 1 of the loop, we yield a coroutine that actually >> starts operations running >> @coroutine >> def _wait_for_some(): >> nonlocal complete, incomplete >> complete, incomplete = yield from tulip.wait(fs, >> return_when=FIRST_COMPLETED) >> return complete.pop().result() >> yield _wait_for_some() >> # Phase 2 of the loop, we pass back the already complete operations >> while complete: >> # Note this use case for @coroutine *forcing* objects >> to behave like a generator, >> # as well as exploiting the ability to avoid trips >> around the event loop >> @coroutine >> def _next_result(): >> return complete.pop().result() >> yield _next_result() >> >> # This is almost as easy to use as the synchronous equivalent, the >> only difference >> # is the use of "yield from f" instead of the synchronous "f.result()" >> for f in as_completed(fs): >> next = yield from f >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sat Dec 22 09:04:58 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Dec 2012 18:04:58 +1000 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 4:20 PM, Guido van Rossum wrote: > I did update the PEP. There are some questions about details; e.g. I > think the 'fs' argument should allow a mixture of Futures and > coroutines (the latter will be wrapped Tasks) and the sets returned by > wait() should contain Futures and Tasks. Yes, I think I wrote my examples that way, even though I didn't say that in the text. > You propose that > as_completed() returns an iterator whose items are coroutines; why not > Futures? (They're more versatile even if slightly slower that > coroutines.) I can sort of see the reasoning but want to tease out > whether you meant it that way. I deliberately chose to return coroutines. My rationale is to be able to handle the case where multiple operations become ready without having to make multiple trips around the event loop by having the iterator switch between two modes: when the complete set is empty, it yields a coroutine that calls wait and then returns the first complete future, while when there are already complete futures available, it yields a coroutine that just returns one of them immediately. It's really the same rationale as that for having @coroutine not automatically wrap things in Task - if we can avoid the event loop in cases that don't actually need to wait for an event, that's a good thing. > Also, we can't have __next__() raise > TimeoutError, since it never blocks; it will have to be the coroutine > (or Future) returned by __next__(). Yeah, any exceptions should happen at the yield from call inside the loop. I *think* my implementation achieves that (since the coroutine instances it creates are passed out to the for loop for further processing), but it's quite possible I missed something. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Dec 22 10:14:59 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Dec 2012 19:14:59 +1000 Subject: [Python-ideas] Async context managers and iterators with tulip Message-ID: On Sat, Dec 22, 2012 at 4:17 PM, guido.van.rossum wrote: > +- We might introduce explicit locks, though these will be a bit of a > + pain to use, as we can't use the ``with lock: block`` syntax > + (because to wait for a lock we'd have to use ``yield from``, which > + the ``with`` statement can't do). Actually, I just realised that the following can work if the async lock is defined appropriately: with yield from async_lock: ... The secret is that async_lock would need to be a coroutine rather than a context manager. *Calling* the coroutine would acquire the lock (potentially registering a callback that is scheduled when the lock is released) and return a context manager that released the lock. The async_lock itself wouldn't be a context manager, so you'd get an immediate error if you left out the "yield from". We'd be heading even further down the path of two-languages-for-the-price-of-one if we did that, though (by which I mean the fact that async code and synchronous code exist in parallel universes - one, more familiar one, where the ability to block is assumed, as is the fact that any operation may give concurrent code the chance to execute, and the universe of Twisted, tulip, et al, where possible suspension points are required to be explicitly marked in the function where they occur). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From _ at lvh.cc Sat Dec 22 13:26:44 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 22 Dec 2012 13:26:44 +0100 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: I can't quite tell by the wording if you consider two-languages-for-the-price-of-one a good thing or a bad thing; but I can tell you that at least in Twisted, explicit suspension points have been a definite boon :) While it may lead to issues in some things (e.g. new users using blocking urllib calls in a callback), I find the net result much easier to read and reason about. cheers, lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 22 13:57:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 22 Dec 2012 22:57:40 +1000 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 10:26 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > I can't quite tell by the wording if you consider > two-languages-for-the-price-of-one a good thing or a bad thing; but I can > tell you that at least in Twisted, explicit suspension points have been a > definite boon :) While it may lead to issues in some things (e.g. new users > using blocking urllib calls in a callback), I find the net result much > easier to read and reason about. On balance, I consider it better than offering only greenlet-style implicit switching (which is effectively equivalent to preemptive threading, since any function call or operator may suspend the task). I'm also a lot happier about it since realising that the model of emitting futures and using "yield from f" where synchronous code would use "f.result()" helps unify the two worlds. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Sat Dec 22 16:54:55 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 07:54:55 -0800 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 12:04 AM, Nick Coghlan wrote: > On Sat, Dec 22, 2012 at 4:20 PM, Guido van Rossum wrote: >> I did update the PEP. There are some questions about details; e.g. I >> think the 'fs' argument should allow a mixture of Futures and >> coroutines (the latter will be wrapped Tasks) and the sets returned by >> wait() should contain Futures and Tasks. > > Yes, I think I wrote my examples that way, even though I didn't say > that in the text. Good. >> You propose that >> as_completed() returns an iterator whose items are coroutines; why not >> Futures? (They're more versatile even if slightly slower that >> coroutines.) I can sort of see the reasoning but want to tease out >> whether you meant it that way. > > I deliberately chose to return coroutines. My rationale is to be able > to handle the case where multiple operations become ready without > having to make multiple trips around the event loop by having the > iterator switch between two modes: when the complete set is empty, it > yields a coroutine that calls wait and then returns the first complete > future, while when there are already complete futures available, it > yields a coroutine that just returns one of them immediately. It's > really the same rationale as that for having @coroutine not > automatically wrap things in Task - if we can avoid the event loop in > cases that don't actually need to wait for an event, that's a good > thing. I think I see it now. The first item yielded is the simplest thing that can be used with yield-from, i.e. a coroutine. Then if multiple futures are ready at once, you return an item of the same type, i.e. a coroutine. This is essentially wrapping a Future in a coroutine! If we could live with the items being alternatingly coroutines and Futures, we could just return the Future in this case. BTW, yield from need not go to the scheduler if the Future is already done -- the Future,__iter__ method should be: def __iter__(self): if not self.done(): yield self # This tells Task to wait for completion. return self.result() # May raise too. (I forgot this previously.) >> Also, we can't have __next__() raise >> TimeoutError, since it never blocks; it will have to be the coroutine >> (or Future) returned by __next__(). > > Yeah, any exceptions should happen at the yield from call inside the > loop. I *think* my implementation achieves that (since the coroutine > instances it creates are passed out to the for loop for further > processing), but it's quite possible I missed something. It'll come out in implementation (in two weeks, maybe). -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 17:01:19 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 08:01:19 -0800 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 1:14 AM, Nick Coghlan wrote: > On Sat, Dec 22, 2012 at 4:17 PM, guido.van.rossum > wrote: >> +- We might introduce explicit locks, though these will be a bit of a >> + pain to use, as we can't use the ``with lock: block`` syntax >> + (because to wait for a lock we'd have to use ``yield from``, which >> + the ``with`` statement can't do). > > Actually, I just realised that the following can work if the async > lock is defined appropriately: > > with yield from async_lock: > ... Syntactically you'd have to say with (yield from async_lock): .... > The secret is that async_lock would need to be a coroutine rather than > a context manager. *Calling* the coroutine would acquire the lock > (potentially registering a callback that is scheduled when the lock is > released) and return a context manager that released the lock. The > async_lock itself wouldn't be a context manager, so you'd get an > immediate error if you left out the "yield from". Very nice. > We'd be heading even further down the path of > two-languages-for-the-price-of-one if we did that, though (by which I > mean the fact that async code and synchronous code exist in parallel > universes - one, more familiar one, where the ability to block is > assumed, as is the fact that any operation may give concurrent code > the chance to execute, and the universe of Twisted, tulip, et al, > where possible suspension points are required to be explicitly marked > in the function where they occur). It's inevitable that some patterns work well together while others don't. I see no big philosophical problem with this. Pragmatically, we'll have plenty of places where existing stdlib modules can't be used with tulip, and the tulip-compatible upgrade will have a different API. (The trickiest part will be that the classic code, e.g. urllib, must work in any thread and cannot rely on the existence of an event loop. *Maybe* you can get by with get_event_loop().run_until_complete() but that might still depend on the default event loop policy. Food for thought.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Dec 22 17:03:29 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 08:03:29 -0800 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 4:57 AM, Nick Coghlan wrote: > On Sat, Dec 22, 2012 at 10:26 PM, Laurens Van Houtven <_ at lvh.cc> wrote: >> I can't quite tell by the wording if you consider >> two-languages-for-the-price-of-one a good thing or a bad thing; but I can >> tell you that at least in Twisted, explicit suspension points have been a >> definite boon :) While it may lead to issues in some things (e.g. new users >> using blocking urllib calls in a callback), I find the net result much >> easier to read and reason about. > > On balance, I consider it better than offering only greenlet-style > implicit switching (which is effectively equivalent to preemptive > threading, since any function call or operator may suspend the task). > I'm also a lot happier about it since realising that the model of > emitting futures and using "yield from f" where synchronous code would > use "f.result()" helps unify the two worlds. I wouldn't go so far as to call that unifying, but it definitely helps people transition. Still, from experience with introducing NDB's async in some internal App Engine software, it takes some getting used to even for the best of developers. But it is worth it. -- --Guido van Rossum (python.org/~guido) From andrew.svetlov at gmail.com Sat Dec 22 18:11:09 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 22 Dec 2012 19:11:09 +0200 Subject: [Python-ideas] ``with from`` statement Message-ID: Crazy idea. Guido van Rossum mentioned after working on PEP 3156 that context managers cannot use yield from statement inside __enter__ and __exit__ magic methods. Explicit call for entering and leaving context (for locking for example) is not convenient. What do you think about with from f(): do_our_work() ``with from ...` construction calls __enter_from__ generator and iterates via ``yield from`` for that. Returned value is our context manager. The same for __exit_from__ ? do``yield from`` for that and stop on StopIteration or exception. -- Thanks, Andrew Svetlov From guido at python.org Sat Dec 22 18:15:00 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 09:15:00 -0800 Subject: [Python-ideas] ``with from`` statement In-Reply-To: References: Message-ID: Nick already proposed "with (yield from ...): ..." Maybe in 3.4 we can tweak the syntax so the paresns are not needed. I am quite glad that we had the foresight (when we designed 'with') to make this possible. On Saturday, December 22, 2012, Andrew Svetlov wrote: > Crazy idea. > Guido van Rossum mentioned after working on PEP 3156 that context > managers cannot use > yield from statement inside __enter__ and __exit__ magic methods. > Explicit call for entering and leaving context (for locking for > example) is not convenient. > > What do you think about > > with from f(): > do_our_work() > > > ``with from ...` construction calls __enter_from__ generator and > iterates via ``yield from`` for that. > Returned value is our context manager. > > The same for __exit_from__ ? do``yield from`` for that and stop on > StopIteration or exception. > > -- > Thanks, > Andrew Svetlov > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Sat Dec 22 18:22:55 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 22 Dec 2012 19:22:55 +0200 Subject: [Python-ideas] ``with from`` statement In-Reply-To: References: Message-ID: Yes, Nick's proposal is just awesome. I cannot figure out is __exit__ can be generator which use ``yield from`` also inside? On Sat, Dec 22, 2012 at 7:15 PM, Guido van Rossum wrote: > Nick already proposed "with (yield from ...): ..." > > Maybe in 3.4 we can tweak the syntax so the paresns are not needed. > > I am quite glad that we had the foresight (when we designed 'with') to make > this possible. > > > On Saturday, December 22, 2012, Andrew Svetlov wrote: >> >> Crazy idea. >> Guido van Rossum mentioned after working on PEP 3156 that context >> managers cannot use >> yield from statement inside __enter__ and __exit__ magic methods. >> Explicit call for entering and leaving context (for locking for >> example) is not convenient. >> >> What do you think about >> >> with from f(): >> do_our_work() >> >> >> ``with from ...` construction calls __enter_from__ generator and >> iterates via ``yield from`` for that. >> Returned value is our context manager. >> >> The same for __exit_from__ ? do``yield from`` for that and stop on >> StopIteration or exception. >> >> -- >> Thanks, >> Andrew Svetlov >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) -- Thanks, Andrew Svetlov From andrew.svetlov at gmail.com Sat Dec 22 18:25:32 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 22 Dec 2012 19:25:32 +0200 Subject: [Python-ideas] ``with from`` statement In-Reply-To: References: Message-ID: Python syntax looks like use of time machine day by day. I like it! On Sat, Dec 22, 2012 at 7:22 PM, Andrew Svetlov wrote: > Yes, Nick's proposal is just awesome. > > I cannot figure out is __exit__ can be generator which use ``yield > from`` also inside? > > On Sat, Dec 22, 2012 at 7:15 PM, Guido van Rossum wrote: >> Nick already proposed "with (yield from ...): ..." >> >> Maybe in 3.4 we can tweak the syntax so the paresns are not needed. >> >> I am quite glad that we had the foresight (when we designed 'with') to make >> this possible. >> >> >> On Saturday, December 22, 2012, Andrew Svetlov wrote: >>> >>> Crazy idea. >>> Guido van Rossum mentioned after working on PEP 3156 that context >>> managers cannot use >>> yield from statement inside __enter__ and __exit__ magic methods. >>> Explicit call for entering and leaving context (for locking for >>> example) is not convenient. >>> >>> What do you think about >>> >>> with from f(): >>> do_our_work() >>> >>> >>> ``with from ...` construction calls __enter_from__ generator and >>> iterates via ``yield from`` for that. >>> Returned value is our context manager. >>> >>> The same for __exit_from__ ? do``yield from`` for that and stop on >>> StopIteration or exception. >>> >>> -- >>> Thanks, >>> Andrew Svetlov >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> http://mail.python.org/mailman/listinfo/python-ideas >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) > > > > -- > Thanks, > Andrew Svetlov -- Thanks, Andrew Svetlov From solipsis at pitrou.net Sat Dec 22 19:26:54 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 22 Dec 2012 19:26:54 +0100 Subject: [Python-ideas] ``with from`` statement References: Message-ID: <20121222192654.2775cb00@pitrou.net> On Sat, 22 Dec 2012 19:25:32 +0200 Andrew Svetlov wrote: > Python syntax looks like use of time machine day by day. I like it! Not sure I like "with yield from". How do you intend to explain that to an average programmer? Regards Antoine. From guido at python.org Sat Dec 22 20:09:22 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 11:09:22 -0800 Subject: [Python-ideas] ``with from`` statement In-Reply-To: <20121222192654.2775cb00@pitrou.net> References: <20121222192654.2775cb00@pitrou.net> Message-ID: On Sat, Dec 22, 2012 at 10:26 AM, Antoine Pitrou wrote: > On Sat, 22 Dec 2012 19:25:32 +0200 > Andrew Svetlov > wrote: >> Python syntax looks like use of time machine day by day. I like it! > > Not sure I like "with yield from". How do you intend to explain that to > an average programmer? Break it down into pieces. The general form is with : where can take many forms, including yield from we just have to handwave a bit about the priorities, but that's usually okay. People do get x = yield from It's just that currently somehow you have to surround "yield from " in an extra pair of parentheses everywhere except on the RHS of an assignment; my other pet peeve in this area is that you must write return (yield from ) (which I end up writing fairly regularly). I assume that if we can make the parens optional for assignment, we can make them optional in other places. -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Dec 22 23:20:08 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Dec 2012 17:20:08 -0500 Subject: [Python-ideas] ``with from`` statement In-Reply-To: References: <20121222192654.2775cb00@pitrou.net> Message-ID: On 12/22/2012 2:09 PM, Guido van Rossum wrote: > On Sat, Dec 22, 2012 at 10:26 AM, Antoine Pitrou wrote: >> On Sat, 22 Dec 2012 19:25:32 +0200 >> Andrew Svetlov >> wrote: >>> Python syntax looks like use of time machine day by day. I like it! >> >> Not sure I like "with yield from". At the moment, that looks a bit dubious to me too. Maybe just because it is new (to me). >> How do you intend to explain that to >> an average programmer? > > Break it down into pieces. The general form is > > with : with as : with yield from x() as y: ... > where can take many forms, including > > yield from > > we just have to handwave a bit about the priorities, Too much dependence on implicit priorities makes the language more baroque and less clear. For instance, I am fine with having to parenthesize generator expressions (except in calls where it would result in doubled parens ((ge))). An explanation need more than a handwave ;-). > but that's usually okay. People do get > > x = yield from No problem because = cleanly breaks the statement. More a problem is the difference of x coming from a value yielded (or returned?) by the callee instead of a value sent by the caller, as in x = yield y. > It's just that currently somehow you have to surround "yield from > " in an extra pair of parentheses everywhere except on the RHS > of an assignment; my other pet peeve in this area is that you must > write > > return (yield from ) > > (which I end up writing fairly regularly). I can see how that seems like a nuisance. Why that omitting parens bother me less here? Perhaps because return binds the expression to location of the call in the calling expression. > I assume that if we can make the parens optional for assignment, we > can make them optional in other places. If the grammar can be written to do that sufficiently clearly, then it should be explainable to people. -- Terry Jan Reedy From tjreedy at udel.edu Sun Dec 23 00:03:55 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 22 Dec 2012 18:03:55 -0500 Subject: [Python-ideas] Tkinter and tulip Message-ID: Though not mentioned much in the tulip discussion, tkinter is a third 'T' package with its own event loop. (And by the way, I associate 'tulip' with 'Floriade', with 10s of thousands of tulips in bloom. It was a +++ experience. But I suppose it is too cute for Python ;-) Yesterday, tk/tkinter expert Kevin Walzer asked on python-list how to (easily) read a pipe asynchonously and post the result to a tk text widget. I don't know the answer now, but is my understanding correct that in the future a) there should be a tk loop adapter that could replace the default tulip loop and b) it would then be easy to add i/o events to the tk loop? My personal interest is whether it will some day be possible to re-write IDLE to use tulip so one could edit in an edit pane while the shell pane asynchronously waits for and displays output from a 'long' computation.* It would also be nice if ^C could be made to work better -- which is to say, take effect sooner -- by decoupling key processing from socket reading. I am thinking that IDLE could be both a simple test and showcase for the usefulness of tulip. *I currently put shell and edit windows side-by-side on my wide-screen monitor. I can imagine putting two panes in one window instead. -- Terry Jan Reedy From ncoghlan at gmail.com Sun Dec 23 06:46:41 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Dec 2012 15:46:41 +1000 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: On Sun, Dec 23, 2012 at 1:54 AM, Guido van Rossum wrote: > On Sat, Dec 22, 2012 at 12:04 AM, Nick Coghlan wrote: >> I deliberately chose to return coroutines. My rationale is to be able >> to handle the case where multiple operations become ready without >> having to make multiple trips around the event loop by having the >> iterator switch between two modes: when the complete set is empty, it >> yields a coroutine that calls wait and then returns the first complete >> future, while when there are already complete futures available, it >> yields a coroutine that just returns one of them immediately. It's >> really the same rationale as that for having @coroutine not >> automatically wrap things in Task - if we can avoid the event loop in >> cases that don't actually need to wait for an event, that's a good >> thing. > > I think I see it now. The first item yielded is the simplest thing > that can be used with yield-from, i.e. a coroutine. Then if multiple > futures are ready at once, you return an item of the same type, i.e. a > coroutine. This is essentially wrapping a Future in a coroutine! If we > could live with the items being alternatingly coroutines and Futures, > we could just return the Future in this case. BTW, yield from > need not go to the scheduler if the Future is already done -- the > Future,__iter__ method should be: > > def __iter__(self): > if not self.done(): > yield self # This tells Task to wait for completion. > return self.result() # May raise too. > > (I forgot this previously.) And I'd missed it completely :) In that case, yeah, yielding any already completed Futures directly from as_completed() should work. The "no completed operations" case will still need a coroutine, though, as it needs to update the "complete" and "incomplete" sets inside the iterator. Since we know we're certain to hit the scheduler in that case, we may as well wrap it directly in a task so we're always returning some kind of future. The impl might end up looking something like: def as_completed(fs): incomplete = fs while incomplete: # Phase 1 of the loop, we yield a Task that waits for operations @coroutine def _wait_for_some(): nonlocal complete, incomplete complete, incomplete = yield from tulip.wait(fs, return_when=FIRST_COMPLETED) return complete.pop().result() yield Task(_wait_for_some()) # Phase 2 of the loop, we pass back the already complete operations while complete: yield complete.pop() Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Dec 23 06:48:00 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Dec 2012 15:48:00 +1000 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sun, Dec 23, 2012 at 2:03 AM, Guido van Rossum wrote: > On Sat, Dec 22, 2012 at 4:57 AM, Nick Coghlan wrote: >> On balance, I consider it better than offering only greenlet-style >> implicit switching (which is effectively equivalent to preemptive >> threading, since any function call or operator may suspend the task). >> I'm also a lot happier about it since realising that the model of >> emitting futures and using "yield from f" where synchronous code would >> use "f.result()" helps unify the two worlds. > > I wouldn't go so far as to call that unifying, but it definitely helps > people transition. Still, from experience with introducing NDB's async > in some internal App Engine software, it takes some getting used to > even for the best of developers. But it is worth it. Yes, "unify" was the wrong word - "align" would be better. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Sun Dec 23 07:20:37 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 22:20:37 -0800 Subject: [Python-ideas] PEP 3156 feedback: wait_one vs par vs concurrent.futures.wait In-Reply-To: References: Message-ID: Yes, I like always returning a future. On Saturday, December 22, 2012, Nick Coghlan wrote: > On Sun, Dec 23, 2012 at 1:54 AM, Guido van Rossum > > wrote: > > On Sat, Dec 22, 2012 at 12:04 AM, Nick Coghlan > > wrote: > >> I deliberately chose to return coroutines. My rationale is to be able > >> to handle the case where multiple operations become ready without > >> having to make multiple trips around the event loop by having the > >> iterator switch between two modes: when the complete set is empty, it > >> yields a coroutine that calls wait and then returns the first complete > >> future, while when there are already complete futures available, it > >> yields a coroutine that just returns one of them immediately. It's > >> really the same rationale as that for having @coroutine not > >> automatically wrap things in Task - if we can avoid the event loop in > >> cases that don't actually need to wait for an event, that's a good > >> thing. > > > > I think I see it now. The first item yielded is the simplest thing > > that can be used with yield-from, i.e. a coroutine. Then if multiple > > futures are ready at once, you return an item of the same type, i.e. a > > coroutine. This is essentially wrapping a Future in a coroutine! If we > > could live with the items being alternatingly coroutines and Futures, > > we could just return the Future in this case. BTW, yield from > > need not go to the scheduler if the Future is already done -- the > > Future,__iter__ method should be: > > > > def __iter__(self): > > if not self.done(): > > yield self # This tells Task to wait for completion. > > return self.result() # May raise too. > > > > (I forgot this previously.) > > And I'd missed it completely :) > > In that case, yeah, yielding any already completed Futures directly > from as_completed() should work. The "no completed operations" case > will still need a coroutine, though, as it needs to update the > "complete" and "incomplete" sets inside the iterator. Since we know > we're certain to hit the scheduler in that case, we may as well wrap > it directly in a task so we're always returning some kind of future. > The impl might end up looking something like: > > def as_completed(fs): > incomplete = fs > while incomplete: > # Phase 1 of the loop, we yield a Task that waits for > operations > @coroutine > def _wait_for_some(): > nonlocal complete, incomplete > complete, incomplete = yield from tulip.wait(fs, > return_when=FIRST_COMPLETED) > return complete.pop().result() > yield Task(_wait_for_some()) > # Phase 2 of the loop, we pass back the already complete > operations > while complete: > yield complete.pop() > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, > Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Dec 23 07:24:12 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 22 Dec 2012 22:24:12 -0800 Subject: [Python-ideas] Tkinter and tulip In-Reply-To: References: Message-ID: I hadn't thought of Tkinter, but it is an excellent idea to see how it and tulip could integrate. Maybe it is possible to add Tkinter as a file descriptor to tulip? I won't have time to look into this myself for a while but would love it if someone tried this and gave feedback. --Guido On Saturday, December 22, 2012, Terry Reedy wrote: > Though not mentioned much in the tulip discussion, tkinter is a third 'T' > package with its own event loop. (And by the way, I associate 'tulip' with > 'Floriade', with 10s of thousands of tulips in bloom. It was a +++ > experience. But I suppose it is too cute for Python ;-) > > Yesterday, tk/tkinter expert Kevin Walzer asked on python-list how to > (easily) read a pipe asynchonously and post the result to a tk text widget. > I don't know the answer now, but is my understanding correct that in the > future a) there should be a tk loop adapter that could replace the default > tulip loop and b) it would then be easy to add i/o events to the tk loop? > > My personal interest is whether it will some day be possible to re-write > IDLE to use tulip so one could edit in an edit pane while the shell pane > asynchronously waits for and displays output from a 'long' computation.* It > would also be nice if ^C could be made to work better -- which is to say, > take effect sooner -- by decoupling key processing from socket reading. I > am thinking that IDLE could be both a simple test and showcase for the > usefulness of tulip. > > *I currently put shell and edit windows side-by-side on my wide-screen > monitor. I can imagine putting two panes in one window instead. > > -- > Terry Jan Reedy > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Sun Dec 23 12:06:31 2012 From: geertj at gmail.com (Geert Jansen) Date: Sun, 23 Dec 2012 12:06:31 +0100 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sat, Dec 22, 2012 at 10:14 AM, Nick Coghlan wrote: [...] > We'd be heading even further down the path of > two-languages-for-the-price-of-one if we did that, though (by which I > mean the fact that async code and synchronous code exist in parallel > universes - one, more familiar one, where the ability to block is > assumed, as is the fact that any operation may give concurrent code > the chance to execute, and the universe of Twisted, tulip, et al, > where possible suspension points are required to be explicitly marked > in the function where they occur). The two languages/parallel universes (sync and asyc) is a big concern IMHO. I looked at a greenlet based program that I'm writing and i'm using call stacks that are 10 deep or so. I would need to change all these layers from the scheduler down to use yield-from to make my program async. The higher levels are typically application specific and could decide to either be sync or async. For the lower levels (e.g. transports and protocols): those are typically library code and you'd need two versions. The latter can amount to quite a bit of duplication: there's a lot of protocol code currently in the standard library. I wonder if the greenlet idea was thrown out too early. If I understand the discussion correctly, the #1 disadvantage that was identified is that calling code does not know if called code will switch or not. Therefore it doesn't know whether to lock, and where. What about the following (straw man) approach to fix that issue using greenlets: functions can state if they are safe with regards to switching using a decorator. The default is off (non-safe). When at some point in the call graph you need to switch, you only to this if all frames starting from the current one up to the scheduler are async-safe. This should be achievable without any language changes. Usually the upper layers in a concurrent program are connection handlers. These can be marked safe quite easily as they usually only use local stated tied to the connection and are not called from other connections. Any code that they call would need to be explicitly marked async-safe otherwise it could block. I think the straw man above is identical to the current yield-from approach in safety because there is no automatic asynchronicity. However, this approach it has the benefit that there can be one implementation of lower layers (protocols and transports) that supports both sync and async, and higher layers can use the natural calling syntax that they are currently used to. Also making a program async can be an incremental process, and you could use e.g. a sys.settrace() handler to identify spots where safe code calls into unsafe code. Regards, Geert From solipsis at pitrou.net Sun Dec 23 12:25:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 23 Dec 2012 12:25:58 +0100 Subject: [Python-ideas] Async context managers and iterators with tulip References: Message-ID: <20121223122558.7d6c7e36@pitrou.net> On Sun, 23 Dec 2012 12:06:31 +0100 Geert Jansen wrote: > On Sat, Dec 22, 2012 at 10:14 AM, Nick Coghlan wrote: > > [...] > > We'd be heading even further down the path of > > two-languages-for-the-price-of-one if we did that, though (by which I > > mean the fact that async code and synchronous code exist in parallel > > universes - one, more familiar one, where the ability to block is > > assumed, as is the fact that any operation may give concurrent code > > the chance to execute, and the universe of Twisted, tulip, et al, > > where possible suspension points are required to be explicitly marked > > in the function where they occur). > > The two languages/parallel universes (sync and asyc) is a big concern > IMHO. I looked at a greenlet based program that I'm writing and i'm > using call stacks that are 10 deep or so. I would need to change all > these layers from the scheduler down to use yield-from to make my > program async. > > The higher levels are typically application specific and could decide > to either be sync or async. For the lower levels (e.g. transports and > protocols): those are typically library code and you'd need two > versions. The latter can amount to quite a bit of duplication: there's > a lot of protocol code currently in the standard library. Protocols written using a callback style (data_received(), etc.), as pointed by Laurens, can be used with both blocking and non-blocking coding styles. Only the transports would need to be duplicated, but that's expected. Regards Antoine. From ncoghlan at gmail.com Sun Dec 23 13:25:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 23 Dec 2012 22:25:09 +1000 Subject: [Python-ideas] Async context managers and iterators with tulip In-Reply-To: References: Message-ID: On Sun, Dec 23, 2012 at 9:06 PM, Geert Jansen wrote: > I wonder if the greenlet idea was thrown out too early. If I > understand the discussion correctly, the #1 disadvantage that was > identified is that calling code does not know if called code will > switch or not. Therefore it doesn't know whether to lock, and where. Greenlets aren't going anywhere. The thing is that "asynchronous programming" is used to describe both an execution model that's limited by the number of concurrent I/O operations rather than the number of OS level threads as well as a programming model based on cooperative (rather than preemptive) multi-threading. Greenlets are designed to provide the scaling benefits of I/O limited concurrency while continuing to use a preemptive multi-threading programming model where any operation is permitted to block the thread of execution (implicitly switching to another thread at the lowest layer). That's *wonderful* for getting the scaling benefits of the execution model without needing to rewrite a program to use a drastically different programming model. PEP 3156, on the other hand, is about providing the cooperative multi-threading *programming* model. Greenlets can't do that, because they're not intended to. However, gevent/greenlets will still benefit from the explicit asynchronous APIs in the future, as those protocols and transports will be usable by the *networking* side of gevent. And that's a ley part of the aim here - reducing the duplication of effort between gevent/Twisted/Tornado/et al by eventually allowing them to share more of the event driven protocol stacks. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From techtonik at gmail.com Sun Dec 23 17:21:43 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Sun, 23 Dec 2012 19:21:43 +0300 Subject: [Python-ideas] Tree as a data structure (Was: Graph class) In-Reply-To: References: Message-ID: On Wed, Dec 19, 2012 at 7:38 PM, Jim Jewett wrote: > On 12/19/12, anatoly techtonik wrote: > > On Sun, Dec 16, 2012 at 6:41 PM, Guido van Rossum > wrote: > > >> I think of graphs and trees as patterns, not data structures. > > > In my world strings, ints and lists are 1D data types, and tree can be a > > very important 2D data structure. > > Yes; the catch is that the details of that data structure will differ > depending on the problem. Most problems do not need the fancy > algorithms -- or the extra overhead that supports them. Since a > simple tree (or graph) is easy to write, and the fiddly details are > often -- but not always -- wasted overhead, it doesn't make sense to > designate a single physical structure as "the" tree (or graph) > representation. So it stays a pattern, rather than a concrete data > structure. Right. Creating a tree structure is not the problem. The problem arise when you have to study the code or work collaboratively with other developers. It takes time to see an ordinary namedtuple in the magic of some custom made tuple subclass. But you can easily add a comment that it is a reimplementation of namedtuple and the code immediately becomes clear. With trees it is impossible to add such a comment, because there is no known reference tree type you can refer to. Making a sum out this to go from patters vs structure. Patterns and data structures are interconnected. The absence of tree definition makes it really hard to communicate about the usage, potential and outcomes or particular approach between developers. What data structure or pattern do we need for - a tree, but which tree exactly and why? > > Speaking of tree as a data structure, I assume that it has a very basic > > definition: > > > 1. tree consists of nodes > > 2. some nodes are containers for other nodes > > Are the leaves a different type, or just nodes that happen to have > zero children at the moment? For the 'reference tree' I'd choose the most common trees human beings work daily, can see and as a result - easily imagine. 1. leaves can not mutate into containers 2. container property structure is different from leaves structure, but may share elements Spoiler: This is a pattern or data structure of filesystem tree. I'd call a tree, which leaves can mutate into containers, a 'mutatable tree', and the one, where leaves are containers with 0 elements, a 'uniform tree' data structure name. A 'flexible tree` could be the better name, but it is too generic to draw a clear association to the behavior. > > 3. every node has properties > > What sort of properties? > I've meant the user level properties, not internal required for maintaining tree structure. > A single value of a given class, plus some binary flags that are > internal to the graph implementation? > I am afraid to become lost in the depths of implementation details, because it is where 2D concept jumps in. The 'reference tree' I mentioned above is a 1:1 mapping between the set of user level properties and a node. This means each container node is "assigned" one user level set of properties (the given class) and each leaf node contains another. It is the opposite to the tree, where each node can have different user class (set of properties) assigned. The 2nd dimension is the mapping between node types (leaf and container) and user level types. > A fixed set of values that occur on every node? (Possibly differing > between leaves and regular nodes?) > A fixed value (used for ordering) plus an arbitrary collection that > can vary by node? > For the 'reference tree' every leaf contains the same set of properties, each property has its own value. Every container has the different set of properties, each property has its own value. I can't say if should be implemented as a class, but I can only propose how this should behave: For example, I want to access filesystem with , the syntax is the following: for node in container.nodes(): if node is File: print node.name print node.hidden if node is Directory: print node.name + '/' >From the other side I want to access: for file in directory.files(): print file.name print file.hidden The latter is more intuitive, but only possible if we can map 'files' accessor name to 'node.type == leaf' query (which is hardcoded for 'generic tree' implementation). > More ideas: > > > [ ] every element in a tree can be accessed by its address specificator > > as 'root/node[3]/last' > > That assumes an arbitrary number of children, and that the children > are ordered. A sensible choice, but it adds way too much overhead for > some cases. > > (And of course, the same goes for the overhead of balancing, etc.) Maintaining data structure (order and nesting of elements) is the key concept for a generic tree, and it also helps in development when you need an easy way to "run a diff over it". Even for unordered children there should be some way to sort them out for the comparisons. One important operation over tree can be "data structure hash", which can be used to detect if the structure of some tree is equal to the given structure. For this operation the actual values of the properties are irrelevant. Only types, positions of the nodes and names of their properties. For the 'reference tree' we have 1:1 mapping between node type, and the user level type, so the type of the node is not relevant. If set of fields is fixed, it is not relevant too, so only the data structure - nesting and order of elements plays role. Actually, after rereading this sounds too abstract. When we compare the filesystem trees for identity, the name of the directory (container) is its address that participates in the hash, and the order of elements is irrelevant. When we compare two data structures that web framework passes to template engine, we also not interested in the order of first level key:value pairs, but the names of these keys are important. This is only the first level of the data structure, though, data structure for the values part can also be a tree, where the order is important. So, for the most generic comparison keys there should be a way to present unordered tree in ordered manner for hash comparison. == More ideas (feel free to skip the brain dump or split it into different thread) For a generic, filesystem-like tree I need to iterate over the lees in specified container, over containers there and over both leaves and containers. I want to choose the failure behavior when I iterate over the non-existing node property. And if given the default choice, I prefer to avoid exception if possible. If field doesn't exist, return None. If field doesn't have a value, supply an Empty class. In the data structure the 'None' is not a value, but a fact, that there is no field in a data structure. Why avoid exceptions? Exception is like an emergency procedure where you lose the jet and can non resume the flight from the point you've stopped. You need to supply the parachute beforehand and make sure it fits in the structure of your cabin. I mean that it is very hard to resume processing after the exception if you're interrupted in the middle of a cycle. The exceptions will occur anyway, but for the first iteration I'd like to see exception-less data structure handling, using None semantics for absent properties. It will also make check for field existence more consistent. Instead of "if property.__name__ in node.__dict__" or even instead of "if property in node" use "if node.property != None", because the latter is not easy to confuse with "if node in container". Another concept if the set of properties should be fixed or expandable for a given node instance in a 'reference tree'. For flexibility I like the latter, but for the static analysis in IDE, it is better to get a warning early when you assign a value to non-existing tree node property. -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Tue Dec 25 07:28:18 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Tue, 25 Dec 2012 09:28:18 +0300 Subject: [Python-ideas] Dynamic code NOPing Message-ID: For the logging module it will be extremely useful if Python included a way to disactivate processing certain blocks to avoid making sacrifices between extensive logging harness and performance. For example, instead of writing: if log.DEBUG==True: log(factorial(2**15)) It should be possible to just write: log(factorial(2**15)) if if log() is an instance of some Nopable class, the statement in log's braces is not executed. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at ei-grad.ru Tue Dec 25 08:23:53 2012 From: andrew at ei-grad.ru (Andrew Grigorev) Date: Tue, 25 Dec 2012 13:23:53 +0600 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: Message-ID: <50D95489.8020307@ei-grad.ru> It is possible now, use just have to move a resource consuming operations to the __str__ or __repr__ class methods and use logging.log feature, that it doesn't format string with specified format and arguments if the logging level is greater than the specified message level. For example: class Factorial: def __init__(self, n): self.n = n def calculate(self): return factorial(n) def __str__(self): return str(self.calculate) logging.debug("Factorial of %d is %s", 2**15, Factorial(2**15)) 25.12.2012 12:28, anatoly techtonik ?????: > For the logging module it will be extremely useful if Python included > a way to disactivate processing certain blocks to avoid making > sacrifices between extensive logging harness and performance. For > example, instead of writing: > > if log.DEBUG==True: > log(factorial(2**15)) > > It should be possible to just write: > log(factorial(2**15)) > > if if log() is an instance of some Nopable class, the statement in > log's braces is not executed. > -- > anatoly t. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From jstpierre at mecheye.net Tue Dec 25 08:40:10 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Tue, 25 Dec 2012 02:40:10 -0500 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: Message-ID: if __debug__: log(factorial(2**15)) Running python with -O will squash this statement. To have something inline, you could also abuse assert statements to do the job. def debug_log(x): log(x) return True assert debug_log(factorial(2**15)) In optimized builds, the statement will be removed entirely. On Tue, Dec 25, 2012 at 1:28 AM, anatoly techtonik wrote: > For the logging module it will be extremely useful if Python included a > way to disactivate processing certain blocks to avoid making sacrifices > between extensive logging harness and performance. For example, instead of > writing: > > if log.DEBUG==True: > log(factorial(2**15)) > > It should be possible to just write: > log(factorial(2**15)) > > if if log() is an instance of some Nopable class, the statement in log's > braces is not executed. > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From jkbbwr at gmail.com Tue Dec 25 10:35:23 2012 From: jkbbwr at gmail.com (Jakob Bowyer) Date: Tue, 25 Dec 2012 09:35:23 +0000 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: Message-ID: Why not pass the function/method, args, kwargs to log.debug and let log.debug decide if it should execute or not, e.g. log.debug(factorial, 2**15) On Tue, Dec 25, 2012 at 7:40 AM, Jasper St. Pierre wrote: > if __debug__: > log(factorial(2**15)) > > Running python with -O will squash this statement. To have something > inline, you could also abuse assert statements to do the job. > > def debug_log(x): > log(x) > return True > > assert debug_log(factorial(2**15)) > > In optimized builds, the statement will be removed entirely. > > > > On Tue, Dec 25, 2012 at 1:28 AM, anatoly techtonik wrote: > >> For the logging module it will be extremely useful if Python included a >> way to disactivate processing certain blocks to avoid making sacrifices >> between extensive logging harness and performance. For example, instead of >> writing: >> >> if log.DEBUG==True: >> log(factorial(2**15)) >> >> It should be possible to just write: >> log(factorial(2**15)) >> >> if if log() is an instance of some Nopable class, the statement in log's >> braces is not executed. >> -- >> anatoly t. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > Jasper > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rene at stranden.com Tue Dec 25 12:11:09 2012 From: rene at stranden.com (Rene Nejsum) Date: Tue, 25 Dec 2012 12:11:09 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: Message-ID: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> Interessting alternatives, but they do not quite come on as flexible/usefull enough? Often debug statements have a lot of text and variables, like: log.debug( "The value of X, Y, Z is now: %d %s %d" % ( x, lookup(y), factorial(2**15)) It would be nice if args in log.debug() was only evaluated if debug was on. But I don't think this is possible with the current Python evaluation rules. But if debug() was indeed NOP'able, maybe it could be done ? /Rene On Dec 25, 2012, at 10:35 AM, Jakob Bowyer wrote: > Why not pass the function/method, args, kwargs to log.debug and let log.debug decide if it should execute or not, > e.g. > > log.debug(factorial, 2**15) > > > On Tue, Dec 25, 2012 at 7:40 AM, Jasper St. Pierre wrote: > if __debug__: > log(factorial(2**15)) > > Running python with -O will squash this statement. To have something inline, you could also abuse assert statements to do the job. > > def debug_log(x): > log(x) > return True > > assert debug_log(factorial(2**15)) > > In optimized builds, the statement will be removed entirely. > > > > On Tue, Dec 25, 2012 at 1:28 AM, anatoly techtonik wrote: > For the logging module it will be extremely useful if Python included a way to disactivate processing certain blocks to avoid making sacrifices between extensive logging harness and performance. For example, instead of writing: > > if log.DEBUG==True: > log(factorial(2**15)) > > It should be possible to just write: > log(factorial(2**15)) > > if if log() is an instance of some Nopable class, the statement in log's braces is not executed. > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > Jasper > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Dec 25 13:28:55 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Dec 2012 22:28:55 +1000 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> Message-ID: On Tue, Dec 25, 2012 at 9:11 PM, Rene Nejsum wrote: > But if debug() was indeed NOP'able, maybe it could be done ? If someone *really* wants to do this, they can abuse assert statements (which will be optimised out under "-O", just like code guarded by "if __debug__"). That doesn't make it a good idea - you most need log messages to investigate faults in production systems that you can't (or are still trying to) reproduce in development and integration environments. Compiling them out instead of deactivating them with runtime configuration settings means you can't switch them on without restarting the system with different options. This does mean that you have to factor in the cost of logging into your performance targets and hardware requirements, but the payoff is an increased ability to correctly diagnose system faults (as well as improving your ability to extract interesting metrics from log messages). Excessive logging calls certainly *can* cause performance problems due to the function call overhead, as can careless calculation of expensive values that aren't needed. One alternatives occasional noted is that you could design a logging API that can accept lazily evaluated callables instead of ordinary parameters. However, one danger of such expensive logging it that enabling that logging level becomes infeasible in practice, because the performance hit is too significant. The typical aim for logging is that your overhead should be such that enabling it in production means your servers run a little hotter, or your task takes a little longer, not that your application grinds to a halt. One good way to achieve this is to decouple the expensive calculations from the main application - you instead log the necessary pieces of information, which can be picked up by an external service and the calculation performed in a separate process (or even on a separate machine) where it won't affect the main application, and where you only calculate it if you actually need it for some reason. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rene at stranden.com Tue Dec 25 13:42:34 2012 From: rene at stranden.com (Rene Nejsum) Date: Tue, 25 Dec 2012 13:42:34 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> Message-ID: <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> I understand and agree with all your arguments on debugging. At my company we typically make some kind of backend/server control software, with a LOT of debugging lines across many modules. We have 20+ debugging flags and in different situations we enable a few of those, if we were to enable all at once it would defently have an impact on production, but hopefully just a hotter CPU and a lot of disk space being used. debug statements in our code is probably one per 10-20 lines of code. I think my main issue (and what I therefore read into the original suggestion) was the extra "if" statement at every log statement So doing: if log.debug.enabled(): log.debug( bla. bla. ) Add's 5-10% extra code lines, whereas if we could do: log.debug( bla. bla ) at the same cost would save a lot of lines. And when you have 43 lines in your editor, it will give you 3-5 lines more of real code to look at :-) /Rene On Dec 25, 2012, at 1:28 PM, Nick Coghlan wrote: > On Tue, Dec 25, 2012 at 9:11 PM, Rene Nejsum wrote: >> But if debug() was indeed NOP'able, maybe it could be done ? > > If someone *really* wants to do this, they can abuse assert statements > (which will be optimised out under "-O", just like code guarded by "if > __debug__"). That doesn't make it a good idea - you most need log > messages to investigate faults in production systems that you can't > (or are still trying to) reproduce in development and integration > environments. Compiling them out instead of deactivating them with > runtime configuration settings means you can't switch them on without > restarting the system with different options. > > This does mean that you have to factor in the cost of logging into > your performance targets and hardware requirements, but the payoff is > an increased ability to correctly diagnose system faults (as well as > improving your ability to extract interesting metrics from log > messages). > > Excessive logging calls certainly *can* cause performance problems due > to the function call overhead, as can careless calculation of > expensive values that aren't needed. One alternatives occasional > noted is that you could design a logging API that can accept lazily > evaluated callables instead of ordinary parameters. > > However, one danger of such expensive logging it that enabling that > logging level becomes infeasible in practice, because the performance > hit is too significant. The typical aim for logging is that your > overhead should be such that enabling it in production means your > servers run a little hotter, or your task takes a little longer, not > that your application grinds to a halt. One good way to achieve this > is to decouple the expensive calculations from the main application - > you instead log the necessary pieces of information, which can be > picked up by an external service and the calculation performed in a > separate process (or even on a separate machine) where it won't affect > the main application, and where you only calculate it if you actually > need it for some reason. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Dec 25 14:00:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 25 Dec 2012 23:00:40 +1000 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: On Tue, Dec 25, 2012 at 10:42 PM, Rene Nejsum wrote: > Add's 5-10% extra code lines, whereas if we could do: > > log.debug( bla. bla ) > > at the same cost would save a lot of lines. Right, that's where the lazy evaluation API idea comes in where there's no choice except to do the expensive calculation in process and you want to factor out the logging level check, it's possible to replace it with 7 characters embedded in the call: debug_lazy(lambda: bla. bla.) You can also do much more sophisticated things with the logging event handling system that only trigger if an event passes the initial priority level check and gets submitted to the rest of the logging machinery. There's no magic wand we can wave to say "evaluate this immediately sometimes, but lazily other times based on some unknown global state". An API has to choose one or the other. The standard logging APIs chooses do lazy evaluation of formatting calls, but eager evaluation of the interpolated values in order to speed up the typical case of readily accessible data - that's why the active level query API is exposed. Another logging API could certainly make the other choice, adapting to the standard APIs via the level query API. I don't know if such an alternative API exists - my rule of thumb for logging calls is if something is too expensive to calculate all the time, find a way to instead pass the necessary pieces for external reconstruction to a lazy formatting call rather than making a given level of logging prohibitively expensive. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rene at stranden.com Tue Dec 25 14:24:52 2012 From: rene at stranden.com (Rene Nejsum) Date: Tue, 25 Dec 2012 14:24:52 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: Thanks, appreciate your answers and comments? OT: Being brought up with a C/Java background the depth of the Python language itself still amazes me, I wonder if there is a correlation to the Python community when guys like Guido, Nick and all others takes time to answer questions in a friendly, informative and educating way?. Merry christmas to all on the list?. /Rene On Dec 25, 2012, at 2:00 PM, Nick Coghlan wrote: > On Tue, Dec 25, 2012 at 10:42 PM, Rene Nejsum wrote: >> Add's 5-10% extra code lines, whereas if we could do: >> >> log.debug( bla. bla ) >> >> at the same cost would save a lot of lines. > > Right, that's where the lazy evaluation API idea comes in where > there's no choice except to do the expensive calculation in process > and you want to factor out the logging level check, it's possible to > replace it with 7 characters embedded in the call: > > debug_lazy(lambda: bla. bla.) > > You can also do much more sophisticated things with the logging event > handling system that only trigger if an event passes the initial > priority level check and gets submitted to the rest of the logging > machinery. > > There's no magic wand we can wave to say "evaluate this immediately > sometimes, but lazily other times based on some unknown global state". > An API has to choose one or the other. The standard logging APIs > chooses do lazy evaluation of formatting calls, but eager evaluation > of the interpolated values in order to speed up the typical case of > readily accessible data - that's why the active level query API is > exposed. Another logging API could certainly make the other choice, > adapting to the standard APIs via the level query API. I don't know if > such an alternative API exists - my rule of thumb for logging calls is > if something is too expensive to calculate all the time, find a way to > instead pass the necessary pieces for external reconstruction to a > lazy formatting call rather than making a given level of logging > prohibitively expensive. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shibturn at gmail.com Tue Dec 25 15:43:10 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 25 Dec 2012 14:43:10 +0000 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <50D95489.8020307@ei-grad.ru> References: <50D95489.8020307@ei-grad.ru> Message-ID: On 25/12/2012 7:23am, Andrew Grigorev wrote: > > class Factorial: > def __init__(self, n): > self.n = n > def calculate(self): > return factorial(n) > def __str__(self): > return str(self.calculate) > > logging.debug("Factorial of %d is %s", 2**15, Factorial(2**15)) A more generic alternative would be class str_partial(functools.partial): def __str__(self): return str(self()) logging.debug("Factorial of %d is %s", 2**15, str_partial(factorial, 2**15))) -- Richard From vinay_sajip at yahoo.co.uk Tue Dec 25 15:45:20 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 25 Dec 2012 14:45:20 +0000 (UTC) Subject: [Python-ideas] Dynamic code NOPing References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: Rene Nejsum writes: > So doing: > > if log.debug.enabled(): > log.debug( bla. bla. ) > > Add's 5-10% extra code lines, whereas if we could do: > > log.debug( bla. bla ) > > at the same cost would save a lot of lines. Bearing in mind that the first statement in the debug (and analogous methods) is a check for the level, the only thing you gain by having the same check outside the call is the cost of evaluating arguments. But you can also do this by passing an arbitrary class as the message object, which lazily evaluates only when needed. Contrived example: class Message(object): def __init__(self, func, x, y): # params should be cheap to evaluate self.func = func self.x = x self.y = y def __str__(self): return str(self.func(self.x**self.y)) # expense is incurred here logger.debug(Message(factorial, 2, 15)) With this setup, no if statements are needed in your code, and the expensive computations only occur when required. Regards, Vinay Sajip From ram.rachum at gmail.com Tue Dec 25 22:46:22 2012 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 25 Dec 2012 13:46:22 -0800 (PST) Subject: [Python-ideas] Allow accessing return value inside finally clause Message-ID: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> Say I have this function: def f(): try: return whatever() finally: pass # I want to get what `whatever()` returned in here I want to get the return value from inside the `finally` clause. I understand that this is currently not possible. I'd like that to be possible because that would allow post-processing of a function's return value. What do you think? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Dec 25 22:55:41 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 25 Dec 2012 22:55:41 +0100 Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> Message-ID: On 12/25/2012 10:46 PM, Ram Rachum wrote: > Say I have this function: > > def f(): > try: > return whatever() > finally: > pass # I want to get what `whatever()` returned in here > > I want to get the return value from inside the `finally` clause. > > I understand that this is currently not possible. I'd like that to be possible > because that would allow post-processing of a function's return value. > > What do you think? Please supply a more complete example of what you are trying to achieve. As it is, I wonder what your motivation is for using a "finally", because in the case of an exception, there won't even *be* a return value to postprocess. If you're trying to use try-finally as a sort of nonlocal exit mechanism (like the famous "goto done" in CPython sources), you probably would be fine with ret = None try: if x: ret = blah return # more cases with returns here finally: # post-process ret here return ret But I would consider this an abuse of try-finally, especially since it suppresses proper propagation of exceptions. cheers, Georg From paul at colomiets.name Tue Dec 25 23:24:22 2012 From: paul at colomiets.name (Paul Colomiets) Date: Wed, 26 Dec 2012 00:24:22 +0200 Subject: [Python-ideas] collections.sortedset proposal Message-ID: Hi, I want to propose to include SortedSet data structure into collections module. SortedSet (name borrowed from Redis) is a basically a mapping of (unique) keys to scores, that allows fast slicing by ordinal number and by score. There are plenty of use cases for the sorted sets: * Leaderboard for a game * Priority queue (that supports task deletion) * Timer list (e.g. can be used for tulip, supports deletion too) * Caches with TTL-based, LFU or LRU eviction (including `functools.lru_cache`) * Search databases with relevance scores * Statistics (many use cases) * Replacement for `collections.Counter` with faster `most_common()` I have first draft of pure python implementation: https://github.com/tailhook/sortedsets http://pypi.python.org/pypi/sortedsets/1.0 The implementation is closely modeled on Redis. Internally it consists of a dict for mapping between keys and scores, and a skiplist for scores. So most operations are done with O(log n) time. The actual performance is probably very slow for pure-python implementation, but can be fixed by C code later. The asymptotic performance seems to be OK. So my questions are: 1. Do you think SortedSets are eligible for inclusion to stdlib? 2. Do I need a PEP? 3. Any comments on the implementation? P.S.: Sorted sets in redis are not the same thing as sorted sets in blist. So maybe a better name? -- Paul From techtonik at gmail.com Wed Dec 26 00:04:15 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 26 Dec 2012 02:04:15 +0300 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: On Tue, Dec 25, 2012 at 5:45 PM, Vinay Sajip wrote: > Rene Nejsum writes: > > > So doing: > > > > if log.debug.enabled(): > > log.debug( bla. bla. ) > > > > Add's 5-10% extra code lines, whereas if we could do: > > > > log.debug( bla. bla ) > > > > at the same cost would save a lot of lines. > > Bearing in mind that the first statement in the debug (and analogous > methods) is > a check for the level, the only thing you gain by having the same check > outside > the call is the cost of evaluating arguments. But you can also do this by > passing an arbitrary class as the message object, which lazily evaluates > only > when needed. Contrived example: > > class Message(object): > def __init__(self, func, x, y): # params should be cheap to evaluate > self.func = func > self.x = x > self.y = y > > def __str__(self): > return str(self.func(self.x**self.y)) # expense is incurred here > > logger.debug(Message(factorial, 2, 15)) > > With this setup, no if statements are needed in your code, and the > expensive > computations only occur when required. > That's still two function calls and three assignments per logging call. Too expensive and syntax unwieldy. I think everybody agrees now that for existing CPython implementation there is really no solution for the problem of expensive logging calls vs code clarity. You have to implement optimization workaround at the cost of readability. The idea is to fix the interpreter, introducing a "feature block" - execution block that works only if it is enabled. Execution block for logging example below is defined by function name "debug" and braces (). debug( ) debug is an object of 'feature' type, which is only executed/evaluated, if the feature is enabled in a table of features. It might be possible to implement this as a custom version of PyPy. Then by hardcoding logic for treating logging call as 'featured' should give an immediate performance boost to any project. Still it would be nice if logging was build with supported layout for easy optimization or for 'from __future__ import features.logging' . -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Dec 26 00:02:24 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 26 Dec 2012 10:02:24 +1100 Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> Message-ID: <50DA3080.8080808@pearwood.info> On 26/12/12 08:46, Ram Rachum wrote: > Say I have this function: > > def f(): > try: > return whatever() > finally: > pass # I want to get what `whatever()` returned in here > > I want to get the return value from inside the `finally` clause. > > I understand that this is currently not possible. I'd like that to be > possible because that would allow post-processing of a function's return > value. The usual ways to do that are: def f(): return postprocess(whatever()) or: def f(): return whatever() x = postprocess(f()) Are these usual solution not suitable for your use-case? -- Steven From rene at stranden.com Wed Dec 26 00:36:22 2012 From: rene at stranden.com (Rene Nejsum) Date: Wed, 26 Dec 2012 00:36:22 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: <08359A6F-73DB-495A-A580-9B81DB975966@stranden.com> I think we all agree that it cannot be done in Python right now? But i doubt there will be support for a solution just for debugging, and I am having a hard time coming up with other examples? Quick thought (very quick) and I am no expert, but maybe an acceptable/compatible solution could be: def do_debug(*args): print 'DEBUG: ', args def nop_debug(*args): pass # Empty function debug = do_debug debug( "Some evaluated text %d %d %d" % (1, 2, fact(22)) ) debug = nop_debug debug( "Will not be evaluated, since Python is clever enough to optimise out") At least some kind of -O option could optimise this out ? Then again, there are probably lot's of reasons for this not to work :-) /Rene On Dec 26, 2012, at 12:04 AM, anatoly techtonik wrote: > On Tue, Dec 25, 2012 at 5:45 PM, Vinay Sajip wrote: > Rene Nejsum writes: > > > So doing: > > > > if log.debug.enabled(): > > log.debug( bla. bla. ) > > > > Add's 5-10% extra code lines, whereas if we could do: > > > > log.debug( bla. bla ) > > > > at the same cost would save a lot of lines. > > Bearing in mind that the first statement in the debug (and analogous methods) is > a check for the level, the only thing you gain by having the same check outside > the call is the cost of evaluating arguments. But you can also do this by > passing an arbitrary class as the message object, which lazily evaluates only > when needed. Contrived example: > > class Message(object): > def __init__(self, func, x, y): # params should be cheap to evaluate > self.func = func > self.x = x > self.y = y > > def __str__(self): > return str(self.func(self.x**self.y)) # expense is incurred here > > logger.debug(Message(factorial, 2, 15)) > > With this setup, no if statements are needed in your code, and the expensive > computations only occur when required. > > That's still two function calls and three assignments per logging call. Too expensive and syntax unwieldy. I think everybody agrees now that for existing CPython implementation there is really no solution for the problem of expensive logging calls vs code clarity. You have to implement optimization workaround at the cost of readability. > > The idea is to fix the interpreter, introducing a "feature block" - execution block that works only if it is enabled. Execution block for logging example below is defined by function name "debug" and braces (). > > debug( ) > > debug is an object of 'feature' type, which is only executed/evaluated, if the feature is enabled in a table of features. > > It might be possible to implement this as a custom version of PyPy. Then by hardcoding logic for treating logging call as 'featured' should give an immediate performance boost to any project. Still it would be nice if logging was build with supported layout for easy optimization or for 'from __future__ import features.logging' . > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Wed Dec 26 00:54:42 2012 From: wuwei23 at gmail.com (alex23) Date: Tue, 25 Dec 2012 15:54:42 -0800 (PST) Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> Message-ID: On 26 Dec, 08:06, Ram Rachum wrote: > Now of course, you can find other solutions to this problem. You can write > a decorator to do the post-processing phase, or you could divide the whole > thing into 2 functions. But I think that sometimes, the > `finally`-postprocess idiom I propose will be the most succinct one. I initially responded to say "use a decorator", but you're already aware of the common pattern for dealing with this, and yet you'd rather the language change instead? > (Regarding lack of return value and propagating exceptions: This all sounds > solvable to me. Why not let the `finally` clause detect what's going on and > react appropriately? No `return` value? Don't postprocess. Exception > raised? Don't interfere.) People already struggle with understanding the semantics of try/ finally - you yourself demonstrated this in your first post by not being away that the 'return value' may not be set - and you want to make it _more_ magic? You're a programmer. At the end of the day, you're going to have to do _some_ "heavy" lifting by yourself. Assign your return values to an object that performs the post-processing on demand. Create a context manager that does it when it exits. Write a loop to decorate your functions if typing @decorator is so strenuous. Making Python less clear to save yourself some typing isn't a decent trade off. From wuwei23 at gmail.com Wed Dec 26 00:55:22 2012 From: wuwei23 at gmail.com (alex23) Date: Tue, 25 Dec 2012 15:55:22 -0800 (PST) Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> <50DA3080.8080808@pearwood.info> Message-ID: <0b2afe6b-7ea3-4e22-9696-c9a5021fefe5@t6g2000pba.googlegroups.com> On 26 Dec, 09:11, Ram Rachum wrote: > That works, sure. I've mentioned this in my email above. But I think that > in some cases making the post-process in the `finally` clause will be more > elegant. What you deem "elegant", I see as "laziness". From techtonik at gmail.com Wed Dec 26 01:10:42 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 26 Dec 2012 03:10:42 +0300 Subject: [Python-ideas] Documenting Python warts on Stack Overflow Message-ID: I am thinking about [python-wart] on SO. There is no currently a list of Python warts, and building a better language is impossible without a clear visibility of warts in current implementations. Why Roundup doesn't work ATM. - warts are lost among other "won't fix" and "works for me" issues - no way to edit description to make it more clear - no voting/stars to percieve how important is this issue - no comment/noise filtering and the most valuable - there is no query to list warts sorted by popularity to explore other time-consuming areas of Python you are not aware of, but which can popup one day SO at least allows: + voting + community wiki edits + useful comment upvoting + sorted lists + user editable tags (adding new warts is easy) This post is a result of facing with numerous locals/settrace/exec issues that are closed on tracker. I also have my own list of other issues (logging/subprocess) at GC project, which I might be unable to maintain in future. There is also some undocumented stuff (subprocess deadlocks) that I'm investigating, but don't have time for a write-up. So I'd rather move this somewhere where it could be updated. -- anatoly t. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Dec 25 22:49:54 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 26 Dec 2012 10:49:54 +1300 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> Message-ID: <50DA1F82.7010600@canterbury.ac.nz> Rene Nejsum wrote: > Interessting alternatives, but they do not quite come on as > flexible/usefull enough? > > Often debug statements have a lot of text and variables, like: > > log.debug( "The value of X, Y, Z is now: %d %s %d" % ( x, lookup(y), > factorial(2**15)) That needn't be a problem: log.lazydebug(lambda: "The value of X, Y, Z is now: %d %s %d" % (x, lookup(y), factorial(2**15))) -- Greg From haoyi.sg at gmail.com Wed Dec 26 03:50:21 2012 From: haoyi.sg at gmail.com (Haoyi Li) Date: Wed, 26 Dec 2012 10:50:21 +0800 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <50DA1F82.7010600@canterbury.ac.nz> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <50DA1F82.7010600@canterbury.ac.nz> Message-ID: I think the lambda: solution really is the best solution. The additional cost is the construction of one function object and one invocation per logging call, which i suspect is about the lower limit. It's also the most generally applicable: it has nothing specific to logging in it at all! So it seems to me that if we were to change anything, improving the lambdas (shorter syntax and/or optimizing away the overhead) would be the way to go over some string-interpolation-logging-specific special case in the interpreter. On Wed, Dec 26, 2012 at 5:49 AM, Greg Ewing wrote: > Rene Nejsum wrote: > >> Interessting alternatives, but they do not quite come on as >> flexible/usefull enough? >> >> Often debug statements have a lot of text and variables, like: >> >> log.debug( "The value of X, Y, Z is now: %d %s %d" % ( x, lookup(y), >> factorial(2**15)) >> > > That needn't be a problem: > > log.lazydebug(lambda: "The value of X, Y, Z is now: %d %s %d" % > (x, lookup(y), factorial(2**15))) > > -- > Greg > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Wed Dec 26 04:09:30 2012 From: wuwei23 at gmail.com (alex23) Date: Tue, 25 Dec 2012 19:09:30 -0800 (PST) Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> Message-ID: <5b1fc750-96d1-44a7-bb7c-1bf3ddd89aa2@jl13g2000pbb.googlegroups.com> On 26 Dec, 10:44, Ram Rachum wrote: > I don't think that this makes Python less clear; How can you possibly say this? You've changed the `finally` clause from _guaranteed_ execution to something utterly inconsistent. In fact, finally blocks would need to have _more_ code to guard against all of the different execution models you're proposing here. I'm not sure why you think forcing me to write more & less obvious code in a finally block is a better trade off than you making clear, explicit use of decorators. Ambiguity does not equate to clarity. Less typing doesn't either. Creating small re-usable pieces of code that do the "hard work" for you, however, _is a lot more clear_. > I think it's just another > minor feature that might be useful for some people, and for people who > don't, it won't matter. How many people use the `for..else` feature, for > example? Very, very few people do. I've used it only several times. But > it's still part of Python because it helps in a few rare cases, so that > makes it worth it *despite* the fact that it might confuse a newbie. The behaviour of `for..else` doesn't change based on arbitrary conditions, whereas what you propose is that the finally blocks behaviour is _fundamentally_ different depending on whether the try block is fully executed or not, whether an exception is raised or not. This is absolutely not the same thing, and trying to pass this concern off as "confusing to newbies" is rather disingenuous. The behaviour would be _confusing to everybody_. This is not a valid cost to save you from having to type a few more keystrokes to decorate the return value. From jstpierre at mecheye.net Wed Dec 26 04:25:22 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Tue, 25 Dec 2012 22:25:22 -0500 Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: <5b1fc750-96d1-44a7-bb7c-1bf3ddd89aa2@jl13g2000pbb.googlegroups.com> References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> <5b1fc750-96d1-44a7-bb7c-1bf3ddd89aa2@jl13g2000pbb.googlegroups.com> Message-ID: Raum, please make sure you reply on-list. I cannot see your replies here. On Tue, Dec 25, 2012 at 10:09 PM, alex23 wrote: > On 26 Dec, 10:44, Ram Rachum wrote: > > I don't think that this makes Python less clear; > > How can you possibly say this? > > You've changed the `finally` clause from _guaranteed_ execution to > something utterly inconsistent. In fact, finally blocks would need to > have _more_ code to guard against all of the different execution > models you're proposing here. I'm not sure why you think forcing me to > write more & less obvious code in a finally block is a better trade > off than you making clear, explicit use of decorators. > > Ambiguity does not equate to clarity. Less typing doesn't either. > Creating small re-usable pieces of code that do the "hard work" for > you, however, _is a lot more clear_. > > > I think it's just another > > minor feature that might be useful for some people, and for people who > > don't, it won't matter. How many people use the `for..else` feature, for > > example? Very, very few people do. I've used it only several times. But > > it's still part of Python because it helps in a few rare cases, so that > > makes it worth it *despite* the fact that it might confuse a newbie. > > The behaviour of `for..else` doesn't change based on arbitrary > conditions, whereas what you propose is that the finally blocks > behaviour is _fundamentally_ different depending on whether the try > block is fully executed or not, whether an exception is raised or not. > This is absolutely not the same thing, and trying to pass this concern > off as "confusing to newbies" is rather disingenuous. The behaviour > would be _confusing to everybody_. > > This is not a valid cost to save you from having to type a few more > keystrokes to decorate the return value. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Wed Dec 26 05:12:26 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Tue, 25 Dec 2012 23:12:26 -0500 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: <50DA792A.5020700@nedbatchelder.com> On 12/25/2012 6:04 PM, anatoly techtonik wrote: > > logger.debug(Message(factorial, 2, 15)) > > > With this setup, no if statements are needed in your code, and the > expensive > > computations only occur when required. > > That's still two function calls and three assignments per logging > call. Too expensive and syntax unwieldy. I think everybody agrees now > that for existing CPython implementation there is really no solution > for the problem of expensive logging calls vs code clarity. You have > to implement optimization workaround at the cost of readability. Anatoly, do you have some measurements to justify the "too expensive" claim? Also, do you have an actual example of expensive logging? I doubt your real code is logging the factorial of 2**15. What is actually in your debug log that is expensive? It will be much easier to discuss solutions if we are talking about actual problems. > > The idea is to fix the interpreter, introducing a "feature block" - > execution block that works only if it is enabled. Execution block > for logging example below is defined by function name "debug" and > braces (). > > debug( ) > > debug is an object of 'feature' type, which is only > executed/evaluated, if the feature is enabled in a table of features. > This feels both sketchy and strange, and not at all integrated with existing Python semantics. --Ned. From dstanek at dstanek.com Wed Dec 26 05:46:39 2012 From: dstanek at dstanek.com (David Stanek) Date: Tue, 25 Dec 2012 23:46:39 -0500 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <50DA792A.5020700@nedbatchelder.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> <50DA792A.5020700@nedbatchelder.com> Message-ID: On Tue, Dec 25, 2012 at 11:12 PM, Ned Batchelder wrote: > On 12/25/2012 6:04 PM, anatoly techtonik wrote: > >> > logger.debug(Message(**factorial, 2, 15)) >> >> > With this setup, no if statements are needed in your code, and the >> expensive >> > computations only occur when required. >> >> That's still two function calls and three assignments per logging call. >> Too expensive and syntax unwieldy. I think everybody agrees now that for >> existing CPython implementation there is really no solution for the problem >> of expensive logging calls vs code clarity. You have to implement >> optimization workaround at the cost of readability. >> > > Anatoly, do you have some measurements to justify the "too expensive" > claim? Also, do you have an actual example of expensive logging? I doubt > your real code is logging the factorial of 2**15. What is actually in > your debug log that is expensive? It will be much easier to discuss > solutions if we are talking about actual problems. > > I was thinking the same thing as I read though this thread. I'm typically logging the result of a calculation and not doing a calculation only because I'm logging. On the other hand I have used a homegrown logging system (existed well before Python's logging module) that allowed the following: >>> logger.warn('factorial = %s', lambda: factorial(2**15)) Instead of just outputting the string representation of the lambda the logger would evaluate the function and str() the return value. Something like this would be trivial to implement on top of Python's logging module. -- David blog: http://www.traceback.org twitter: http://twitter.com/dstanek www: http://dstanek.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Wed Dec 26 06:48:50 2012 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 26 Dec 2012 16:48:50 +1100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <50DA792A.5020700@nedbatchelder.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> <50DA792A.5020700@nedbatchelder.com> Message-ID: On Wed, Dec 26, 2012 at 3:12 PM, Ned Batchelder wrote: > Also, do you have an actual example of expensive logging? I doubt your real > code is logging the factorial of 2**15. What is actually in your debug log > that is expensive? It will be much easier to discuss solutions if we are > talking about actual problems. Not specifically a Python logging issue, but what I periodically find in my code is that there's an "internal representation" and an "external representation" that have some sort of direct relationship. I could easily log the internal form at many points, but that's not particularly useful; logging the external involves either some hefty calculations, or perhaps a linear search of some list of possibilities (eg a reverse lookup of a constant - do you pay the cost of building up a reverse dictionary, or just do the search?). Obviously that's nothing like as expensive as 2**15!, but it makes more sense to be logging "WM_MOUSEMOVE" than "Msg 512". ChrisA From ram.rachum at gmail.com Tue Dec 25 14:56:45 2012 From: ram.rachum at gmail.com (Ram Rachum) Date: Tue, 25 Dec 2012 05:56:45 -0800 (PST) Subject: [Python-ideas] Allow deleting slice in an OrderedDict Message-ID: When I have an OrderedDict, I want to be able to delete a slice of it. I want to be able to do: del ordered_dict[:3] To delete the first 3 items, like I would do in a list. Is there any reason why this shouldn't be implemented? Thanks, Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 26 08:27:52 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Dec 2012 17:27:52 +1000 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: (replying again, as the original somehow had a broken googlegroups.com address instead of the proper python.org one) On Tue, Dec 25, 2012 at 11:56 PM, Ram Rachum wrote: > When I have an OrderedDict, I want to be able to delete a slice of it. I > want to be able to do: > > del ordered_dict[:3] > > To delete the first 3 items, like I would do in a list. > > Is there any reason why this shouldn't be implemented? Yes, because if you want to do that, you need a list, not an ordered dictionary. Don't try to lump every possible operation into one incoherent uber-type. If you need mutable list-like behaviour *and* mapping behaviour, you're better off with an ordinary mapping and a separate list of keys. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Wed Dec 26 08:45:09 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 26 Dec 2012 02:45:09 -0500 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: On 12/25/2012 8:56 AM, Ram Rachum wrote: > When I have an OrderedDict, I want to be able to delete a slice of it. I > want to be able to do: > > del ordered_dict[:3] > > To delete the first 3 items, like I would do in a list. > > Is there any reason why this shouldn't be implemented? An OrderedDict is a mapping (has the mapping api) with a defined iteration order (the order of entry). It is not a sequence and does not have the sequence api. Indeed, a DictList is not possible because dl[2] would look for the item associated with 2 as a key rather than 2 as a position. So od[2:3] would *not* be the same as od[2], violating the usually properly of within-sequence length-1 slices. -- Terry Jan Reedy From tjreedy at udel.edu Wed Dec 26 08:58:00 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 26 Dec 2012 02:58:00 -0500 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: On 12/25/2012 5:24 PM, Paul Colomiets wrote: > Hi, > > I want to propose to include SortedSet data structure into collections module. > > SortedSet (name borrowed from Redis) is a basically a mapping of > (unique) keys to scores, that allows fast slicing by ordinal number > and by score. Since a set, in general, is not a mapping, I do not understand what you mean. If you mean a mapping from sorted position to item, then I would call it a sortedlist. > There are plenty of use cases for the sorted sets: > > * Leaderboard for a game This looks like an auto-sorted list. > * Priority queue (that supports task deletion) This looks like something else. > * Timer list (e.g. can be used for tulip, supports deletion too) > * Caches with TTL-based, LFU or LRU eviction (including `functools.lru_cache`) These look like sorted lists. > * Search databases with relevance scores > * Statistics (many use cases) These are rather vague. > * Replacement for `collections.Counter` with faster `most_common()` This looks like something else. > I have first draft of pure python implementation: > > https://github.com/tailhook/sortedsets > http://pypi.python.org/pypi/sortedsets/1.0 > > The implementation is closely modeled on Redis. Internally it consists > of a dict for mapping between keys and scores, and a skiplist for > scores. So most operations are done with O(log n) time. The actual > performance is probably very slow for pure-python implementation, but > can be fixed by C code later. The asymptotic performance seems to be > OK. > > So my questions are: > > 1. Do you think SortedSets are eligible for inclusion to stdlib? > 2. Do I need a PEP? > 3. Any comments on the implementation? The standard answer is to list on or submit to pypi and get community approval and adoption. Then a pep with a commitment to maintenance even while others interfere with your 'baby'. Long-time core committers sometimes get to cut the process short, but even Guido is starting his propose async module/package with a pep and publicly available code for 3.3. -- Terry Jan Reedy From storchaka at gmail.com Wed Dec 26 09:58:12 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 26 Dec 2012 10:58:12 +0200 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: On 26.12.12 00:24, Paul Colomiets wrote: > P.S.: Sorted sets in redis are not the same thing as sorted sets in > blist. So maybe a better name? SortedSet in Java (and some other languages) is something entirely different. From paul at colomiets.name Wed Dec 26 10:59:51 2012 From: paul at colomiets.name (Paul Colomiets) Date: Wed, 26 Dec 2012 11:59:51 +0200 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: Hi, On Wed, Dec 26, 2012 at 9:58 AM, Terry Reedy wrote: >> SortedSet (name borrowed from Redis) is a basically a mapping of >> (unique) keys to scores, that allows fast slicing by ordinal number >> and by score. > > > Since a set, in general, is not a mapping, I do not understand what you > mean. If you mean a mapping from sorted position to item, then I would call > it a sortedlist. > Ok. My description is vague. Here is one from Redis documentation: Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings. The difference is that every member of a Sorted Set is associated with score, that is used in order to take the sorted set ordered, from the smallest to the greatest score. While members are unique, scores may be repeated. http://redis.io/topics/data-types I was just so silly to suppose that everybody knows Redis data types. > >> There are plenty of use cases for the sorted sets: >> >> * Leaderboard for a game > > > This looks like an auto-sorted list. > Yep. The crucial property is fast insertion and updates. > >> * Priority queue (that supports task deletion) > > > This looks like something else. > Priority queue is basically an auto-sorted list too. No? > >> * Timer list (e.g. can be used for tulip, supports deletion too) >> * Caches with TTL-based, LFU or LRU eviction (including >> `functools.lru_cache`) > > > These look like sorted lists. > Yup. But we can't call the data structure SortedList, because elements must be unique. > >> * Search databases with relevance scores >> * Statistics (many use cases) > > > These are rather vague. > Yes. Included just to give some overview. > >> * Replacement for `collections.Counter` with faster `most_common()` > > > This looks like something else. > Why? If you have a list sorted by counter values, you can have `most_common()` by slicing. > The standard answer is to list on or submit to pypi and get community > approval and adoption. Then a pep with a commitment to maintenance even > while others interfere with your 'baby'. Long-time core committers sometimes > get to cut the process short, but even Guido is starting his propose async > module/package with a pep and publicly available code for 3.3. > It's on the PyPI now. I know the standard answer :) So you don't understand what SortedSets are and what would be good name for data structure, or do you think it's useless? The crucial point of adoption, is that most of the time people don't want to add additional dependency for simple tasks like priority queue, even if it's faster or more featureful. And I think that SortedSets in Redis have proved their usefulness as a data structure. -- Paul From ncoghlan at gmail.com Wed Dec 26 11:12:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 26 Dec 2012 20:12:11 +1000 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: On Wed, Dec 26, 2012 at 7:59 PM, Paul Colomiets wrote: > Hi, > > On Wed, Dec 26, 2012 at 9:58 AM, Terry Reedy wrote: >>> SortedSet (name borrowed from Redis) is a basically a mapping of >>> (unique) keys to scores, that allows fast slicing by ordinal number >>> and by score. >> >> >> Since a set, in general, is not a mapping, I do not understand what you >> mean. If you mean a mapping from sorted position to item, then I would call >> it a sortedlist. >> > > Ok. My description is vague. Here is one from Redis documentation: > > Redis Sorted Sets are, similarly to Redis Sets, non repeating > collections of Strings. The difference is that every member of a > Sorted Set is associated with score, that is used in order to take the > sorted set ordered, from the smallest to the greatest score. While > members are unique, scores may be repeated. Perhaps you mean a heap queue? The standard library doesn't have a separate type for that, it just has some functions for treating a list as a heap: http://docs.python.org/2/library/heapq.html Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From paul at colomiets.name Wed Dec 26 11:59:40 2012 From: paul at colomiets.name (Paul Colomiets) Date: Wed, 26 Dec 2012 12:59:40 +0200 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: Hi Nick, On Wed, Dec 26, 2012 at 12:12 PM, Nick Coghlan wrote: > Perhaps you mean a heap queue? The standard library doesn't have a > separate type for that, it just has some functions for treating a list > as a heap: http://docs.python.org/2/library/heapq.html > The problem with heap queue (as implemented in python) as priority queue or list of timers is that it does not support deletion of the tasks (at least not in efficient manner). For other use cases, e.g. for a leader board heapq doesn't allow efficient slicing. Or do you mean "heap queue" is a nice name for the data structure that redis calls "sorted set"? -- Paul From wuwei23 at gmail.com Wed Dec 26 11:58:27 2012 From: wuwei23 at gmail.com (alex23) Date: Wed, 26 Dec 2012 02:58:27 -0800 (PST) Subject: [Python-ideas] Allow accessing return value inside finally clause In-Reply-To: References: <213d11b1-e7a5-4336-82a8-fca65a612ad6@googlegroups.com> <5b1fc750-96d1-44a7-bb7c-1bf3ddd89aa2@jl13g2000pbb.googlegroups.com> Message-ID: <854f85a6-9a74-439e-8075-6b3f2e7e721d@d2g2000pbd.googlegroups.com> On Dec 26, 7:55?pm, Ram Rachum wrote: > Alex: I'm getting the feeling that you misunderstand what I'm proposing > here. I'm proposing that the return value will be accessible in the > `finally` clause. In a similar (if shorter) way that the exception info is > available by using `sys.exc_info()`. I get what you're saying. What you haven't shown is how introducing 'return value' semantics to try/finally blocks does anything other than make them more confusing to people. Functions have return values. Decorators wrap functions and can thus be used to pre- or post-process the in/outputs for the function. This is clearly defined and a well known approach to your problem. The onus is on you to show how turning try/finally into a kitchen-sink of behaviour will improve the language, preferably without recourse to what you "think" or "feel". A concrete use-case would help here, but I'm 100% convinced that whatever you come up with, there'll be a better solution using decorators that works right now. From storchaka at gmail.com Wed Dec 26 13:31:12 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 26 Dec 2012 14:31:12 +0200 Subject: [Python-ideas] Add support keyword arguments with suitable defaults for OSError and subclasses Message-ID: Now OSError constructor does not support keyword arguments. It will be good add support for followed keyword arguments: "errno", "strerror", "filename". If "strerror" is not specified, a standard error message corresponding to errno is used. If "errno" is not specified for an OSError subclass, an errno associated with this subclass is used (if only one errno associated). For backward compatibility perhaps keyword arguments should be incompatible with any positional arguments (or at least suitable defaults should used only if any keyword argument specified). Examples: >>> OSError(errno=errno.ENOENT) FileNotFoundError(2, 'No such file or directory') >>> FileNotFoundError(filename='qwerty') FileNotFoundError(2, 'No such file or directory') >>> FileNotFoundError(strerr='Bad file') FileNotFoundError(2, 'Bad file') From ned at nedbatchelder.com Wed Dec 26 14:21:16 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Wed, 26 Dec 2012 08:21:16 -0500 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: <50DAF9CC.3060208@nedbatchelder.com> On 12/25/2012 7:10 PM, anatoly techtonik wrote: > I am thinking about [python-wart] on SO. There is no currently a list > of Python warts, and building a better language is impossible without > a clear visibility of warts in current implementations. > > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues > - no way to edit description to make it more clear > - no voting/stars to percieve how important is this issue > - no comment/noise filtering > and the most valuable > - there is no query to list warts sorted by popularity to explore > other time-consuming areas of Python you are not aware of, but which > can popup one day > > SO at least allows: > + voting > + community wiki edits > + useful comment upvoting > + sorted lists > + user editable tags (adding new warts is easy) > 1) Stack Overflow probably won't accept this as a question. 2) a bunch of people answering "what is a wart" is not a way to get the Python community to agree on what needs to be changed in the language. People with ideas need to write them up thoughtfully with proposals for improvements, and then engage meaningfully in the discussion that follows. You seem to think that people just need to identify "warts" and then we can start changing the language to remove them. What you consider a "wart" is probably the result of a complex balance of competing forces. Changing Python is hard. We take backward compatibility very seriously, and that sometimes makes it hard to "remove warts." --Ned. > This post is a result of facing with numerous locals/settrace/exec > issues that are closed on tracker. I also have my own list of other > issues (logging/subprocess) at GC project, which I might be unable to > maintain in future. There is also some undocumented stuff (subprocess > deadlocks) that I'm investigating, but don't have time for a write-up. > So I'd rather move this somewhere where it could be updated. > -- > anatoly t. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Dec 26 15:32:02 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 27 Dec 2012 00:32:02 +1000 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: On Wed, Dec 26, 2012 at 8:59 PM, Paul Colomiets wrote: > Hi Nick, > > On Wed, Dec 26, 2012 at 12:12 PM, Nick Coghlan wrote: >> Perhaps you mean a heap queue? The standard library doesn't have a >> separate type for that, it just has some functions for treating a list >> as a heap: http://docs.python.org/2/library/heapq.html >> > > The problem with heap queue (as implemented in python) as priority > queue or list of timers is that it does not support deletion of the > tasks (at least not in efficient manner). For other use cases, e.g. > for a leader board heapq doesn't allow efficient slicing. > > Or do you mean "heap queue" is a nice name for the data structure that > redis calls "sorted set"? I mean if what you want is a heap queue with a more efficient heappop() implementation (due to a different underlying data structure), then it's probably clearer to call it that. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From paul at colomiets.name Wed Dec 26 16:09:27 2012 From: paul at colomiets.name (Paul Colomiets) Date: Wed, 26 Dec 2012 17:09:27 +0200 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: Hi Nick, On Wed, Dec 26, 2012 at 4:32 PM, Nick Coghlan wrote: >> The problem with heap queue (as implemented in python) as priority >> queue or list of timers is that it does not support deletion of the >> tasks (at least not in efficient manner). For other use cases, e.g. >> for a leader board heapq doesn't allow efficient slicing. >> >> Or do you mean "heap queue" is a nice name for the data structure that >> redis calls "sorted set"? > > I mean if what you want is a heap queue with a more efficient > heappop() implementation (due to a different underlying data > structure), then it's probably clearer to call it that. > The underlying data structure is skiplists not heap. It would be strange to call it heap-something. But, yes, for the discussion, similarity with heapqueue may be a better starting point. -- Paul From guido at python.org Wed Dec 26 17:58:31 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 26 Dec 2012 08:58:31 -0800 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: Perhaps the desired functionality can be spelled as a method? Would it be easy to implement? --Guido On Wednesday, December 26, 2012, Terry Reedy wrote: > On 12/25/2012 8:56 AM, Ram Rachum wrote: > >> When I have an OrderedDict, I want to be able to delete a slice of it. I >> want to be able to do: >> >> del ordered_dict[:3] >> >> To delete the first 3 items, like I would do in a list. >> >> Is there any reason why this shouldn't be implemented? >> > > An OrderedDict is a mapping (has the mapping api) with a defined iteration > order (the order of entry). It is not a sequence and does not have the > sequence api. Indeed, a DictList is not possible because dl[2] would look > for the item associated with 2 as a key rather than 2 as a position. So > od[2:3] would *not* be the same as od[2], violating the usually properly of > within-sequence length-1 slices. > > -- > Terry Jan Reedy > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Wed Dec 26 18:43:34 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Wed, 26 Dec 2012 18:43:34 +0100 Subject: [Python-ideas] collections.sortedset proposal In-Reply-To: References: Message-ID: For the record, there is another implementation of skiplists on PyPI: http://pypi.python.org/pypi/skiplist/0.1.0 2012/12/26 Paul Colomiets > Hi Nick, > > On Wed, Dec 26, 2012 at 4:32 PM, Nick Coghlan wrote: > >> The problem with heap queue (as implemented in python) as priority > >> queue or list of timers is that it does not support deletion of the > >> tasks (at least not in efficient manner). For other use cases, e.g. > >> for a leader board heapq doesn't allow efficient slicing. > >> > >> Or do you mean "heap queue" is a nice name for the data structure that > >> redis calls "sorted set"? > > > > I mean if what you want is a heap queue with a more efficient > > heappop() implementation (due to a different underlying data > > structure), then it's probably clearer to call it that. > > > > The underlying data structure is skiplists not heap. It would be > strange to call it heap-something. But, yes, for the discussion, > similarity with heapqueue may be a better starting point. > > -- > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Michele Lacchia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Dec 26 18:54:03 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 26 Dec 2012 19:54:03 +0200 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: On 26.12.12 18:58, Guido van Rossum wrote: > Perhaps the desired functionality can be spelled as a method? Would it > be easy to implement? This is a pretty trivial method. def drop_items(self, n, last=True) for i in range(n): self.popitem(last) You can wrap it with "try/except KeyError" or add pre-execution checks if you will. I doubt such trivial and not common used method needed to be in stdlib. From eliben at gmail.com Wed Dec 26 19:28:10 2012 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 26 Dec 2012 10:28:10 -0800 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: On Tue, Dec 25, 2012 at 4:10 PM, anatoly techtonik wrote: > I am thinking about [python-wart] on SO. There is no currently a list of > Python warts, and building a better language is impossible without a clear > visibility of warts in current implementations. > > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues > - no way to edit description to make it more clear > - no voting/stars to percieve how important is this issue > - no comment/noise filtering > and the most valuable > - there is no query to list warts sorted by popularity to explore other > time-consuming areas of Python you are not aware of, but which can popup > one day > > SO at least allows: > + voting > + community wiki edits > + useful comment upvoting > + sorted lists > + user editable tags (adding new warts is easy) > > This post is a result of facing with numerous locals/settrace/exec issues > that are closed on tracker. I also have my own list of other issues > (logging/subprocess) at GC project, which I might be unable to maintain in > future. There is also some undocumented stuff (subprocess deadlocks) that > I'm investigating, but don't have time for a write-up. So I'd rather move > this somewhere where it could be updated. > -- > Is this a question or just a rant? If it's a question, I must have missed what it is exactly that you're asking? The web is a pretty free place. Feel free to create such a tag on Stack Overflow and maintain it, if the SO community agrees it has merit. Don't expect the Python developers to officially endorse it, because "warts" is a very subjective issue. A "wart" for one person is a reasonable behavior for another. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From techtonik at gmail.com Wed Dec 26 19:28:38 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Wed, 26 Dec 2012 21:28:38 +0300 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <50DAF9CC.3060208@nedbatchelder.com> References: <50DAF9CC.3060208@nedbatchelder.com> Message-ID: On Wed, Dec 26, 2012 at 4:21 PM, Ned Batchelder wrote: > On 12/25/2012 7:10 PM, anatoly techtonik wrote: > > I am thinking about [python-wart] on SO. There is no currently a list of > Python warts, and building a better language is impossible without a clear > visibility of warts in current implementations. > > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues > - no way to edit description to make it more clear > - no voting/stars to percieve how important is this issue > - no comment/noise filtering > and the most valuable > - there is no query to list warts sorted by popularity to explore other > time-consuming areas of Python you are not aware of, but which can popup > one day > > SO at least allows: > + voting > + community wiki edits > + useful comment upvoting > + sorted lists > + user editable tags (adding new warts is easy) > > > 1) Stack Overflow probably won't accept this as a question. > That's why it is proposed as a community wiki. 2) a bunch of people answering "what is a wart" is not a way to get the > Python community to agree on what needs to be changed in the language. > People with ideas need to write them up thoughtfully with proposals for > improvements, and then engage meaningfully in the discussion that follows. > > You seem to think that people just need to identify "warts" and then we > can start changing the language to remove them. What you consider a "wart" > is probably the result of a complex balance of competing forces. Changing > Python is hard. We take backward compatibility very seriously, and that > sometimes makes it hard to "remove warts." > You've nailed it. The goal of listing warts on SO is not to prove that some language suxx [1], but to provide answers to question about *why* some particular wart exists. "wart" may not be the best word, because from the other side of rebalancing things there is most likely some "feature", but when people experience problems, they usually face only one side of the story [2] As I already said it is impossible to fully master the language without a complete coverage of such things. These things are equally interesting for users and for future contributors. There are the starting points in making the next better generation dynamic language (if the one is possible). SO is a FAQ site, not a web-page or a wiki, so I expect there to be answers with research on the history of design decisions behind the balancing of the language, the sources of "warts" and things that are balancing them on the other side. I expect there to find analysis what features will have to be removed in order for some specific "wart" to be gone, and I see it as a perfect entrypoint for learning high-level things about programming languages. Some people may get a feeling that a SO list like that will make a huge negative impact on Python development. I don't know how to respond to these concerns. =) In my life I haven't seen a person who abandoned Python completely after picking it up. That should mean something. From my side I'd like to thank to all core developers and say that you are doing the right thing. Unicode and Python 3 was hard, but even grumpy trolls like me start to like it. The next year will be the next exciting step in Python development. My IMHO is that it became mature enough to openly discuss its "bad child habits" in details and make fun of them accepting they as they are. Take it easy, and have a good year ahead! ;) 1. http://wiki.theory.org/YourLanguageSucks 2. http://adsoftheworld.com/media/ambient/bbc_world_soldier -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram at rachum.com Wed Dec 26 19:33:07 2012 From: ram at rachum.com (Ram Rachum) Date: Wed, 26 Dec 2012 20:33:07 +0200 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: I agree with Terry that doing `del ordered_dict[:2]` is problematic because there might be confusion between index numbers and dictionary keys. My new proposed API: Build on `ItemsView` so we could do this: `del ordered_dict.items()[:2]` and have it delete the first 2 items from the ordered dict. On Wed, Dec 26, 2012 at 6:58 PM, Guido van Rossum wrote: > Perhaps the desired functionality can be spelled as a method? Would it be > easy to implement? > > --Guido > > > On Wednesday, December 26, 2012, Terry Reedy wrote: > >> On 12/25/2012 8:56 AM, Ram Rachum wrote: >> >>> When I have an OrderedDict, I want to be able to delete a slice of it. I >>> want to be able to do: >>> >>> del ordered_dict[:3] >>> >>> To delete the first 3 items, like I would do in a list. >>> >>> Is there any reason why this shouldn't be implemented? >>> >> >> An OrderedDict is a mapping (has the mapping api) with a defined >> iteration order (the order of entry). It is not a sequence and does not >> have the sequence api. Indeed, a DictList is not possible because dl[2] >> would look for the item associated with 2 as a key rather than 2 as a >> position. So od[2:3] would *not* be the same as od[2], violating the >> usually properly of within-sequence length-1 slices. >> >> -- >> Terry Jan Reedy >> >> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Wed Dec 26 19:37:35 2012 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 26 Dec 2012 10:37:35 -0800 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: <50DAF9CC.3060208@nedbatchelder.com> Message-ID: > 2) a bunch of people answering "what is a wart" is not a way to get the >> Python community to agree on what needs to be changed in the language. >> People with ideas need to write them up thoughtfully with proposals for >> improvements, and then engage meaningfully in the discussion that follows. >> >> You seem to think that people just need to identify "warts" and then we >> can start changing the language to remove them. What you consider a "wart" >> is probably the result of a complex balance of competing forces. Changing >> Python is hard. We take backward compatibility very seriously, and that >> sometimes makes it hard to "remove warts." >> > > You've nailed it. The goal of listing warts on SO is not to prove that > some language suxx [1], but to provide answers to question about *why* some > particular wart exists. "wart" may not be the best word, because from the > other side of rebalancing things there is most likely some "feature", but > when people experience problems, they usually face only one side of the > story [2] > > As I already said it is impossible to fully master the language without a > complete coverage of such things. These things are equally interesting for > users and for future contributors. There are the starting points in > making the next better generation dynamic language (if the one is possible). > > SO is a FAQ site, not a web-page or a wiki, so I expect there to be > answers with research on the history of design decisions behind the > balancing of the language, the sources of "warts" and things that are > balancing them on the other side. I expect there to find analysis what > features will have to be removed in order for some specific "wart" to be > gone, and I see it as a perfect entrypoint for learning high-level things > about programming languages. > > Yet again, while I don't speak for the whole Python dev community, I predict this will not be officially endorsed. As for explaining why some things are the way they are, there are plenty of blog articles on the web trying to explain Python internals. Nick Coghlan has some very good ones (with the benefit of his being actually in the position to say *why* things are this way historically), Guido has articles on the history of Python, and even my humble blog has some internals pieces (which are more focused on the "how" instead of "why"). Consider directing your energies and obvious love for Python to constructive channels like contributing similar articles of your own. I'm sure that you'll be able to find core devs willing to review such articles and discuss them prior to your posting them. Also feel free to collect all such articles in some central location and maintaining the list - this actually could be very helpful for a lot of Python fans and devs alike. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Wed Dec 26 23:08:18 2012 From: wuwei23 at gmail.com (alex23) Date: Wed, 26 Dec 2012 14:08:18 -0800 (PST) Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: On Dec 27, 4:33?am, Ram Rachum wrote: > My new proposed API: Build on `ItemsView` so we could do this: `del > ordered_dict.items()[:2]` and have it delete the first 2 items from the > ordered dict. Modifying the return value of an object's method and having the object itself mutate feels like a side-effect to me. From python at mrabarnett.plus.com Thu Dec 27 00:47:49 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 26 Dec 2012 23:47:49 +0000 Subject: [Python-ideas] Allow deleting slice in an OrderedDict In-Reply-To: References: Message-ID: <50DB8CA5.3070200@mrabarnett.plus.com> On 2012-12-26 22:08, alex23 wrote: > On Dec 27, 4:33 am, Ram Rachum wrote: >> My new proposed API: Build on `ItemsView` so we could do this: `del >> ordered_dict.items()[:2]` and have it delete the first 2 items from the >> ordered dict. > > Modifying the return value of an object's method and having the object > itself mutate feels like a side-effect to me. > +1 From ncoghlan at gmail.com Thu Dec 27 16:10:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 28 Dec 2012 01:10:47 +1000 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence Message-ID: After helping Brett with the migration to importlib in 3.3, and looking at some of the ideas kicking around for additional CPython features that would affect the startup sequence, I've come to the conclusion that what we have now simply isn't sustainable long term. It's already the case that if you use certain options (specifically -W or -X), the interpreter will start accessing the C API before it has called Py_Initialize(). It's not cool when other people do that (we'd never accept code that behaved that way as a valid reproducer for a bug report), and it's *definitely* not cool that we're doing it (even though we seem to be getting away with it for the moment, and have been for a long time). The attached PEP is a first attempt at a plan for doing something about it. (My notes at http://wiki.python.org/moin/CPythonInterpreterInitialization provide additional context - let me know if you think there's more material on that page that should be in the PEP itself) The PEP is also available online at http://www.python.org/dev/peps/pep-0432/ Cheers, Nick. PEP: 432 Title: Simplifying the CPython startup sequence Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Dec-2012 Python-Version: 3.4 Post-History: 28-Dec-2012 Abstract ======== This PEP proposes a mechanism for simplifying the startup sequence for CPython, making it easier to modify the initialisation behaviour of the reference interpreter executable, as well as making it easier to control CPython's startup behaviour when creating an alternate executable or embedding it as a Python execution engine inside a larger application. Proposal Summary ================ This PEP proposes that CPython move to an explicit 2-phase initialisation process, where a preliminary interpreter is put in place with limited OS interaction capabilities early in the startup sequence. This essential core remains in place while all of the configuration settings are determined, until a final configuration call takes those settings and finishes bootstrapping the interpreter immediately before executing the main module. As a concrete use case to help guide any design changes, and to solve a known problem where the appropriate defaults for system utilities differ from those for running user scripts, this PEP also proposes the creation and distribution of a separate system Python (``spython``) executable which, by default, ignores user site directories and environment variables, and does not implicitly set ``sys.path[0]`` based on the current directory or the script being executed. Background ========== Over time, CPython's initialisation sequence has become progressively more complicated, offering more options, as well as performing more complex tasks (such as configuring the Unicode settings for OS interfaces in Python 3 as well as bootstrapping a pure Python implementation of the import system). Much of this complexity is accessible only through the ``Py_Main`` and ``Py_Initialize`` APIs, offering embedding applications little opportunity for customisation. This creeping complexity also makes life difficult for maintainers, as much of the configuration needs to take place prior to the ``Py_Initialize`` call, meaning much of the Python C API cannot be used safely. A number of proposals are on the table for even *more* sophisticated startup behaviour, such as better control over ``sys.path`` initialisation (easily adding additional directories on the command line in a cross-platform fashion, as well as controlling the configuration of ``sys.path[0]``), easier configuration of utilities like coverage tracing when launching Python subprocesses, and easier control of the encoding used for the standard IO streams when embedding CPython in a larger application. Rather than attempting to bolt such behaviour onto an already complicated system, this PEP proposes to instead simplify the status quo *first*, with the aim of making these further feature requests easier to implement. Key Concerns ============ There are a couple of key concerns that any change to the startup sequence needs to take into account. Maintainability --------------- The current CPython startup sequence is difficult to understand, and even more difficult to modify. It is not clear what state the interpreter is in while much of the initialisation code executes, leading to behaviour such as lists, dictionaries and Unicode values being created prior to the call to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_]. By moving to a 2-phase startup sequence, developers should only need to understand which features are not available in the core bootstrapping state, as the vast majority of the configuration process will now take place in that state. By basing the new design on a combination of C structures and Python dictionaries, it should also be easier to modify the system in the future to add new configuration options. Performance ----------- CPython is used heavily to run short scripts where the runtime is dominated by the interpreter initialisation time. Any changes to the startup sequence should minimise their impact on the startup overhead. (Given that the overhead is dominated by IO operations, this is not currently expected to cause any significant problems). The Status Quo ============== Much of the configuration of CPython is currently handled through C level global variables:: Py_IgnoreEnvironmentFlag Py_HashRandomizationFlag _Py_HashSecretInitialized _Py_HashSecret Py_BytesWarningFlag Py_DebugFlag Py_InspectFlag Py_InteractiveFlag Py_OptimizeFlag Py_DontWriteBytecodeFlag Py_NoUserSiteDirectory Py_NoSiteFlag Py_UnbufferedStdioFlag Py_VerboseFlag For the above variables, the conversion of command line options and environment variables to C global variables is handled by ``Py_Main``, so each embedding application must set those appropriately in order to change them from their defaults. Some configuration can only be provided as OS level environment variables:: PYTHONHASHSEED PYTHONSTARTUP PYTHONPATH PYTHONHOME PYTHONCASEOK PYTHONIOENCODING Additional configuration is handled via separate API calls:: Py_SetProgramName() (call before Py_Initialize()) Py_SetPath() (optional, call before Py_Initialize()) Py_SetPythonHome() (optional, call before Py_Initialize()???) Py_SetArgv[Ex]() (call after Py_Initialize()) The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate whether or not CPython's signal handlers should be installed. Finally, some interactive behaviour (such as printing the introductory banner) is triggered only when standard input is reported as a terminal connection by the operating system. Also see more detailed notes at [1_] Proposal ======== (Note: details here are still very much in flux, but preliminary feedback is appreciated anyway) Core Interpreter Initialisation ------------------------------- The only configuration that currently absolutely needs to be in place before even the interpreter core can be initialised is the seed for the randomised hash algorithm. However, there are a couple of settings needed there: whether or not hash randomisation is enabled at all, and if it's enabled, whether or not to use a specific seed value. The proposed API for this step in the startup sequence is:: void Py_BeginInitialization(Py_CoreConfig *config); Like Py_Initialize, this part of the new API treats initialisation failures as fatal errors. While that's still not particularly embedding friendly, the operations in this step *really* shouldn't be failing, and changing them to return error codes instead of aborting would be an even larger task than the one already being proposed. The new Py_CoreConfig struct holds the settings required for preliminary configuration:: typedef struct { int use_hash_seed; size_t hash_seed; } Py_CoreConfig; To "disable" hash randomisation, set "use_hash_seed" and pass a hash seed of zero. (This seems reasonable to me, but there may be security implications I'm overlooking. If so, adding a separate flag or switching to a 3-valued "no randomisation", "fixed hash seed" and "randomised hash" option is easy) The core configuration settings pointer may be NULL, in which case the default behaviour of randomised hashes with a random seed will be used. A new query API will allow code to determine if the interpreter is in the bootstrapping state between core initialisation and the completion of the initialisation process:: int Py_IsInitializing(); While in the initialising state, the interpreter should be fully functional except that: * compilation is not allowed (as the parser is not yet configured properly) * The following attributes in the ``sys`` module are all either missing or ``None``: * ``sys.path`` * ``sys.argv`` * ``sys.executable`` * ``sys.base_exec_prefix`` * ``sys.base_prefix`` * ``sys.exec_prefix`` * ``sys.prefix`` * ``sys.warnoptions`` * ``sys.flags`` * ``sys.dont_write_bytecode`` * ``sys.stdin`` * ``sys.stdout`` * The filesystem encoding is not yet defined * The IO encoding is not yet defined * CPython signal handlers are not yet installed * only builtin and frozen modules may be imported (due to above limitations) * ``sys.stderr`` is set to a temporary IO object using unbuffered binary mode * The ``warnings`` module is not yet initialised * The ``__main__`` module does not yet exist The main things made available by this step will be the core Python datatypes, in particular dictionaries, lists and strings. This allows them to be used safely for all of the remaining configuration steps (unlike the status quo). In addition, the current thread will possess a valid Python thread state, allow any further configuration data to be stored on the interpreter object rather than in C process globals. Any call to Py_BeginInitialization() must have a matching call to Py_Finalize(). It is acceptable to skip calling Py_EndInitialization() in between (e.g. if attempting to read the configuration settings fails) Determining the remaining configuration settings ------------------------------------------------ The next step in the initialisation sequence is to determine the full settings needed to complete the process. No changes are made to the interpreter state at this point. The core API for this step is:: int Py_ReadConfiguration(PyObject *config); The config argument should be a pointer to a Python dictionary. For any supported configuration setting already in the dictionary, CPython will sanity check the supplied value, but otherwise accept it as correct. Unlike Py_Initialize and Py_BeginInitialization, this call will raise an exception and report an error return rather than exhibiting fatal errors if a problem is found with the config data. Any supported configuration setting which is not already set will be populated appropriately. The default configuration can be overridden entirely by setting the value *before* calling Py_ReadConfiguration. The provided value will then also be used in calculating any settings derived from that value. Alternatively, settings may be overridden *after* the Py_ReadConfiguration call (this can be useful if an embedding application wants to adjust a setting rather than replace it completely, such as removing ``sys.path[0]``). Supported configuration settings -------------------------------- At least the following configuration settings will be supported:: raw_argv (list of str, default = retrieved from OS APIs) argv (list of str, default = derived from raw_argv) warnoptions (list of str, default = derived from raw_argv and environment) xoptions (list of str, default = derived from raw_argv and environment) program_name (str, default = retrieved from OS APIs) executable (str, default = derived from program_name) home (str, default = complicated!) prefix (str, default = complicated!) exec_prefix (str, default = complicated!) base_prefix (str, default = complicated!) base_exec_prefix (str, default = complicated!) path (list of str, default = complicated!) io_encoding (str, default = derived from environment or OS APIs) fs_encoding (str, default = derived from OS APIs) skip_signal_handlers (boolean, default = derived from environment or False) ignore_environment (boolean, default = derived from environment or False) dont_write_bytecode (boolean, default = derived from environment or False) no_site (boolean, default = derived from environment or False) no_user_site (boolean, default = derived from environment or False) Completing the interpreter initialisation ----------------------------------------- The final step in the process is to actually put the configuration settings into effect and finish bootstrapping the interpreter up to full operation:: int Py_EndInitialization(PyObject *config); Like Py_ReadConfiguration, this call will raise an exception and report an error return rather than exhibiting fatal errors if a problem is found with the config data. After a successful call, Py_IsInitializing() will be false, while Py_IsInitialized() will become true. The caveats described above for the interpreter during the initialisation phase will no longer hold. Stable ABI ---------- All of the APIs proposed in this PEP are excluded from the stable ABI, as embedding a Python interpreter involves a much higher degree of coupling than merely writing an extension. Backwards Compatibility ----------------------- Backwards compatibility will be preserved primarily by ensuring that Py_ReadConfiguration() interrogates all the previously defined configuration settings stored in global variables and environment variables. One acknowledged incompatiblity is that some environment variables which are currently read lazily may instead be read once during interpreter initialisation. As the PEP matures, these will be discussed in more detail on a case by case basis. The Py_Initialize() style of initialisation will continue to be supported. It will use the new API internally, but will continue to exhibit the same behaviour as it does today, ensuring that sys.argv is not set until a subsequent PySys_SetArgv call. A System Python Executable ========================== When executing system utilities with administrative access to a system, many of the default behaviours of CPython are undesirable, as they may allow untrusted code to execute with elevated privileges. The most problematic aspects are the fact that user site directories are enabled, environment variables are trusted and that the directory containing the executed file is placed at the beginning of the import path. Currently, providing a separate executable with different default behaviour would be prohibitively hard to maintain. One of the goals of this PEP is to make it possible to replace much of the hard to maintain bootstrapping code with more normal CPython code, as well as making it easier for a separate application to make use of key components of ``Py_Main``. Including this change in the PEP is designed to help avoid acceptance of a design that sounds good in theory but proves to be problematic in practice. One final aspect not addressed by the general embedding changes above is the current inaccessibility of the core logic for deciding between the different execution modes supported by CPython: * script execution * directory/zipfile execution * command execution ("-c" switch) * module or package execution ("-m" switch) * execution from stdin (non-interactive) * interactive stdin Implementation ============== None as yet. Once I have a reasonably solid plan of attack, I intend to work on a reference implementation as a feature branch in my BitBucket sandbox [2_] References ========== .. [1] CPython interpreter initialization notes (http://wiki.python.org/moin/CPythonInterpreterInitialization) .. [2] BitBucket Sandbox (https://bitbucket.org/ncoghlan/cpython_sandbox) Copyright =========== This document has been placed in the public domain. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From benjamin at python.org Thu Dec 27 17:29:54 2012 From: benjamin at python.org (Benjamin Peterson) Date: Thu, 27 Dec 2012 16:29:54 +0000 (UTC) Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence References: Message-ID: Nick Coghlan writes: > > PEP: 432 > Title: Simplifying the CPython startup sequence b In general, it looks quite nice. While you're creating new initialization APIs, it would be nice if they could support (or at least be future compatible with) a "interpreter context". If we ever get around to killing at the c-level global state in the interpreter, such a struct would hold the state. For example, it would be nice if instead of those Py_* option variables, members of a structure on PyInterpreter were used. From ubershmekel at gmail.com Thu Dec 27 17:39:08 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 27 Dec 2012 18:39:08 +0200 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: References: Message-ID: On Thu, Dec 27, 2012 at 5:10 PM, Nick Coghlan wrote: > Performance > ----------- > > CPython is used heavily to run short scripts where the runtime is dominated > by the interpreter initialisation time. Any changes to the startup sequence > should minimise their impact on the startup overhead. (Given that the > overhead is dominated by IO operations, this is not currently expected to > cause any significant problems). > > I'd like to just stress the performance issue. It seems python3.3 takes 30% more time to start vs 2.7 on my ubuntu. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Thu Dec 27 17:40:49 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 27 Dec 2012 18:40:49 +0200 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: References: Message-ID: On Thu, Dec 27, 2012 at 6:39 PM, Yuval Greenfield wrote: > On Thu, Dec 27, 2012 at 5:10 PM, Nick Coghlan wrote: > >> Performance >> ----------- >> >> CPython is used heavily to run short scripts where the runtime is >> dominated >> by the interpreter initialisation time. Any changes to the startup >> sequence >> should minimise their impact on the startup overhead. (Given that the >> overhead is dominated by IO operations, this is not currently expected to >> cause any significant problems). >> >> > I'd like to just stress the performance issue. It seems python3.3 takes > 30% more time to start vs 2.7 on my ubuntu. > > Yuval > Here's the test I used https://gist.github.com/4389657 -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Thu Dec 27 17:42:52 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Dec 2012 17:42:52 +0100 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence References: Message-ID: <20121227174252.562d604c@pitrou.net> On Fri, 28 Dec 2012 01:10:47 +1000 Nick Coghlan wrote: > > Performance > ----------- > > CPython is used heavily to run short scripts where the runtime is dominated > by the interpreter initialisation time. Any changes to the startup sequence > should minimise their impact on the startup overhead. (Given that the > overhead is dominated by IO operations, this is not currently expected to > cause any significant problems). Do you have any actual measurements to back this up? Regards Antoine. From solipsis at pitrou.net Thu Dec 27 17:43:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 27 Dec 2012 17:43:59 +0100 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence References: Message-ID: <20121227174359.2aa1b71a@pitrou.net> On Thu, 27 Dec 2012 18:40:49 +0200 Yuval Greenfield wrote: > On Thu, Dec 27, 2012 at 6:39 PM, Yuval Greenfield wrote: > > > On Thu, Dec 27, 2012 at 5:10 PM, Nick Coghlan wrote: > > > >> Performance > >> ----------- > >> > >> CPython is used heavily to run short scripts where the runtime is > >> dominated > >> by the interpreter initialisation time. Any changes to the startup > >> sequence > >> should minimise their impact on the startup overhead. (Given that the > >> overhead is dominated by IO operations, this is not currently expected to > >> cause any significant problems). > >> > >> > > I'd like to just stress the performance issue. It seems python3.3 takes > > 30% more time to start vs 2.7 on my ubuntu. > > > > Yuval > > > > Here's the test I used https://gist.github.com/4389657 Python 3 simply has more modules to load at startup (for example because of the IO stack). Regards Antoine. From christian at python.org Thu Dec 27 21:14:11 2012 From: christian at python.org (Christian Heimes) Date: Thu, 27 Dec 2012 21:14:11 +0100 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: References: Message-ID: <50DCAC13.5040303@python.org> Am 27.12.2012 16:10, schrieb Nick Coghlan: > Additional configuration is handled via separate API calls:: > > Py_SetProgramName() (call before Py_Initialize()) > Py_SetPath() (optional, call before Py_Initialize()) > Py_SetPythonHome() (optional, call before Py_Initialize()???) > Py_SetArgv[Ex]() (call after Py_Initialize()) [...] > The only configuration that currently absolutely needs to be in place > before even the interpreter core can be initialised is the seed for the > randomised hash algorithm. However, there are a couple of settings needed > there: whether or not hash randomisation is enabled at all, and if it's > enabled, whether or not to use a specific seed value. > > The proposed API for this step in the startup sequence is:: > > void Py_BeginInitialization(Py_CoreConfig *config); > > Like Py_Initialize, this part of the new API treats initialisation failures > as fatal errors. While that's still not particularly embedding friendly, > the operations in this step *really* shouldn't be failing, and changing them > to return error codes instead of aborting would be an even larger task than > the one already being proposed. > > The new Py_CoreConfig struct holds the settings required for preliminary > configuration:: > > typedef struct { > int use_hash_seed; > size_t hash_seed; > } Py_CoreConfig; Hello Nick, we could use the opportunity and move more settings to Py_CoreConfig. At the moment several settings are stored in static variables: Python/pythonrun.c static wchar_t *progname static wchar_t *default_home static wchar_t env_home[PATH_MAX+1] Modules/getpath.c static wchar_t prefix[MAXPATHLEN+1] static wchar_t exec_prefix[MAXPATHLEN+1] static wchar_t progpath[MAXPATHLEN+1] static wchar_t *module_search_path static int module_search_path_malloced static wchar_t *lib_python = L"lib/python" VERSION; PC/getpath.c static wchar_t dllpath[MAXPATHLEN+1] These settings could be added to the Py_CoreConfig struct and unify the configuration schema for embedders. Functions like Py_SetProgramName() would set the members of a global Py_CoreConfig struct. Christian From ncoghlan at gmail.com Fri Dec 28 01:55:52 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 28 Dec 2012 10:55:52 +1000 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: <50DCAC13.5040303@python.org> References: <50DCAC13.5040303@python.org> Message-ID: I was planning to move most of those settings into the config dict. Both the core config struct and the config dict would then be stored in new slots in the interpreter struct. My preference is to push more settings into the config dictionary, since those can use the C API and frozen bytecode to do their calculations. -- Sent from my phone, thus the relative brevity :) On Dec 28, 2012 6:14 AM, "Christian Heimes" wrote: > Am 27.12.2012 16:10, schrieb Nick Coghlan: > > > Additional configuration is handled via separate API calls:: > > > > Py_SetProgramName() (call before Py_Initialize()) > > Py_SetPath() (optional, call before Py_Initialize()) > > Py_SetPythonHome() (optional, call before Py_Initialize()???) > > Py_SetArgv[Ex]() (call after Py_Initialize()) > > [...] > > > The only configuration that currently absolutely needs to be in place > > before even the interpreter core can be initialised is the seed for the > > randomised hash algorithm. However, there are a couple of settings needed > > there: whether or not hash randomisation is enabled at all, and if it's > > enabled, whether or not to use a specific seed value. > > > > The proposed API for this step in the startup sequence is:: > > > > void Py_BeginInitialization(Py_CoreConfig *config); > > > > Like Py_Initialize, this part of the new API treats initialisation > failures > > as fatal errors. While that's still not particularly embedding friendly, > > the operations in this step *really* shouldn't be failing, and changing > them > > to return error codes instead of aborting would be an even larger task > than > > the one already being proposed. > > > > The new Py_CoreConfig struct holds the settings required for preliminary > > configuration:: > > > > typedef struct { > > int use_hash_seed; > > size_t hash_seed; > > } Py_CoreConfig; > > Hello Nick, > > we could use the opportunity and move more settings to Py_CoreConfig. At > the moment several settings are stored in static variables: > > Python/pythonrun.c > > static wchar_t *progname > static wchar_t *default_home > static wchar_t env_home[PATH_MAX+1] > > Modules/getpath.c > > static wchar_t prefix[MAXPATHLEN+1] > static wchar_t exec_prefix[MAXPATHLEN+1] > static wchar_t progpath[MAXPATHLEN+1] > static wchar_t *module_search_path > static int module_search_path_malloced > static wchar_t *lib_python = L"lib/python" VERSION; > > PC/getpath.c > > static wchar_t dllpath[MAXPATHLEN+1] > > > These settings could be added to the Py_CoreConfig struct and unify the > configuration schema for embedders. Functions like Py_SetProgramName() > would set the members of a global Py_CoreConfig struct. > > Christian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Dec 28 06:50:28 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 27 Dec 2012 22:50:28 -0700 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: References: Message-ID: On Thu, Dec 27, 2012 at 9:29 AM, Benjamin Peterson wrote: > Nick Coghlan writes: >> >> PEP: 432 >> Title: Simplifying the CPython startup sequence > b > In general, it looks quite nice. While you're creating new initialization APIs, > it would be nice if they could support (or at least be future compatible with) a > "interpreter context". If we ever get around to killing at the c-level global > state in the interpreter, such a struct would hold the state. For example, it > would be nice if instead of those Py_* option variables, members of a structure > on PyInterpreter were used. This is exactly what I was wondering, a la subinterpreter support. -eric From solipsis at pitrou.net Fri Dec 28 13:15:22 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 28 Dec 2012 13:15:22 +0100 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence References: <50DCAC13.5040303@python.org> Message-ID: <20121228131522.3c925d3e@pitrou.net> On Fri, 28 Dec 2012 10:55:52 +1000 Nick Coghlan wrote: > I was planning to move most of those settings into the config dict. Both > the core config struct and the config dict would then be stored in new > slots in the interpreter struct. > > My preference is to push more settings into the config dictionary, since > those can use the C API and frozen bytecode to do their calculations. But dicts are also more annoying to use in C than plain structs. Regards Antoine. From ncoghlan at gmail.com Fri Dec 28 13:50:07 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 28 Dec 2012 22:50:07 +1000 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: <20121228131522.3c925d3e@pitrou.net> References: <50DCAC13.5040303@python.org> <20121228131522.3c925d3e@pitrou.net> Message-ID: On Fri, Dec 28, 2012 at 10:15 PM, Antoine Pitrou wrote: > On Fri, 28 Dec 2012 10:55:52 +1000 > Nick Coghlan wrote: >> I was planning to move most of those settings into the config dict. Both >> the core config struct and the config dict would then be stored in new >> slots in the interpreter struct. >> >> My preference is to push more settings into the config dictionary, since >> those can use the C API and frozen bytecode to do their calculations. > > But dicts are also more annoying to use in C than plain structs. Yeah, you may be right. I'll add more on the internal storage of the configuration data and include that as an open question. I want the dict in the config API so we can distinguish between "please fill in the default value" and "don't fill this in at all", but there's nothing stopping us mapping that to a C struct internally. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mark at hotpy.org Fri Dec 28 16:45:47 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 28 Dec 2012 15:45:47 +0000 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: References: Message-ID: <50DDBEAB.6000906@hotpy.org> On 27/12/12 15:10, Nick Coghlan wrote: Hi, > This PEP proposes that CPython move to an explicit 2-phase initialisation Why only two phases? I was thinking about the initialisation sequence a while ago and thought that a three or four phase sequence might be appropriate. What matters is that the state in between phases is well defined and simple to understand. You might want to take a look at rubinius which implements most of its core components in Ruby, so needs a clearly defined startup sequence. http://rubini.us/doc/en/bootstrapping/ (Rubinius using 7 phases, but that would be overkill for CPython) Cheers, Mark. From ncoghlan at gmail.com Fri Dec 28 19:07:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Dec 2012 04:07:45 +1000 Subject: [Python-ideas] PEP 432: Simplifying the CPython startup sequence In-Reply-To: <50DDBEAB.6000906@hotpy.org> References: <50DDBEAB.6000906@hotpy.org> Message-ID: On Sat, Dec 29, 2012 at 1:45 AM, Mark Shannon wrote: > On 27/12/12 15:10, Nick Coghlan wrote: > > Hi, > > >> This PEP proposes that CPython move to an explicit 2-phase initialisation > > > Why only two phases? I was thinking about the initialisation sequence a > while ago and thought that a three or four phase sequence might be > appropriate. What matters is that the state in between phases is well > defined and simple to understand. The "2-phase" term came from the fact that I'm trying to break Py_Initialize() into two separate phase changes that roughly correspond with the locations of the current calls to _Py_Random_Init() and Py_Initialize() in Py_Main(). There's also at least a 3rd phase (even in the current design), because there's a "get ready to start executing __main__" phase after Py_Initialise finishes that changes various attributes on __main__ and may also modify sys.path[0] and sys.argv[0]. This is the first phase where user code may execute (Package __init__ modules may run in this phase when the "-m" switch is used to execute a package or submodule) So yeah, I need to lose the "2-phase" term, because it's simply wrong. A more realistic description of the phases proposed in the PEP would be: PreInit Phase - No CPython infrastructure configured, only pure C code allowed Initializing Phase - After Py_BeginInitialization() is called. Limitations as described in the PEP. PreMain Phase - After Py_EndInitialization() is called. __main__ attributes, sys.path[0], sys.argv[0] may still be inaccurate Main Execution - Execution of the main module bytecode has started. Interpreter has been fully configured. > You might want to take a look at rubinius which implements most of its core > components in Ruby, so needs a clearly defined startup sequence. > http://rubini.us/doc/en/bootstrapping/ > (Rubinius using 7 phases, but that would be overkill for CPython) Thanks for the reference. However, it looks like most of those seven stages will still be handled in our preinit phase. It sounds like we do a *lot* more in C than Rubinius does, so most of that code really doesn't need much in the way of infrastructure. It's definitely not *easy* to understand, but we also don't mess with it very often, and it's the kind of code where having access to more of the Python C API wouldn't really help all that much. The key piece I think we're currently missing is the clearly phase change between "PreInit" (can't safely use the Python C API) and "Initializing" (can use most of the C API, with some restrictions). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From michelelacchia at gmail.com Sat Dec 29 08:39:05 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Sat, 29 Dec 2012 08:39:05 +0100 Subject: [Python-ideas] [Python-Dev] question about packaging In-Reply-To: <50DE2762.90509@cavallinux.eu> References: <480CF8A8-0461-4C20-8A3C-2944C883E78B@gmail.com> <50DE2762.90509@cavallinux.eu> Message-ID: Sorry if I interfere, but now what should be supported, distlib or packaging? It seems to me that the former was born to solve some problems packaging and distribute still had. In addition to that, packaging has not been included in Python 3.3, as it was first planned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Dec 29 09:25:40 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 29 Dec 2012 18:25:40 +1000 Subject: [Python-ideas] [Python-Dev] question about packaging In-Reply-To: References: <480CF8A8-0461-4C20-8A3C-2944C883E78B@gmail.com> <50DE2762.90509@cavallinux.eu> Message-ID: On Sat, Dec 29, 2012 at 5:39 PM, Michele Lacchia wrote: > Sorry if I interfere, but now what should be supported, distlib or > packaging? It seems to me that the former was born to solve some problems > packaging and distribute still had. In addition to that, packaging has not > been included in Python 3.3, as it was first planned. Originally, the distutils2 project was going to be the basis the new packaging support in the stdlib. The critical problem identified in the run up to 3.3 was that the level of maturity in distutils2 (and hence packaging) was hugely variable - some parts were (almost) ready for inclusion, but many were not. By building up distlib more incrementally (rather than starting as a fork of distutils), it should be easier to identify which parts are sufficiently mature for stdlib inclusion. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dkreuter at gmail.com Sun Dec 30 04:25:38 2012 From: dkreuter at gmail.com (David Kreuter) Date: Sun, 30 Dec 2012 04:25:38 +0100 Subject: [Python-ideas] proposed methods: list.replace / list.indices Message-ID: Hi python-ideas. I think it would be nice to have a method in 'list' to replace certain elements by others in-place. Like this: l = [x, a, y, a] l.replace(a, b) assert l == [x, b, y, b] The alternatives are longer than they should be, imo. For example: for i, n in enumerate(l): if n == a: l[i] = b Or: l = [b if n==a else n for n in l] And this is what happens when someone tries to "optimize" this process. It totally obscures the intention: try: i = 0 while i < len(l): i = l.index(a, i) l[i] = b i += 1 except ValueError: pass If there is a reason not to add '.replace' as built-in method, it could be implemented in pure python efficiently if python provided a version of '.index' that returns the index of more than just the first occurrence of a given item. Like this: l = [x, a, b, a] for i in l.indices(a): l[i] = b So adding .replace and/or .indices? Good idea? Bad idea? -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Dec 30 05:03:18 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 30 Dec 2012 04:03:18 +0000 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: Message-ID: <50DFBD06.10304@mrabarnett.plus.com> On 2012-12-30 03:25, David Kreuter wrote: > Hi python-ideas. > > I think it would be nice to have a method in 'list' to replace certain > elements by others in-place. Like this: > > l = [x, a, y, a] > l.replace(a, b) > assert l == [x, b, y, b] > > The alternatives are longer than they should be, imo. For example: > > for i, n in enumerate(l): > if n == a: > l[i] = b > > Or: > > l = [b if n==a else n for n in l] > > And this is what happens when someone tries to "optimize" this process. > It totally obscures the intention: > > try: > i = 0 > while i < len(l): > i = l.index(a, i) > l[i] = b > i += 1 > except ValueError: > pass > > If there is a reason not to add '.replace' as built-in method, it could > be implemented in pure python efficiently if python provided a version > of '.index' that returns the index of more than just the first > occurrence of a given item. Like this: > > l = [x, a, b, a] > for i in l.indices(a): > l[i] = b > > So adding .replace and/or .indices? Good idea? Bad idea? > What's your use-case? I personally can't remember ever needing to do this (or, if I have, it was so long ago that I can't remember it!). Features get added to Python only when someone can show a compelling reason for it and sufficient other people agree. From dkreuter at gmail.com Sun Dec 30 05:59:28 2012 From: dkreuter at gmail.com (David Kreuter) Date: Sun, 30 Dec 2012 05:59:28 +0100 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50DFBD06.10304@mrabarnett.plus.com> References: <50DFBD06.10304@mrabarnett.plus.com> Message-ID: On Sun, Dec 30, 2012 at 5:03 AM, MRAB wrote: > On 2012-12-30 03:25, David Kreuter wrote: > >> Hi python-ideas. >> >> I think it would be nice to have a method in 'list' to replace certain >> elements by others in-place. Like this: >> >> l = [x, a, y, a] >> l.replace(a, b) >> assert l == [x, b, y, b] >> >> The alternatives are longer than they should be, imo. For example: >> >> for i, n in enumerate(l): >> if n == a: >> l[i] = b >> >> Or: >> >> l = [b if n==a else n for n in l] >> >> And this is what happens when someone tries to "optimize" this process. >> It totally obscures the intention: >> >> try: >> i = 0 >> while i < len(l): >> i = l.index(a, i) >> l[i] = b >> i += 1 >> except ValueError: >> pass >> >> If there is a reason not to add '.replace' as built-in method, it could >> be implemented in pure python efficiently if python provided a version >> of '.index' that returns the index of more than just the first >> occurrence of a given item. Like this: >> >> l = [x, a, b, a] >> for i in l.indices(a): >> l[i] = b >> >> So adding .replace and/or .indices? Good idea? Bad idea? >> >> What's your use-case? > > I personally can't remember ever needing to do this (or, if I have, it > was so long ago that I can't remember it!). > > Features get added to Python only when someone can show a compelling > reason for it and sufficient other people agree. > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > When I write code for processing graphs it becomes very useful. For example: def collapse_edge_undirected_graph(a, b): n = Node() n.connected = a.connected + b.connected for x in a.connected: x.connected.replace(a, n) for x in b.connected: x.connected.replace(b, n) In other cases one would probably just add another layer of indirection. x = Wrapper("a") y = Wrapper("y") a = Wrapper("a") l = [x, a, y, a] a.contents = "b" # instead of l.replace(a, b) But having to add .contents everywhere makes it messy. Graph code is complicated enough as it is. And '.index' is a basically a resumable search. But instead of using a iterator-interface it requires the user to call it repeatedly. A method '.indices' returning a generator seems more like the python way to approaching this. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Dec 30 10:04:32 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 30 Dec 2012 09:04:32 +0000 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50DFBD06.10304@mrabarnett.plus.com> Message-ID: On 30 December 2012 04:59, David Kreuter wrote: > When I write code for processing graphs it becomes very useful. For example: > > def collapse_edge_undirected_graph(a, b): > n = Node() > n.connected = a.connected + b.connected > for x in a.connected: > x.connected.replace(a, n) > for x in b.connected: > x.connected.replace(b, n) Assuming n.connected is the set of nodes connected to n, why use a list rather than a set? And if you need multi-edges, a dict mapping node to count of edges (i.e. a multiset). Paul. From dkreuter at gmail.com Sun Dec 30 10:24:31 2012 From: dkreuter at gmail.com (David Kreuter) Date: Sun, 30 Dec 2012 10:24:31 +0100 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50DFBD06.10304@mrabarnett.plus.com> Message-ID: On Sun, Dec 30, 2012 at 10:04 AM, Paul Moore wrote: > On 30 December 2012 04:59, David Kreuter wrote: > > When I write code for processing graphs it becomes very useful. For > example: > > > > def collapse_edge_undirected_graph(a, b): > > n = Node() > > n.connected = a.connected + b.connected > > for x in a.connected: > > x.connected.replace(a, n) > > for x in b.connected: > > x.connected.replace(b, n) > > Assuming n.connected is the set of nodes connected to n, why use a > list rather than a set? And if you need multi-edges, a dict mapping > node to count of edges (i.e. a multiset). > > Paul. > Ah, that's because in that specific case I'm processing flow graphs. A node with two outgoing edges represents an 'if'. The order does matter. [1] is where the flow continues when the condition evaluates to true. [0] for false. Forgot to mention that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Dec 30 11:42:53 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 30 Dec 2012 11:42:53 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: My astoptimizer provides tools to really *remove* debug at compilation, so the overhead of the debug code is just null. You can for example declare your variable project.config.DEBUG as constant with the value 0, where project.config is a module. So the if statement in "from project.config import DEBUG ... if DEBUG: ..." will be removed. See: https://bitbucket.org/haypo/astoptimizer Victor Le 25 d?c. 2012 13:43, "Rene Nejsum" a ?crit : > I understand and agree with all your arguments on debugging. > > At my company we typically make some kind of backend/server control > software, with a LOT of debugging lines across many modules. We have 20+ > debugging flags and in different situations we enable a few of those, if we > were to enable all at once it would defently have an impact on production, > but hopefully just a hotter CPU and a lot of disk space being used. > > debug statements in our code is probably one per 10-20 lines of code. > > I think my main issue (and what I therefore read into the original > suggestion) was the extra "if" statement at every log statement > > So doing: > > if log.debug.enabled(): > log.debug( bla. bla. ) > > Add's 5-10% extra code lines, whereas if we could do: > > log.debug( bla. bla ) > > at the same cost would save a lot of lines. > > And when you have 43 lines in your editor, it will give you 3-5 lines more > of real code to look at :-) > > /Rene > > > > On Dec 25, 2012, at 1:28 PM, Nick Coghlan wrote: > > > On Tue, Dec 25, 2012 at 9:11 PM, Rene Nejsum wrote: > >> But if debug() was indeed NOP'able, maybe it could be done ? > > > > If someone *really* wants to do this, they can abuse assert statements > > (which will be optimised out under "-O", just like code guarded by "if > > __debug__"). That doesn't make it a good idea - you most need log > > messages to investigate faults in production systems that you can't > > (or are still trying to) reproduce in development and integration > > environments. Compiling them out instead of deactivating them with > > runtime configuration settings means you can't switch them on without > > restarting the system with different options. > > > > This does mean that you have to factor in the cost of logging into > > your performance targets and hardware requirements, but the payoff is > > an increased ability to correctly diagnose system faults (as well as > > improving your ability to extract interesting metrics from log > > messages). > > > > Excessive logging calls certainly *can* cause performance problems due > > to the function call overhead, as can careless calculation of > > expensive values that aren't needed. One alternatives occasional > > noted is that you could design a logging API that can accept lazily > > evaluated callables instead of ordinary parameters. > > > > However, one danger of such expensive logging it that enabling that > > logging level becomes infeasible in practice, because the performance > > hit is too significant. The typical aim for logging is that your > > overhead should be such that enabling it in production means your > > servers run a little hotter, or your task takes a little longer, not > > that your application grinds to a halt. One good way to achieve this > > is to decouple the expensive calculations from the main application - > > you instead log the necessary pieces of information, which can be > > picked up by an external service and the calculation performed in a > > separate process (or even on a separate machine) where it won't affect > > the main application, and where you only calculate it if you actually > > need it for some reason. > > > > Cheers, > > Nick. > > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sun Dec 30 11:46:45 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 30 Dec 2012 05:46:45 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: Message-ID: On 12/29/2012 10:25 PM, David Kreuter wrote: > I think it would be nice to have a method in 'list' to replace certain > elements by others in-place. Like this: > > l = [x, a, y, a] > l.replace(a, b) > assert l == [x, b, y, b] > > The alternatives are longer than they should be, imo. For example: > > for i, n in enumerate(l): > if n == a: > l[i] = b I dont see anything wrong with this. It is how I would do it in python. Wrap it in a function if you want. Or write it on two line ;-). > If there is a reason not to add '.replace' as built-in method, There is a perfectly good python version above that does the necessary search and replace as efficiently as possible. Thank you for posting it. -- Terry Jan Reedy From stefan_ml at behnel.de Sun Dec 30 11:58:21 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 30 Dec 2012 11:58:21 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: Victor Stinner, 30.12.2012 11:42: > My astoptimizer provides tools to really *remove* debug at compilation, so > the overhead of the debug code is just null. > > You can for example declare your variable project.config.DEBUG as constant > with the value 0, where project.config is a module. So the if statement in > "from project.config import DEBUG ... if DEBUG: ..." will be removed. How would you know at compile time that it can be removed? How do you handle the example below? Stefan ## constants.py DEBUG = False ## enable_debug.py import constants constants.DEBUG = True ## test.py import enable_debug from constants import DEBUG if DEBUG: print("DEBUGGING !") From ned at nedbatchelder.com Sun Dec 30 15:10:01 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 30 Dec 2012 09:10:01 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: Message-ID: <50E04B39.2040508@nedbatchelder.com> On 12/30/2012 5:46 AM, Terry Reedy wrote: > On 12/29/2012 10:25 PM, David Kreuter wrote: > >> I think it would be nice to have a method in 'list' to replace certain >> elements by others in-place. Like this: >> >> l = [x, a, y, a] >> l.replace(a, b) >> assert l == [x, b, y, b] >> >> The alternatives are longer than they should be, imo. For example: >> >> for i, n in enumerate(l): >> if n == a: >> l[i] = b > > I dont see anything wrong with this. It is how I would do it in > python. Wrap it in a function if you want. Or write it on two line ;-). I wonder at the underlying philosophy of things being accepted or rejected in this way. For example, here's a thought experiment: if list.count() and list.index() didn't exist yet, would we accept them as additions to the list methods? By Terry's reasoning, there's no need to, since I can implement those operations in a few lines of Python. Does that mean they persist only for backwards compatibility? Was their initial inclusion a violation of some "list method philosophy"? Or is there a good reason for them to exist, and if so, why shouldn't .replace() and .indexes() also exist? The two sides (count/index and replace/indexes) seem about the same to me: - They are unambiguous operations. That is, no one has objected that reasonable people might disagree about how .replace() should behave, which is a common reason not to add things to the stdlib. - They implement simple operations that are easy to explain and will find use. In my experience, .indexes() is at least as useful as .count(). - All are based on element equality semantics. - Any of them could be implemented in a few lines of Python. What is the organizing principle for the methods list (or any other built-in data structure) should have? I would hate for the main criterion to be, "these are the methods that existed in Python 2.3," for example. Why is .count() in and .replace() out? > >> If there is a reason not to add '.replace' as built-in method, > > There is a perfectly good python version above that does the necessary > search and replace as efficiently as possible. Thank you for posting it. > You say "as efficiently as possible," but you mean, "as algorithmically efficient as possible," which is true, they are linear, which is as good as it's going to get. But surely if coded in C, these operations would be faster. --Ned. From ubershmekel at gmail.com Sun Dec 30 15:51:23 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 30 Dec 2012 16:51:23 +0200 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E04B39.2040508@nedbatchelder.com> References: <50E04B39.2040508@nedbatchelder.com> Message-ID: On Sun, Dec 30, 2012 at 4:10 PM, Ned Batchelder wrote: > I wonder at the underlying philosophy of things being accepted or rejected > in this way. > I'm no expert on the subject but here are a few criteria for builtin method inclusion: * Useful - show many popular use cases, e.g. attach many links to various lines on github/stackoverflow/bitbucket. * Hard to get right, i.e. user implementations tend to have bugs. * Would benefit greatly from C optimization * Have a great, obvious, specific, readable name * Don't overlap with anything else in the stdlib - TSBOAPOOOWTDI * Consistent with the rest of python, e.g. * Community approval * BDFL approval Brett wrote a bit on stdlib inclusion which may be relevant http://mail.python.org/pipermail/python-3000/2006-June/002442.html "that way may not be obvious at first unless you're Dutch." Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Dec 30 16:05:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 31 Dec 2012 01:05:45 +1000 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E04B39.2040508@nedbatchelder.com> References: <50E04B39.2040508@nedbatchelder.com> Message-ID: On Mon, Dec 31, 2012 at 12:10 AM, Ned Batchelder wrote: > The two sides (count/index and replace/indexes) seem about the same to me: > > - They are unambiguous operations. That is, no one has objected that > reasonable people might disagree about how .replace() should behave, which > is a common reason not to add things to the stdlib. > - They implement simple operations that are easy to explain and will find > use. In my experience, .indexes() is at least as useful as .count(). > - All are based on element equality semantics. > - Any of them could be implemented in a few lines of Python. > > What is the organizing principle for the methods list (or any other built-in > data structure) should have? I would hate for the main criterion to be, > "these are the methods that existed in Python 2.3," for example. Why is > .count() in and .replace() out? The general problem with adding new methods to types rather than adding new functions+protocols is that it breaks ducktyping. We can mitigate that now by adding the new methods to collections.abc.Sequence, but it remains the case that relying on these methods being present rather than using the functional equivalent will needlessly couple your code to the underlying sequence implementation (since not all sequences inherit from the ABC, some are just registered). We also have a problem with replace() specifically that it *does* already exist in the standard library, as a non-mutating operation on str, bytes and bytearray. Adding it as a mutating method on sequences in general would create an immediate name conflict in the bytearray method namespace. That alone is a dealbreaker for that part of the idea. The question of an "indices" builtin or itertools function is potentially more interesting, but really, I don't think the algorithm David noted in his original post rises to the level of needing standardisation or acceleration: def indices(seq, val): for i, x in enumerate(seq): if x == val: yield i def map_assign(store, keys, val): for k in keys: store[k] = val def replace(seq, old, new): map_assign(seq, indices(seq, old), new) seq = [x, a, y, a] replace(seq, a, b) assert seq == [x, b, y, b] Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ned at nedbatchelder.com Sun Dec 30 16:13:10 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 30 Dec 2012 10:13:10 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50E04B39.2040508@nedbatchelder.com> Message-ID: <50E05A06.8010308@nedbatchelder.com> On 12/30/2012 9:51 AM, Yuval Greenfield wrote: > On Sun, Dec 30, 2012 at 4:10 PM, Ned Batchelder > wrote: > > I wonder at the underlying philosophy of things being accepted or > rejected in this way. > > > I'm no expert on the subject but here are a few criteria for builtin > method inclusion: > > * Useful - show many popular use cases, e.g. attach many links to > various lines on github/stackoverflow/bitbucket. > * Hard to get right, i.e. user implementations tend to have bugs. > * Would benefit greatly from C optimization > * Have a great, obvious, specific, readable name > * Don't overlap with anything else in the stdlib - TSBOAPOOOWTDI > * Consistent with the rest of python, e.g. > * Community approval > * BDFL approval > This is a good list. To make this concrete: in your opinion, would list.replace() and list.indexes() pass these criteria, or not? --Ned. > Brett wrote a bit on stdlib inclusion which may be relevant > http://mail.python.org/pipermail/python-3000/2006-June/002442.html > > "that way may not be obvious at first unless you're Dutch." > > Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Sun Dec 30 17:00:31 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 30 Dec 2012 11:00:31 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50E04B39.2040508@nedbatchelder.com> Message-ID: <50E0651F.6000305@nedbatchelder.com> On 12/30/2012 10:05 AM, Nick Coghlan wrote: > On Mon, Dec 31, 2012 at 12:10 AM, Ned Batchelder wrote: >> The two sides (count/index and replace/indexes) seem about the same to me: >> >> - They are unambiguous operations. That is, no one has objected that >> reasonable people might disagree about how .replace() should behave, which >> is a common reason not to add things to the stdlib. >> - They implement simple operations that are easy to explain and will find >> use. In my experience, .indexes() is at least as useful as .count(). >> - All are based on element equality semantics. >> - Any of them could be implemented in a few lines of Python. >> >> What is the organizing principle for the methods list (or any other built-in >> data structure) should have? I would hate for the main criterion to be, >> "these are the methods that existed in Python 2.3," for example. Why is >> .count() in and .replace() out? > The general problem with adding new methods to types rather than > adding new functions+protocols is that it breaks ducktyping. We can > mitigate that now by adding the new methods to > collections.abc.Sequence, but it remains the case that relying on > these methods being present rather than using the functional > equivalent will needlessly couple your code to the underlying sequence > implementation (since not all sequences inherit from the ABC, some are > just registered). > > We also have a problem with replace() specifically that it *does* > already exist in the standard library, as a non-mutating operation on > str, bytes and bytearray. Adding it as a mutating method on sequences > in general would create an immediate name conflict in the bytearray > method namespace. That alone is a dealbreaker for that part of the > idea. I don't understand the conflict? .replace() from sequence does precisely the same thing as .replace() from bytes if you limit the arguments to single-byte values. It seems perfectly natural to me. I must be missing something. > > The question of an "indices" builtin or itertools function is > potentially more interesting, but really, I don't think the algorithm > David noted in his original post rises to the level of needing > standardisation or acceleration: > > def indices(seq, val): > for i, x in enumerate(seq): > if x == val: yield i > > def map_assign(store, keys, val): > for k in keys: > store[k] = val > > def replace(seq, old, new): > map_assign(seq, indices(seq, old), new) > > seq = [x, a, y, a] > replace(seq, a, b) > assert seq == [x, b, y, b] Does this mean that if .index() or .count() didn't already exist, you wouldn't add them to list? > Cheers, > Nick. > From python at mrabarnett.plus.com Sun Dec 30 17:58:06 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 30 Dec 2012 16:58:06 +0000 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E0651F.6000305@nedbatchelder.com> References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> Message-ID: <50E0729E.1040208@mrabarnett.plus.com> On 2012-12-30 16:00, Ned Batchelder wrote: > On 12/30/2012 10:05 AM, Nick Coghlan wrote: >> On Mon, Dec 31, 2012 at 12:10 AM, Ned Batchelder wrote: >>> The two sides (count/index and replace/indexes) seem about the same to me: >>> >>> - They are unambiguous operations. That is, no one has objected that >>> reasonable people might disagree about how .replace() should behave, which >>> is a common reason not to add things to the stdlib. >>> - They implement simple operations that are easy to explain and will find >>> use. In my experience, .indexes() is at least as useful as .count(). >>> - All are based on element equality semantics. >>> - Any of them could be implemented in a few lines of Python. >>> >>> What is the organizing principle for the methods list (or any other built-in >>> data structure) should have? I would hate for the main criterion to be, >>> "these are the methods that existed in Python 2.3," for example. Why is >>> .count() in and .replace() out? >> The general problem with adding new methods to types rather than >> adding new functions+protocols is that it breaks ducktyping. We can >> mitigate that now by adding the new methods to >> collections.abc.Sequence, but it remains the case that relying on >> these methods being present rather than using the functional >> equivalent will needlessly couple your code to the underlying sequence >> implementation (since not all sequences inherit from the ABC, some are >> just registered). >> >> We also have a problem with replace() specifically that it *does* >> already exist in the standard library, as a non-mutating operation on >> str, bytes and bytearray. Adding it as a mutating method on sequences >> in general would create an immediate name conflict in the bytearray >> method namespace. That alone is a dealbreaker for that part of the >> idea. > > I don't understand the conflict? .replace() from sequence does > precisely the same thing as .replace() from bytes if you limit the > arguments to single-byte values. It seems perfectly natural to me. I > must be missing something. > [snip] The difference is that for bytes and str it returns the result (they are immutable after all), but the suggested addition would mutate the list in-place. In order to be consistent it would have to return the result instead. From hernan.grecco at gmail.com Sun Dec 30 18:54:43 2012 From: hernan.grecco at gmail.com (Hernan Grecco) Date: Sun, 30 Dec 2012 18:54:43 +0100 Subject: [Python-ideas] Order in the documentation search results Message-ID: Hi, I have seen many people new to Python stumbling while using the Python docs due to the order of the search results. For example, if somebody new to python searches for `tuple`, the actual section about `tuple` comes in place 39. What is more confusing for people starting with the language is that all the C functions come first. I have seen people clicking in PyTupleObject just to be totally disoriented. Maybe `tuple` is a silly example. But if somebody wants to know how does `open` behaves and which arguments it takes, the result comes in position 16. `property` does not appear in the list at all (but built-in appears in position 31). This is true for most builtins. Experienced people will have no trouble navigating through these results, but new users do. It is not terrible and at the end they get it, but I think it would be nice to change it to more (new) user friendly order. So my suggestion is to put the builtins first, the rest of the standard lib later including HowTos, FAQ, etc and finally the c-modules. Additionally, a section with a title matching exactly the search query should come first. (I am not sure if the last suggestion belongs in python-ideas or in the sphinx mailing list, please advice) Thanks, Hernan From ned at nedbatchelder.com Sun Dec 30 19:11:06 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 30 Dec 2012 13:11:06 -0500 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: <50E083BA.7000603@nedbatchelder.com> On 12/30/2012 12:54 PM, Hernan Grecco wrote: > Hi, > > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. > > For example, if somebody new to python searches for `tuple`, the > actual section about `tuple` comes in place 39. What is more confusing > for people starting with the language is that all the C functions come > first. I have seen people clicking in PyTupleObject just to be totally > disoriented. > > Maybe `tuple` is a silly example. But if somebody wants to know how > does `open` behaves and which arguments it takes, the result comes in > position 16. `property` does not appear in the list at all (but > built-in appears in position 31). This is true for most builtins. > > Experienced people will have no trouble navigating through these > results, but new users do. It is not terrible and at the end they get > it, but I think it would be nice to change it to more (new) user > friendly order. > > So my suggestion is to put the builtins first, the rest of the > standard lib later including HowTos, FAQ, etc and finally the > c-modules. Additionally, a section with a title matching exactly the > search query should come first. (I am not sure if the last suggestion > belongs in python-ideas or in > the sphinx mailing list, please advice) While we're on the topic, why in this day and age do we have a custom search? Using google site search would be faster for the user, and more accurate. --Ned. > Thanks, > > Hernan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From ezio.melotti at gmail.com Sun Dec 30 19:11:27 2012 From: ezio.melotti at gmail.com (Ezio Melotti) Date: Sun, 30 Dec 2012 20:11:27 +0200 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: Hi, On Sun, Dec 30, 2012 at 7:54 PM, Hernan Grecco wrote: > Hi, > > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. > > For example, if somebody new to python searches for `tuple`, the > actual section about `tuple` comes in place 39. What is more confusing > for people starting with the language is that all the C functions come > first. I have seen people clicking in PyTupleObject just to be totally > disoriented. > > Maybe `tuple` is a silly example. But if somebody wants to know how > does `open` behaves and which arguments it takes, the result comes in > position 16. `property` does not appear in the list at all (but > built-in appears in position 31). This is true for most builtins. > > Experienced people will have no trouble navigating through these > results, but new users do. It is not terrible and at the end they get > it, but I think it would be nice to change it to more (new) user > friendly order. > > So my suggestion is to put the builtins first, the rest of the > standard lib later including HowTos, FAQ, etc and finally the > c-modules. Additionally, a section with a title matching exactly the > search query should come first. (I am not sure if the last suggestion > belongs in python-ideas or in > the sphinx mailing list, please advice) > > Thanks, > > Hernan > I experimented with this a bit a while ago. See http://bugs.python.org/issue15871#msg170048. Best Regards, Ezio Melotti -------------- next part -------------- An HTML attachment was scrubbed... URL: From hernan.grecco at gmail.com Sun Dec 30 19:18:28 2012 From: hernan.grecco at gmail.com (Hernan Grecco) Date: Sun, 30 Dec 2012 19:18:28 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: <50E083BA.7000603@nedbatchelder.com> Message-ID: Hi Ned, On Sun, Dec 30, 2012 at 7:11 PM, Ned Batchelder wrote: > > While we're on the topic, why in this day and age do we have a custom > search? Using google site search would be faster for the user, and more > accurate. > > --Ned. In general I agree with you, but I find downloadable documentation very useful (one of the many reasons that I like sphinx). Keeping the search engine in that case is very convenient. Hernan From g.brandl at gmx.net Sun Dec 30 20:45:53 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 30 Dec 2012 20:45:53 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: <50E083BA.7000603@nedbatchelder.com> References: <50E083BA.7000603@nedbatchelder.com> Message-ID: On 12/30/2012 07:11 PM, Ned Batchelder wrote: > On 12/30/2012 12:54 PM, Hernan Grecco wrote: >> Hi, >> >> I have seen many people new to Python stumbling while using the Python >> docs due to the order of the search results. >> >> For example, if somebody new to python searches for `tuple`, the >> actual section about `tuple` comes in place 39. What is more confusing >> for people starting with the language is that all the C functions come >> first. I have seen people clicking in PyTupleObject just to be totally >> disoriented. >> >> Maybe `tuple` is a silly example. But if somebody wants to know how >> does `open` behaves and which arguments it takes, the result comes in >> position 16. `property` does not appear in the list at all (but >> built-in appears in position 31). This is true for most builtins. >> >> Experienced people will have no trouble navigating through these >> results, but new users do. It is not terrible and at the end they get >> it, but I think it would be nice to change it to more (new) user >> friendly order. >> >> So my suggestion is to put the builtins first, the rest of the >> standard lib later including HowTos, FAQ, etc and finally the >> c-modules. Additionally, a section with a title matching exactly the >> search query should come first. (I am not sure if the last suggestion >> belongs in python-ideas or in >> the sphinx mailing list, please advice) > > While we're on the topic, why in this day and age do we have a custom > search? Using google site search would be faster for the user, and more > accurate. I agree. Someone needs to propose a patch though. cheers, Georg From random832 at fastmail.us Sun Dec 30 22:28:23 2012 From: random832 at fastmail.us (Random832) Date: Sun, 30 Dec 2012 16:28:23 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50E04B39.2040508@nedbatchelder.com> Message-ID: <50E0B1F7.1020000@fastmail.us> On 12/30/2012 10:05 AM, Nick Coghlan wrote: > The general problem with adding new methods to types rather than > adding new functions+protocols is that it breaks ducktyping. We can > mitigate that now by adding the new methods to > collections.abc.Sequence, but it remains the case that relying on > these methods being present rather than using the functional > equivalent will needlessly couple your code to the underlying sequence > implementation (since not all sequences inherit from the ABC, some are > just registered). You know what wouldn't break duck typing? Adding an extension-method-like (a la C#) mechanism to ABCs. Of course, the problem with that is, what if a sequence implements a method called replace that does something else? From victor.stinner at gmail.com Sun Dec 30 23:20:34 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 30 Dec 2012 23:20:34 +0100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: 2012/12/26 anatoly techtonik : > I am thinking about [python-wart] on SO. I'm not sure that StackOverflow is the best place for such project. (Note: please avoid abreviation, not all people know this website.) > There is no currently a list of > Python warts, and building a better language is impossible without a clear > visibility of warts in current implementations. Sorry, but what is a wart in Python? > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues When an issue is closed with "won't fix", "works for me", "invalid" or something like this, a comment always explain why. If you don't understand or such comment is missing, you can ask for more information. If you don't agree, the bug tracker is maybe not the right place for such discussion. The python-ideas mailing list is maybe a better place :-) Sometimes, the best thing to do is to propose a patch to enhance the documentation. > - no way to edit description to make it more clear You can add comments, it's almost the same. > - no voting/stars to percieve how important is this issue Votes are a trap. It's not how Python is developed. Python core developers are not paid to work on Python, and so work only on issues which interest them. I don't think that votes would help to fix an issue. If you want an issue to be closed: - ensure that someone else reproduced it: if not, provide more information - help to analyze the issue and track the bug in the code - propose a patch with tests and documentation > - no comment/noise filtering I don't have such problem. Can you give an example of issue which contains many useless comments? > and the most valuable > - there is no query to list warts sorted by popularity to explore other > time-consuming areas of Python you are not aware of, but which can popup one > day Sorry, I don't understand, maybe because I don't know what a wart is. -- If I understood correctly, you would like to list some specific issues like print() not flushing immediatly stdout if you ask to not write a newline (print "a", in Python 2 or print("a", end=" ") in Python 3). If I understood correctly, and if you want to improve Python, you should help the documentation project. Or if you can build a website listing such issues *and listing solutions* like calling sys.stdout.flush() or using print(flush=True) (Python 3.3+) for the print issue. A list of such issue without solution doesn't help anyone. Victor From tjreedy at udel.edu Sun Dec 30 23:59:53 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 30 Dec 2012 17:59:53 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E04B39.2040508@nedbatchelder.com> References: <50E04B39.2040508@nedbatchelder.com> Message-ID: On 12/30/2012 9:10 AM, Ned Batchelder wrote: > On 12/30/2012 5:46 AM, Terry Reedy wrote: >> On 12/29/2012 10:25 PM, David Kreuter wrote: >> >>> I think it would be nice to have a method in 'list' to replace certain >>> elements by others in-place. Like this: >>> >>> l = [x, a, y, a] >>> l.replace(a, b) >>> assert l == [x, b, y, b] >>> >>> The alternatives are longer than they should be, imo. For example: >>> >>> for i, n in enumerate(l): Note that enumerate is a generic function of iterables, not a specific list method. >>> if n == a: >>> l[i] = b >> >> I dont see anything wrong with this. It is how I would do it in >> python. Wrap it in a function if you want. Or write it on two line ;-). My deeper objection is that 'replace_all_in_place' is a generic mutable collection function, not a specific list or even mutable sequence function. Python 1 was stronger list oriented. Python 3 is mostly iterable oriented, with remnants of the Python 1 heritage. > I wonder at the underlying philosophy of things being accepted or > rejected in this way. For example, here's a thought experiment: if > list.count() and list.index() didn't exist yet, would we accept them as > additions to the list methods? I personally would have deleted list.find in 3.0. Count and index are not list methods but rather sequence methods, part of the sequence ABC. Tuples got them, as their only two public methods, in 3.0 to conform. This ties in to Nick's comment. (Actually counting a particular item in a collection is not specific to sequencess, but having multiple items to count tends to be.) It would be possible for count and index to be functions instead. But their definition as methods goes back to Python 1. Also note that .index has a start parameter, making it useful to get all indexes. See the code below. > By Terry's reasoning, there's no need > to, since I can implement those operations in a few lines of Python. We constantly get proposals to add new functions and methods that are easily written in a few lines. Everyone thinks their proposal is useful because it is useful in their work. If we accepted all such proposals, Python would have hundreds more. > Does that mean they persist only for backwards compatibility? Backwards compatibility is important. Changing them to functions would be disruptive without sufficient gain. > Was their initial inclusion a violation of some "list method philosophy"? No, it was part of the Python 1 philosophy of lists as the common data interchange type. As I said, this has changed in Python 3. > Or is > there a good reason for them to exist, and if so, why shouldn't > .replace() and .indexes() also exist? Neither are list methods. Nicks gave a generic indexes generator. A specific list indexes generator can use repeated applications of .index with start argument. I do that 'inline' below. > I would hate for the main > criterion to be, "these are the methods that existed in Python 2.3," Then you are hating reality ;-). The .method()s of basic builtin classes is close to frozen. >> There is a perfectly good python version above that does the necessary >> search and replace as efficiently as possible. Thank you for posting it. > You say "as efficiently as possible," but you mean, "as algorithmically > efficient as possible," which is true, they are linear, which is as good > as it's going to get. But surely if coded in C, these operations would > be faster. You are right. Lets do the next-item search in C with .index. If the density of items to be replaces is low, as it would be for most applications, this should dominate. def enum(lis): for i, n in enumerate(lis): if n == 1: lis[i] = 2 a, b = 100, 10000 # started with 2,1 for initial tests start = a*([1]+b*[0]) after = a*([2]+b*[0]) # test that correct before test speed! # since the list is mutated, it must be reset for each test lis = start.copy() enum(lis) print('enum: ', lis == after) def repin(lis): i = -1 try: while True: i = lis.index(1, i+1) lis[i] = 2 except: pass lis = start.copy() repin(lis) print('repin: ', lis == after) from timeit import timeit # now for speed, remembering to reset for each test # first measure the copy time to subtract from test times print(timeit('lis = start.copy()', 'from __main__ import start', number=10)) print(timeit('lis = start.copy(); enum(lis)', 'from __main__ import start, enum', number=10)) print(timeit('lis = start.copy(); repin(lis)', 'from __main__ import start, repin', number=10)) # measure scan without replace to give an upper limit to python-coded replace # since lis is not mutated, it only needs to be defined once print(timeit('repin(lis)', 'from __main__ import a, b, repin; lis = a*(b+1)*[0]', number=10)) # prints enum: True repin: True 0.06801244890066886 0.849063227602523 0.2759397696510706 0.20790119084727898 After subtracting and dividing, enum take .078 seconds for 100 replacements in 1000000 items, repin just .021, which is essentially the time it takes just to scan 1000000 items. So doing the replacements also in C would not be much faster. Rerunning with 10000 replacements (a,b = 10000, 100), the times are .080 and .024. -- Terry Jan Reedy From tjreedy at udel.edu Mon Dec 31 00:05:17 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 30 Dec 2012 18:05:17 -0500 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: On 12/30/2012 1:11 PM, Ezio Melotti wrote: > On Sun, Dec 30, 2012 at 7:54 PM, Hernan Grecco > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. People should use the index, both on and off line. See the issue below > I experimented with this a bit a while ago. See > http://bugs.python.org/issue15871#msg170048. -- Terry Jan Reedy From g.rodola at gmail.com Mon Dec 31 00:38:54 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Mon, 31 Dec 2012 00:38:54 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: 2012/12/30 Hernan Grecco > Hi, > > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. > > For example, if somebody new to python searches for `tuple`, the > actual section about `tuple` comes in place 39. What is more confusing > for people starting with the language is that all the C functions come > first. I have seen people clicking in PyTupleObject just to be totally > disoriented. > > Maybe `tuple` is a silly example. But if somebody wants to know how > does `open` behaves and which arguments it takes, the result comes in > position 16. `property` does not appear in the list at all (but > built-in appears in position 31). This is true for most builtins. > > Experienced people will have no trouble navigating through these > results, but new users do. It is not terrible and at the end they get > it, but I think it would be nice to change it to more (new) user > friendly order. > > So my suggestion is to put the builtins first, the rest of the > standard lib later including HowTos, FAQ, etc and finally the > c-modules. Additionally, a section with a title matching exactly the > search query should come first. (I am not sure if the last suggestion > belongs in python-ideas or in > the sphinx mailing list, please advice) > > Thanks, > > Hernan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > +1 I agree it's sub-optimal. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Mon Dec 31 00:59:27 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 30 Dec 2012 18:59:27 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50E04B39.2040508@nedbatchelder.com> Message-ID: <50E0D55F.7080108@nedbatchelder.com> Thanks, these are very informative answers. --Ned. From ryan at hackery.io Mon Dec 31 01:06:08 2012 From: ryan at hackery.io (Ryan Macy) Date: Sun, 30 Dec 2012 18:06:08 -0600 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: <50E0D6F0.5090200@hackery.io> I'm a young developer (22) that is aspiring to contribute to the python language and I think another perspective could help the conversation. What I believe Anataoly is asking for, albeit phrased in a different manner, is the ability to clearly see what the core issues/needs are in the language. I've been able to discern through time, and the python mailing lists, that packaging, multitasking, and timezone support are areas that could use help. Sure, 'wart' is subjective, but I believe the point made in between the lines is valid. Is there a place that holds the key improvements that the python language needs, so that we can work to it better? If that's the bug tracker, is there a method already in place that signals areas that need improvements or fixes? [I know that there are severity levels, etc :)] FWIW I've joined the python-mentor list, have read most of the devguide, and lurked on the bug tracker; I still feel like there is tons of context that I'm missing, which has me chasing PEPs constantly - So I'm definitely able to resonate the with this thread. I apologize if I'm waay off target. _Ryan > Victor Stinner > December 30, 2012 4:20 PM > 2012/12/26 anatoly techtonik: >> I am thinking about [python-wart] on SO. > > I'm not sure that StackOverflow is the best place for such project. > (Note: please avoid abreviation, not all people know this website.) > >> There is no currently a list of >> Python warts, and building a better language is impossible without a clear >> visibility of warts in current implementations. > > Sorry, but what is a wart in Python? > >> Why Roundup doesn't work ATM. >> - warts are lost among other "won't fix" and "works for me" issues > > When an issue is closed with "won't fix", "works for me", "invalid" or > something like this, a comment always explain why. If you don't > understand or such comment is missing, you can ask for more > information. > > If you don't agree, the bug tracker is maybe not the right place for > such discussion. The python-ideas mailing list is maybe a better place > :-) > > Sometimes, the best thing to do is to propose a patch to enhance the > documentation. > >> - no way to edit description to make it more clear > > You can add comments, it's almost the same. > >> - no voting/stars to percieve how important is this issue > > Votes are a trap. It's not how Python is developed. Python core > developers are not paid to work on Python, and so work only on issues > which interest them. > > I don't think that votes would help to fix an issue. > > If you want an issue to be closed: > - ensure that someone else reproduced it: if not, provide more information > - help to analyze the issue and track the bug in the code > - propose a patch with tests and documentation > >> - no comment/noise filtering > > I don't have such problem. Can you give an example of issue which > contains many useless comments? > >> and the most valuable >> - there is no query to list warts sorted by popularity to explore other >> time-consuming areas of Python you are not aware of, but which can popup one >> day > > Sorry, I don't understand, maybe because I don't know what a wart is. > > -- > > If I understood correctly, you would like to list some specific issues > like print() not flushing immediatly stdout if you ask to not write a > newline (print "a", in Python 2 or print("a", end=" ") in Python 3). > If I understood correctly, and if you want to improve Python, you > should help the documentation project. Or if you can build a website > listing such issues *and listing solutions* like calling > sys.stdout.flush() or using print(flush=True) (Python 3.3+) for the > print issue. > > A list of such issue without solution doesn't help anyone. > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > anatoly techtonik > December 25, 2012 6:10 PM > I am thinking about [python-wart] on SO. There is no currently a list > of Python warts, and building a better language is impossible without > a clear visibility of warts in current implementations. > > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues > - no way to edit description to make it more clear > - no voting/stars to percieve how important is this issue > - no comment/noise filtering > and the most valuable > - there is no query to list warts sorted by popularity to explore > other time-consuming areas of Python you are not aware of, but which > can popup one day > > SO at least allows: > + voting > + community wiki edits > + useful comment upvoting > + sorted lists > + user editable tags (adding new warts is easy) > > This post is a result of facing with numerous locals/settrace/exec > issues that are closed on tracker. I also have my own list of other > issues (logging/subprocess) at GC project, which I might be unable to > maintain in future. There is also some undocumented stuff (subprocess > deadlocks) that I'm investigating, but don't have time for a write-up. > So I'd rather move this somewhere where it could be updated. > -- > anatoly t. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postbox-contact.jpg Type: image/jpeg Size: 1240 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postbox-contact.jpg Type: image/jpeg Size: 1103 bytes Desc: not available URL: From steve at pearwood.info Mon Dec 31 01:12:15 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Dec 2012 11:12:15 +1100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: <50E0D85F.8070607@pearwood.info> On 31/12/12 04:54, Hernan Grecco wrote: > Hi, > > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. [...] > Experienced people will have no trouble navigating through these > results, but new users do. It is not terrible and at the end they get > it, but I think it would be nice to change it to more (new) user > friendly order. I'm an experienced person, and I have trouble navigating through the search results. I usually use Google or DuckDuckGo to search, and avoid the website's search functionality altogether. -- Steven From cs at zip.com.au Mon Dec 31 01:22:16 2012 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 31 Dec 2012 11:22:16 +1100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: <20121231002215.GA28101@cskk.homeip.net> On 30Dec2012 18:05, Terry Reedy wrote: | On 12/30/2012 1:11 PM, Ezio Melotti wrote: | > On Sun, Dec 30, 2012 at 7:54 PM, Hernan Grecco | > I have seen many people new to Python stumbling while using the Python | > docs due to the order of the search results. | | People should use the index, both on and off line. See the issue below Personally, I do. But even that is misleading, or at any rate often not so useful. And since there is a search, its quality should be addressed. IMO the index has similar issues to the search, though on a much smaller scale. You'll see here I'm only offering criticism, no fixes. Cheers, -- Cameron Simpson 'Soup: This is the one that Kawasaki sent out pictures, that looks so beautiful. Yanagawa: Yes, everybody says it's beautiful - but many problems! 'Soup: But you are not part of the design team, you're just a test rider. Yanagawa: Yes. I just complain. - _Akira Yanagawa Sounds Off_ @ www.amasuperbike.com From ryan at hackery.io Mon Dec 31 01:15:58 2012 From: ryan at hackery.io (Ryan Macy) Date: Sun, 30 Dec 2012 18:15:58 -0600 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: Message-ID: <50E0D93E.7010904@hackery.io> > Giampaolo Rodol? > December 30, 2012 5:38 PM > > > +1 > I agree it's sub-optimal. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > Hernan Grecco > December 30, 2012 11:54 AM > Hi, > > I have seen many people new to Python stumbling while using the Python > docs due to the order of the search results. > > For example, if somebody new to python searches for `tuple`, the > actual section about `tuple` comes in place 39. What is more confusing > for people starting with the language is that all the C functions come > first. I have seen people clicking in PyTupleObject just to be totally > disoriented. > > Maybe `tuple` is a silly example. But if somebody wants to know how > does `open` behaves and which arguments it takes, the result comes in > position 16. `property` does not appear in the list at all (but > built-in appears in position 31). This is true for most builtins. > > Experienced people will have no trouble navigating through these > results, but new users do. It is not terrible and at the end they get > it, but I think it would be nice to change it to more (new) user > friendly order. > > So my suggestion is to put the builtins first, the rest of the > standard lib later including HowTos, FAQ, etc and finally the > c-modules. Additionally, a section with a title matching exactly the > search query should come first. (I am not sure if the last suggestion > belongs in python-ideas or in > the sphinx mailing list, please advice) > > Thanks, > > Hernan > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas +1 as well, dash has come in handy! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postbox-contact.jpg Type: image/jpeg Size: 1283 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postbox-contact.jpg Type: image/jpeg Size: 1128 bytes Desc: not available URL: From solipsis at pitrou.net Mon Dec 31 01:48:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Dec 2012 01:48:21 +0100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow References: <50E0D6F0.5090200@hackery.io> Message-ID: <20121231014821.2fb21be8@pitrou.net> On Sun, 30 Dec 2012 18:06:08 -0600 Ryan Macy wrote: > I'm a young developer (22) that is aspiring to contribute to the python > language and I think another perspective could help the conversation. > What I believe Anataoly is asking for, albeit phrased in a different > manner, is the ability to clearly see what the core issues/needs are in > the language. I've been able to discern through time, and the python > mailing lists, that packaging, multitasking, and timezone support are > areas that could use help. Sure, 'wart' is subjective, but I believe the > point made in between the lines is valid. I'm not sure Anatoly is talking about things that have to be improved, rather than things which are lacking (in his opinion, or in the general opinion) and which nevertheless won't be fixed for various reasons. These things would have a place in the FAQ, if Anatoly wants to contribute documentation patches: http://docs.python.org/dev/faq/index.html Regards Antoine. From tjreedy at udel.edu Mon Dec 31 02:17:14 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 30 Dec 2012 20:17:14 -0500 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: I consider Anatoly's post to be off-topic, obnoxious, and best ignored. On 12/30/2012 5:20 PM, Victor Stinner wrote: I am only responding because Eli and then Victor responded, largely repeating things that have been said before (and ignored) on many of the same issues. > 2012/12/26 anatoly techtonik : >> I am thinking about [python-wart] on SO. The purpose of python-ideas is to discuss possible ideas for improving future versions of Python and the reference CPython implementation, including its included documentation. Announcements of independent personal activities are off topic. Announcements of thoughts about such activities are, to me, even more so. I have lots of thoughts about things I *might* do, and I am sure many others do too. Should we all post them here? I think not. I am actually working on, not just thinking about, a book that showcases many of the positive features of Python. But I do not think that an announcement post here is particularly on-topic. As for 'obnoxious', this is not just a post about thoughts, but of thoughts to abuse another forum to trash python, and a trashy justification for doing so. >> There is no currently a list of >> Python warts, and building a better language is impossible without a clear >> visibility of warts in current implementations. There is, of course, a tracker with, at the moment,3771 open issues. That is already too many. Repeatly regurgitating closed issue is an obnoxious distraction. > Sorry, but what is a wart in Python? A Python behavior that Anatoly does not like and that the CPython developers cannot, will not*, or have not yet# changed. By extension, our disliked-by-him actions are also warts. This ego-centric view is more of 'obnoxious'. * Perhaps because we consider the whole community, not just one person. # Perhaps because of ignorance or lack of interest. Berating us for not doing something that he will also not do (write a patch) is more of 'obnoxious'. >> Why Roundup doesn't work ATM. >> - warts are lost among other "won't fix" and "works for me" issues One can easily search the tracker for closed issues with any particular resolution. One can even limit the search for such issue with 'techtonik' on the nosy list. Results: 'rejected' 17 'invalid' 17 'won't fix' 10 'works for me' 17 The numbers are smaller if 'techtonik' is entered instead in the creator box. This is the core list of issues Anatoly would consider 'lost warts'. They are not lost, just not prominently displayed to the world in the way he would like. Spreading disinformation is more of 'obnoxious'. >> - no way to edit description to make it more clear There is no description field. The title of an issue and other descriptive headers can be edited and often are. There is an audit trail of changes. The description of a issue can be and sometimes is re-stated by the original author or others in successive messages. As a matter of audit trail policy, messages cannot be edited. They can be deleted from an issue (and that fact noted, and by who) but not (normally, anyway) from the database. So, more disinformation. Calling a disagreement over policy a 'wart' is disengenous. >> - no voting/stars to percieve how important is this issue Proposed and rejected before. Again: the devs don't do what Anatoly wants, its a wart. -- Terry Jan Reedy From steve at pearwood.info Mon Dec 31 02:39:18 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Dec 2012 12:39:18 +1100 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E04B39.2040508@nedbatchelder.com> References: <50E04B39.2040508@nedbatchelder.com> Message-ID: <50E0ECC6.6060004@pearwood.info> On 31/12/12 01:10, Ned Batchelder wrote: > What is the organizing principle for the methods list (or any other >built-in data structure) should have? I would hate for the main >criterion to be, "these are the methods that existed in Python 2.3," > for example. Why is .count() in and .replace() out? I fear that it is more likely to be "they existed in Python 1.5". As far as I can tell, there have been very few new methods added to standard types since Python 1.5, and possibly before that. Putting aside dunder methods, the only public list methods in 3.3 that weren't in 1.5 are clear and copy. Tuples also have two new methods, count and index. Dicts have seen a few more changes: - has_key is gone; - fromkeys, pop, popitem, and setdefault are added. So changes to builtin types have been very conservative. -- Steven From steve at pearwood.info Mon Dec 31 02:40:02 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 31 Dec 2012 12:40:02 +1100 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E0729E.1040208@mrabarnett.plus.com> References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> Message-ID: <50E0ECF2.7050107@pearwood.info> On 31/12/12 03:58, MRAB wrote: > On 2012-12-30 16:00, Ned Batchelder wrote: >> I don't understand the conflict? .replace() from sequence does >> precisely the same thing as .replace() from bytes if you limit the >> arguments to single-byte values. It seems perfectly natural to me. I >> must be missing something. >> > [snip] > The difference is that for bytes and str it returns the result (they > are immutable after all), but the suggested addition would mutate the > list in-place. In order to be consistent it would have to return the > result instead. Are you seriously suggesting that because str has a replace method with a specific API, no other type can have a replace method unless it has the same API? Why must list.replace and str.replace do exactly the same thing? Lists and strings are not the same, and you cannot in general expect to substitute lists with strings, or vice versa. collections.abc.MutableSequence would seem to me to be the right place for a mutator replace method. -- Steven From victor.stinner at gmail.com Mon Dec 31 03:00:08 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 31 Dec 2012 03:00:08 +0100 Subject: [Python-ideas] Dynamic code NOPing In-Reply-To: References: <5DFDB30C-9A3D-4939-81D7-F34727284148@stranden.com> <563458C3-9580-46AE-B343-6987116A3F08@stranden.com> Message-ID: If you mark constant.DEBUG as constant and compile your project with astoptimizer, enable_debug has no effect (if it was compiled with DEBUG=False). So only use it if DEBUG will not be changed at runtime. It cannot be used if your users might run your applucation in debug mode. To compare it to the C language, DEBUG would be a #define and astoptimizer can be see as a preprocessor. Victor Le 30 d?c. 2012 11:59, "Stefan Behnel" a ?crit : > Victor Stinner, 30.12.2012 11:42: > > My astoptimizer provides tools to really *remove* debug at compilation, > so > > the overhead of the debug code is just null. > > > > You can for example declare your variable project.config.DEBUG as > constant > > with the value 0, where project.config is a module. So the if statement > in > > "from project.config import DEBUG ... if DEBUG: ..." will be removed. > > How would you know at compile time that it can be removed? How do you > handle the example below? > > Stefan > > > ## constants.py > > DEBUG = False > > > ## enable_debug.py > > import constants > constants.DEBUG = True > > > ## test.py > > import enable_debug > from constants import DEBUG > > if DEBUG: > print("DEBUGGING !") > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Dec 31 03:48:10 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 30 Dec 2012 19:48:10 -0700 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E0ECF2.7050107@pearwood.info> References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> <50E0ECF2.7050107@pearwood.info> Message-ID: I would be very conservative here, since they are both builtin types, both sequences, and the reader may use the methods used as a hint about the type (a form of type inference if you will). The use case for list.replace() seems weak and we should beware of making standard interfaces too "thick" lest implementing alternative versions become too burdensome. --Guido On Sunday, December 30, 2012, Steven D'Aprano wrote: > On 31/12/12 03:58, MRAB wrote: > >> On 2012-12-30 16:00, Ned Batchelder wrote: >> > > I don't understand the conflict? .replace() from sequence does >>> precisely the same thing as .replace() from bytes if you limit the >>> arguments to single-byte values. It seems perfectly natural to me. I >>> must be missing something. >>> >>> [snip] >> The difference is that for bytes and str it returns the result (they >> are immutable after all), but the suggested addition would mutate the >> list in-place. In order to be consistent it would have to return the >> result instead. >> > > Are you seriously suggesting that because str has a replace method with > a specific API, no other type can have a replace method unless it has > the same API? > > Why must list.replace and str.replace do exactly the same thing? Lists > and strings are not the same, and you cannot in general expect to > substitute lists with strings, or vice versa. > > collections.abc.**MutableSequence would seem to me to be the right place > for a mutator replace method. > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- --Guido van Rossum (on iPad) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Mon Dec 31 04:10:57 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 31 Dec 2012 12:10:57 +0900 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <50E0D6F0.5090200@hackery.io> References: <50E0D6F0.5090200@hackery.io> Message-ID: <87fw2mor0e.fsf@uwakimon.sk.tsukuba.ac.jp> Ryan Macy writes: > I'm a young developer (22) that is aspiring to contribute to the python > language and I think another perspective could help the conversation. > What I believe Anataoly is asking for, albeit phrased in a different > manner, is the ability to clearly see what the core issues/needs are in > the language. Good luck on that. Just as you write, it is in fact a human ability, not a collection of facts that can be published. AFAICS, the core issues are what block core developers from getting applied work done. (Or cause them to stumble in process, for that matter.) The reason for this, based on introspection and watching a few Nobel prizewinners work, is that people working at that level have an uncanny ability to *ask* the right questions, and do so recursively. Of course they're usually really fast and accurate at answering them, too, but answering smallish research questions is an upperclass undergrad student[1] skill. The knack for filtering out inessential questions and zeroing in on the bottleneck is what makes them great. The flip side, of course, is that because a core developer is blocked, he or she is working on it. So maybe you won't get a chance to make a big contribution there -- it will be solved by the time you figure out what to do. ;-) > Is there a place that holds the key improvements that the python > language needs, so that we can work to it better? If that's the bug > tracker, Bingo! > is there a method already in place that signals areas that need > improvements or fixes? [I know that there are severity levels, etc :)] The problem is that "need" is mostly subjective. In Python there are several objectifiable criteria, encoded in the venerable Zen of Python, and more recently in the thread answering Ned Batchelder's question on what makes a good change to the stdlib. But if you look at them, I suspect that you'll come to the same conclusion that I do: need is defined by what at least some programmers often want to do and are likely to do imperfectly, even if they do it repeatedly. That's "need", and it's dynamic, only imperfectly correlated with the state of the language. The only reliable measure of need is what somebody is willing to provide a high-quality patch for. Just Do It! :-) > [There is] context that I'm missing, which has me chasing PEPs > constantly Well, when you catch one, take it out to lunch. Spend some time in conversation with it. Figure out what the person who wrote it was thinking, and why. :-) > - So I'm definitely able to resonate the with this thread. > > I apologize if I'm waay off target. Not at all. I just don't think there's a royal road to core contribution. The flip side of that is that as far as defining "need" goes, what you perceive as important is no less important than what Guido does. It's just that he has a proven knack for picking questions that others value too, and for giving answers that untangle the language, as well as solving a practical problem. But that doesn't mean you should work on what Guido thinks is important just because he thinks it's important. If you resonate with the need he feels, then you will find ways to contribute to resolving it. I haven't seen the word "channel" around here recently, but trying to channel the core developers on problems you encounter is a good way to get started. Try to anticipate what they'll say if you post (or in response to somebody else's post that interests you). When conversing with a PEP, try to figure out what it's going to propose as the solution before you read it. Try to figure out what problems it will need to solve to achieve its goal, etc. Steve From stephen at xemacs.org Mon Dec 31 04:42:09 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 31 Dec 2012 12:42:09 +0900 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: <87ehi6opke.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > I consider Anatoly's post to be off-topic, obnoxious, and best ignored. > > On 12/30/2012 5:20 PM, Victor Stinner wrote: > > I am only responding because Eli and then Victor responded, largely > repeating things that have been said before (and ignored) on many of the > same issues. +1 > > 2012/12/26 anatoly techtonik : > >> I am thinking about [python-wart] on SO. > > The purpose of python-ideas is to discuss possible ideas for improving > future versions of Python and the reference CPython implementation, > including its included documentation. I think it would be fair to s/included//. See the doc site search engine thread, which nobody (including you) seems to think off-topic. > Announcements of independent personal activities are off topic. Not at all. Announcing a PyPI project and requesting testing for potential stdlib inclusion, for example. Doesn't fit exactly, but what's the preferred venue? > Announcements of thoughts about such activities are, to me, even more > so. It's the lack of any pre-posting filter whatsoever, combined with a lack of patches, that leads me to ignore Anatoly. This is more of the same. Nevertheless, a desire for a list of "important unsolved problems" is common (cf Ryan's post). > > Sorry, but what is a wart in Python? > > A Python behavior that Anatoly does not like and that the CPython > developers cannot, will not*, or have not yet# changed. By extension, > our disliked-by-him actions are also warts. That is apparently Anatoly's operational definition, yes. However, it's easy to define conceptually. A wart in Python is an un-Pythonic functionality, or an un-Pythonic implementation of functionality. The print statement was a wart. It was an interesting idea, like syntactic indentation. The former didn't work for Python, the latter did and still does.[1] That makes it clear to me why Anatoly's proposal is perverse. The word "Pythonic" itself cannot be defined by stars on a Roundup issue or user posts to StackOverflow. Ultimately it's defined by Guido, I suppose, but by now many developers have been shown to have an excellent sense, sufficient to get Guido to change his mind on occasion. It is not, however, a matter for democratic decision. The word "wart" itself is useful, when used by those know what "Pythonic" means. It's a warning: you will break your teeth if you just try to bite it off. So it's not very useful in guiding the work of new developers, because the bar is high and the benefits small. Most warts in Python 3 (and there are far fewer than Anatoly seems to think) will have to wait for Python 4, absent solutions of true genius. Footnotes: [1] As a way of tweaking the nose of paren-lovers, if nothing else. From ncoghlan at gmail.com Mon Dec 31 04:47:02 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 31 Dec 2012 13:47:02 +1000 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E0ECF2.7050107@pearwood.info> References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> <50E0ECF2.7050107@pearwood.info> Message-ID: The problem is bytearray, not bytes and str. bytearray is a builtin mutable sequence with a non-destructive replace() method. It doesn't matter that this is almost certainly just a mistake due to its immutable bytes heritage, the presence of that method is enough to categorically rule out the idea of adding a destructive replace() method to mutable sequences in general. -- Sent from my phone, thus the relative brevity :) On Dec 31, 2012 11:41 AM, "Steven D'Aprano" wrote: > On 31/12/12 03:58, MRAB wrote: > >> On 2012-12-30 16:00, Ned Batchelder wrote: >> > > I don't understand the conflict? .replace() from sequence does >>> precisely the same thing as .replace() from bytes if you limit the >>> arguments to single-byte values. It seems perfectly natural to me. I >>> must be missing something. >>> >>> [snip] >> The difference is that for bytes and str it returns the result (they >> are immutable after all), but the suggested addition would mutate the >> list in-place. In order to be consistent it would have to return the >> result instead. >> > > Are you seriously suggesting that because str has a replace method with > a specific API, no other type can have a replace method unless it has > the same API? > > Why must list.replace and str.replace do exactly the same thing? Lists > and strings are not the same, and you cannot in general expect to > substitute lists with strings, or vice versa. > > collections.abc.**MutableSequence would seem to me to be the right place > for a mutator replace method. > > > -- > Steven > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Mon Dec 31 06:38:57 2012 From: random832 at fastmail.us (Random832) Date: Mon, 31 Dec 2012 00:38:57 -0500 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> <50E0ECF2.7050107@pearwood.info> Message-ID: <50E124F1.8000406@fastmail.us> On 12/30/2012 10:47 PM, Nick Coghlan wrote: > > The problem is bytearray, not bytes and str. > > bytearray is a builtin mutable sequence with a non-destructive > replace() method. It doesn't matter that this is almost certainly just > a mistake due to its immutable bytes heritage, the presence of that > method is enough to categorically rule out the idea of adding a > destructive replace() method to mutable sequences in general. > All this discussion is, of course, before getting into the fact that string, bytes, and bytearray .replace() methods all work on subsequences rather than elements. From dkreuter at gmail.com Mon Dec 31 07:17:32 2012 From: dkreuter at gmail.com (David Kreuter) Date: Mon, 31 Dec 2012 07:17:32 +0100 Subject: [Python-ideas] proposed methods: list.replace / list.indices In-Reply-To: <50E0729E.1040208@mrabarnett.plus.com> References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> Message-ID: On Sun, Dec 30, 2012 at 5:58 PM, MRAB wrote: > On 2012-12-30 16:00, Ned Batchelder wrote: >> >> I don't understand the conflict? .replace() from sequence does >> precisely the same thing as .replace() from bytes if you limit the >> arguments to single-byte values. It seems perfectly natural to me. I >> must be missing something. >> >> [snip] > The difference is that for bytes and str it returns the result (they > are immutable after all), but the suggested addition would mutate the > list in-place. In order to be consistent it would have to return the > result instead. I don't think that consistency between str and list is desirable. If .index for example were consistent in str and list it would look like this: [9, 8, 7, 6, 5].index([8,7]) # = 1 Also, reversed, sorted (copy) list.reverse, list.sort (in-place) >From that perspective list.replace working in-place *is* consistent. However, I can see that this '.replace' might cause more confusion than future code clarity. What about .indices though? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Dec 31 16:00:41 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 31 Dec 2012 07:00:41 -0800 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: <50E1A899.3000602@stoneleaf.us> Victor Stinner wrote: > A list of such issue without solution doesn't help anyone. I disagree: knowledge of a problem is beneficial even when a workaround is not known. ~Ethan~ From jstpierre at mecheye.net Mon Dec 31 08:31:48 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Mon, 31 Dec 2012 02:31:48 -0500 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: We already have a collection of "warts" or "gotchas": http://docs.python.org/3/faq/design.html#why-must-dictionary-keys-be-immutable http://docs.python.org/3/faq/design.html#why-doesn-t-list-sort-return-the-sorted-list http://docs.python.org/3/faq/design.html#why-are-default-values-shared-between-objects http://docs.python.org/3/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash and so on. Note that that document is probably extremely out of date, but there is an existing place for them. On Tue, Dec 25, 2012 at 7:10 PM, anatoly techtonik wrote: > I am thinking about [python-wart] on SO. There is no currently a list of > Python warts, and building a better language is impossible without a clear > visibility of warts in current implementations. > > Why Roundup doesn't work ATM. > - warts are lost among other "won't fix" and "works for me" issues > - no way to edit description to make it more clear > - no voting/stars to percieve how important is this issue > - no comment/noise filtering > and the most valuable > - there is no query to list warts sorted by popularity to explore other > time-consuming areas of Python you are not aware of, but which can popup > one day > > SO at least allows: > + voting > + community wiki edits > + useful comment upvoting > + sorted lists > + user editable tags (adding new warts is easy) > > This post is a result of facing with numerous locals/settrace/exec issues > that are closed on tracker. I also have my own list of other issues > (logging/subprocess) at GC project, which I might be unable to maintain in > future. There is also some undocumented stuff (subprocess deadlocks) that > I'm investigating, but don't have time for a write-up. So I'd rather move > this somewhere where it could be updated. > -- > anatoly t. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyideas at rebertia.com Mon Dec 31 08:56:04 2012 From: pyideas at rebertia.com (Chris Rebert) Date: Sun, 30 Dec 2012 23:56:04 -0800 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: > On Tue, Dec 25, 2012 at 7:10 PM, anatoly techtonik > wrote: >> >> I am thinking about [python-wart] on SO. There is no currently a list of >> Python warts, and building a better language is impossible without a clear >> visibility of warts in current implementations. >> >> Why Roundup doesn't work ATM. >> - warts are lost among other "won't fix" and "works for me" issues >> - no way to edit description to make it more clear >> - no voting/stars to percieve how important is this issue >> - no comment/noise filtering >> and the most valuable >> - there is no query to list warts sorted by popularity to explore other >> time-consuming areas of Python you are not aware of, but which can popup one >> day >> >> SO at least allows: >> + voting >> + community wiki edits >> + useful comment upvoting >> + sorted lists >> + user editable tags (adding new warts is easy) >> >> This post is a result of facing with numerous locals/settrace/exec issues >> that are closed on tracker. I also have my own list of other issues >> (logging/subprocess) at GC project, which I might be unable to maintain in >> future. There is also some undocumented stuff (subprocess deadlocks) that >> I'm investigating, but don't have time for a write-up. So I'd rather move >> this somewhere where it could be updated. On Sun, Dec 30, 2012 at 11:31 PM, Jasper St. Pierre wrote: > We already have a collection of "warts" or "gotchas": > > http://docs.python.org/3/faq/design.html#why-must-dictionary-keys-be-immutable > http://docs.python.org/3/faq/design.html#why-doesn-t-list-sort-return-the-sorted-list > http://docs.python.org/3/faq/design.html#why-are-default-values-shared-between-objects > http://docs.python.org/3/faq/design.html#why-can-t-raw-strings-r-strings-end-with-a-backslash > > and so on. Note that that document is probably extremely out of date, but > there is an existing place for them. When much older Python 2.x-s were still in their heyday, there were some popular 3rd-party lists: http://lwn.net/Articles/43059/ http://zephyrfalcon.org/labs/python_pitfalls.html http://www.ferg.org/projects/python_gotchas.html (FWICT, Andrew Kuchling's article led to the "warts" terminology.) Cheers, Chris From stefan at drees.name Mon Dec 31 08:47:11 2012 From: stefan at drees.name (Stefan Drees) Date: Mon, 31 Dec 2012 08:47:11 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: <50E083BA.7000603@nedbatchelder.com> Message-ID: <50E142FF.3070101@drees.name> On 30.12.12 20:45, Georg Brandl wrote: > On 12/30/2012 07:11 PM, Ned Batchelder wrote: >> On 12/30/2012 12:54 PM, Hernan Grecco wrote: >>> ... >>> I have seen many people new to Python stumbling while using the Python >>> docs due to the order of the search results. >>> ... >>> So my suggestion is to put the builtins first, the rest of the >>> standard lib later including HowTos, FAQ, etc and finally the >>> c-modules. Additionally, a section with a title matching exactly the >>> search query should come first. (I am not sure if the last suggestion >>> belongs in python-ideas or in >>> the sphinx mailing list, please advice) >> >> While we're on the topic, why in this day and age do we have a custom >> search? Using google site search would be faster for the user, and more >> accurate. > > I agree. Someone needs to propose a patch though. > ... a custom search in itself is a wonderful thing. To me it also shows more appreciation of visitor concerns than thoses sites, that are just _offering_ google site search (which is accessible anyway to every visitor capable of memorizing the google or bing or whatnot URL). I second Hernans suggestion about ordering and also his question where the request (and patches) should be directed to. All the best, Stefan. From solipsis at pitrou.net Mon Dec 31 12:52:06 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 31 Dec 2012 12:52:06 +0100 Subject: [Python-ideas] proposed methods: list.replace / list.indices References: <50E04B39.2040508@nedbatchelder.com> <50E0651F.6000305@nedbatchelder.com> <50E0729E.1040208@mrabarnett.plus.com> Message-ID: <20121231125206.64b17fce@pitrou.net> On Mon, 31 Dec 2012 07:17:32 +0100 David Kreuter wrote: > > I don't think that consistency between str and list is desirable. If .index > for example were consistent in str and list it would look like this: > > [9, 8, 7, 6, 5].index([8,7]) # = 1 > > Also, > reversed, sorted (copy) > list.reverse, list.sort (in-place) > From that perspective list.replace working in-place *is* consistent. > > However, I can see that this '.replace' might cause more confusion than > future code clarity. Another name could be found if necessary. > What about .indices though? I've never needed it myself. The fact that it's O(n) seems to hint that a list is not the right data structure for the use cases you may be thinking about :) Regards Antoine. From maxmoroz at gmail.com Mon Dec 31 23:16:55 2012 From: maxmoroz at gmail.com (Max Moroz) Date: Mon, 31 Dec 2012 14:16:55 -0800 Subject: [Python-ideas] Preventing out of memory conditions Message-ID: Sometimes, I have the flexibility to reduce the memory used by my program (e.g., by destroying large cached objects, etc.). It would be great if I could ask Python interpreter to notify me when memory is running out, so I can take such actions. Of course, it's nearly impossible for Python to know in advance if the OS would run out of memory with the next malloc call. Furthermore, Python shouldn't guess which memory (physical, virtual, etc.) is relevant in the particular situation (for instance, in my case, I only care about physical memory, since swapping to disk makes my application as good as frozen). So the problem as stated above is unsolvable. But let's say I am willing to do some work to estimate the maximum amount of memory my application can be allowed to use. If I provide that number to Python interpreter, it may be possible for it to notify me when the next memory allocation would exceed this limit by calling a function I provide it (hopefully passing as arguments the amount of memory being requested, as well as the amount currently in use). My callback function could then destroy some objects, and return True to indicate that some objects were destroyed. At that point, the intepreter could run its standard garbage collection routines to release the memory that corresponded to those objects - before proceeding with whatever it was trying to do originally. (If I returned False, or if I didn't provide a callback function at all, the interpreter would simply behave as it does today.) Any memory allocations that happen while the callback function itself is executing, would not trigger further calls to it. The whole mechanism would be disabled for the rest of the session if the memory freed by the callback function was insufficient to prevent going over the memory limit. Would this be worth considering for a future language extension? How hard would it be to implement? Max From phd at phdru.name Mon Dec 31 01:00:12 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 31 Dec 2012 04:00:12 +0400 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: Message-ID: <20121231000012.GA10426@iskra.aviel.ru> Hello and happy New Year! On Sun, Dec 30, 2012 at 11:20:34PM +0100, Victor Stinner wrote: > If I understood correctly, you would like to list some specific issues > like print() not flushing immediatly stdout if you ask to not write a > newline (print "a", in Python 2 or print("a", end=" ") in Python 3). > If I understood correctly, and if you want to improve Python, you > should help the documentation project. Or if you can build a website > listing such issues *and listing solutions* like calling > sys.stdout.flush() or using print(flush=True) (Python 3.3+) for the > print issue. > > A list of such issue without solution doesn't help anyone. I cannot say for Anatoly but for me warts are: -- things that don't exist where they should (but the core team object or they are hard to implement or something); -- things that exist where they shouldn't; they are hard to fix because removing them would break backward compatibility; -- things that are implemented in strange, inconsistent ways. A few examples: -- things that don't exist in the language where they should: anonymous code blocks (multiline lambdas); case (switch) statements; do/until loops; -- things that exist in the language where they shouldn't: else clause in 'for' loops (documentation doesn't help); -- things that don't exist in the stdlib where they should: asynchronous network libs (ftp/http/etc); GUI toolkit wrappers (GTK and/or Qt); SQL DB API drivers; SSL (key/certificate generation and parsing); restricted execution (remember rexec and Bastion?); -- things that exist in the stdlib where they shouldn't: tkinter (Tk is the rarest GUI toolkit in use), turtle; smtpd.py (it's a program, not a library); -- things that are implemented in strange, inconsistent ways: limited expression syntax in decorators (only attr access and calls); heapq (not object-oriented). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN.