From casevh at gmail.com Mon Oct 1 00:08:07 2012 From: casevh at gmail.com (Case Van Horsen) Date: Sun, 30 Sep 2012 15:08:07 -0700 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: On Sun, Sep 30, 2012 at 2:51 PM, Joshua Landau wrote: > On 30 September 2012 22:48, Joshua Landau > wrote: >> >> This seems like a problem for the proposal, though: we can't have it in >> the math library if it's a method! > > > Now I think about it: yeah, it can be. We just coerce to float/decimal > first. *sigh* math.ceil(x), math.floor(x), and math.trunc(x) and round(x) already call the special methods x.__ceil__, x.__floor__, x.__round__, and x.__trunc__. So those four functions already work with decimal instances (and other numeric types that support those methods.) casevh > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From joshua.landau.ws at gmail.com Mon Oct 1 00:19:33 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 30 Sep 2012 23:19:33 +0100 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: On 30 September 2012 23:08, Case Van Horsen wrote: > On Sun, Sep 30, 2012 at 2:51 PM, Joshua Landau > wrote: > > On 30 September 2012 22:48, Joshua Landau > > wrote: > >> > >> This seems like a problem for the proposal, though: we can't have it in > >> the math library if it's a method! > > > > > > Now I think about it: yeah, it can be. We just coerce to float/decimal > > first. *sigh* > math.ceil(x), math.floor(x), and math.trunc(x) and round(x) already > call the special methods x.__ceil__, x.__floor__, x.__round__, and > x.__trunc__. So those four functions already work with decimal > instances (and other numeric types that support those methods.) > >>> math.ceil("") > Traceback (most recent call last): > File "", line 1, in > TypeError: *a float is required* How deceptive... I hope you forgive me for not realizing that (even though I must have seen the __ceil__ and __floor__ methods a thousand times). OK, carry on. -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Oct 1 03:44:36 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 1 Oct 2012 11:44:36 +1000 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: <20121001014435.GC8499@ando> On Sun, Sep 30, 2012 at 02:38:33PM -0700, Gregory P. Smith wrote: > Why suggest adding new round-like functions to the math module rather than > defining a new round method on all numerical objects? round already calls the special __round__ method, and in 3.2 works with ints, floats, Decimals and Fractions. Only complex misses out. py> round(12345, -2) 12300 py> from decimal import Decimal as D py> round(D("1.2345"), 2) Decimal('1.23') py> from fractions import Fraction as F py> round(F(12345, 10000), 2) Fraction(123, 100) -- Steven From tjreedy at udel.edu Mon Oct 1 05:17:55 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 30 Sep 2012 23:17:55 -0400 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: On 9/30/2012 6:19 PM, Joshua Landau wrote: > >>> math.ceil("") > Traceback (most recent call last): > File "", line 1, in > TypeError: *a float is required* > > > How deceptive... I hope you forgive me for not realizing that (even > though I must have seen the __ceil__ and __floor__ methods a thousand > times). > OK, carry on. The obsolete error message should be fixed. A number is required. Or perhaps 'float or number with __ceil__ method'. -- Terry Jan Reedy From dreamingforward at gmail.com Mon Oct 1 06:46:05 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Sun, 30 Sep 2012 23:46:05 -0500 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: <50642918.1060804@canterbury.ac.nz> References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: On Thu, Sep 27, 2012 at 5:23 AM, Greg Ewing wrote: > Presumably they would be implemented as module objects, > created automatically at interpreter startup instead of > being loaded from a file. > > In which case "built-in module" might be a better term > for them. And their names should start with lower case. That's cool. YES, lowercase. > Also you wouldn't need new syntax to get names out of > them, just the existing import machinery: > > from numbers import * Well, to me there must be a clear partitioning. The stuff in the builtin [module] sets the tone for the whole interpreter environment (and I think python culture itself). If one were to use the standard import language (like in your example), it confuses one "semantically" -- because you're suggesting to treat a it (i.e. a whole class of "things") as something optional. Does that make sense? Thanks, markj From steve at pearwood.info Mon Oct 1 08:05:51 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 1 Oct 2012 16:05:51 +1000 Subject: [Python-ideas] Namespaces and modules [was Deprecate the round builtin] In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> Message-ID: <20121001060551.GA9193@ando> On Sun, Sep 30, 2012 at 11:46:05PM -0500, Mark Adam wrote: > On Thu, Sep 27, 2012 at 5:23 AM, Greg Ewing wrote: > > Presumably they would be implemented as module objects, > > created automatically at interpreter startup instead of > > being loaded from a file. > > > > In which case "built-in module" might be a better term > > for them. And their names should start with lower case. > > That's cool. YES, lowercase. I'm not sure why "built-in module" is a better term for something which I gather is a separate namespace within a module, so you can have: module.spam # global namespace module.sub.spam # sub is a "submodule" or "namespace" but sub has no independent existence as a file on disk. If that's what we're discussing, I don't think that "built-in module" is a good name, since it isn't *built-in*. We already have something called "built-in modules" -- modules like sys which actually are built-in to the Python virtual machine. > > Also you wouldn't need new syntax to get names out of > > them, just the existing import machinery: > > > > from numbers import * > > Well, to me there must be a clear partitioning. > > The stuff in the builtin [module] sets the tone for the whole > interpreter environment (and I think python culture itself). If one > were to use the standard import language (like in your example), it > confuses one "semantically" -- because you're suggesting to treat a it > (i.e. a whole class of "things") as something optional. > > Does that make sense? Not to me, I'm afraid. -- Steven From stefan_ml at behnel.de Mon Oct 1 09:22:50 2012 From: stefan_ml at behnel.de (Stefan Behnel) Date: Mon, 01 Oct 2012 09:22:50 +0200 Subject: [Python-ideas] make decimal the default non-integer instead of float? In-Reply-To: References: Message-ID: Serhiy Storchaka, 30.09.2012 18:35: > Instructive story about fractions: > http://python-history.blogspot.com/2009/03/problem-with-integer-division.html Sorry - I don't get it. Instructive in what way? Stefan From solipsis at pitrou.net Mon Oct 1 13:57:46 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 1 Oct 2012 13:57:46 +0200 Subject: [Python-ideas] Deprecate the round builtin References: <50635956.4050409@egenix.com> Message-ID: <20121001135746.5f0a9acf@pitrou.net> On Wed, 26 Sep 2012 17:21:40 -0400 Daniel Holth wrote: > Normally deprecation means you keep it forever but don't mention it much in > the docs... Not really. Most deprecated things disappear one or two versions after they are deprecated. We only keep something forever when removing it would break a lot of code and keeping it is cheap. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From jimjjewett at gmail.com Mon Oct 1 17:43:04 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 1 Oct 2012 11:43:04 -0400 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <506860DA.7060905@pearwood.info> References: <506860DA.7060905@pearwood.info> Message-ID: On 9/30/12, Steven D'Aprano wrote: > On 01/10/12 00:00, Oscar Benjamin wrote: > py> A = 42 > py> ? = 23 > py> A == ? > False It will never be possible to catch all confusables, which is one reason that the unicode property stalled. It seems like it would be reasonable to at least warn when identifiers are not all in the same script -- but real-world examples from Emacs Lisp made it clear that this is often intentional. There were still clear word-boundaries, but it wasn't clear how that word-boundary detection could be properly automated in the general case. > Besides, just because you and I can't distinguish A from ? in my editor, > using one particular choice of font, doesn't mean that the author or his > intended audience (Greek programmers perhaps?) can't distinguish them, In many cases, it does -- for the letters to look different requires an unnatural font choice, though perhaps not so extreme as the print-the-hex-code font. > I would welcome "confusable detection" in the standard library, possibly a > string method "skeleton" or some other interface to the Confusables file, > perhaps in unicodedata. I would too, and agree that it shouldn't be limited to identifiers. -jJ From grosser.meister.morti at gmx.net Mon Oct 1 18:07:19 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 01 Oct 2012 18:07:19 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> Message-ID: <5069BFB7.7070207@gmx.net> I still don't understand why unicode characters are allowed at all in identifier names. Is the reason for this written down somewhere? On 10/01/2012 05:43 PM, Jim Jewett wrote: > On 9/30/12, Steven D'Aprano wrote: >> On 01/10/12 00:00, Oscar Benjamin wrote: > >> py> A = 42 >> py> ? = 23 >> py> A == ? >> False > > It will never be possible to catch all confusables, which is one > reason that the unicode property stalled. > > It seems like it would be reasonable to at least warn when identifiers > are not all in the same script -- but real-world examples from Emacs > Lisp made it clear that this is often intentional. There were still > clear word-boundaries, but it wasn't clear how that word-boundary > detection could be properly automated in the general case. > >> Besides, just because you and I can't distinguish A from ? in my editor, >> using one particular choice of font, doesn't mean that the author or his >> intended audience (Greek programmers perhaps?) can't distinguish them, > > In many cases, it does -- for the letters to look different requires > an unnatural font choice, though perhaps not so extreme as the > print-the-hex-code font. > >> I would welcome "confusable detection" in the standard library, possibly a >> string method "skeleton" or some other interface to the Confusables file, >> perhaps in unicodedata. > > I would too, and agree that it shouldn't be limited to identifiers. > > -jJ From dreamingforward at gmail.com Mon Oct 1 18:12:46 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Mon, 1 Oct 2012 11:12:46 -0500 Subject: [Python-ideas] Namespaces and modules [was Deprecate the round builtin] In-Reply-To: <20121001060551.GA9193@ando> References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> <20121001060551.GA9193@ando> Message-ID: On Mon, Oct 1, 2012 at 1:05 AM, Steven D'Aprano wrote: > I'm not sure why "built-in module" is a better term for something which > I gather is a separate namespace within a module, so you can have: Yeah, I'm not really sure it makes sense to call it a module at all. I was sort of capitulating about the use of the word "module". It's not like you can do "import __builtins__" in the interpreter, so if one is going to call it a module (like the interpreter currently does), one should see that it is a very special exception of the word. I prefer "namespace", it's the built-in namespace which is a synonym for "the global module". >> Well, to me there must be a clear partitioning. >> >> The stuff in the builtin [module] sets the tone for the whole >> interpreter environment (and I think python culture itself). If one >> were to use the standard import language (like in your example), it >> confuses one "semantically" -- because you're suggesting to treat a it >> (i.e. a whole class of "things") as something optional. >> >> Does that make sense? > > Not to me, I'm afraid. Hopefully the above makes it a little clearer. But, it's as if you're going on a road trip, you want to travel efficient and light -- what you include in your backpack ("interpreter environment") is your "builtin" and everything else you'll "buy"/import on the road. Modules are those things on the road. mark From rosuav at gmail.com Mon Oct 1 18:19:40 2012 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 2 Oct 2012 02:19:40 +1000 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069BFB7.7070207@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> Message-ID: On Tue, Oct 2, 2012 at 2:07 AM, Mathias Panzenb?ck wrote: > I still don't understand why unicode characters are allowed at all in > identifier names. Is the reason for this written down somewhere? Same reason you're allowed more than two letters in your identifiers: to allow programmers to make variable names meaningful. The problem isn't with Unicode, anyway; there are plenty of fonts in which l and 1 are practically identical, and unless your font is monospaced, you probably will have trouble distinguishing __________rn___ from __________m___ (just how many underscores IS that?). It's up to the programmer to be smart about his names. ChrisA From robert.kern at gmail.com Mon Oct 1 18:43:40 2012 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 01 Oct 2012 17:43:40 +0100 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069BFB7.7070207@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> Message-ID: On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: > I still don't understand why unicode characters are allowed at all in identifier > names. Is the reason for this written down somewhere? http://www.python.org/dev/peps/pep-3131/#rationale -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From grosser.meister.morti at gmx.net Mon Oct 1 19:02:07 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 01 Oct 2012 19:02:07 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> Message-ID: <5069CC8F.2070605@gmx.net> On 10/01/2012 06:43 PM, Robert Kern wrote: > On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >> I still don't understand why unicode characters are allowed at all in identifier >> names. Is the reason for this written down somewhere? > > http://www.python.org/dev/peps/pep-3131/#rationale > But the Python keywords and more importantly the documentation is English. Don't you need to be able to speak/write English in order to code Python anyway? And if you keep you code+comments English you can access a much larger developer pool (all developers who speak English should by my hypothesis be a superset of all developers who speak a certain language). From massimo.dipierro at gmail.com Mon Oct 1 19:18:31 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 1 Oct 2012 12:18:31 -0500 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> Message-ID: <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write. When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem. Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language. I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet. Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so. Massimo On Oct 1, 2012, at 11:43 AM, Robert Kern wrote: > On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >> I still don't understand why unicode characters are allowed at all in identifier >> names. Is the reason for this written down somewhere? > > http://www.python.org/dev/peps/pep-3131/#rationale > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Mon Oct 1 19:44:42 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Oct 2012 10:44:42 -0700 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069CC8F.2070605@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> Message-ID: On Mon, Oct 1, 2012 at 10:02 AM, Mathias Panzenb?ck wrote: > On 10/01/2012 06:43 PM, Robert Kern wrote: >> >> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>> >>> I still don't understand why unicode characters are allowed at all in >>> identifier >>> names. Is the reason for this written down somewhere? >> >> >> http://www.python.org/dev/peps/pep-3131/#rationale >> > > But the Python keywords and more importantly the documentation is English. > Don't you need to be able to speak/write English in order to code Python > anyway? And if you keep you code+comments English you can access a much > larger developer pool (all developers who speak English should by my > hypothesis be a superset of all developers who speak a certain language). Hi Matthias, Your objections go pretty much exactly along the lines of my original resistance to this proposal (which was proposed many times before it got to be a PEP). What finally made me change my mind was talking to educators who were teaching Python in countries where not only English is not the primary language, the primary language is not even related to English. (E.g. Chinese or Japanese.) Teaching the students the necessary language keywords and standard library names is not that difficult; even if English *is* your primary language you have to learn what they mean in the context of programming. (Example: "print" comes from a very ancient mode of using computers where the only form of output was through a physical printer.) But these students often have a very limited English vocabulary, and their science and math classes (which are often useful starting points for programming exercises) are usually taught in the native language. So when teachers show students example programs it helps if they can name e.g. their variables and functions in the native language. Comments are also often written in the native language. Here, it really helps if the students can type their native language directly rather than having to use the Latin transcription (even if they often also have to learn the latter, for unrelated pragmatic reasons). >From your name and email it sounds like your native language might be German. Like me, you probably take pride in your English skills and like me, you write all your code using English for identifiers and comments. However, for students just learning to program and not yet well-versed in English, that would be like trying to teach them multiple things at once. It may work for the smartest students, but it probably would be unnecessarily off-putting for many others. As an example in German, I found a Python book aimed at middle- and high-schoolers written in German, Python f?r Kids. You can look inside it on the Amazon website: http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514 -- the examples use German words for most module and variable names. Luckily German limited to ASCII is still fairly readable ("fuer" instead of "f?r" etc.), so Unicode is not strictly needed for this case -- but you can understand that in languages whose native alphabet is not English, Unicode is essential for the same style of introduction. I'm sure there are also examples beyond education -- e.g. in a program for calculating dutch taxes I would use the dutch names for the various technical terms naming concepts in dutch tax law, and again, in the case of the Dutch language that doesn't require Unicode, but for many other languages it would. I hope this helps. (Also note, as the PEP states explicitly, that the Python standard library should use only ASCII and English for identifiers and comments, except in those unittests that are specifically testing the Unicode identifiers feature.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 1 19:51:54 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Oct 2012 10:51:54 -0700 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> Message-ID: On Mon, Oct 1, 2012 at 10:18 AM, Massimo DiPierro wrote: > The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write. > > When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem. > > Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language. > > I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet. > > Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so. Our posts crossed. I hope my explanation makes sense to you. The age / grade level of students probably matters; all classes in middle or high school are typically taught in the native language, but in University more and more courses are taught in English (some European countries are even making English the mandatory teachkng language at the University level). Not everything you design is meant to be a better power plug for the world. Sometimes you just need to find a way to fit *your* oven in *your* cabinet, and cutting up some planks in a way that wouldn't work for anyone else is fine. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Mon Oct 1 19:48:44 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 01 Oct 2012 19:48:44 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069CC8F.2070605@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> Message-ID: On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote: > On 10/01/2012 06:43 PM, Robert Kern wrote: >> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>> I still don't understand why unicode characters are allowed at all in identifier >>> names. Is the reason for this written down somewhere? >> >> http://www.python.org/dev/peps/pep-3131/#rationale >> > > But the Python keywords and more importantly the documentation is English. Don't you need to be able > to speak/write English in order to code Python anyway? And if you keep you code+comments English you > can access a much larger developer pool (all developers who speak English should by my hypothesis be > a superset of all developers who speak a certain language). Please; the PEP has been discussed quite a lot when it was proposed, and believe me, yours is not an unfamiliar argument :) You're about 5 years late. Georg From solipsis at pitrou.net Mon Oct 1 20:04:06 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 1 Oct 2012 20:04:06 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> Message-ID: <20121001200406.5228060a@pitrou.net> On Mon, 1 Oct 2012 10:44:42 -0700 Guido van Rossum wrote: > > As an example in German, I found a Python book aimed at middle- and > high-schoolers written in German, Python f?r Kids. You can look inside > it on the Amazon website: > http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514 Oh but why isn't it named Python f?r Kinder? :-) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Mon Oct 1 20:10:32 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Oct 2012 11:10:32 -0700 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <20121001200406.5228060a@pitrou.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> Message-ID: On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou wrote: > On Mon, 1 Oct 2012 10:44:42 -0700 > Guido van Rossum wrote: >> >> As an example in German, I found a Python book aimed at middle- and >> high-schoolers written in German, Python f?r Kids. You can look inside >> it on the Amazon website: >> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514 > > Oh but why isn't it named Python f?r Kinder? :-) Probably to be "cool" for the "kids". Why is a mobile phone in Germany called a "Handy" ? -- --Guido van Rossum (python.org/~guido) From jkbbwr at gmail.com Mon Oct 1 20:12:41 2012 From: jkbbwr at gmail.com (Jakob Bowyer) Date: Mon, 1 Oct 2012 19:12:41 +0100 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> Message-ID: Because it fits in your hand? And its handy? :) On Mon, Oct 1, 2012 at 7:10 PM, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou wrote: >> On Mon, 1 Oct 2012 10:44:42 -0700 >> Guido van Rossum wrote: >>> >>> As an example in German, I found a Python book aimed at middle- and >>> high-schoolers written in German, Python f?r Kids. You can look inside >>> it on the Amazon website: >>> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514 >> >> Oh but why isn't it named Python f?r Kinder? :-) > > Probably to be "cool" for the "kids". Why is a mobile phone in Germany > called a "Handy" ? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From tjreedy at udel.edu Mon Oct 1 20:21:05 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 01 Oct 2012 14:21:05 -0400 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069CC8F.2070605@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> Message-ID: On 10/1/2012 1:02 PM, Mathias Panzenb?ck wrote: > On 10/01/2012 06:43 PM, Robert Kern wrote: >> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>> I still don't understand why unicode characters are allowed at all in >>> identifier >>> names. Is the reason for this written down somewhere? >> >> http://www.python.org/dev/peps/pep-3131/#rationale I have the impression that latin-1 chars were/are (unofficially) accepted in Python2. > But the Python keywords and more importantly the documentation is > English. I know of at least one translation http://docs.python.org.ar/tutorial/contenido.html though keeping up with changes is obvious a problem. There are multiple books in multiple languages. When I went to a bookstore in Japan, the program languages sections had about 8 for Python. I suspect that is more than most equivalent US bookstores. -- Terry Jan Reedy From ncoghlan at gmail.com Mon Oct 1 20:35:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Oct 2012 00:05:47 +0530 Subject: [Python-ideas] Namespaces and modules [was Deprecate the round builtin] In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> <20121001060551.GA9193@ando> Message-ID: On Mon, Oct 1, 2012 at 9:42 PM, Mark Adam wrote: > On Mon, Oct 1, 2012 at 1:05 AM, Steven D'Aprano wrote: >> I'm not sure why "built-in module" is a better term for something which >> I gather is a separate namespace within a module, so you can have: > > Yeah, I'm not really sure it makes sense to call it a module at all. > I was sort of capitulating about the use of the word "module". It's > not like you can do "import __builtins__" in the interpreter, so if > one is going to call it a module (like the interpreter currently > does), one should see that it is a very special exception of the word. "import __builtin__" in Python 2, "import builtins" in Python 3. The contents of those modules are implicitly made available to all Python code running in that process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From massimo.dipierro at gmail.com Mon Oct 1 21:29:46 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 1 Oct 2012 14:29:46 -0500 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> Message-ID: <75904331-419D-4C8B-9019-EC81908B9098@gmail.com> Hello Guido, it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged. I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so. Yet, if they do so, they would be isolating their students and their code from the rest of the world. Massimo On Oct 1, 2012, at 12:51 PM, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 10:18 AM, Massimo DiPierro > wrote: >> The great thing about open source is that is has brought the world together. I am not an english speaker and I learned the meaning of IF, THEN, FOR, WHILE, not in the context of the English language, but as keywords of the Basic programming language. The fact that they are english words has is accidental. The great thing about code is (used to be) that anybody can read and understand what others write. >> >> When I used program in Italy, I had to deal with latin-1 characters. This was never a problem. Not even in Cobol, Basic, Clipper, or Paradox because data should be separated from code. Data may contain latin-1 or unicode or whatever. Code always contains ASCII and if one does not mix the two there is never a problem. >> >> Allowing unicode in variable names blurs this separation. It makes code written people speaking one language unreadable by people speaking a different language. >> >> I should point out that most of my students are Chinese. They do not have any problem with reading and writing code using the english alphabet. >> >> Any one of us could design better power plugs for our homes. That does not mean it would be a good idea to do so. > > Our posts crossed. I hope my explanation makes sense to you. The age / > grade level of students probably matters; all classes in middle or > high school are typically taught in the native language, but in > University more and more courses are taught in English (some European > countries are even making English the mandatory teachkng language at > the University level). > > Not everything you design is meant to be a better power plug for the > world. Sometimes you just need to find a way to fit *your* oven in > *your* cabinet, and cutting up some planks in a way that wouldn't work > for anyone else is fine. > > -- > --Guido van Rossum (python.org/~guido) From grosser.meister.morti at gmx.net Mon Oct 1 21:33:13 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Mon, 01 Oct 2012 21:33:13 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> Message-ID: <5069EFF9.50706@gmx.net> On 10/01/2012 07:48 PM, Georg Brandl wrote: > On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote: >> On 10/01/2012 06:43 PM, Robert Kern wrote: >>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>>> I still don't understand why unicode characters are allowed at all in identifier >>>> names. Is the reason for this written down somewhere? >>> >>> http://www.python.org/dev/peps/pep-3131/#rationale >>> >> >> But the Python keywords and more importantly the documentation is English. Don't you need to be able >> to speak/write English in order to code Python anyway? And if you keep you code+comments English you >> can access a much larger developer pool (all developers who speak English should by my hypothesis be >> a superset of all developers who speak a certain language). > > Please; the PEP has been discussed quite a lot when it was proposed, > and believe me, yours is not an unfamiliar argument :) You're about > 5 years late. > > Georg > I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger). -panzi From ncoghlan at gmail.com Mon Oct 1 21:37:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 2 Oct 2012 01:07:24 +0530 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <75904331-419D-4C8B-9019-EC81908B9098@gmail.com> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> <75904331-419D-4C8B-9019-EC81908B9098@gmail.com> Message-ID: On Tue, Oct 2, 2012 at 12:59 AM, Massimo DiPierro wrote: > Hello Guido, > > it does make sense. The only point I tried to make is that, because something is allowed, it does mean it should be encouraged. > I am sure there are instructors who want to teach to code using Japanese of Chinese variable names. Python gives them a way to do so. > Yet, if they do so, they would be isolating their students and their code from the rest of the world. Only if they *stop* there. The idea is just to allow the learning curve to be made gentler - as people learn the standard library and the tools on PyPI, then yes, it will still be necessary to continue learning English in order to make use of those tools (especially as many of them won't have translated documentation). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.brandl at gmx.net Mon Oct 1 22:03:21 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 01 Oct 2012 22:03:21 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069EFF9.50706@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 10/01/2012 09:33 PM, Mathias Panzenb?ck wrote: > On 10/01/2012 07:48 PM, Georg Brandl wrote: >> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote: >>> On 10/01/2012 06:43 PM, Robert Kern wrote: >>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>>>> I still don't understand why unicode characters are allowed at all in identifier >>>>> names. Is the reason for this written down somewhere? >>>> >>>> http://www.python.org/dev/peps/pep-3131/#rationale >>>> >>> >>> But the Python keywords and more importantly the documentation is English. Don't you need to be able >>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you >>> can access a much larger developer pool (all developers who speak English should by my hypothesis be >>> a superset of all developers who speak a certain language). >> >> Please; the PEP has been discussed quite a lot when it was proposed, >> and believe me, yours is not an unfamiliar argument :) You're about >> 5 years late. >> >> Georg >> > > I didn't want to start a discussion. I just wanted to know why one would implement such a language > feature. Well, in that case I would have said "read the PEP": I think it's well explained there. Georg From g.brandl at gmx.net Mon Oct 1 22:04:51 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 01 Oct 2012 22:04:51 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> Message-ID: On 10/01/2012 08:10 PM, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 11:04 AM, Antoine Pitrou wrote: >> On Mon, 1 Oct 2012 10:44:42 -0700 >> Guido van Rossum wrote: >>> >>> As an example in German, I found a Python book aimed at middle- and >>> high-schoolers written in German, Python f?r Kids. You can look inside >>> it on the Amazon website: >>> http://www.amazon.com/Python-f%C3%BCr-Kids/dp/3826609514#reader_3826609514 >> >> Oh but why isn't it named Python f?r Kinder? :-) > > Probably to be "cool" for the "kids". Why is a mobile phone in Germany > called a "Handy" ? And why, oh why, do we have to buy our bread rolls at a "Backshop" nowadays... Georg From oscar.j.benjamin at gmail.com Mon Oct 1 22:26:07 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 1 Oct 2012 21:26:07 +0100 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069EFF9.50706@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 1 October 2012 20:33, Mathias Panzenb?ck wrote: > > On 10/01/2012 07:48 PM, Georg Brandl wrote: >> >> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote: >>> >>> On 10/01/2012 06:43 PM, Robert Kern wrote: >>>> >>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>>>> >>>>> I still don't understand why unicode characters are allowed at all in identifier >>>>> names. Is the reason for this written down somewhere? >>>> >>>> >>>> http://www.python.org/dev/peps/pep-3131/#rationale >>>> >>> >>> But the Python keywords and more importantly the documentation is English. Don't you need to be able >>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you >>> can access a much larger developer pool (all developers who speak English should by my hypothesis be >>> a superset of all developers who speak a certain language). >> >> >> Please; the PEP has been discussed quite a lot when it was proposed, >> and believe me, yours is not an unfamiliar argument :) You're about >> 5 years late. >> >> Georg >> > > I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger). Non-ascii identifiers have other possible uses. I'll repost the case that started this discussion on python-tutor (attached in case it doesn't display): ''' #!/usr/bin/env python3 # -*- encoding: utf-8 -*- # Parameters ? = 1 ? = 0.1 ? = 1.5 ? = 0.075 # Initial conditions x? = 10 y? = 5 Z? = x?, y? # Solution parameters t? = 0 ?t = 0.001 T = 10 # Lotka-Volterra derivative def f(Z, t): x, y = Z x? = x * (? - ?*y) y? = -y * (? - ?*x) return x?, y? # Accumulate results from Euler stepper t? = t? Z? = Z? Z?, t = [], [] while t? <= t? + T: Z?.append(Z?) t.append(t?) Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))] t? += ?t # Output since I don't have plotting libraries in Python 3 print('t', 'x', 'y') for t?, (x?, y?) in zip(t, Z?): print(t?, x?, y?) ''' Oscar -------------- next part -------------- A non-text attachment was scrubbed... Name: lv.py Type: application/octet-stream Size: 735 bytes Desc: not available URL: From dholth at gmail.com Mon Oct 1 22:44:21 2012 From: dholth at gmail.com (Daniel Holth) Date: Mon, 1 Oct 2012 16:44:21 -0400 Subject: [Python-ideas] use multiprocessing in included compileall script Message-ID: As an option, compileall should use a multiprocessing Pool() to speed up its work. From guido at python.org Mon Oct 1 22:51:34 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 1 Oct 2012 13:51:34 -0700 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin wrote: > On 1 October 2012 20:33, Mathias Panzenb?ck > wrote: >> >> On 10/01/2012 07:48 PM, Georg Brandl wrote: >>> >>> On 10/01/2012 07:02 PM, Mathias Panzenb?ck wrote: >>>> >>>> On 10/01/2012 06:43 PM, Robert Kern wrote: >>>>> >>>>> On 10/1/12 5:07 PM, Mathias Panzenb?ck wrote: >>>>>> >>>>>> I still don't understand why unicode characters are allowed at all in identifier >>>>>> names. Is the reason for this written down somewhere? >>>>> >>>>> >>>>> http://www.python.org/dev/peps/pep-3131/#rationale >>>>> >>>> >>>> But the Python keywords and more importantly the documentation is English. Don't you need to be able >>>> to speak/write English in order to code Python anyway? And if you keep you code+comments English you >>>> can access a much larger developer pool (all developers who speak English should by my hypothesis be >>>> a superset of all developers who speak a certain language). >>> >>> >>> Please; the PEP has been discussed quite a lot when it was proposed, >>> and believe me, yours is not an unfamiliar argument :) You're about >>> 5 years late. >>> >>> Georg >>> >> >> I didn't want to start a discussion. I just wanted to know why one would implement such a language feature. Guido's answer cleared it up for me, thanks. I can see the purpose in an educational setting (not in production code of anything a little bit bigger). > > Non-ascii identifiers have other possible uses. I'll repost the case > that started this discussion on python-tutor (attached in case it > doesn't display): > > ''' > #!/usr/bin/env python3 > # -*- encoding: utf-8 -*- > > # Parameters > ? = 1 > ? = 0.1 > ? = 1.5 > ? = 0.075 > > # Initial conditions > x? = 10 > y? = 5 > Z? = x?, y? > > # Solution parameters > t? = 0 > ?t = 0.001 > T = 10 > > # Lotka-Volterra derivative > def f(Z, t): > x, y = Z > x? = x * (? - ?*y) > y? = -y * (? - ?*x) > return x?, y? > > # Accumulate results from Euler stepper > t? = t? > Z? = Z? > Z?, t = [], [] > while t? <= t? + T: > Z?.append(Z?) > t.append(t?) > Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))] > t? += ?t > > # Output since I don't have plotting libraries in Python 3 > print('t', 'x', 'y') > for t?, (x?, y?) in zip(t, Z?): > print(t?, x?, y?) > ''' Those examples would be a lot more compelling if there was an acceptable way to input those characters. Maybe we could support some kind of input method that enabled LaTeX style math notation as used by scientists for writing equations in papers? -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Oct 1 22:51:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 1 Oct 2012 22:51:23 +0200 Subject: [Python-ideas] use multiprocessing in included compileall script References: Message-ID: <20121001225123.065b5010@pitrou.net> Hello Daniel, On Mon, 1 Oct 2012 16:44:21 -0400 Daniel Holth wrote: > As an option, compileall should use a multiprocessing Pool() to speed > up its work. This kind of concrete proposal can be brought directly on the bug tracker, no need to go through python-ideas. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From dholth at gmail.com Mon Oct 1 22:54:50 2012 From: dholth at gmail.com (Daniel Holth) Date: Mon, 1 Oct 2012 16:54:50 -0400 Subject: [Python-ideas] use multiprocessing in included compileall script In-Reply-To: <20121001225123.065b5010@pitrou.net> References: <20121001225123.065b5010@pitrou.net> Message-ID: filed From andre.roberge at gmail.com Mon Oct 1 22:55:30 2012 From: andre.roberge at gmail.com (Andre Roberge) Date: Mon, 1 Oct 2012 17:55:30 -0300 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On Mon, Oct 1, 2012 at 5:51 PM, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin > SNIP > > Non-ascii identifiers have other possible uses. I'll repost the case > > that started this discussion on python-tutor (attached in case it > > doesn't display): > > > > ''' > > #!/usr/bin/env python3 > > # -*- encoding: utf-8 -*- > > > > # Parameters > > ? = 1 > > ? = 0.1 > > ? = 1.5 > > ? = 0.075 > > > > # Initial conditions > > x? = 10 > > y? = 5 > > Z? = x?, y? > > > SNIP > > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Maybe we could support some > kind of input method that enabled LaTeX style math notation as used by > scientists for writing equations in papers? > > +1000 Andr? Roberge > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Mon Oct 1 23:46:50 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 1 Oct 2012 22:46:50 +0100 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 1 October 2012 21:51, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin > wrote: >> # Parameters >> ? = 1 >> ? = 0.1 >> ? = 1.5 >> ? = 0.075 >> >> # Initial conditions >> x? = 10 >> y? = 5 >> Z? = x?, y? > > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Maybe we could support some > kind of input method that enabled LaTeX style math notation as used by > scientists for writing equations in papers? Sympy already has a few of the basic TeX concepts. I imagine that something like Sympy notebooks (a browser-based interface) might one day gain support for this. A readline-ish method to do it would be a great extension to isympy (since it already works for output): $ isympy IPython console for SymPy 0.7.1.rc1 (Python 2.7.3-64-bit) (ground types: python) In [1]: Symbol('beta') Out[1]: ? In [2]: Symbol('c_1') Out[2]: c? Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Mon Oct 1 23:54:04 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 01 Oct 2012 23:54:04 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 10/01/2012 10:51 PM, Guido van Rossum wrote: > On Mon, Oct 1, 2012 at 1:26 PM, Oscar Benjamin >> Non-ascii identifiers have other possible uses. I'll repost the case >> that started this discussion on python-tutor (attached in case it >> doesn't display): Very nice! >> ''' >> #!/usr/bin/env python3 >> # -*- encoding: utf-8 -*- >> >> # Parameters >> ? = 1 >> ? = 0.1 >> ? = 1.5 >> ? = 0.075 >> >> # Initial conditions >> x? = 10 >> y? = 5 >> Z? = x?, y? >> >> # Solution parameters >> t? = 0 >> ?t = 0.001 >> T = 10 >> >> # Lotka-Volterra derivative >> def f(Z, t): >> x, y = Z >> x? = x * (? - ?*y) >> y? = -y * (? - ?*x) >> return x?, y? >> >> # Accumulate results from Euler stepper >> t? = t? >> Z? = Z? >> Z?, t = [], [] >> while t? <= t? + T: >> Z?.append(Z?) >> t.append(t?) >> Z? = [Z??+ ?t*Z??? for Z??, Z??? in zip(Z?, f(Z?, t?))] >> t? += ?t >> >> # Output since I don't have plotting libraries in Python 3 >> print('t', 'x', 'y') >> for t?, (x?, y?) in zip(t, Z?): >> print(t?, x?, y?) >> ''' > > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Maybe we could support some > kind of input method that enabled LaTeX style math notation as used by > scientists for writing equations in papers? With the right editor, of course, it's not a problem :) (Emacs has a TeX input method with which I could type this example without problems.) Georg From matthew at woodcraft.me.uk Tue Oct 2 00:28:09 2012 From: matthew at woodcraft.me.uk (Matthew Woodcraft) Date: Mon, 01 Oct 2012 23:28:09 +0100 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 2012-10-01 21:51, Guido van Rossum wrote: > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Maybe we could support some > kind of input method that enabled LaTeX style math notation as used by > scientists for writing equations in papers? I think that's up to the OS or the text editor. In Emacs, this works: M-x set-input-method tex -M- From greg.ewing at canterbury.ac.nz Tue Oct 2 01:07:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Oct 2012 12:07:05 +1300 Subject: [Python-ideas] Namespaces and modules [was Deprecate the round builtin] In-Reply-To: References: <50635956.4050409@egenix.com> <20120926212127.GA9680@iskra.aviel.ru> <50642918.1060804@canterbury.ac.nz> <20121001060551.GA9193@ando> Message-ID: <506A2219.5080700@canterbury.ac.nz> Mark Adam wrote: > It's not like you can do "import __builtins__" in the interpreter, But you *can* do "import __builtin__". Also, "sys" is created at interpreter startup and doesn't correspond to any disk file, but we don't seem to mind calling it a module and using the same import syntax to access it. The only difference I can see with these proposed namespace things is that they would be pre-bound to names in the builtin namespace. > But, it's as if you're > going on a road trip, you want to travel efficient and light -- what > you include in your backpack ("interpreter environment") is your > "builtin" and everything else you'll "buy"/import on the road. > Modules are those things on the road. The sys module violates this taxonomy -- it's already in your backpack, just tucked away in a paper bag that you need to open first. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 2 01:24:27 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Oct 2012 12:24:27 +1300 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <20121001200406.5228060a@pitrou.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> Message-ID: <506A262B.9090306@canterbury.ac.nz> Antoine Pitrou wrote: > Oh but why isn't it named Python f?r Kinder? :-) It looks like Germans have adopted "kid" as an abbreviation for "kinder", just like we use it as an abbreviation for "child". Or maybe we got it from them -- it's closer to their original word than ours! They seem to be using our plural, though -- "kids", not "kidden"... -- Greg From grosser.meister.morti at gmx.net Tue Oct 2 02:06:35 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Tue, 02 Oct 2012 02:06:35 +0200 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <506A262B.9090306@canterbury.ac.nz> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> <506A262B.9090306@canterbury.ac.nz> Message-ID: <506A300B.7000704@gmx.net> On 10/02/2012 01:24 AM, Greg Ewing wrote: > Antoine Pitrou wrote: > >> Oh but why isn't it named Python f?r Kinder? :-) > > It looks like Germans have adopted "kid" as an abbreviation > for "kinder", just like we use it as an abbreviation for > "child". Or maybe we got it from them -- it's closer to > their original word than ours! > > They seem to be using our plural, though -- "kids", not > "kidden"... > Sometimes we use the ...s for plural as well, especially for acronyms, words of English or French origin and last names. But it would not be ...en, maybe ...er. Is there any German word that uses ...en for plural? I don't think so. Anyway, "kids" is definitely an anglicism, because we pronounce it "English" and not like it would be pronounced if it where derived from "Kind" (it would be more like "keed"). German today is full of anglicisms. But then, there are some German words used by English people as well: gesundheit, kindergarten, ?ber, blitz(krieg), angst (used as something different as the German word), abseiling ("abseilen" in German), doppelg?nger, gestalt, poltergeist, Zeitgeist... From steve at pearwood.info Tue Oct 2 02:15:18 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 02 Oct 2012 10:15:18 +1000 Subject: [Python-ideas] use multiprocessing in included compileall script In-Reply-To: References: Message-ID: <506A3216.2010905@pearwood.info> On 02/10/12 06:44, Daniel Holth wrote: > As an option, compileall should use a multiprocessing Pool() to speed > up its work. Sounds like overkill. In my experience, very few ideas are so self-evident that they don't need any explanation, and this is certainly not one of them. What is your rationale for why compileall should use multiprocessing? -- Steven From steve at pearwood.info Tue Oct 2 03:32:06 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 02 Oct 2012 11:32:06 +1000 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <75904331-419D-4C8B-9019-EC81908B9098@gmail.com> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <99143AA3-CCC2-49DD-903B-6F9FD161263D@gmail.com> <75904331-419D-4C8B-9019-EC81908B9098@gmail.com> Message-ID: <506A4416.5030102@pearwood.info> On 02/10/12 05:29, Massimo DiPierro wrote: > it does make sense. The only point I tried to make is that, > because something is allowed, it does mean it should be > encouraged. I am sure there are instructors who want to teach >to code using Japanese of Chinese variable names. Python gives > them a way to do so. Yet, if they do so, they would be >isolating their students and their code from the rest of the >world. People very often over-estimate the cost of that isolation, and over-value access to the rest of the world. The average open source piece of software has one, maybe two, contributors. What do they care if millions of English-speaking programmers can't contribute when they weren't going to contribute regardless of the language? Perhaps the convenience of being able to read your own code in your own native language outweighs the loss of being able to attract contributors that you can't even talk to. And for proprietary software, again it is irrelevant. If a Chinese company writes Chinese software for Chinese users with Chinese developers, why would they want to write it in English? Perhaps they have little choice due to the overwhelming trend towards English in programming languages, but there's no positive benefit to using a non-native language. Quite frankly, and I'm saying this as somebody who only speaks English, I think that the use of English as the single lingua franca of computer programming is as unnecessary (and ultimately as harmful) as the use of Latin and then French as the sole lingua franca of science and mathematics. I expect that it too will be a passing phase. By the way, are you familiar with ChinesePython and IronPerunis? http://www.chinesepython.org/english/english.html http://ironperunis.codeplex.com/ -- Steven From stephen at xemacs.org Tue Oct 2 05:48:07 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 02 Oct 2012 12:48:07 +0900 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <5069BFB7.7070207@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> Message-ID: <87obklleko.fsf@uwakimon.sk.tsukuba.ac.jp> Mathias Panzenb?ck writes: > I still don't understand why unicode characters are allowed at all > in identifier names. "Consenting adults." 'nuff said? An anecdote. Back when I was first learning Japanese, I maintained an Emacs interface to EDICT, a free Japanese-English dictionary. The code was smart enough to parse morphosyntax (inflection of verbs and adjectives) into dictionary forms, but I wasn't (and according to my daughter, still am not). So I asked my tutor for help. Although a total non-programmer, he was able to read the grammar easily because the state names (identifiers for callable objects) were written in Japanese, using the standard grammatical name for the inflection. The "easy" part comes in because although his English was good, it wasn't good enough to disentangle Lisp gobbledygook from the morphosyntax data had it been written in ASCII. But he was able to read and comment on the whole grammar in about half an hour because he could just skip *all* the ASCII! From stephen at xemacs.org Tue Oct 2 06:11:58 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 02 Oct 2012 13:11:58 +0900 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: <87mx05ldgx.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Hey!! What's "unacceptable" about Emacs?? > Maybe we could support some kind of input method that enabled LaTeX > style math notation as used by scientists for writing equations in > papers? If you're talking about interactive use, Emacs has a method based on searching the Unicode character database. LaTeX math notation has a number of potential pitfalls. In particular, the sub-/superscript notation can be applied to anything, not just characters that happen to have *script versions in Unicode. Also, not everything that seems to a character in LaTeX necessarily has a corresponding Unicode character. From ben+python at benfinney.id.au Tue Oct 2 06:25:40 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 02 Oct 2012 14:25:40 +1000 Subject: [Python-ideas] Visually confusable unicode characters in identifiers References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: <7w7gr9ebzv.fsf@benfinney.id.au> Matthew Woodcraft writes: > On 2012-10-01 21:51, Guido van Rossum wrote: > > Those examples would be a lot more compelling if there was an > > acceptable way to input those characters. Maybe we could support > > some kind of input method that enabled LaTeX style math notation as > > used by scientists for writing equations in papers? > > I think that's up to the OS or the text editor. Agreed. Make of these identifiers will need to be typed at an OS command line, after all (e.g. for naming a test case to run, as one which springs easily to mind). Solve the keyboard input problem in the OS layer ? as someone who anticipates working with non-ASCII characters must already do ? and you solve it for Python code as well. I don't think it's Python's business to get involved at the input method level. -- \ ?The apparent lesson of the Inquisition is that insistence on | `\ uniformity of belief is fatal to intellectual, moral, and | _o__) spiritual health.? ?_The Uses Of The Past_, Herbert J. Muller | Ben Finney From greg.ewing at canterbury.ac.nz Tue Oct 2 07:09:14 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 02 Oct 2012 18:09:14 +1300 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <506A300B.7000704@gmx.net> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <20121001200406.5228060a@pitrou.net> <506A262B.9090306@canterbury.ac.nz> <506A300B.7000704@gmx.net> Message-ID: <506A76FA.6040200@canterbury.ac.nz> Mathias Panzenb?ck wrote: > But it would not be > ...en, maybe ...er. Is there any German word that uses ...en for plural? > I don't think so. This page seems to think that some do: http://german.about.com/od/grammar/a/PluralNounsWithnENEndings.htm -- Greg From stephen at xemacs.org Tue Oct 2 10:04:55 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 02 Oct 2012 17:04:55 +0900 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <7w7gr9ebzv.fsf@benfinney.id.au> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> <7w7gr9ebzv.fsf@benfinney.id.au> Message-ID: <87ipatl2oo.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > Solve the keyboard input problem in the OS layer ? as someone who > anticipates working with non-ASCII characters must already do ? and you > solve it for Python code as well. That simply isn't true for symbol characters and Greek letters. I still let either TeX or XEmacs translate TeX macros for me. I don't even know how to type an integral sign in Mac OS X Terminal (conveniently, that is -- of course there's always the character palette), and if I wanted directed quotation marks (I don't), I'd just use ASCII quotes and let XEmacs translate those, too. There ought to be a standard way to get those symbols and punctuation, preferably ASCII-based, on any terminal, using the standard Python interpreter. From storchaka at gmail.com Tue Oct 2 12:43:07 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 02 Oct 2012 13:43:07 +0300 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> Message-ID: On 01.10.12 23:51, Guido van Rossum wrote: > Those examples would be a lot more compelling if there was an > acceptable way to input those characters. Maybe we could support some > kind of input method that enabled LaTeX style math notation as used by > scientists for writing equations in papers? \u03B1 Java already allows this outside of the string literals. And it sometimes causes unpleasant effects. From ben+python at benfinney.id.au Tue Oct 2 13:39:12 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 02 Oct 2012 21:39:12 +1000 Subject: [Python-ideas] Visually confusable unicode characters in identifiers References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> <7w7gr9ebzv.fsf@benfinney.id.au> <87ipatl2oo.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7w1uhhdrxb.fsf@benfinney.id.au> "Stephen J. Turnbull" writes: > I still let either TeX or XEmacs translate TeX macros for me. I don't > even know how to type an integral sign in Mac OS X Terminal > (conveniently, that is -- of course there's always the character > palette), and if I wanted directed quotation marks (I don't), I'd just > use ASCII quotes and let XEmacs translate those, too. Right. So you've solved it for one program only, not the OS which is (or should be) responsible for turning what you type into characters, uniformly across all applications you have keyboard input for. > There ought to be a standard way to get those symbols and punctuation, > preferably ASCII-based, on any terminal Definitely agreed with this. Indeed, it's my point: the problem should be solved in one place for the user of the computer, not separately per application or framework. > using the standard Python interpreter. If you mean that the Python interpreter should be aware of the solution, why? That's solving it at the wrong level, because any non-Python program (such as a shell or an editor) gets no benefit from that. If you mean that the single, one-point solution should work across all programs, including the standard Python interpreter, then yes I agree. I'm saying the OS is the right place to solve it, by installing an appropriate input method (or whatever each OS calls them). -- \ ?In economics, hope and faith coexist with great scientific | `\ pretension and also a deep desire for respectability.? ?John | _o__) Kenneth Galbraith, 1970-06-07 | Ben Finney From stephen at xemacs.org Wed Oct 3 07:31:46 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 03 Oct 2012 14:31:46 +0900 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <7w1uhhdrxb.fsf@benfinney.id.au> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> <7w7gr9ebzv.fsf@benfinney.id.au> <87ipatl2oo.fsf@uwakimon.sk.tsukuba.ac.jp> <7w1uhhdrxb.fsf@benfinney.id.au> Message-ID: <87bogkktod.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > "Stephen J. Turnbull" > writes: > > > I still let either TeX or XEmacs translate TeX macros for me. I don't > > even know how to type an integral sign in Mac OS X Terminal > > (conveniently, that is -- of course there's always the character > > palette), and if I wanted directed quotation marks (I don't), I'd just > > use ASCII quotes and let XEmacs translate those, too. > > Right. So you've solved it for one program only, not the OS You seem to be under a misconception. Emacs *is* an OS, it just runs on top of the more primitive OSes normally associated with the term. ;-) > I'm saying the OS is the right place to solve it, by installing an > appropriate input method (or whatever each OS calls them). I doubt very many people used to and fond of LaTeX would agree with you, since AFAIK there aren't any OSes providing TeX macros as an input method. AFAICS it's not available on my Mac. While I don't particularly favor it, it may be the best compromise, as many people are familiar with it, and many many symbols are available with familiar, intuitive names so that non-TeXnical typists can often guess them. From bborcic at gmail.com Wed Oct 3 14:52:43 2012 From: bborcic at gmail.com (Boris Borcic) Date: Wed, 03 Oct 2012 14:52:43 +0200 Subject: [Python-ideas] Deprecate the round builtin In-Reply-To: References: <5063B6CD.4030405@pearwood.info> Message-ID: Mike Graham wrote: > round(x, n) for n>0 is quite simply not sane code. I've occasionally used round(x,n) with n>0 - as a quick way to normalize away numeric imprecisions and have values generated by a computation recognized as identical set elements or dictionary keys. I'd have used a function to round in binary instead of decimal had one been handy, but otoh I don't see it would make a real difference, would it? From chrysn at fsfe.org Wed Oct 3 16:43:20 2012 From: chrysn at fsfe.org (chrysn) Date: Wed, 3 Oct 2012 16:43:20 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> Message-ID: <20121003144320.GA16485@hephaistos.amsuess.com> On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote: > Go ahead and read PEP 3153, we will wait. > > A careful reading of PEP 3153 will tell you that the intent is to make > a "light" version of Twisted built into Python. There isn't any > discussion as to *why* this is a good idea, it just lays out the plan > of action. Its ideas were gathered from the experience of the Twisted > folks. > > Their experience is substantial, but in the intervening 1.5+ years > since Pycon 2011, only the barest of abstract interfaces has been > defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py), > and no discussion has taken place as to forward migration of the > (fairly large) body of existing asyncore code. it doesn't look like twisted-light to me, more like a interface suggestion for a small subset of twisted. in particular, it doesn't talk about main loops / reactors / registration-in-the-first-place. you mention interaction with the twisted people. is there willingness, from the twisted side, to use a standard python middle layer, once it exists and has sufficiently high quality? > To the point, Giampaolo already has a reactor that implements the > interface (more or less "idea #3" from his earlier message), and it's > been used in production (under staggering ftp(s) load). Even better, > it offers effectively transparent replacement of the existing asyncore > loop, and supports existing asyncore-derived classes. It is available: > https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py i've had a look at it, but honestly can't say more than that it's good to have a well-tested asyncore compatible main loop with scheduling support, and i'll try it out for my own projects. > >> Again, at this point in time what you're proposing looks too vague, > >> ambitious and premature to me. > > > > please don't get me wrong -- i'm not proposing anything for immediate > > action, i just want to start a thinking process towards a better > > integrated stdlib. > > I am curious as to what you mean by "a better integrated stdlib". A > new interface that doesn't allow people to easily migrate from an > existing (and long-lived, though flawed) standard library is not > better integration. Better integration requires allowing previous > users to migrate, while encouraging new users to join in with any > later development. That's what Giampaolo's suggested interface offers > on the lowest level; something to handle file-handle reactors, > combined with a scheduler. a new interface won't make integration automatically happen, but it's something the standard library components can evolve on. whether, for example urllib2 will then automatically work asynchronously in that framework or whether we'll wait for urllib3, we'll see when we have it. @migrate from an existing standard library: is there a big user base for the current asyncore framework? my impression from is that it is not very well known among python users, and most that could use it use twisted. > > we've talked about many things we'd need in a python asynchronous > > interface (not implementation), so what are the things we *don't* need? > > (so we won't start building a framework like twisted). i'll start: > > > > * high-level protocol handling (can be extra modules atop of it) > > * ssl > > * something like the twisted delayed framework (not sure about that, i > > guess the twisted people will have good reason to use it, but i don't > > see compelling reasons for such a thing in a minimal interface from my > > limited pov) > > * explicit connection handling (retries, timeouts -- would be up to the > > user as well, eg urllib might want to set up a timeout and retries for > > asynchronous url requests) > > I disagree with the last 3. If you have an IO loop, more often than > not you want an opportunity to do something later in the same context. > This is commonly the case for bandwidth limiting, connection timeouts, > etc., which are otherwise *very* difficult to do at a higher level > (which are the reasons why schedulers are built into IO loops). > Further, SSL in async can be tricky to get right. Having the 20-line > SSL layer as an available class is a good idea, and will save people > time by not having them re-invent it (poorly or incorrectly) every > time. i see; those should be provided, then. i'm afraid i don't completely get the point you're making, sorry for that, maybe i've missed important statements or lack sufficiently deep knowledge of topics affected and got lost in details. what is your opinion on the state of asynchronous operations in python, and what would you like it to be? thanks for staying with this topic chrysn -- To use raw power is to make yourself infinitely vulnerable to greater powers. -- Bene Gesserit axiom -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: Digital signature URL: From maxmoroz at gmail.com Thu Oct 4 13:48:03 2012 From: maxmoroz at gmail.com (Max Moroz) Date: Thu, 4 Oct 2012 04:48:03 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects Message-ID: It seems that built-in classes do not short-circuit `__eq__` method when the objects are identical, at least in CPython: f = frozenset(range(200000000)) f1 = f f1 == f # this operation will take about 1 sec on my machine Is there any disadvantage to checking whether the equality was called with the same object, and if it was, return `True` right away? I noticed this when trying to memoize a function that has large frozenset arguments. While hashing of a large argument is very fast after it's done once (hash value is presumably cached), the equality comparison is always slow even against itself. So when the same large argument is provided over and over, memoization is slow. Of course, there's a workaround: subclass frozenset, and redefine __eq__ to check id() first. And arguably, for this particular use case, I should redefine both __hash__ and __eq__, to make them only look exclusively at id(), since it's not worth wasting memoizer time trying to compare two non-identical large arguments that are highly unlikely to compare equal anyway. So if there's any reason for the current implementation, I don't have a strong argument against it. From steve at pearwood.info Thu Oct 4 15:53:50 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 04 Oct 2012 23:53:50 +1000 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: Message-ID: <506D94EE.30808@pearwood.info> On 04/10/12 21:48, Max Moroz wrote: > It seems that built-in classes do not short-circuit `__eq__` method > when the objects are identical, at least in CPython: > > f = frozenset(range(200000000)) > f1 = f > f1 == f # this operation will take about 1 sec on my machine You shouldn't over-generalize. Some built-ins do short-circuit __eq__ when the objects are identical. I believe that strings and ints both do. Other types might not. > Is there any disadvantage to checking whether the equality was called > with the same object, and if it was, return `True` right away? That would break floats and Decimals, both of which support NANs. The decision whether or not to optimize __eq__ should be left up to the type. Some types, for example, might decide to optimize x == x even if x contains a NAN or other objects that break reflexivity of equality. Other types might prefer not to. (Please do not start an argument about NANs and reflexivity. That's been argued to death, and there are very good reasons for the IEEE 754 standard to define NANs the way they do.) Since frozensets containing NANs are rare (I presume), I think it is reasonable to optimize frozenset equality. But I do not think it is reasonable for Python to mandate identity checking before __eq__. > I noticed this when trying to memoize a function that has large > frozenset arguments. While hashing of a large argument is very fast > after it's done once (hash value is presumably cached), the equality > comparison is always slow even against itself. So when the same large > argument is provided over and over, memoization is slow. I'm not sure what you are doing here, because dicts (at least in Python 3.2) already short-circuit equality: py> NAN = float('nan') py> NAN == NAN False py> d = {NAN: 42} py> d[NAN] 42 Actually, that behaviour goes back to at least 2.4, so I'm not sure how you are doing memoization and not seeing the same optimization. -- Steven From grosser.meister.morti at gmx.net Thu Oct 4 16:02:29 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 04 Oct 2012 16:02:29 +0200 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D94EE.30808@pearwood.info> References: <506D94EE.30808@pearwood.info> Message-ID: <506D96F5.3060706@gmx.net> On 10/04/2012 03:53 PM, Steven D'Aprano wrote: > py> NAN == NAN > False Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) behavior? From mikegraham at gmail.com Thu Oct 4 16:07:36 2012 From: mikegraham at gmail.com (Mike Graham) Date: Thu, 4 Oct 2012 10:07:36 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D96F5.3060706@gmx.net> References: <506D94EE.30808@pearwood.info> <506D96F5.3060706@gmx.net> Message-ID: On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck wrote: > On 10/04/2012 03:53 PM, Steven D'Aprano wrote: >> >> py> NAN == NAN >> False > > > Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) > behavior? IEEE 754 specifies this. Mike From python at mrabarnett.plus.com Thu Oct 4 16:19:44 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 04 Oct 2012 15:19:44 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <506D96F5.3060706@gmx.net> Message-ID: <506D9B00.3050307@mrabarnett.plus.com> On 2012-10-04 15:07, Mike Graham wrote: > On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck > wrote: >> On 10/04/2012 03:53 PM, Steven D'Aprano wrote: >>> >>> py> NAN == NAN >>> False >> >> >> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) >> behavior? > > IEEE 754 specifies this. > Think of it this way: Calculation A returns NaN for some reason Calculation B also returns NaN for some reason Have they really returned the same result? Just because they're both NaN doesn't mean that they're the _same_ NaN... From rosuav at gmail.com Thu Oct 4 16:30:50 2012 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 5 Oct 2012 00:30:50 +1000 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D9B00.3050307@mrabarnett.plus.com> References: <506D94EE.30808@pearwood.info> <506D96F5.3060706@gmx.net> <506D9B00.3050307@mrabarnett.plus.com> Message-ID: On Fri, Oct 5, 2012 at 12:19 AM, MRAB wrote: > On 2012-10-04 15:07, Mike Graham wrote: >> >> On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenb?ck >> wrote: >>> >>> On 10/04/2012 03:53 PM, Steven D'Aprano wrote: >>>> >>>> >>>> py> NAN == NAN >>>> False >>> >>> >>> >>> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) >>> behavior? >> >> >> IEEE 754 specifies this. >> > Think of it this way: > > Calculation A returns NaN for some reason > > Calculation B also returns NaN for some reason > > Have they really returned the same result? Just because they're both > NaN doesn't mean that they're the _same_ NaN... The only other viable option would be to declare that (NaN==NaN) is NaN - kinda like SQL's NULL and its weird semantics. And that would be *highly* confusing to many situations. ChrisA From victor.stinner at gmail.com Thu Oct 4 17:08:40 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Thu, 4 Oct 2012 17:08:40 +0200 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D94EE.30808@pearwood.info> References: <506D94EE.30808@pearwood.info> Message-ID: 2012/10/4 Steven D'Aprano : > On 04/10/12 21:48, Max Moroz wrote: >> >> It seems that built-in classes do not short-circuit `__eq__` method >> when the objects are identical, at least in CPython: >> >> f = frozenset(range(200000000)) >> f1 = f >> f1 == f # this operation will take about 1 sec on my machine > > > You shouldn't over-generalize. Some built-ins do short-circuit __eq__ > when the objects are identical. I believe that strings and ints both > do. Other types might not. This optimization is not implemented for Unicode strings. PyObject_RichCompareBool() implements this optimization which leads to incorrect results: nan = float("nan") mytuple = (nan,) assert mytuple != mytuple # fails I think that the optimization should be implemented for Unicode strings, but disabled in PyObject_RichCompareBool(). @Max Moroz: Can you please open an issue on bugs.python.org? Victor From steve at pearwood.info Thu Oct 4 17:53:36 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 05 Oct 2012 01:53:36 +1000 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: <506DB100.2070105@pearwood.info> On 05/10/12 01:08, Victor Stinner wrote: > 2012/10/4 Steven D'Aprano: >> On 04/10/12 21:48, Max Moroz wrote: >>> >>> It seems that built-in classes do not short-circuit `__eq__` method >>> when the objects are identical, at least in CPython: >>> >>> f = frozenset(range(200000000)) >>> f1 = f >>> f1 == f # this operation will take about 1 sec on my machine >> >> >> You shouldn't over-generalize. Some built-ins do short-circuit __eq__ >> when the objects are identical. I believe that strings and ints both >> do. Other types might not. > > This optimization is not implemented for Unicode strings. That does not match my experience. In Python 3.2, I generate a large unicode string, and an equal but not identical copy: s = "a?cdef"*100000 t = "a" + s[1:] assert s is not t and s == t Using timeit, s == s is about 10000 times faster than s == t. -- Steven From python at mrabarnett.plus.com Thu Oct 4 18:05:43 2012 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 04 Oct 2012 17:05:43 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506DB100.2070105@pearwood.info> References: <506D94EE.30808@pearwood.info> <506DB100.2070105@pearwood.info> Message-ID: <506DB3D7.5060804@mrabarnett.plus.com> On 2012-10-04 16:53, Steven D'Aprano wrote: > On 05/10/12 01:08, Victor Stinner wrote: >> 2012/10/4 Steven D'Aprano: >>> On 04/10/12 21:48, Max Moroz wrote: >>>> >>>> It seems that built-in classes do not short-circuit `__eq__` method >>>> when the objects are identical, at least in CPython: >>>> >>>> f = frozenset(range(200000000)) >>>> f1 = f >>>> f1 == f # this operation will take about 1 sec on my machine >>> >>> >>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__ >>> when the objects are identical. I believe that strings and ints both >>> do. Other types might not. >> >> This optimization is not implemented for Unicode strings. > > That does not match my experience. In Python 3.2, I generate a large > unicode string, and an equal but not identical copy: > > s = "a?cdef"*100000 > t = "a" + s[1:] > assert s is not t and s == t > > > Using timeit, s == s is about 10000 times faster than s == t. > In Python 3.3 I get a similar result. From oscar.j.benjamin at gmail.com Thu Oct 4 18:48:59 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 4 Oct 2012 17:48:59 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506DB3D7.5060804@mrabarnett.plus.com> References: <506D94EE.30808@pearwood.info> <506DB100.2070105@pearwood.info> <506DB3D7.5060804@mrabarnett.plus.com> Message-ID: On 4 October 2012 17:05, MRAB wrote: > On 2012-10-04 16:53, Steven D'Aprano wrote: >> >> On 05/10/12 01:08, Victor Stinner wrote: >>> >>> 2012/10/4 Steven D'Aprano: >>>> >>>> On 04/10/12 21:48, Max Moroz wrote: >>>>> >>>>> >>>>> It seems that built-in classes do not short-circuit `__eq__` method >>>>> when the objects are identical, at least in CPython: >>>>> >>>>> f = frozenset(range(200000000)) >>>>> f1 = f >>>>> f1 == f # this operation will take about 1 sec on my machine >>>> >>>> >>>> >>>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__ >>>> when the objects are identical. I believe that strings and ints both >>>> do. Other types might not. >>> >>> >>> This optimization is not implemented for Unicode strings. >> >> >> That does not match my experience. In Python 3.2, I generate a large >> unicode string, and an equal but not identical copy: >> >> s = "a?cdef"*100000 >> t = "a" + s[1:] >> assert s is not t and s == t >> >> >> Using timeit, s == s is about 10000 times faster than s == t. >> > In Python 3.3 I get a similar result. This was discussed not long ago in a different thread. Here is the line: http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/unicodeobject.c#l10508 As I understood it that line is the reason that comparisons for interned strings are faster. Oscar From grosser.meister.morti at gmx.net Thu Oct 4 18:51:23 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 04 Oct 2012 18:51:23 +0200 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D94EE.30808@pearwood.info> References: <506D94EE.30808@pearwood.info> Message-ID: <506DBE8B.1010907@gmx.net> On 10/04/2012 03:53 PM, Steven D'Aprano wrote: > On 04/10/12 21:48, Max Moroz wrote: >> It seems that built-in classes do not short-circuit `__eq__` method >> when the objects are identical, at least in CPython: >> >> f = frozenset(range(200000000)) >> f1 = f >> f1 == f # this operation will take about 1 sec on my machine > > You shouldn't over-generalize. Some built-ins do short-circuit __eq__ > when the objects are identical. I believe that strings and ints both > do. Other types might not. > > >> Is there any disadvantage to checking whether the equality was called >> with the same object, and if it was, return `True` right away? > > That would break floats and Decimals, both of which support NANs. > > The decision whether or not to optimize __eq__ should be left up to the > type. Some types, for example, might decide to optimize x == x even if > x contains a NAN or other objects that break reflexivity of equality. > Other types might prefer not to. > > (Please do not start an argument about NANs and reflexivity. That's > been argued to death, and there are very good reasons for the IEEE 754 > standard to define NANs the way they do.) > > Since frozensets containing NANs are rare (I presume), I think it is > reasonable to optimize frozenset equality. But I do not think it is > reasonable for Python to mandate identity checking before __eq__. > But it seems like set and frozenset behave like this anyway (using "is" to compare it's items): >>> frozenset([float("nan")]) == frozenset([float("nan")]) False >>> s = frozenset([float("nan")]) >>> s == s True >>> NaN = float("nan") >>> NaN == NaN False >>> frozenset([NaN]) == frozenset([NaN]) True So the "is" optimization should not change it's semantics. (I tested this in Python 2.7.3 and 3.2.3) > > >> I noticed this when trying to memoize a function that has large >> frozenset arguments. While hashing of a large argument is very fast >> after it's done once (hash value is presumably cached), the equality >> comparison is always slow even against itself. So when the same large >> argument is provided over and over, memoization is slow. > > I'm not sure what you are doing here, because dicts (at least in Python > 3.2) already short-circuit equality: > > py> NAN = float('nan') > py> NAN == NAN > False > py> d = {NAN: 42} > py> d[NAN] > 42 > > Actually, that behaviour goes back to at least 2.4, so I'm not sure how > you are doing memoization and not seeing the same optimization. > > > From maxmoroz at gmail.com Thu Oct 4 19:49:45 2012 From: maxmoroz at gmail.com (Max Moroz) Date: Thu, 4 Oct 2012 10:49:45 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D94EE.30808@pearwood.info> References: <506D94EE.30808@pearwood.info> Message-ID: On Thu, Oct 4, 2012 at 6:53 AM, Steven D'Aprano wrote: > I'm not sure what you are doing here, because dicts (at least in Python > 3.2) already short-circuit equality: > > py> NAN = float('nan') > py> NAN == NAN > False > py> d = {NAN: 42} > py> d[NAN] > 42 > > Actually, that behaviour goes back to at least 2.4, so I'm not sure how > you are doing memoization and not seeing the same optimization. It was my mistake... I do see this optimization now that I know where to look for it. Thanks for clarifying this. From maxmoroz at gmail.com Thu Oct 4 19:50:50 2012 From: maxmoroz at gmail.com (Max Moroz) Date: Thu, 4 Oct 2012 10:50:50 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506DB3D7.5060804@mrabarnett.plus.com> References: <506D94EE.30808@pearwood.info> <506DB100.2070105@pearwood.info> <506DB3D7.5060804@mrabarnett.plus.com> Message-ID: On Thu, Oct 4, 2012 at 7:19 AM, MRAB wrote: > Think of it this way: > > Calculation A returns NaN for some reason > > Calculation B also returns NaN for some reason > > Have they really returned the same result? Just because they're both > NaN doesn't mean that they're the _same_ NaN... Someone who performs two calculations with float numbers should never compare their results for equality. It's really a bug to rely on that comparison: # this is a bug # since the result of this comparison for regular numbers is unpredictable # so doesn't it really matter how this behaves when NaNs are compared? if a/b == c/d: # ... On the other hand, comparing a number to another number, when none of the two numbers are involved in a calculation, is perfectly fine: # this is not a bug # too bad that it won't work as expected # when input1 == input2 == 'nan' a = float(input1) b = float(input2) if a == b: # ... So it seems to me your argument is this: "let's break the expectations of developers who are writing valid code, in order to partially meet the expectations of developers who are writing buggy code". If so, I disagree. From solipsis at pitrou.net Fri Oct 5 01:00:10 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 01:00:10 +0200 Subject: [Python-ideas] checking for identity before comparing built-in objects References: <506D94EE.30808@pearwood.info> Message-ID: <20121005010010.45c4a1c0@pitrou.net> On Thu, 4 Oct 2012 17:08:40 +0200 Victor Stinner wrote: > PyObject_RichCompareBool() implements this optimization which leads to > incorrect results: > > nan = float("nan") > mytuple = (nan,) > assert mytuple != mytuple # fails > > I think that the optimization should be implemented for Unicode > strings, but disabled in PyObject_RichCompareBool(). I think we should wait for someone to complain before disabling it. It's a useful optimization. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ben+python at benfinney.id.au Thu Oct 4 00:20:34 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Thu, 04 Oct 2012 08:20:34 +1000 Subject: [Python-ideas] Visually confusable unicode characters in identifiers References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> <7w7gr9ebzv.fsf@benfinney.id.au> <87ipatl2oo.fsf@uwakimon.sk.tsukuba.ac.jp> <7w1uhhdrxb.fsf@benfinney.id.au> <87bogkktod.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <7wpq4zci4t.fsf@benfinney.id.au> "Stephen J. Turnbull" writes: > Ben Finney writes: > > Right. So you've solved it for one program only, not the OS > > You seem to be under a misconception. Emacs *is* an OS [?] ? all it needs is a good editor? :-) (I'm claiming permission for that snark because Emacs is my primary editor.) > > I'm saying the OS is the right place to solve it, by installing an > > appropriate input method (or whatever each OS calls them). > > I doubt very many people used to and fond of LaTeX would agree with > you, since AFAIK there aren't any OSes providing TeX macros as an > input method. I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE (for GNU+Linux), and they were very glad that it has a LaTeX input method. So anyone who is fond of LaTeX and has IBus or an equivalent input method engine on their OS can agree. > AFAICS it's not available on my Mac. That's a shame. Maybe some OS vendors don't want to support users extending the OS functionality? Or maybe your OS does have such a thing available. I haven't been motivated to look for it. > While I don't particularly favor it, it may be the best compromise, as > many people are familiar with it, and many many symbols are available > with familiar, intuitive names so that non-TeXnical typists can often > guess them. Agreed. Which is why I advocate installing such an input method in one's OS input method engine, so that input method is available for all applications. -- \ ?I thought I'd begin by reading a poem by Shakespeare, but then | `\ I thought ?Why should I? He never reads any of mine.?? ?Spike | _o__) Milligan | Ben Finney From stephen at xemacs.org Fri Oct 5 05:11:34 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 05 Oct 2012 12:11:34 +0900 Subject: [Python-ideas] Visually confusable unicode characters in identifiers In-Reply-To: <7wpq4zci4t.fsf@benfinney.id.au> References: <506860DA.7060905@pearwood.info> <5069BFB7.7070207@gmx.net> <5069CC8F.2070605@gmx.net> <5069EFF9.50706@gmx.net> <7w7gr9ebzv.fsf@benfinney.id.au> <87ipatl2oo.fsf@uwakimon.sk.tsukuba.ac.jp> <7w1uhhdrxb.fsf@benfinney.id.au> <87bogkktod.fsf@uwakimon.sk.tsukuba.ac.jp> <7wpq4zci4t.fsf@benfinney.id.au> Message-ID: <87txu9k3yx.fsf@uwakimon.sk.tsukuba.ac.jp> Ben Finney writes: > I've shown several LaTeX-comfortable people IBus on GNOME and/or KDE > (for GNU+Linux), and they were very glad that it has a LaTeX input > method. I'm happy to be proved wrong! > > AFAICS it's not available on my Mac. > > That's a shame. Maybe some OS vendors don't want to support users > extending the OS functionality? Or maybe your OS does have such a thing > available. I haven't been motivated to look for it. I have looked for it; if it's available on Mac OS X, it's not easy to find. I suspect the same is true for Windows. > Agreed. Which is why I advocate installing such an input method in one's > OS input method engine, so that input method is available for all > applications. Whatever makes you think I don't? That's *exactly* why I live in XEmacs, because it provides me with a portable environment for mixing English and math with a language whose orthography puts Brainf*ck syntax to shame. But pragmatically speaking, Unicode support is a sore point for Python. "Screw you if you don't know how to conveniently input integral signs on your OS" is not a message we want to be sending. From steve at pearwood.info Fri Oct 5 06:52:55 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 5 Oct 2012 14:52:55 +1000 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <20121005010010.45c4a1c0@pitrou.net> References: <506D94EE.30808@pearwood.info> <20121005010010.45c4a1c0@pitrou.net> Message-ID: <20121005045254.GA14666@ando> On Fri, Oct 05, 2012 at 01:00:10AM +0200, Antoine Pitrou wrote: > On Thu, 4 Oct 2012 17:08:40 +0200 > Victor Stinner > wrote: > > PyObject_RichCompareBool() implements this optimization which leads to > > incorrect results: > > > > nan = float("nan") > > mytuple = (nan,) > > assert mytuple != mytuple # fails > > > > I think that the optimization should be implemented for Unicode > > strings, but disabled in PyObject_RichCompareBool(). > > I think we should wait for someone to complain before disabling it. > It's a useful optimization. +1 I will go to the wall to defend correct IEEE 754 semantics for NANs, but I also support containers that optimise away those semantics by default. I think it's too early to talk about disabling it without even the report of a bug caused by it. -- Steven From andy at insectnation.org Fri Oct 5 11:27:28 2012 From: andy at insectnation.org (Andy Buckley) Date: Fri, 05 Oct 2012 11:27:28 +0200 Subject: [Python-ideas] History stepping in interactive session? Message-ID: <506EA800.1080106@insectnation.org> A couple of weeks ago I posted a question on superuser.com about whether there is a way to get the same *very* convenient stepping-through-command-history behaviour in an interactive Python interpreter session as is possible in (at least) the bash shell with the Ctrl-o keybinding: http://superuser.com/questions/477997/key-binding-to-interactively-execute-commands-from-python-interpreter-history-in I was spurred to ask this question by a painful development experience full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to repeat a previous set of Python commands/statements that weren't worth putting in a script file, or which I wanted to make very minor changes to on each iteration. As you might have noticed, I didn't get any answers, which either means that I'm the only person in the world to think this is an issue worth getting bothered about, or that there is no such behaviour available. Perhaps both -- but my feeling is that if this behaviour were available and well-known, it would become heavily used and very popular. As many other readline behaviours *do* work, this one would be really nice to have -- any chance that it could be added to a future release? (if it's not already there via some secret binding) Thanks! Andy From stephen at xemacs.org Fri Oct 5 12:26:03 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 05 Oct 2012 19:26:03 +0900 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <506EA800.1080106@insectnation.org> References: <506EA800.1080106@insectnation.org> Message-ID: <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> Andy Buckley writes: > A couple of weeks ago I posted a question on superuser.com Maybe it's a bug. (See below.) Have you checked the tracker? Have you posted to python-list? That's a better place than here to get that kind of information. > As you might have noticed, The people on this list (and on python-dev) probably don't pay much attention to questions on superuser.com, unless they're the kind of people who hang out on python-list. Sorry for not being much help, but after trying the obvious (read the GNU bash manpage, grep the output of "bind -p" to find out what C-o does, check that python does link to "True GNU" readline on my platform, try python, and when that didn't work, restart python after doing "bind -p >> ~/.inputc", which didn't work either), I don't know. It *might* be a bug, or you could file for an RFE if it's by design. From solipsis at pitrou.net Fri Oct 5 14:09:27 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 14:09:27 +0200 Subject: [Python-ideas] History stepping in interactive session? References: <506EA800.1080106@insectnation.org> Message-ID: <20121005140927.759293ed@pitrou.net> On Fri, 05 Oct 2012 11:27:28 +0200 Andy Buckley wrote: > A couple of weeks ago I posted a question on superuser.com about whether > there is a way to get the same *very* convenient > stepping-through-command-history behaviour in an interactive Python > interpreter session as is possible in (at least) the bash shell with the > Ctrl-o keybinding: The interactive interpreter (and I mean the default one, not third-party choices like IPython) uses libreadline for its editing and history functionality, so it's really a question about libreadline you're asking. I don't know if it allows such customization, but perhaps the Web site has the answer you're looking for: http://www.gnu.org/software/readline/ http://cnswww.cns.cwru.edu/php/chet/readline/rluserman.html Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From oscar.j.benjamin at gmail.com Fri Oct 5 14:43:00 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Fri, 5 Oct 2012 13:43:00 +0100 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <506EA800.1080106@insectnation.org> References: <506EA800.1080106@insectnation.org> Message-ID: On 5 October 2012 10:27, Andy Buckley wrote: > A couple of weeks ago I posted a question on superuser.com about whether > there is a way to get the same *very* convenient > stepping-through-command-history behaviour in an interactive Python > interpreter session as is possible in (at least) the bash shell with the > Ctrl-o keybinding: > > http://superuser.com/questions/477997/key-binding-to-interactively-execute-commands-from-python-interpreter-history-in > > I was spurred to ask this question by a painful development experience > full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to > repeat a previous set of Python commands/statements that weren't worth > putting in a script file, or which I wanted to make very minor changes > to on each iteration. As soon as I find myself doing this I quit the interpreter and start ipython. The feature that ipython has that makes what you are doing much easier is the magic %edit command. Just type In [1]: edit tmp.py and your favourite editor will open up allowing you to write/edit some code. When you close the editor, ipython will run the code from tmp.py within the interactive session (as if you had typed it in directly). If you want to rerun that code with modifications just type 'edit tmp.py' again and you can make the modifications within your editor. Oscar From steve at pearwood.info Fri Oct 5 16:24:39 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 06 Oct 2012 00:24:39 +1000 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <20121005140927.759293ed@pitrou.net> References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> Message-ID: <506EEDA7.9000108@pearwood.info> On 05/10/12 22:09, Antoine Pitrou wrote: > On Fri, 05 Oct 2012 11:27:28 +0200 > Andy Buckley wrote: >> A couple of weeks ago I posted a question on superuser.com about whether >> there is a way to get the same *very* convenient >> stepping-through-command-history behaviour in an interactive Python >> interpreter session as is possible in (at least) the bash shell with the >> Ctrl-o keybinding: > > The interactive interpreter (and I mean the default one, not > third-party choices like IPython) uses libreadline for its > editing and history functionality, so it's really a question about > libreadline you're asking. I don't think so. I'm not an expert on readline, but it seems to me to be a Python bug. In bash, I check for the existence of the "operate-and-get-next" command, and sure enough it is bound to C-o (Ctrl-o) as expected: [steve at ando ~]$ bind -p | grep operate "\C-o": operate-and-get-next I don't believe that there is any direct mechanism for querying the current readline bindings in Python, but I can fake it with the "dump-functions" command: import readline readline.parse_and_bind(r'"\C-xd": dump-functions') If I then type Ctrl-x d at the interactive interpreter, readline dumps the function bindings to screen: py> readline.parse_and_bind(r'"\C-xd": dump-functions') py> abort can be found on "\C-g", "\C-x\C-g", "\M-\C-g". accept-line can be found on "\C-j", "\C-m". arrow-key-prefix is not bound to any keys backward-byte is not bound to any keys backward-char can be found on "\C-b", "\M-OD", "\M-[D". [...] operate-and-get-next is absent from the list. I don't mean that it is not bound. It just isn't there at all. If I nevertheless try to use it: readline.parse_and_bind(r'"\C-o": operate-and-get-next') it does *not* enable Ctrl-o as expected, operate-and-get-next remains absent from the list of bindings. I have checked this on both Python 2.7 and 3.3.0rc3 under Centos 5, and on 3.3.0rc3 under Debian Squeeze. -- Steven From solipsis at pitrou.net Fri Oct 5 16:30:00 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 16:30:00 +0200 Subject: [Python-ideas] History stepping in interactive session? References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> <506EEDA7.9000108@pearwood.info> Message-ID: <20121005163000.775a719e@pitrou.net> On Sat, 06 Oct 2012 00:24:39 +1000 Steven D'Aprano wrote: > On 05/10/12 22:09, Antoine Pitrou wrote: > > On Fri, 05 Oct 2012 11:27:28 +0200 > > Andy Buckley wrote: > >> A couple of weeks ago I posted a question on superuser.com about whether > >> there is a way to get the same *very* convenient > >> stepping-through-command-history behaviour in an interactive Python > >> interpreter session as is possible in (at least) the bash shell with the > >> Ctrl-o keybinding: > > > > The interactive interpreter (and I mean the default one, not > > third-party choices like IPython) uses libreadline for its > > editing and history functionality, so it's really a question about > > libreadline you're asking. > > > I don't think so. I'm not an expert on readline, but it seems to me to be a > Python bug. [snip useful explanations] Well, if there is a bug, then it should be reported on the tracker (and a patch uploaded, if possible :-)). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From phd at phdru.name Fri Oct 5 15:17:17 2012 From: phd at phdru.name (Oleg Broytman) Date: Fri, 5 Oct 2012 17:17:17 +0400 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <20121005140927.759293ed@pitrou.net> References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> Message-ID: <20121005131717.GA15819@iskra.aviel.ru> On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou wrote: > http://cnswww.cns.cwru.edu/php/chet/readline/rluserman.html The manual lacks the function "operate-and-get-next" bound in bash to Ctrl-O. Either the manual is old or the function is not a function of readline but rather one implemented by bash. That requires further investigation (which I'm not going to do). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From nadeem.vawda at gmail.com Fri Oct 5 17:23:08 2012 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Fri, 5 Oct 2012 17:23:08 +0200 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <20121005131717.GA15819@iskra.aviel.ru> References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> <20121005131717.GA15819@iskra.aviel.ru> Message-ID: On Fri, Oct 5, 2012 at 3:17 PM, Oleg Broytman wrote: > On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou wrote: >> http://cnswww.cns.cwru.edu/php/chet/readline/rluserman.html > > The manual lacks the function "operate-and-get-next" bound in bash to > Ctrl-O. Either the manual is old or the function is not a function of > readline but rather one implemented by bash. That requires further > investigation (which I'm not going to do). The function is implemented by bash; see operate_and_get_next() in bashline.c. -Nadeem From phd at phdru.name Fri Oct 5 17:39:27 2012 From: phd at phdru.name (Oleg Broytman) Date: Fri, 5 Oct 2012 19:39:27 +0400 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> <20121005131717.GA15819@iskra.aviel.ru> Message-ID: <20121005153927.GA19322@iskra.aviel.ru> On Fri, Oct 05, 2012 at 05:23:08PM +0200, Nadeem Vawda wrote: > On Fri, Oct 5, 2012 at 3:17 PM, Oleg Broytman wrote: > > On Fri, Oct 05, 2012 at 02:09:27PM +0200, Antoine Pitrou wrote: > >> http://cnswww.cns.cwru.edu/php/chet/readline/rluserman.html > > > > The manual lacks the function "operate-and-get-next" bound in bash to > > Ctrl-O. Either the manual is old or the function is not a function of > > readline but rather one implemented by bash. That requires further > > investigation (which I'm not going to do). > > The function is implemented by bash; see operate_and_get_next() in bashline.c. Thanks! That closes the issue -- the function are to be implemented by (a user of) Python if one wants to have it in Python. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Fri Oct 5 20:25:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 20:25:34 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths Message-ID: <20121005202534.5f721292@pitrou.net> Hello, This PEP is a resurrection of the idea of having object-oriented filesystem paths in the stdlib. It comes with a general API proposal as well as a specific implementation (*). The implementation is young and discussion is quite open. (*) http://pypi.python.org/pypi/pathlib/ Regards Antoine. PS: You can all admire my ASCII-art skills. PEP: 428 Title: The pathlib module -- object-oriented filesystem paths Version: $Revision$ Last-Modified: $Date Author: Antoine Pitrou Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 30-July-2012 Python-Version: 3.4 Post-History: Abstract ======== This PEP proposes the inclusion of a third-party module, `pathlib`_, in the standard library. The inclusion is proposed under the provisional label, as described in :pep:`411`. Therefore, API changes can be done, either as part of the PEP process, or after acceptance in the standard library (and until the provisional label is removed). The aim of this library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them. .. _`pathlib`: http://pypi.python.org/pypi/pathlib/ Related work ============ An object-oriented API for filesystem paths has already been proposed and rejected in :pep:`355`. Several third-party implementations of the idea of object-oriented filesystem paths exist in the wild: * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs and others, which provides a ``str``-subclassing ``Path`` class; * Twisted's slightly specialized `FilePath class`_; * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than ``str``; * `Unipath`_, a variation on the str-subclassing approach with two public classes, an ``AbstractPath`` class for operations which don't do I/O and a ``Path`` class for all common operations. This proposal attempts to learn from these previous attempts and the rejection of :pep:`355`. .. _`path.py module`: https://github.com/jaraco/path.py .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview Why an object-oriented API ========================== The rationale to represent filesystem paths using dedicated classes is the same as for other kinds of stateless objects, such as dates, times or IP addresses. Python has been slowly moving away from strictly replicating the C language's APIs to providing better, more helpful abstractions around all kinds of common functionality. Even if this PEP isn't accepted, it is likely that another form of filesystem handling abstraction will be adopted one day into the standard library. Indeed, many people will prefer handling dates and times using the high-level objects provided by the ``datetime`` module, rather than using numeric timestamps and the ``time`` module API. Moreover, using a dedicated class allows to enable desirable behaviours by default, for example the case insensitivity of Windows paths. Proposal ======== Class hierarchy --------------- The `pathlib`_ module implements a simple hierarchy of classes:: +----------+ | | ---------| PurePath |-------- | | | | | +----------+ | | | | | | | v | v +---------------+ | +------------+ | | | | | | PurePosixPath | | | PureNTPath | | | | | | +---------------+ | +------------+ | v | | +------+ | | | | | | -------| Path |------ | | | | | | | | | +------+ | | | | | | | | | | v v v v +-----------+ +--------+ | | | | | PosixPath | | NTPath | | | | | +-----------+ +--------+ This hierarchy divides path classes along two dimensions: * a path class can be either pure or concrete: pure classes support only operations that don't need to do any actual I/O, which are most path manipulation operations; concrete classes support all the operations of pure classes, plus operations that do I/O. * a path class is of a given flavour according to the kind of operating system paths it represents. `pathlib`_ implements two flavours: NT paths for the filesystem semantics embodied in Windows systems, POSIX paths for other systems (``os.name``'s terminology is re-used here). Any pure class can be instantiated on any system: for example, you can manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects under Unix, and so on. However, concrete classes can only be instantiated on a matching system: indeed, it would be error-prone to start doing I/O with ``NTPath`` objects under Unix, or vice-versa. Furthermore, there are two base classes which also act as system-dependent factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a ``PureNTPath`` depending on the operating system. Similarly, ``Path`` will instantiate either a ``PosixPath`` or a ``NTPath``. It is expected that, in most uses, using the ``Path`` class is adequate, which is why it has the shortest name of all. No confusion with builtins -------------------------- In this proposal, the path classes do not derive from a builtin type. This contrasts with some other Path class proposals which were derived from ``str``. They also do not pretend to implement the sequence protocol: if you want a path to act as a sequence, you have to lookup a dedicate attribute (the ``parts`` attribute). By avoiding to pass as builtin types, the path classes minimize the potential for confusion if they are combined by accident with genuine builtin types. Immutability ------------ Path objects are immutable, which makes them hashable and also prevents a class of programming errors. Sane behaviour -------------- Little of the functionality from os.path is reused. Many os.path functions are tied by backwards compatibility to confusing or plain wrong behaviour (for example, the fact that ``os.path.abspath()`` simplifies ".." path components without resolving symlinks first). Also, using classes instead of plain strings helps make system-dependent behaviours natural. For example, comparing and ordering Windows path objects is case-insensitive, and path separators are automatically converted to the platform default. Useful notations ---------------- The API tries to provide useful notations all the while avoiding magic. Some examples:: >>> p = Path('/home/antoine/pathlib/setup.py') >>> p.name 'setup.py' >>> p.ext '.py' >>> p.root '/' >>> p.parts >>> list(p.parents()) [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] >>> p.exists() True >>> p.st_size 928 Pure paths API ============== The philosophy of the ``PurePath`` API is to provide a consistent array of useful path manipulation operations, without exposing a hodge-podge of functions like ``os.path`` does. Definitions ----------- First a couple of conventions: * All paths can have a drive and a root. For POSIX paths, the drive is always empty. * A relative path has neither drive nor root. * A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive *and* a root. A Windows UNC path (e.g. ``\\some\\share\\myfile.txt``) always has a drive and a root (here, ``\\some\\share`` and ``\\``, respectively). * A drive which has either a drive *or* a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute". Construction and joining ------------------------ We will present construction and joining together since they expose similar semantics. The simplest way to construct a path is to pass it its string representation:: >>> PurePath('setup.py') PurePosixPath('setup.py') Extraneous path separators and ``"."`` components are eliminated:: >>> PurePath('a///b/c/./d/') PurePosixPath('a/b/c/d') If you pass several arguments, they will be automatically joined:: >>> PurePath('docs', 'Makefile') PurePosixPath('docs/Makefile') Joining semantics are similar to os.path.join, in that anchored paths ignore the information from the previously joined components:: >>> PurePath('/etc', '/usr', 'bin') PurePosixPath('/usr/bin') However, with Windows paths, the drive is retained as necessary:: >>> PureNTPath('c:/foo', '/Windows') PureNTPath('c:\\Windows') >>> PureNTPath('c:/foo', 'd:') PureNTPath('d:') Calling the constructor without any argument creates a path object pointing to the logical "current directory":: >>> PurePosixPath() PurePosixPath('.') A path can be joined with another using the ``__getitem__`` operator:: >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') >>> p[PurePosixPath('bar')] PurePosixPath('foo/bar') As with constructing, multiple path components can be specified at once:: >>> p['bar/xyzzy'] PurePosixPath('foo/bar/xyzzy') A join() method is also provided, with the same behaviour. It can serve as a factory function:: >>> path_factory = p.join >>> path_factory('bar') PurePosixPath('foo/bar') Representing ------------ To represent a path (e.g. to pass it to third-party libraries), just call ``str()`` on it:: >>> p = PurePath('/home/antoine/pathlib/setup.py') >>> str(p) '/home/antoine/pathlib/setup.py' >>> p = PureNTPath('c:/windows') >>> str(p) 'c:\\windows' To force the string representation with forward slashes, use the ``as_posix()`` method:: >>> p.as_posix() 'c:/windows' To get the bytes representation (which might be useful under Unix systems), call ``bytes()`` on it, or use the ``as_bytes()`` method:: >>> bytes(p) b'/home/antoine/pathlib/setup.py' Properties ---------- Five simple properties are provided on every path (each can be empty):: >>> p = PureNTPath('c:/pathlib/setup.py') >>> p.drive 'c:' >>> p.root '\\' >>> p.anchor 'c:\\' >>> p.name 'setup.py' >>> p.ext '.py' Sequence-like access -------------------- The ``parts`` property provides read-only sequence access to a path object:: >>> p = PurePosixPath('/etc/init.d') >>> p.parts Simple indexing returns the invidual path component as a string, while slicing returns a new path object constructed from the selected components:: >>> p.parts[-1] 'init.d' >>> p.parts[:-1] PurePosixPath('/etc') Windows paths handle the drive and the root as a single path component:: >>> p = PureNTPath('c:/setup.py') >>> p.parts >>> p.root '\\' >>> p.parts[0] 'c:\\' (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). The ``parent()`` method returns an ancestor of the path:: >>> p.parent() PureNTPath('c:\\python33\\bin') >>> p.parent(2) PureNTPath('c:\\python33') >>> p.parent(3) PureNTPath('c:\\') The ``parents()`` method automates repeated invocations of ``parent()``, until the anchor is reached:: >>> p = PureNTPath('c:/python33/bin/python.exe') >>> for parent in p.parents(): parent ... PureNTPath('c:\\python33\\bin') PureNTPath('c:\\python33') PureNTPath('c:\\') Querying -------- ``is_relative()`` returns True if the path is relative (see definition above), False otherwise. ``is_reserved()`` returns True if a Windows path is a reserved path such as ``CON`` or ``NUL``. It always returns False for POSIX paths. ``match()`` matches the path against a glob pattern:: >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY') True ``relative()`` returns a new relative path by stripping the drive and root:: >>> PurePosixPath('setup.py').relative() PurePosixPath('setup.py') >>> PurePosixPath('/setup.py').relative() PurePosixPath('setup.py') ``relative_to()`` computes the relative difference of a path to another:: >>> PurePosixPath('/usr/bin/python').relative_to('/usr') PurePosixPath('bin/python') ``normcase()`` returns a case-folded version of the path for NT paths:: >>> PurePosixPath('CAPS').normcase() PurePosixPath('CAPS') >>> PureNTPath('CAPS').normcase() PureNTPath('caps') Concrete paths API ================== In addition to the operations of the pure API, concrete paths provide additional methods which actually access the filesystem to query or mutate information. Constructing ------------ The classmethod ``cwd()`` creates a path object pointing to the current working directory in absolute form:: >>> Path.cwd() PosixPath('/home/antoine/pathlib') File metadata ------------- The ``stat()`` method caches and returns the file's stat() result; ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided, but doesn't have any caching behaviour:: >>> p.stat() posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) For ease of use, direct attribute access to the fields of the stat structure is provided over the path object itself:: >>> p.st_size 928 >>> p.st_mtime 1328287308.889562 Higher-level methods help examine the kind of the file:: >>> p.exists() True >>> p.is_file() True >>> p.is_dir() False >>> p.is_symlink() False The file owner and group names (rather than numeric ids) are queried through matching properties:: >>> p = Path('/etc/shadow') >>> p.owner 'root' >>> p.group 'shadow' Path resolution --------------- The ``resolve()`` method makes a path absolute, resolving any symlink on the way. It is the only operation which will remove "``..``" path components. Directory walking ----------------- Simple (non-recursive) directory access is done by iteration:: >>> p = Path('docs') >>> for child in p: child ... PosixPath('docs/conf.py') PosixPath('docs/_templates') PosixPath('docs/make.bat') PosixPath('docs/index.rst') PosixPath('docs/_build') PosixPath('docs/_static') PosixPath('docs/Makefile') This allows simple filtering through list comprehensions:: >>> p = Path('.') >>> [child for child in p if child.is_dir()] [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] Simple and recursive globbing is also provided:: >>> for child in p.glob('**/*.py'): child ... PosixPath('test_pathlib.py') PosixPath('setup.py') PosixPath('pathlib.py') PosixPath('docs/conf.py') PosixPath('build/lib/pathlib.py') File opening ------------ The ``open()`` method provides a file opening API similar to the builtin ``open()`` method:: >>> p = Path('setup.py') >>> with p.open() as f: f.readline() ... '#!/usr/bin/env python3\n' The ``raw_open()`` method, on the other hand, is similar to ``os.open``:: >>> fd = p.raw_open(os.O_RDONLY) >>> os.read(fd, 15) b'#!/usr/bin/env ' Filesystem alteration --------------------- Several common filesystem operations are provided as methods: ``touch()``, ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be provided, for example some of the functionality of the shutil module. Experimental openat() support ----------------------------- On compatible POSIX systems, the concrete PosixPath class can take advantage of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of open file descriptors as necessary. Support is enabled by passing the *use_openat* argument to the constructor:: >>> p = Path(".", use_openat=True) Then all paths constructed by navigating this path (either by iteration or indexing) will also use the openat() family of functions. The point of using these functions is to avoid race conditions whereby a given directory is silently replaced with another (often a symbolic link to a sensitive system location) between two accesses. .. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html Copyright ========= This document has been placed into the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 From josiah.carlson at gmail.com Fri Oct 5 20:51:21 2012 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Fri, 5 Oct 2012 11:51:21 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121003144320.GA16485@hephaistos.amsuess.com> References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Wed, Oct 3, 2012 at 7:43 AM, chrysn wrote: > On Wed, Sep 26, 2012 at 10:02:24AM -0700, Josiah Carlson wrote: >> Go ahead and read PEP 3153, we will wait. >> >> A careful reading of PEP 3153 will tell you that the intent is to make >> a "light" version of Twisted built into Python. There isn't any >> discussion as to *why* this is a good idea, it just lays out the plan >> of action. Its ideas were gathered from the experience of the Twisted >> folks. >> >> Their experience is substantial, but in the intervening 1.5+ years >> since Pycon 2011, only the barest of abstract interfaces has been >> defined (https://github.com/lvh/async-pep/blob/master/async/abstract.py), >> and no discussion has taken place as to forward migration of the >> (fairly large) body of existing asyncore code. > > it doesn't look like twisted-light to me, more like a interface > suggestion for a small subset of twisted. in particular, it doesn't talk > about main loops / reactors / registration-in-the-first-place. > > you mention interaction with the twisted people. is there willingness, > from the twisted side, to use a standard python middle layer, once it > exists and has sufficiently high quality? >> To the point, Giampaolo already has a reactor that implements the >> interface (more or less "idea #3" from his earlier message), and it's >> been used in production (under staggering ftp(s) load). Even better, >> it offers effectively transparent replacement of the existing asyncore >> loop, and supports existing asyncore-derived classes. It is available: >> https://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py > > i've had a look at it, but honestly can't say more than that it's good > to have a well-tested asyncore compatible main loop with scheduling > support, and i'll try it out for my own projects. > >> >> Again, at this point in time what you're proposing looks too vague, >> >> ambitious and premature to me. >> > >> > please don't get me wrong -- i'm not proposing anything for immediate >> > action, i just want to start a thinking process towards a better >> > integrated stdlib. >> >> I am curious as to what you mean by "a better integrated stdlib". A >> new interface that doesn't allow people to easily migrate from an >> existing (and long-lived, though flawed) standard library is not >> better integration. Better integration requires allowing previous >> users to migrate, while encouraging new users to join in with any >> later development. That's what Giampaolo's suggested interface offers >> on the lowest level; something to handle file-handle reactors, >> combined with a scheduler. > > a new interface won't make integration automatically happen, but it's > something the standard library components can evolve on. whether, for > example urllib2 will then automatically work asynchronously in that > framework or whether we'll wait for urllib3, we'll see when we have it. Things don't "automatically work" without work. You can't just make urllib2 work asynchronously unless you do the sorts of greenlet-style stack switching that lies to you about what is going on, or unless you redesign it from scratch to do such. That's not to say that greenlets are bad, they are great. But expecting that a standard library implementing an updated async spec will all of a sudden hook itself into a synchronous socket client? I think that expectation is unreasonable. > @migrate from an existing standard library: is there a big user base for > the current asyncore framework? my impression from is that it is not > very well known among python users, and most that could use it use > twisted. "Well known" is an interesting claim. I believe it actually known of by quite a large part of the community, but due to a (perhaps deserved) reputation (that may or may not still be the case), isn't used as often as Twisted. But along those lines, there are a few questions that should be asked: 1. Is it desirable to offer users the chance to transition from asyncore-derived stuff to some new thing? 2. If so, what is necessary for an upgrade/replacement for asyncore/asynchat in the long term? 3. Would 3rd parties use this as a basis for their libraries? 4. What are the short, mid, and long-term goals? For my answers: 1. I think it is important to offer people who are using a standard library module to continue using a standard library module if possible. 2. A transition should offer either an adapter or similar-enough API equivalency between the old and new. 3. I think that if it offers a reasonable API, good functionality, and examples are provided - both as part of the stdlib and outside the stdlib, people will see the advantages of maintaining less of their own custom code. To the point: would Twisted use *whatever* was in the stdlib? I don't know the answer, but unless the API is effectively identical to Twisted, that transition may be delayed significantly. 4. Short: get current asyncore people transitioned to something demonstrably better, that 3rd parties might also use. Mid: pull parsers/logic out of cores of methods and make them available for sync/async/3rd party parsing/protocol handling (get the best protocol parsers into the stdlib, separated from the transport). Long: everyone contributes/updates the stdlib modules because it has the best parsers for protocols/formats, that can be used from *anywhere* (sync or async). My long-term dream (which has been the case for 6+ years, since I proposed doing it myself on the python-dev mailing list and was told "no") is that whether someone uses urllib2, httplib2, smtpd, requests, ftplib, etc., they all have access to high-quality protocol-level protocol parsers. So that once one person writes the bit that handles http 30X redirects, everyone can use it. So that when one person writes the gzip + chunked transfer encoding/decoding, everyone can use it. >> > we've talked about many things we'd need in a python asynchronous >> > interface (not implementation), so what are the things we *don't* need? >> > (so we won't start building a framework like twisted). i'll start: >> > >> > * high-level protocol handling (can be extra modules atop of it) >> > * ssl >> > * something like the twisted delayed framework (not sure about that, i >> > guess the twisted people will have good reason to use it, but i don't >> > see compelling reasons for such a thing in a minimal interface from my >> > limited pov) >> > * explicit connection handling (retries, timeouts -- would be up to the >> > user as well, eg urllib might want to set up a timeout and retries for >> > asynchronous url requests) >> >> I disagree with the last 3. If you have an IO loop, more often than >> not you want an opportunity to do something later in the same context. >> This is commonly the case for bandwidth limiting, connection timeouts, >> etc., which are otherwise *very* difficult to do at a higher level >> (which are the reasons why schedulers are built into IO loops). >> Further, SSL in async can be tricky to get right. Having the 20-line >> SSL layer as an available class is a good idea, and will save people >> time by not having them re-invent it (poorly or incorrectly) every >> time. > > i see; those should be provided, then. > > i'm afraid i don't completely get the point you're making, sorry for > that, maybe i've missed important statements or lack sufficiently deep > knowledge of topics affected and got lost in details. > > what is your opinion on the state of asynchronous operations in python, > and what would you like it to be? I think it is functional, but flawed. I also think that every 3rd party that does network-level protocols are different mixes of functional and flawed. I think that there is a repeated and often-times wasted effort where folks are writing different and invariably crappy (to some extent) protocol parsers and network handlers. I think that whenever possible, that should stop, and the highest-quality protocol parsing functions/methods should be available in the Python standard library, available to be called from any library, whether sync, async, stdlib, or 3rd party. Now, my discussions in the context of asyncore-related upgrades may seem like a strange leap, but some of these lesser-quality parsing routines exist in asyncore-derived classes, as well as non-asyncore-derived classes. But if we make an effort on the asyncore side of things, under the auspices of improving one stdlib module, offering additional functionality, the obviousness of needing protocol-level parsers shared among sync/async should become obvious to *everyone* (that it isn't now the case I suspect is because the communities either don't spend a lot of time cross-pollinating, people like writing parsers - I do too ;) - or the sync folks end up going the greenlet route if/when threading bites them on the ass). Regards, - Josiah From phd at phdru.name Fri Oct 5 21:16:25 2012 From: phd at phdru.name (Oleg Broytman) Date: Fri, 5 Oct 2012 23:16:25 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <20121005191625.GA23607@iskra.aviel.ru> Hi! On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou wrote: > This PEP proposes the inclusion of a third-party module, `pathlib`_, in > the standard library. +1 from me for a sane path handling in the stdlib! > >>> p = Path('/home/antoine/pathlib/setup.py') > >>> p.name > 'setup.py' > >>> p.ext > '.py' > >>> p.root > '/' > >>> p.parts > > >>> list(p.parents()) > [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] Some attributes are properties and some are methods. Which is which? Why .root is a property but .parents() is a method? .owner/.group are properties but .exists() is a method, and so on. .stat() just returns self._stat, but said ._stat is a property! > A Windows UNC path (e.g. > ``\\some\\share\\myfile.txt``) always has a drive and a root > (here, ``\\some\\share`` and ``\\``, respectively). If I understand it correctly these should are either \\\\some\\share\\myfile.txt and \\\\some\\share or \\some\share\myfile.txt and \\some\share no? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From tjreedy at udel.edu Fri Oct 5 21:18:21 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 05 Oct 2012 15:18:21 -0400 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: References: <506EA800.1080106@insectnation.org> Message-ID: On 10/5/2012 8:43 AM, Oscar Benjamin wrote: > On 5 October 2012 10:27, Andy Buckley wrote: >> I was spurred to ask this question by a painful development experience >> full of Up Up Up Up Up Enter Up Up Up Up Up Enter ... keypresses to >> repeat a previous set of Python commands/statements that weren't worth >> putting in a script file, or which I wanted to make very minor changes >> to on each iteration. Using Windows for a couple of decades, I am not spoiled by bash ;-). Idle lets me directly click on a previous statement and hit enter to make it the current statement. Edit if desired and hit enter again to execute again in the current workspace. But I agree with Oscar that even a few lines are worth a temporary script file. > As soon as I find myself doing this I quit the interpreter and start > ipython. The feature that ipython has that makes what you are doing > much easier is the magic %edit command. Just type > > In [1]: edit tmp.py > > and your favourite editor will open up allowing you to write/edit some > code. When you close the editor, ipython will run the code from tmp.py > within the interactive session (as if you had typed it in directly). > If you want to rerun that code with modifications just type 'edit > tmp.py' again and you can make the modifications within your editor. In Idle, I click File - Recent files - .../tem.py (in my misc. files directory) to open an edit window, which I leave open all day. Running from the edit window does restart the workspace, so one would have to cut and paste to not restart. I seldom want to re-run multiple lines without restarting. If I want to keep the 'temporary' code, saving under a different name is easy. -- Terry Jan Reedy From p.f.moore at gmail.com Fri Oct 5 21:19:12 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 5 Oct 2012 20:19:12 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: On 5 October 2012 19:25, Antoine Pitrou wrote: > A path can be joined with another using the ``__getitem__`` operator:: > > >>> p = PurePosixPath('foo') > >>> p['bar'] > PurePosixPath('foo/bar') > >>> p[PurePosixPath('bar')] > PurePosixPath('foo/bar') There is a risk that this is too "cute". However, it's probably better than overloading the '/' operator, and you do need something short. > As with constructing, multiple path components can be specified at once:: > > >>> p['bar/xyzzy'] > PurePosixPath('foo/bar/xyzzy') That's risky. Are you proposing always using '/' regardless of OS? I'd have expected os.sep (so \ on Windows). On the other hand, that would make p['bar\\baz'] mean two different things on Windows and Unix - 2 extra path levels on Windows, only one on Unix (and a filename containing a backslash). It would probably be better to allow tuples as arguments: p['bar', 'baz'] > Properties > ---------- > > Five simple properties are provided on every path (each can be empty):: > > >>> p = PureNTPath('c:/pathlib/setup.py') > >>> p.drive > 'c:' > >>> p.root > '\\' > >>> p.anchor > 'c:\\' > >>> p.name > 'setup.py' > >>> p.ext > '.py' I don't like the way the distinction between "root" and "anchor" works here. Unix users are never going to use "anchor", as "root" is the natural term, and it does exactly the right thing on Unix. So code written on Unix will tend to do the wrong thing on Windows (where generally you'd want to use "anchor" or you'll find yourself switching accidentally to the current drive). It's a rare situation where it would matter, which on the one hand makes it much less worth worrying about, but on the other hand means that when bugs *do* occur, they will be very obscure :-( Also, there is no good terminology in current use here. The only concrete thing I can suggest is that "root" would be better used as the term for what you're calling "anchor" as Windows users would expect the root of "C:\foo\bar\baz" to be "C:\". The term "drive" would be right for "C:" (although some might expect that to mean "C:\" as well, but there's no point wasting two terms on the one concept). It might be more practical to use a new, but explicit, term like "driveroot" for "\". It's the same as root on Unix, and on Windows it's fairly obviously "the root on the current drive". And by using the coined term for the less common option, it might act as a reminder to people that something not entirely portable is going on. But there's no really simple answer - Windows and Unix are just different here. > The ``parts`` property provides read-only sequence access to a path object:: > > >>> p = PurePosixPath('/etc/init.d') > >>> p.parts > +1. There's lots of times I have wished os.path had this. > Windows paths handle the drive and the root as a single path component:: > > >>> p = PureNTPath('c:/setup.py') > >>> p.parts > > >>> p.root > '\\' > >>> p.parts[0] > 'c:\\' > > (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). This again suggests to me that "C:\" is more closely allied to the term "root" here. Also, I assume that paths will be comparable, using case sensitivity appropriate to the platform. Presumably a PurePath and a Path are comparable, too. What about a PosixPath and an NTPath? Would you expect them to be comparable or not? But in general, this looks like a pretty good proposal. Having a decent path abstraction in the stdlib would be great. Paul. From mikegraham at gmail.com Fri Oct 5 21:23:57 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 5 Oct 2012 15:23:57 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou wrote: > > Hello, > > This PEP is a resurrection of the idea of having object-oriented > filesystem paths in the stdlib. It comes with a general API proposal > as well as a specific implementation (*). The implementation is young > and discussion is quite open. > > (*) http://pypi.python.org/pypi/pathlib/ > > Regards > > Antoine. The os.path approach probably isn't the best, but it does work pretty well in practice. I'm not sure I see the benefit of introducing something new. Mike From ubershmekel at gmail.com Fri Oct 5 21:36:56 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Fri, 5 Oct 2012 21:36:56 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005191625.GA23607@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> Message-ID: On Fri, Oct 5, 2012 at 9:16 PM, Oleg Broytman wrote: > Some attributes are properties and some are methods. Which is which? > Why .root is a property but .parents() is a method? .owner/.group are > properties but .exists() is a method, and so on. .stat() just returns > self._stat, but said ._stat is a property! > > Unobvious indeed. Maybe operations that cause OS api calls should have parens? Also, I agree with Paul Moore that the naming at its current state may cause cross-platform bugs. Though I don't understand why not to overload the "/" or "+" operators. Sounds more elegant than square brackets. Just make sure the op fails on anything other than Path objects. I'm +1 on adding such a useful abstraction to python if and only if it were >= os.path on every front, Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Oct 5 21:41:01 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 21:41:01 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> Message-ID: <20121005214101.104d9b76@pitrou.net> On Fri, 5 Oct 2012 23:16:25 +0400 Oleg Broytman wrote: > Hi! > > On Fri, Oct 05, 2012 at 08:25:34PM +0200, Antoine Pitrou wrote: > > This PEP proposes the inclusion of a third-party module, `pathlib`_, in > > the standard library. > > +1 from me for a sane path handling in the stdlib! > > > >>> p = Path('/home/antoine/pathlib/setup.py') > > >>> p.name > > 'setup.py' > > >>> p.ext > > '.py' > > >>> p.root > > '/' > > >>> p.parts > > > > >>> list(p.parents()) > > [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] > > Some attributes are properties and some are methods. Which is which? > Why .root is a property but .parents() is a method? .owner/.group are > properties but .exists() is a method, and so on. .stat() just returns > self._stat, but said ._stat is a property! parents() returns a generator (hence the list() call in the example above). A generator-returning property sounds a bit too confusing IMHO. ._stat is an implementation detail. stat() and exists() both mirror similar APIs in the os / os.path modules. .name, .ext, .root, .parts just return static, immutable properties of the path, I see no reason for them to be methods. > > A Windows UNC path (e.g. > > ``\\some\\share\\myfile.txt``) always has a drive and a root > > (here, ``\\some\\share`` and ``\\``, respectively). > > If I understand it correctly these should are either > \\\\some\\share\\myfile.txt and \\\\some\\share > or > \\some\share\myfile.txt and \\some\share > no? Ah, right. I'll correct it. Thanks Antoine. -- Software development and contracting: http://pro.pitrou.net From ethan at stoneleaf.us Fri Oct 5 21:44:07 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 12:44:07 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: <506F3887.4020402@stoneleaf.us> Paul Moore wrote: > On 5 October 2012 19:25, Antoine Pitrou wrote: >> A path can be joined with another using the ``__getitem__`` operator:: >> >> >>> p = PurePosixPath('foo') >> >>> p['bar'] >> PurePosixPath('foo/bar') >> >>> p[PurePosixPath('bar')] >> PurePosixPath('foo/bar') > > There is a risk that this is too "cute". However, it's probably better > than overloading the '/' operator, and you do need something short. I actually like using the '/' operator for this. My own path module uses it, and the resulting code is along the lines of: job = Path('c:/orders/38273') table = dbf.Table(job/'ABC12345') >> As with constructing, multiple path components can be specified at once:: >> >> >>> p['bar/xyzzy'] >> PurePosixPath('foo/bar/xyzzy') > > That's risky. Are you proposing always using '/' regardless of OS? Mine does; it also accepts `\\` on Windows machines. Personally, I don't care for the index notation Antoine is suggesting. ~Ethan~ From solipsis at pitrou.net Fri Oct 5 21:55:20 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 21:55:20 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: <20121005215520.19b63efe@pitrou.net> On Fri, 5 Oct 2012 20:19:12 +0100 Paul Moore wrote: > On 5 October 2012 19:25, Antoine Pitrou wrote: > > A path can be joined with another using the ``__getitem__`` operator:: > > > > >>> p = PurePosixPath('foo') > > >>> p['bar'] > > PurePosixPath('foo/bar') > > >>> p[PurePosixPath('bar')] > > PurePosixPath('foo/bar') > > There is a risk that this is too "cute". However, it's probably better > than overloading the '/' operator, and you do need something short. I think overloading '/' is ugly (dividing paths??). Someone else proposed overloading '+', which would be confusing since we need to be able to combine paths and regular strings, for ease of use. The point of using __getitem__ is that you get an error if you replace the Path object with a regular string by mistake: >>> PurePath('foo')['bar'] PurePosixPath('foo/bar') >>> 'foo'['bar'] Traceback (most recent call last): File "", line 1, in TypeError: string indices must be integers If you were to use the '+' operator instead, 'foo' + 'bar' would work but give you the wrong result. > > As with constructing, multiple path components can be specified at once:: > > > > >>> p['bar/xyzzy'] > > PurePosixPath('foo/bar/xyzzy') > > That's risky. Are you proposing always using '/' regardless of OS? I'd > have expected os.sep (so \ on Windows). Both '/' and '\\' are accepted as path separators under Windows. Under Unix, '\\' is a regular character: >>> PurePosixPath('foo\\bar') == PurePosixPath('foo/bar') False >>> PureNTPath('foo\\bar') == PureNTPath('foo/bar') True > It would probably be better to allow tuples as arguments: > > p['bar', 'baz'] It already works indeed: >>> p = PurePath('foo') >>> p['bar', 'baz'] PurePosixPath('foo/bar/baz') > > Five simple properties are provided on every path (each can be empty):: > > > > >>> p = PureNTPath('c:/pathlib/setup.py') > > >>> p.drive > > 'c:' > > >>> p.root > > '\\' > > >>> p.anchor > > 'c:\\' > > >>> p.name > > 'setup.py' > > >>> p.ext > > '.py' > > I don't like the way the distinction between "root" and "anchor" works > here. Unix users are never going to use "anchor", as "root" is the > natural term, and it does exactly the right thing on Unix. So code > written on Unix will tend to do the wrong thing on Windows (where > generally you'd want to use "anchor" or you'll find yourself switching > accidentally to the current drive). Well, I expect .root or .anchor to be used mostly for presentation or debugging purposes. There's nothing really useful to be done with them otherwise, IMHO. Do you know of any use cases? > Also, there is no good terminology in current use here. The only > concrete thing I can suggest is that "root" would be better used as > the term for what you're calling "anchor" as Windows users would > expect the root of "C:\foo\bar\baz" to be "C:\". But then the root of "C:foo" would be "C:", which sounds wrong: "C:" isn't a root at all. > But there's no really simple answer - Windows and Unix are just different here. Yes, and Unix users are expecting something simpler than what's going on under Windows ;) > Also, I assume that paths will be comparable, using case sensitivity > appropriate to the platform. Presumably a PurePath and a Path are > comparable, too. What about a PosixPath and an NTPath? Would you > expect them to be comparable or not? Currently, different flavours imply unequal (and unorderable) paths: >>> PurePosixPath('foo') == PureNTPath('foo') False >>> PurePosixPath('foo') > PureNTPath('foo') Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: PurePosixPath() > PureNTPath() However, pure paths and concrete paths of the same flavour can be equal, and ordered: >>> PurePath('foo') == Path('foo') True >>> PurePath('foo') >= Path('foo') True Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From amcnabb at mcnabbs.org Fri Oct 5 21:53:27 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Fri, 5 Oct 2012 13:53:27 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> Message-ID: <20121005195327.GG8974@mcnabbs.org> On Fri, Oct 05, 2012 at 09:36:56PM +0200, Yuval Greenfield wrote: > > Though I don't understand why not to overload the "/" or "+" operators. > Sounds more elegant than square brackets. Just make sure the op fails on > anything other than Path objects. Path concatenation is obviously not a form of division, so it makes little sense to use the division operator for this purpose. I always wonder why the designers of C++ felt that it made sense to perform output by left-bitshifting the output stream by a string: std::cout << "hello, world"; Fortunately, operator overloading in Python is generally limited to cases where the operator's meaning is preserved (with the unfortunate exception of the % operator for strings). -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From ethan at stoneleaf.us Fri Oct 5 22:06:57 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 13:06:57 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005215520.19b63efe@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> Message-ID: <506F3DE1.7010802@stoneleaf.us> Antoine Pitrou wrote: > On Fri, 5 Oct 2012 20:19:12 +0100 > Paul Moore wrote: >> On 5 October 2012 19:25, Antoine Pitrou wrote: >>> A path can be joined with another using the ``__getitem__`` operator:: >>> >>> >>> p = PurePosixPath('foo') >>> >>> p['bar'] >>> PurePosixPath('foo/bar') >>> >>> p[PurePosixPath('bar')] >>> PurePosixPath('foo/bar') >> There is a risk that this is too "cute". However, it's probably better >> than overloading the '/' operator, and you do need something short. > > I think overloading '/' is ugly (dividing paths??). But '/' is the normal path separator, so it's not dividing; and it certainly makes more sense than `%` with string interpolations. ;) > Someone else proposed overloading '+', which would be confusing since we > need to be able to combine paths and regular strings, for ease of use. > The point of using __getitem__ is that you get an error if you replace > the Path object with a regular string by mistake: > >>>> PurePath('foo')['bar'] > PurePosixPath('foo/bar') >>>> 'foo'['bar'] > Traceback (most recent call last): > File "", line 1, in > TypeError: string indices must be integers > > If you were to use the '+' operator instead, 'foo' + 'bar' would work > but give you the wrong result. I would rather use the `/` and `+` and risk the occasional wrong result. (And yes, I have spent time tracking bugs because of that wrong result when using my own Path module -- and I'd still rather make that trade-off.) ~Ethan~ From solipsis at pitrou.net Fri Oct 5 22:09:54 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 5 Oct 2012 22:09:54 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <20121005220954.6be30804@pitrou.net> On Fri, 5 Oct 2012 11:51:21 -0700 Josiah Carlson wrote: > > My long-term dream (which has been the case for 6+ years, since I > proposed doing it myself on the python-dev mailing list and was told > "no") is that whether someone uses urllib2, httplib2, smtpd, requests, > ftplib, etc., they all have access to high-quality protocol-level > protocol parsers. I'm not sure what you're talking about: what were you told "no" about, specifically? Your proposal sounds reasonable and (ideally) desirable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From andrew.svetlov at gmail.com Fri Oct 5 22:59:24 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 5 Oct 2012 23:59:24 +0300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506F3DE1.7010802@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <506F3DE1.7010802@stoneleaf.us> Message-ID: +1 in general. I like to have library like that in the battery. I would to see the note why [] used instead / or + in the pep while I'm agree with that. +0 for / -1 for + For method/property decision I guess (maybe stupid) rule: properties for simple accessors and methods for operations which require os calls. With exception for parents() as method which returns generator. On Fri, Oct 5, 2012 at 11:06 PM, Ethan Furman wrote: > Antoine Pitrou wrote: >> >> On Fri, 5 Oct 2012 20:19:12 +0100 >> Paul Moore wrote: >>> >>> On 5 October 2012 19:25, Antoine Pitrou wrote: >>>> >>>> A path can be joined with another using the ``__getitem__`` operator:: >>>> >>>> >>> p = PurePosixPath('foo') >>>> >>> p['bar'] >>>> PurePosixPath('foo/bar') >>>> >>> p[PurePosixPath('bar')] >>>> PurePosixPath('foo/bar') >>> >>> There is a risk that this is too "cute". However, it's probably better >>> than overloading the '/' operator, and you do need something short. >> >> >> I think overloading '/' is ugly (dividing paths??). > > > But '/' is the normal path separator, so it's not dividing; and it certainly > makes more sense than `%` with string interpolations. ;) > > > >> Someone else proposed overloading '+', which would be confusing since we >> need to be able to combine paths and regular strings, for ease of use. >> The point of using __getitem__ is that you get an error if you replace >> the Path object with a regular string by mistake: >> >>>>> PurePath('foo')['bar'] >> >> PurePosixPath('foo/bar') >>>>> >>>>> 'foo'['bar'] >> >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: string indices must be integers >> >> If you were to use the '+' operator instead, 'foo' + 'bar' would work >> but give you the wrong result. > > > I would rather use the `/` and `+` and risk the occasional wrong result. > (And yes, I have spent time tracking bugs because of that wrong result when > using my own Path module -- and I'd still rather make that trade-off.) > > ~Ethan~ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Thanks, Andrew Svetlov From ethan at stoneleaf.us Fri Oct 5 23:38:57 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 14:38:57 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <506F5371.6080302@stoneleaf.us> Antoine Pitrou wrote: > Extraneous path separators and ``"."`` components are eliminated:: > > >>> PurePath('a///b/c/./d/') > PurePosixPath('a/b/c/d') I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error? > The ``parent()`` method returns an ancestor of the path:: > > >>> p.parent() > PureNTPath('c:\\python33\\bin') > >>> p.parent(2) > PureNTPath('c:\\python33') > >>> p.parent(3) > PureNTPath('c:\\') > > The ``parents()`` method automates repeated invocations of ``parent()``, until > the anchor is reached:: > > >>> p = PureNTPath('c:/python33/bin/python.exe') > >>> for parent in p.parents(): parent > ... > PureNTPath('c:\\python33\\bin') > PureNTPath('c:\\python33') > PureNTPath('c:\\') What's the use-case for iterating through all the parent directories? Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), and I export it to .csv in the same folder; how would I transform the above PureNTPath's ext from 'dbf' to 'csv'? ~Ethan~ From phd at phdru.name Sat Oct 6 00:05:14 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 02:05:14 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506F5371.6080302@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> Message-ID: <20121005220514.GA27986@iskra.aviel.ru> On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman wrote: > Antoine Pitrou wrote: > >Extraneous path separators and ``"."`` components are eliminated:: > > > > >>> PurePath('a///b/c/./d/') > > PurePosixPath('a/b/c/d') > > I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error? Why? They aren't errors in the underlying OS. > > >>> p = PureNTPath('c:/python33/bin/python.exe') > > >>> for parent in p.parents(): parent > > ... > > PureNTPath('c:\\python33\\bin') > > PureNTPath('c:\\python33') > > PureNTPath('c:\\') > > What's the use-case for iterating through all the parent directories? for parent in p.parents(): if parent['.svn'].exists(): last_seen = parent continue else: print("The topmost directory of the project: %s" % last_seen) break Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ethan at stoneleaf.us Sat Oct 6 00:21:06 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 15:21:06 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005220514.GA27986@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121005220514.GA27986@iskra.aviel.ru> Message-ID: <506F5D52.4050301@stoneleaf.us> Oleg Broytman wrote: > On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman wrote: >> Antoine Pitrou wrote: >>> Extraneous path separators and ``"."`` components are eliminated:: >>> >>> >>> PurePath('a///b/c/./d/') >>> PurePosixPath('a/b/c/d') >> I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error? > > Why? They aren't errors in the underlying OS. They are on Windows (no comment on whether or not it qualifies as an OS ;). c:\temp>dir \\\\\temp The filename, directory name, or volume label syntax is incorrect. c:\temp>dir \\temp The filename, directory name, or volume label syntax is incorrect. Although I see it works fine in between path pieces: c:\temp\34400>dir \temp\\\34400 [snip listing] >> What's the use-case for iterating through all the parent directories? > > for parent in p.parents(): > if parent['.svn'].exists(): > last_seen = parent > continue > else: > print("The topmost directory of the project: %s" % last_seen) > break Cool, thanks. ~Ethan~ From steve at pearwood.info Sat Oct 6 00:41:05 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 06 Oct 2012 08:41:05 +1000 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005195327.GG8974@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> Message-ID: <506F6201.4000503@pearwood.info> On 06/10/12 05:53, Andrew McNabb wrote: > Path concatenation is obviously not a form of division, so it makes > little sense to use the division operator for this purpose. But / is not just a division operator. It is also used for: * alternatives: "tea and/or coffee, breakfast/lunch/dinner" * italic markup: "some apps use /slashes/ for italics" * instead of line breaks when quoting poetry * abbreviations such as n/a b/w c/o and even w/ (not applicable, between, care of, with) * date separator Since / is often (but not always) used as a path separator, using it as a path component join operator makes good sense. BTW, are there any supported platforms where the path separator or alternate path are not slash? There used to be Apple Mac OS using colons. -- Steven From grosser.meister.morti at gmx.net Sat Oct 6 00:47:28 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 06 Oct 2012 00:47:28 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506F5D52.4050301@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121005220514.GA27986@iskra.aviel.ru> <506F5D52.4050301@stoneleaf.us> Message-ID: <506F6380.3080508@gmx.net> On 10/06/2012 12:21 AM, Ethan Furman wrote: > Oleg Broytman wrote: >> On Fri, Oct 05, 2012 at 02:38:57PM -0700, Ethan Furman wrote: >>> Antoine Pitrou wrote: >>>> Extraneous path separators and ``"."`` components are eliminated:: >>>> >>>> >>> PurePath('a///b/c/./d/') >>>> PurePosixPath('a/b/c/d') >>> I'm all for eliminating extra '.'s, but shouldn't extra '/'s be an error? >> >> Why? They aren't errors in the underlying OS. > > They are on Windows (no comment on whether or not it qualifies as an OS ;). > > c:\temp>dir \\\\\temp > The filename, directory name, or volume label syntax is incorrect. > > c:\temp>dir \\temp > The filename, directory name, or volume label syntax is incorrect. > > Although I see it works fine in between path pieces: > > c:\temp\34400>dir \temp\\\34400 > [snip listing] > \\ at the start of a path has a special meaning under windows: http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention > >>> What's the use-case for iterating through all the parent directories? >> >> for parent in p.parents(): >> if parent['.svn'].exists(): >> last_seen = parent >> continue >> else: >> print("The topmost directory of the project: %s" % last_seen) >> break > > Cool, thanks. > > ~Ethan~ From solipsis at pitrou.net Sat Oct 6 01:16:09 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 01:16:09 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121005220514.GA27986@iskra.aviel.ru> <506F5D52.4050301@stoneleaf.us> <506F6380.3080508@gmx.net> Message-ID: <20121006011609.2ab81f67@pitrou.net> On Sat, 06 Oct 2012 00:47:28 +0200 Mathias Panzenb?ck wrote: > > \\ at the start of a path has a special meaning under windows: > http://en.wikipedia.org/wiki/UNC_path#Uniform_Naming_Convention And indeed the API preserves them: >>> PurePosixPath('//some/path') PurePosixPath('/some/path') >>> PureNTPath('//some/path') PureNTPath('\\\\some\\path\\') Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sat Oct 6 01:48:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 01:48:23 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> Message-ID: <20121006014823.1fc46741@pitrou.net> On Fri, 05 Oct 2012 14:38:57 -0700 Ethan Furman wrote: > > Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), > and I export it to .csv in the same folder; how would I transform the > above PureNTPath's ext from 'dbf' to 'csv'? Something like: >>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>> p.parent()[p.name.split('.')[0] + '.csv'] PureNTPath('c:\\orders\\12345\\abc67890.csv') Any suggestion to ease this use case a bit? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From amcnabb at mcnabbs.org Sat Oct 6 01:54:57 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Fri, 5 Oct 2012 18:54:57 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506F6201.4000503@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> Message-ID: <20121005235457.GA7755@mcnabbs.org> On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote: > On 06/10/12 05:53, Andrew McNabb wrote: > > >Path concatenation is obviously not a form of division, so it makes > >little sense to use the division operator for this purpose. > > But / is not just a division operator. It is also used for: > > * alternatives: "tea and/or coffee, breakfast/lunch/dinner" > * italic markup: "some apps use /slashes/ for italics" > * instead of line breaks when quoting poetry > * abbreviations such as n/a b/w c/o and even w/ (not applicable, > between, care of, with) > * date separator This is the difference between C++ style operators, where the only thing that matters is what the operator symbol looks like, and Python style operators, where an operator symbol is just syntactic sugar. In Python, the "/" is synonymous with `operator.div` and is defined in terms of the `__div__` special method. This distinction is why I hate operator overloading in C++ but like it in Python. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From shibturn at gmail.com Sat Oct 6 02:27:49 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 06 Oct 2012 01:27:49 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006014823.1fc46741@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: On 06/10/2012 12:48am, Antoine Pitrou wrote: >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>>> >>>p.parent()[p.name.split('.')[0] + '.csv'] > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > Any suggestion to ease this use case a bit? Maybe p.basename could be shorthand for p.name.split('.')[0]. Richard From greg.ewing at canterbury.ac.nz Sat Oct 6 02:37:26 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 06 Oct 2012 13:37:26 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005215520.19b63efe@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> Message-ID: <506F7D46.9070309@canterbury.ac.nz> Antoine Pitrou wrote: > Well, I expect .root or .anchor to be used mostly for presentation or > debugging purposes. I'm having trouble thinking of *any* use cases, even for presentation or debugging. Maybe they should be dropped altogether until someone comes up with a use case. -- Greg From solipsis at pitrou.net Sat Oct 6 02:38:14 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 02:38:14 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: <20121006023814.36f7e22f@pitrou.net> On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk wrote: > On 06/10/2012 12:48am, Antoine Pitrou wrote: > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > >>>> >>>p.parent()[p.name.split('.')[0] + '.csv'] > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > > > Any suggestion to ease this use case a bit? > > Maybe p.basename could be shorthand for p.name.split('.')[0]. Wouldn't there be some confusion with os.path.basename: > > > Richard -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sat Oct 6 02:39:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 02:39:23 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: <20121006023923.62545731@pitrou.net> On Sat, 06 Oct 2012 01:27:49 +0100 Richard Oudkerk wrote: > On 06/10/2012 12:48am, Antoine Pitrou wrote: > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > >>>> >>>p.parent()[p.name.split('.')[0] + '.csv'] > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > > > Any suggestion to ease this use case a bit? > > Maybe p.basename could be shorthand for p.name.split('.')[0]. Wouldn't there be some confusion with os.path.basename: >>> os.path.basename('a/b/c.ext') 'c.ext' (sorry for the earlier, unfinished reply) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From greg.ewing at canterbury.ac.nz Sat Oct 6 02:54:21 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 06 Oct 2012 13:54:21 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005235457.GA7755@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> Message-ID: <506F813D.2050305@canterbury.ac.nz> Andrew McNabb wrote: > This is the difference between C++ style operators, where the only thing > that matters is what the operator symbol looks like, and Python style > operators, where an operator symbol is just syntactic sugar. In Python, > the "/" is synonymous with `operator.div` and is defined in terms of the > `__div__` special method. This distinction is why I hate operator > overloading in C++ but like it in Python. Not sure what you're saying here -- in both languages, operators are no more than syntactic sugar for dispatching to an appropriate method or function. Python just avoids introducing a special syntax for spelling the name of the operator, which is nice, but it's not a huge difference. The same issues of what you *should* use operators for arises in both communities, and it seems to be very much a matter of personal taste. (The use of << for output in C++ has never bothered me, BTW. There are plenty of problems with the way I/O is done in C++, but the use of << is the least of them, IMO...) -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 6 03:05:41 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 06 Oct 2012 14:05:41 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: <506F83E5.70705@canterbury.ac.nz> How about making a path object behave like a sequence of pathname components? Then * You can iterate over it directly instead of needing .parents() * p[:-1] gives you the dirname * p[-1] gives you the os.path.basename -- Greg From massimo.dipierro at gmail.com Sat Oct 6 04:41:17 2012 From: massimo.dipierro at gmail.com (massimo.dipierro at gmail.com) Date: Fri, 5 Oct 2012 19:41:17 -0700 (PDT) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths Message-ID: <347403772.2436.1349491350702.JavaMail.seven@ap8.p0.sjc.7sys.net> An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sat Oct 6 06:57:48 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 5 Oct 2012 22:57:48 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005215520.19b63efe@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> Message-ID: On Fri, Oct 5, 2012 at 1:55 PM, Antoine Pitrou wrote: > I think overloading '/' is ugly (dividing paths??). Agreed. +1 on the proposed API in this regard. It's pretty easy to grok. I also like that item access here mirrors how paths are treated as sequences/iterables in other parts of the API. It wouldn't surprise me if the join syntax is the most contentious part of the proposal. ;) -eric From ericsnowcurrently at gmail.com Sat Oct 6 07:16:55 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 5 Oct 2012 23:16:55 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006014823.1fc46741@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou wrote: > On Fri, 05 Oct 2012 14:38:57 -0700 > Ethan Furman wrote: >> >> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), >> and I export it to .csv in the same folder; how would I transform the >> above PureNTPath's ext from 'dbf' to 'csv'? > > Something like: > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>>> p.parent()[p.name.split('.')[0] + '.csv'] > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > Any suggestion to ease this use case a bit? Each namedtuple has a _replace() method that's is used to generate a new instance with one or more attributes changed. We could do something similar here: >>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>> p.replace(ext='.csv') PureNTPath('c:\\orders\\12345\\abc67890.csv') -eric From ethan at stoneleaf.us Sat Oct 6 07:36:49 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 22:36:49 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005235457.GA7755@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> Message-ID: <506FC371.8040703@stoneleaf.us> Andrew McNabb wrote: > On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote: >> On 06/10/12 05:53, Andrew McNabb wrote: >> >>> Path concatenation is obviously not a form of division, so it makes >>> little sense to use the division operator for this purpose. >> But / is not just a division operator. It is also used for: >> >> * alternatives: "tea and/or coffee, breakfast/lunch/dinner" >> * italic markup: "some apps use /slashes/ for italics" >> * instead of line breaks when quoting poetry >> * abbreviations such as n/a b/w c/o and even w/ (not applicable, >> between, care of, with) >> * date separator > > This is the difference between C++ style operators, where the only thing > that matters is what the operator symbol looks like, and Python style > operators, where an operator symbol is just syntactic sugar. In Python, > the "/" is synonymous with `operator.div` and is defined in terms of the > `__div__` special method. This distinction is why I hate operator > overloading in C++ but like it in Python. '/' is just a symbol. One common interpretation is as division, but that is not its only purpose. It's not even one of the first two symbols I learned for division when I was younger. ~Ethan~ From ethan at stoneleaf.us Sat Oct 6 07:42:00 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Oct 2012 22:42:00 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: <506FC4A8.9060009@stoneleaf.us> Eric Snow wrote: > On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou wrote: >> On Fri, 05 Oct 2012 14:38:57 -0700 >> Ethan Furman wrote: >>> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), >>> and I export it to .csv in the same folder; how would I transform the >>> above PureNTPath's ext from 'dbf' to 'csv'? >> Something like: >> >>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>>>> p.parent()[p.name.split('.')[0] + '.csv'] >> PureNTPath('c:\\orders\\12345\\abc67890.csv') >> >> Any suggestion to ease this use case a bit? > > Each namedtuple has a _replace() method that's is used to generate a > new instance with one or more attributes changed. We could do > something similar here: > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>>> p.replace(ext='.csv') > PureNTPath('c:\\orders\\12345\\abc67890.csv') +1 From turnbull at sk.tsukuba.ac.jp Sat Oct 6 10:00:31 2012 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 06 Oct 2012 17:00:31 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006023923.62545731@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006023923.62545731@pitrou.net> Message-ID: <87fw5sqbc0.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > On Sat, 06 Oct 2012 01:27:49 +0100 > Richard Oudkerk > wrote: > > > On 06/10/2012 12:48am, Antoine Pitrou wrote: > > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > > >>>> >>>p.parent()[p.name.split('.')[0] + '.csv'] > > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > > > > > Any suggestion to ease this use case a bit? > > > > Maybe p.basename could be shorthand for p.name.split('.')[0]. > > Wouldn't there be some confusion with os.path.basename: > > >>> os.path.basename('a/b/c.ext') > 'c.ext' Not to mention standard Unix usage. GNU basename will allow you to specify a *particular* extension explicitly, which will be stripped if present and otherwise ignored. Eg, "basename a/b/c.ext ext" => "c." (note the period!) and "basename a/b/c ext" => "c". I don't know if that's an extension to POSIX. In any case, it would require basename to be a method rather than a property. > (sorry for the earlier, unfinished reply) Also there are applications where "basenames" contain periods (eg, wget often creates directories with names like "www.python.org"), and filenames may have multiple extensions, eg, "index.ja.html". I think it's reasonable to define "extension" to mean "the portion after the last period (if any, maybe including the period), but I think usage of the complementary concept is pretty application- specific. From stephen at xemacs.org Sat Oct 6 10:04:44 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Oct 2012 17:04:44 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506FC4A8.9060009@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <506FC4A8.9060009@stoneleaf.us> Message-ID: <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Eric Snow wrote: > > On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou wrote: > >> On Fri, 05 Oct 2012 14:38:57 -0700 > >> Ethan Furman wrote: > >>> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), > >>> and I export it to .csv in the same folder; how would I transform the > >>> above PureNTPath's ext from 'dbf' to 'csv'? > >> Something like: > >> > >>>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > >>>>> p.parent()[p.name.split('.')[0] + '.csv'] > >> PureNTPath('c:\\orders\\12345\\abc67890.csv') > >> > >> Any suggestion to ease this use case a bit? > > > > Each namedtuple has a _replace() method that's is used to generate a > > new instance with one or more attributes changed. We could do > > something similar here: > > > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > >>>> p.replace(ext='.csv') > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > +1 How about a more general subst() method? Indeed, it would need keyword arguments for named components like ext, but I often do things like "mv ~/Maildir/{tmp,new}/42" in the shell. I think it would be useful to be able to replace any component of a path. From turnbull at sk.tsukuba.ac.jp Sat Oct 6 10:39:13 2012 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Sat, 06 Oct 2012 17:39:13 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005215520.19b63efe@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> Message-ID: <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > On Fri, 5 Oct 2012 20:19:12 +0100 > Paul Moore wrote: > > On 5 October 2012 19:25, Antoine Pitrou wrote: > > > A path can be joined with another using the ``__getitem__`` operator:: > > > > > > >>> p = PurePosixPath('foo') > > > >>> p['bar'] > > > PurePosixPath('foo/bar') > > > >>> p[PurePosixPath('bar')] > > > PurePosixPath('foo/bar') > > > > There is a risk that this is too "cute". However, it's probably better > > than overloading the '/' operator, and you do need something > > short. I didn't like this much at first. However, if you think of this as a "collection" (cf. WebDAV), then the bracket notation is the obvious way to do it in Python (FVO "it" == "accessing a member of a collection by name"). I wonder if there is a need to distinguish between a path naming a directory as a collection, and as a file itself? Or can/should this be implicit (wash my mouth out with soap!) in the operation using the Path? > Someone else proposed overloading '+', which would be confusing > since we need to be able to combine paths and regular strings, for > ease of use. Is it really that obnoxious to write "p + Path('bar')" (where p is a Path)? What about the case "'bar' + p"? Since Python isn't C, you can't express that as "'bar'[p]"! > The point of using __getitem__ is that you get an error if you replace > the Path object with a regular string by mistake: > > > > As with constructing, multiple path components can be specified at once:: > > > > > > >>> p['bar/xyzzy'] > > > PurePosixPath('foo/bar/xyzzy') > > > > That's risky. Are you proposing always using '/' regardless of OS? I'd > > have expected os.sep (so \ on Windows). > > Both '/' and '\\' are accepted as path separators under Windows. Under > Unix, '\\' is a regular character: That's outright ugly, especially from the "collections" point of view (foo/bar/xyzzy is not a member of foo). If you want something that doesn't suffer from the bogosities of os.path, this kind of platform- dependence should be avoided, I think. > > Also, there is no good terminology in current use here. The only > > concrete thing I can suggest is that "root" would be better used as > > the term for what you're calling "anchor" as Windows users would > > expect the root of "C:\foo\bar\baz" to be "C:\". > > But then the root of "C:foo" would be "C:", which sounds wrong: > "C:" isn't a root at all. Why not interpret the root of "C:foo" to be None? The Windows user can still get "C:" as the drive, and I don't think that will be surprising to them. > > But there's no really simple answer - Windows and Unix are just > > different here. > > Yes, and Unix users are expecting something simpler than what's going on > under Windows ;) Well, Unix users can do things more uniformly. But there's also a lot of complexity going on under the hood. Every file system has a root, of which only one is named "/". I don't know if Python programs ever need that information (I never have :-), but it would be nice to leave room for extension. Similarly, many "file systems" are actually just hierarchically organized database access methods with no physical existence on hardware. I wonder if "mount_point" is sufficiently general to include the roots of real local file systems, remote file systems, Windows drives, and pseudo file systems? An obvious problem is that Windows users would not find that terminology natural. From stephen at xemacs.org Sat Oct 6 12:09:05 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Oct 2012 19:09:05 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > ``relative()`` returns a new relative path by stripping the drive and root:: Does this have use cases so common that it deserves a convenience method? I would expect "relative" to require an argument. (Ie, I would expect it to have the semantics of "relative_to".) Or is the issue that you can't count on PureNTPath(p).relative_to('C:\\') to DTRT? Maybe the From p.f.moore at gmail.com Sat Oct 6 12:24:01 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 6 Oct 2012 11:24:01 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6 October 2012 09:39, Stephen J. Turnbull wrote: > I wonder if "mount_point" is sufficiently general to include the roots > of real local file systems, remote file systems, Windows drives, and > pseudo file systems? An obvious problem is that Windows users would > not find that terminology natural. Technically, newer versions of Windows (Vista and later, I think) allow you to mount a drive on a directory rather than a drive letter, just like Unix. Although I'm not sure I've ever seen it done, and I don't know if there are suitable API calls to determine if a directory is a mount point (I guess there must be). An ugly, but viable, approach would be to have drive and mount_point properties, which are synonyms. Paul. From p.f.moore at gmail.com Sat Oct 6 12:27:58 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 6 Oct 2012 11:27:58 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 6 October 2012 11:09, Stephen J. Turnbull wrote: > Antoine Pitrou writes: > > > ``relative()`` returns a new relative path by stripping the drive and root:: > > Does this have use cases so common that it deserves a convenience > method? Agreed. > I would expect "relative" to require an argument. (Ie, I > would expect it to have the semantics of "relative_to".) I agree that's what I thought relative() would be when I first read the name. > Or is the > issue that you can't count on PureNTPath(p).relative_to('C:\\') to > DTRT? It seems to me that if p isn't on drive C:, then the right thing is clearly to raise an exception. No ambiguity there - although Unix users might well write code that doesn't allow for exceptions from the method, just because it's not a possible result on Unix. Having it documented might help raise awareness of the possibility, though. And that's about the best you can hope for. Paul. From mark at hotpy.org Sat Oct 6 12:49:35 2012 From: mark at hotpy.org (Mark Shannon) Date: Sat, 06 Oct 2012 11:49:35 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <50700CBF.8060109@hotpy.org> Just to add my 2p's worth. On 05/10/12 19:25, Antoine Pitrou wrote: > > Hello, > > This PEP is a resurrection of the idea of having object-oriented > filesystem paths in the stdlib. It comes with a general API proposal > as well as a specific implementation (*). The implementation is young > and discussion is quite open. > > (*) http://pypi.python.org/pypi/pathlib/ > > Regards > > Antoine. > > PS: You can all admire my ASCII-art skills. > In general I like it. > > Class hierarchy > --------------- Lovely ASCII art work :) but it does have have the n*m problem of such hierarchies. N types of file: file, directory, mount-point, drive, root, etc, etc and M implementations Posix, NT, linux, OSX, network, database, etc, etc I would prefer duck-typing. Add ABCs for all the N types of file and use concrete classes for the actual filesystems That way there are N+M rather than N*M classes. Although I'm generally against operator overloading, would the // operator be better than the // operator as it is more rarely used and more visually distinctive? Cheers, Mark. From solipsis at pitrou.net Sat Oct 6 14:06:52 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:06:52 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121006140652.630794f4@pitrou.net> On Sat, 06 Oct 2012 17:39:13 +0900 "Stephen J. Turnbull" wrote: > > I wonder if there is a need to distinguish between a path naming a > directory as a collection, and as a file itself? Or can/should this > be implicit (wash my mouth out with soap!) in the operation using the > Path? I don't think there's a need to distinguish. Trying to access /etc/passwd/somefile will simply raise an error on I/O. > > Someone else proposed overloading '+', which would be confusing > > since we need to be able to combine paths and regular strings, for > > ease of use. > > Is it really that obnoxious to write "p + Path('bar')" (where p is a > Path)? > > What about the case "'bar' + p"? Since Python isn't C, you can't > express that as "'bar'[p]"! The issue I envision is if you write `p + "bar"`, thinking p is a Path, and p is actually a str object. It won't raise, but give you the wrong result. > > Both '/' and '\\' are accepted as path separators under Windows. Under > > Unix, '\\' is a regular character: > > That's outright ugly, especially from the "collections" point of view > (foo/bar/xyzzy is not a member of foo). If you want something that > doesn't suffer from the bogosities of os.path, this kind of platform- > dependence should be avoided, I think. Well, you do want to be able to convert str paths to Path objects without handling path separator conversion by hand. It's a matter of practicality. > > > Also, there is no good terminology in current use here. The only > > > concrete thing I can suggest is that "root" would be better used as > > > the term for what you're calling "anchor" as Windows users would > > > expect the root of "C:\foo\bar\baz" to be "C:\". > > > > But then the root of "C:foo" would be "C:", which sounds wrong: > > "C:" isn't a root at all. > > Why not interpret the root of "C:foo" to be None? The Windows user > can still get "C:" as the drive, and I don't think that will be > surprising to them. That's a possibility indeed. I'd like to have feedback from more Windows users about your suggestion: >>> PureNTPath('c:foo').root '' >>> PureNTPath('c:\\foo').root 'c:\\' which would also give the following for UNC paths: >>> PureNTPath('//network/share/foo/bar').root '\\\\network\\share\\' > I wonder if "mount_point" is sufficiently general to include the roots > of real local file systems, remote file systems, Windows drives, and > pseudo file systems? An obvious problem is that Windows users would > not find that terminology natural. Another is that finding mount points is I/O, while finding the root is a purely lexical operation. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sat Oct 6 14:09:24 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:09:24 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> Message-ID: <20121006140924.3bfdb710@pitrou.net> On Fri, 5 Oct 2012 23:16:55 -0600 Eric Snow wrote: > On Fri, Oct 5, 2012 at 5:48 PM, Antoine Pitrou wrote: > > On Fri, 05 Oct 2012 14:38:57 -0700 > > Ethan Furman wrote: > >> > >> Say I have a .dbf table as PureNTPath('c:\orders\12345\abc67890.dbf'), > >> and I export it to .csv in the same folder; how would I transform the > >> above PureNTPath's ext from 'dbf' to 'csv'? > > > > Something like: > > > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > >>>> p.parent()[p.name.split('.')[0] + '.csv'] > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > > > Any suggestion to ease this use case a bit? > > Each namedtuple has a _replace() method that's is used to generate a > new instance with one or more attributes changed. We could do > something similar here: The concrete Path objects' replace() method already maps to os.replace(). Note os.replace() is new in 3.3 and is a portable always-overwriting alternative to os.rename(): http://docs.python.org/dev/library/os.html#os.replace Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sat Oct 6 14:18:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:18:58 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121006141858.73b42c38@pitrou.net> On Sat, 6 Oct 2012 11:27:58 +0100 Paul Moore wrote: > > > I would expect "relative" to require an argument. (Ie, I > > would expect it to have the semantics of "relative_to".) > > I agree that's what I thought relative() would be when I first read the name. You are right, relative() could be removed and replaced with the current relative_to() method. I wasn't sure about how these names would feel to a native English speaker. > > Or is the > > issue that you can't count on PureNTPath(p).relative_to('C:\\') to > > DTRT? > > It seems to me that if p isn't on drive C:, then the right thing is > clearly to raise an exception. Indeed: >>> PureNTPath('/foo').relative_to('c:/foo') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 894, in relative_to .format(str(self), str(formatted))) ValueError: '\\foo' does not start with 'c:\\foo' > No ambiguity there - although Unix > users might well write code that doesn't allow for exceptions from the > method, just because it's not a possible result on Unix. Actually, it can raise too: >>> PurePosixPath('/usr').relative_to('/usr/lib') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 894, in relative_to .format(str(self), str(formatted))) ValueError: '/usr' does not start with '/usr/lib' You can't really add '..' components and expect the result to be correct, for example if '/usr/lib' is a symlink to '/lib', then '/usr/lib/..' is '/', not /usr'. That's why the resolve() method, which resolves symlinks along the path, is the only one allowed to muck with '..' components. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sat Oct 6 14:25:29 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:25:29 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <50700CBF.8060109@hotpy.org> Message-ID: <20121006142529.7ea5c8b6@pitrou.net> Hello Mark, On Sat, 06 Oct 2012 11:49:35 +0100 Mark Shannon wrote: > > > > Class hierarchy > > --------------- > > Lovely ASCII art work :) > but it does have have the n*m problem of such hierarchies. > N types of file: > file, directory, mount-point, drive, root, etc, etc > and M implementations > Posix, NT, linux, OSX, network, database, etc, etc There is no distinction per "type of file": files, directories, etc. all share the same implementation. So you only have a per-flavour distinction (Posix / NT). > I would prefer duck-typing. > Add ABCs for all the N types of file and use concrete classes for the > actual filesystems It seems to me that "duck typing" and "ABCs" are mutually exclusive, kind of :) > Although I'm generally against operator overloading, would the // > operator be better than the // operator as it is more rarely used and > more visually distinctive? You mean "would the / operator be better than the [] operator"? I didn't choose / at first because I knew this choice would be quite contentious. However, if there happens to be a strong majority in its favour, why not. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From phd at phdru.name Sat Oct 6 14:26:42 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 16:26:42 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006140924.3bfdb710@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> Message-ID: <20121006122642.GA15492@iskra.aviel.ru> On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou wrote: > On Fri, 5 Oct 2012 23:16:55 -0600 > Eric Snow > wrote: > > Each namedtuple has a _replace() method that's is used to generate a > > new instance with one or more attributes changed. We could do > > something similar here: > > The concrete Path objects' replace() method already maps to > os.replace(). Call it "with": newpath = path.with_drive('C:') newpath = path.with_name('newname') newpath = path.with_ext('.zip') Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Sat Oct 6 14:40:49 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 16:40:49 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006122642.GA15492@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> Message-ID: <20121006124049.GC16843@iskra.aviel.ru> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: > newpath = path.with_drive('C:') > newpath = path.with_name('newname') > newpath = path.with_ext('.zip') BTW, I think having these three -- replacing drive, name and extension -- is enough. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Sat Oct 6 14:46:35 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:46:35 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> Message-ID: <20121006144635.34f21f84@pitrou.net> On Sat, 6 Oct 2012 16:40:49 +0400 Oleg Broytman wrote: > On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: > > newpath = path.with_drive('C:') > > newpath = path.with_name('newname') > > newpath = path.with_ext('.zip') > > BTW, I think having these three -- replacing drive, name and extension -- > is enough. What is the point of replacing the drive? Replacing the name is already trivial: path.parent()[newname] So we only need to replace the "basename" and the extension (I think I'm ok with the "basename" terminology now :-)). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From g.brandl at gmx.net Sat Oct 6 14:55:16 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 14:55:16 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006144635.34f21f84@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> Message-ID: Am 06.10.2012 14:46, schrieb Antoine Pitrou: > On Sat, 6 Oct 2012 16:40:49 +0400 > Oleg Broytman wrote: >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: >> > newpath = path.with_drive('C:') >> > newpath = path.with_name('newname') >> > newpath = path.with_ext('.zip') >> >> BTW, I think having these three -- replacing drive, name and extension -- >> is enough. > > What is the point of replacing the drive? > > Replacing the name is already trivial: path.parent()[newname] > > So we only need to replace the "basename" and the extension (I think > I'm ok with the "basename" terminology now :-)). If my crystal ball is correct, the middle example above replaces not the basename but the "part before the extension". So we have to find another name for it ... Georg From phd at phdru.name Sat Oct 6 14:52:27 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 16:52:27 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006144635.34f21f84@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> Message-ID: <20121006125227.GA17128@iskra.aviel.ru> On Sat, Oct 06, 2012 at 02:46:35PM +0200, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 16:40:49 +0400 > Oleg Broytman wrote: > > On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: > > > newpath = path.with_drive('C:') > > > newpath = path.with_name('newname') > > > newpath = path.with_ext('.zip') > > > > BTW, I think having these three -- replacing drive, name and extension -- > > is enough. > > What is the point of replacing the drive? > > Replacing the name is already trivial: path.parent()[newname] > > So we only need to replace the "basename" and the extension (I think > I'm ok with the "basename" terminology now :-)). I'm ok with that. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Sat Oct 6 14:57:44 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 14:57:44 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> Message-ID: <20121006145744.19e3789c@pitrou.net> On Sat, 06 Oct 2012 14:55:16 +0200 Georg Brandl wrote: > Am 06.10.2012 14:46, schrieb Antoine Pitrou: > > On Sat, 6 Oct 2012 16:40:49 +0400 > > Oleg Broytman wrote: > >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: > >> > newpath = path.with_drive('C:') > >> > newpath = path.with_name('newname') > >> > newpath = path.with_ext('.zip') > >> > >> BTW, I think having these three -- replacing drive, name and extension -- > >> is enough. > > > > What is the point of replacing the drive? > > > > Replacing the name is already trivial: path.parent()[newname] > > > > So we only need to replace the "basename" and the extension (I think > > I'm ok with the "basename" terminology now :-)). > > If my crystal ball is correct, the middle example above replaces not the > basename but the "part before the extension". So we have to find another > name for it ... Well, "basename" is the name proposed for the "part before the extension". "name" is the full filename. (so path.name == path.basename + path.ext) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From g.brandl at gmx.net Sat Oct 6 15:08:27 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 15:08:27 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006145744.19e3789c@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> Message-ID: Am 06.10.2012 14:57, schrieb Antoine Pitrou: > On Sat, 06 Oct 2012 14:55:16 +0200 > Georg Brandl wrote: >> Am 06.10.2012 14:46, schrieb Antoine Pitrou: >> > On Sat, 6 Oct 2012 16:40:49 +0400 >> > Oleg Broytman wrote: >> >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: >> >> > newpath = path.with_drive('C:') >> >> > newpath = path.with_name('newname') >> >> > newpath = path.with_ext('.zip') >> >> >> >> BTW, I think having these three -- replacing drive, name and extension -- >> >> is enough. >> > >> > What is the point of replacing the drive? >> > >> > Replacing the name is already trivial: path.parent()[newname] >> > >> > So we only need to replace the "basename" and the extension (I think >> > I'm ok with the "basename" terminology now :-)). >> >> If my crystal ball is correct, the middle example above replaces not the >> basename but the "part before the extension". So we have to find another >> name for it ... > > Well, "basename" is the name proposed for the "part before the > extension". "name" is the full filename. > > (so path.name == path.basename + path.ext) Is it? You said yourself it was easily confused with os.path.basename()'s result. Georg From mark at hotpy.org Sat Oct 6 15:08:31 2012 From: mark at hotpy.org (Mark Shannon) Date: Sat, 06 Oct 2012 14:08:31 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006142529.7ea5c8b6@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <50700CBF.8060109@hotpy.org> <20121006142529.7ea5c8b6@pitrou.net> Message-ID: <50702D4F.7070202@hotpy.org> On 06/10/12 13:25, Antoine Pitrou wrote: > > Hello Mark, > > On Sat, 06 Oct 2012 11:49:35 +0100 > Mark Shannon wrote: >>> >>> Class hierarchy >>> --------------- >> >> Lovely ASCII art work :) >> but it does have have the n*m problem of such hierarchies. >> N types of file: >> file, directory, mount-point, drive, root, etc, etc >> and M implementations >> Posix, NT, linux, OSX, network, database, etc, etc > > There is no distinction per "type of file": files, directories, etc. > all share the same implementation. So you only have a per-flavour > distinction (Posix / NT). > >> I would prefer duck-typing. >> Add ABCs for all the N types of file and use concrete classes for the >> actual filesystems > > It seems to me that "duck typing" and "ABCs" are mutually exclusive, > kind of :) > >> Although I'm generally against operator overloading, would the // >> operator be better than the // operator as it is more rarely used and >> more visually distinctive? > > You mean "would the / operator be better than the [] operator"? Actually I did mean the '//' (floor division) operator as it would stand out more than '/'. It is just something for you to consider (in case you didn't have enough possibilities already :) ) > > I didn't choose / at first because I knew this choice would be quite > contentious. However, if there happens to be a strong majority in its > favour, why not. > > Regards > > Antoine. > > From solipsis at pitrou.net Sat Oct 6 15:42:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 15:42:28 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> Message-ID: <20121006154228.0b2a6087@pitrou.net> On Sat, 06 Oct 2012 15:08:27 +0200 Georg Brandl wrote: > > > > Well, "basename" is the name proposed for the "part before the > > extension". "name" is the full filename. > > > > (so path.name == path.basename + path.ext) > > Is it? You said yourself it was easily confused with os.path.basename()'s result. True, but since we already have the name attribute it stands reasonable for basename to mean something else than name :-) Do you have another suggestion? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ubershmekel at gmail.com Sat Oct 6 15:49:49 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 6 Oct 2012 15:49:49 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006154228.0b2a6087@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 3:42 PM, Antoine Pitrou wrote: > On Sat, 06 Oct 2012 15:08:27 +0200 > Georg Brandl wrote: > > > > > > Well, "basename" is the name proposed for the "part before the > > > extension". "name" is the full filename. > > > > > > (so path.name == path.basename + path.ext) > > > > Is it? You said yourself it was easily confused with > os.path.basename()'s result. > > True, but since we already have the name attribute it stands reasonable > for basename to mean something else than name :-) > Do you have another suggestion? > > It appears "base name" or "base" is the convention for the part before the extension. http://en.wikipedia.org/wiki/Filename Perhaps os.path.basename should be deprecated in favor of a better named function one day. But that's probably for a different thread. -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Sat Oct 6 16:01:16 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 18:01:16 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> Message-ID: <20121006140116.GA18535@iskra.aviel.ru> On Sat, Oct 06, 2012 at 03:49:49PM +0200, Yuval Greenfield wrote: > Perhaps os.path.basename should be deprecated in favor of a better named > function one day. But that's probably for a different thread. That's certainly for a different Python. os.path.basename cannot be renamed because: 1) it's used in millions of programs; 2) it's in line with GNU tools. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From g.brandl at gmx.net Sat Oct 6 16:47:06 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 16:47:06 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006154228.0b2a6087@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> Message-ID: Am 06.10.2012 15:42, schrieb Antoine Pitrou: > On Sat, 06 Oct 2012 15:08:27 +0200 > Georg Brandl wrote: >> > >> > Well, "basename" is the name proposed for the "part before the >> > extension". "name" is the full filename. >> > >> > (so path.name == path.basename + path.ext) >> >> Is it? You said yourself it was easily confused with os.path.basename()'s result. > > True, but since we already have the name attribute it stands reasonable > for basename to mean something else than name :-) > Do you have another suggestion? Not really. I'd prefer "base" or "namebase" though, to at least have a tiny bit of difference. Georg From stephen at xemacs.org Sat Oct 6 16:49:25 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 06 Oct 2012 23:49:25 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006140652.630794f4@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006140652.630794f4@pitrou.net> Message-ID: <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > > > Someone else proposed overloading '+', which would be confusing > > > since we need to be able to combine paths and regular strings, for > > > ease of use. > > > > Is it really that obnoxious to write "p + Path('bar')" (where p is a > > Path)? > > > > What about the case "'bar' + p"? Since Python isn't C, you can't > > express that as "'bar'[p]"! > > The issue I envision is if you write `p + "bar"`, thinking p is a Path, > and p is actually a str object. It won't raise, but give you the wrong > result. No, my point is that for me prepending new segments is quite common, though not as common as appending them. The asymmetry of the bracket operator means that there's no easy way to deal with that. On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is a Path, not a string) both seem reasonable to me. It's true that one could screw up as you suggest, but that requires *two* mistakes, first thinking that p is a Path when it's a string, and then forgetting to convert 'bar' to Path. I don't think that's very likely if you don't allow mixing strings and Paths without explicit conversion. > > > Both '/' and '\\' are accepted as path separators under Windows. Under > > > Unix, '\\' is a regular character: > > > > That's outright ugly, especially from the "collections" point of view > > (foo/bar/xyzzy is not a member of foo). If you want something that > > doesn't suffer from the bogosities of os.path, this kind of platform- > > dependence should be avoided, I think. > > Well, you do want to be able to convert str paths to Path objects > without handling path separator conversion by hand. It's a matter of > practicality. Sorry, cut too much context. I was referring to the use of path['foo/bar'] where path['foo', 'bar'] will do. Of course overloading the constructor is an obvious thing to do. From ironfroggy at gmail.com Sat Oct 6 18:14:40 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 6 Oct 2012 12:14:40 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: Responding late, but I didn't get a chance to get my very strong feelings on this proposal in yesterday. I do not like it. I'll give full disclosure and say that I think our earlier failure to include the path library in the stdlib has been a loss for Python and I'll always hope we can fix that one day. I still hold out hope. It feels like this proposal is "make it object oriented, because object oriented is good" without any actual justification or obvious problem this solves. The API looks clunky and redundant, and does not appear to actually improve anything over the facilities in the os.path module. This takes a lot of things we can already do with paths and files and remixes them into a not-so intuitive API for the sake of change, not for the sake of solving a real problem. As for specific problems I have with the proposal: Frankly, I think not keeping the / operator for joining is a huge mistake. This is the number one best feature of path and despite that many people don't like it, it makes sense. It makes our most common path operation read very close to the actual representation of the what you're creating. This is great. Not inheriting from str means that we can't directly path these path objects to existing code that just expects a string, so we have a really hard boundary around the edges of this new API. It does not lend itself well to incrementally transitioning to it from existing code. The stat operations and other file-facilities tacked on feel out of place, and limited. Why does it make sense to add these facilities to path and not other file operations? Why not give me a read method on paths? or maybe a copy? Putting lots of file facilities on a path object feels wrong because you can't extend it easily. This is one place that function(thing) works better than thing.function() Overall, I'm completely -1 on the whole thing. On Fri, Oct 5, 2012 at 2:25 PM, Antoine Pitrou wrote: > > Hello, > > This PEP is a resurrection of the idea of having object-oriented > filesystem paths in the stdlib. It comes with a general API proposal > as well as a specific implementation (*). The implementation is young > and discussion is quite open. > > (*) http://pypi.python.org/pypi/pathlib/ > > Regards > > Antoine. > > PS: You can all admire my ASCII-art skills. > > > > PEP: 428 > Title: The pathlib module -- object-oriented filesystem paths > Version: $Revision$ > Last-Modified: $Date > Author: Antoine Pitrou > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 30-July-2012 > Python-Version: 3.4 > Post-History: > > > Abstract > ======== > > This PEP proposes the inclusion of a third-party module, `pathlib`_, in > the standard library. The inclusion is proposed under the provisional > label, as described in :pep:`411`. Therefore, API changes can be done, > either as part of the PEP process, or after acceptance in the standard > library (and until the provisional label is removed). > > The aim of this library is to provide a simple hierarchy of classes to > handle filesystem paths and the common operations users do over them. > > .. _`pathlib`: http://pypi.python.org/pypi/pathlib/ > > > Related work > ============ > > An object-oriented API for filesystem paths has already been proposed > and rejected in :pep:`355`. Several third-party implementations of the > idea of object-oriented filesystem paths exist in the wild: > > * The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs > and others, which provides a ``str``-subclassing ``Path`` class; > > * Twisted's slightly specialized `FilePath class`_; > > * An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than > ``str``; > > * `Unipath`_, a variation on the str-subclassing approach with two public > classes, an ``AbstractPath`` class for operations which don't do I/O and a > ``Path`` class for all common operations. > > This proposal attempts to learn from these previous attempts and the > rejection of :pep:`355`. > > > .. _`path.py module`: https://github.com/jaraco/path.py > .. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html > .. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass > .. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview > > > Why an object-oriented API > ========================== > > The rationale to represent filesystem paths using dedicated classes is the > same as for other kinds of stateless objects, such as dates, times or IP > addresses. Python has been slowly moving away from strictly replicating > the C language's APIs to providing better, more helpful abstractions around > all kinds of common functionality. Even if this PEP isn't accepted, it is > likely that another form of filesystem handling abstraction will be adopted > one day into the standard library. > > Indeed, many people will prefer handling dates and times using the high-level > objects provided by the ``datetime`` module, rather than using numeric > timestamps and the ``time`` module API. Moreover, using a dedicated class > allows to enable desirable behaviours by default, for example the case > insensitivity of Windows paths. > > > Proposal > ======== > > Class hierarchy > --------------- > > The `pathlib`_ module implements a simple hierarchy of classes:: > > +----------+ > | | > ---------| PurePath |-------- > | | | | > | +----------+ | > | | | > | | | > v | v > +---------------+ | +------------+ > | | | | | > | PurePosixPath | | | PureNTPath | > | | | | | > +---------------+ | +------------+ > | v | > | +------+ | > | | | | > | -------| Path |------ | > | | | | | | > | | +------+ | | > | | | | > | | | | > v v v v > +-----------+ +--------+ > | | | | > | PosixPath | | NTPath | > | | | | > +-----------+ +--------+ > > > This hierarchy divides path classes along two dimensions: > > * a path class can be either pure or concrete: pure classes support only > operations that don't need to do any actual I/O, which are most path > manipulation operations; concrete classes support all the operations > of pure classes, plus operations that do I/O. > > * a path class is of a given flavour according to the kind of operating > system paths it represents. `pathlib`_ implements two flavours: NT paths > for the filesystem semantics embodied in Windows systems, POSIX paths for > other systems (``os.name``'s terminology is re-used here). > > Any pure class can be instantiated on any system: for example, you can > manipulate ``PurePosixPath`` objects under Windows, ``PureNTPath`` objects > under Unix, and so on. However, concrete classes can only be instantiated > on a matching system: indeed, it would be error-prone to start doing I/O > with ``NTPath`` objects under Unix, or vice-versa. > > Furthermore, there are two base classes which also act as system-dependent > factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a > ``PureNTPath`` depending on the operating system. Similarly, ``Path`` > will instantiate either a ``PosixPath`` or a ``NTPath``. > > It is expected that, in most uses, using the ``Path`` class is adequate, > which is why it has the shortest name of all. > > > No confusion with builtins > -------------------------- > > In this proposal, the path classes do not derive from a builtin type. This > contrasts with some other Path class proposals which were derived from > ``str``. They also do not pretend to implement the sequence protocol: > if you want a path to act as a sequence, you have to lookup a dedicate > attribute (the ``parts`` attribute). > > By avoiding to pass as builtin types, the path classes minimize the potential > for confusion if they are combined by accident with genuine builtin types. > > > Immutability > ------------ > > Path objects are immutable, which makes them hashable and also prevents a > class of programming errors. > > > Sane behaviour > -------------- > > Little of the functionality from os.path is reused. Many os.path functions > are tied by backwards compatibility to confusing or plain wrong behaviour > (for example, the fact that ``os.path.abspath()`` simplifies ".." path > components without resolving symlinks first). > > Also, using classes instead of plain strings helps make system-dependent > behaviours natural. For example, comparing and ordering Windows path > objects is case-insensitive, and path separators are automatically converted > to the platform default. > > > Useful notations > ---------------- > > The API tries to provide useful notations all the while avoiding magic. > Some examples:: > > >>> p = Path('/home/antoine/pathlib/setup.py') > >>> p.name > 'setup.py' > >>> p.ext > '.py' > >>> p.root > '/' > >>> p.parts > > >>> list(p.parents()) > [PosixPath('/home/antoine/pathlib'), PosixPath('/home/antoine'), PosixPath('/home'), PosixPath('/')] > >>> p.exists() > True > >>> p.st_size > 928 > > > Pure paths API > ============== > > The philosophy of the ``PurePath`` API is to provide a consistent array of > useful path manipulation operations, without exposing a hodge-podge of > functions like ``os.path`` does. > > > Definitions > ----------- > > First a couple of conventions: > > * All paths can have a drive and a root. For POSIX paths, the drive is > always empty. > > * A relative path has neither drive nor root. > > * A POSIX path is absolute if it has a root. A Windows path is absolute if > it has both a drive *and* a root. A Windows UNC path (e.g. > ``\\some\\share\\myfile.txt``) always has a drive and a root > (here, ``\\some\\share`` and ``\\``, respectively). > > * A drive which has either a drive *or* a root is said to be anchored. > Its anchor is the concatenation of the drive and root. Under POSIX, > "anchored" is the same as "absolute". > > > Construction and joining > ------------------------ > > We will present construction and joining together since they expose > similar semantics. > > The simplest way to construct a path is to pass it its string representation:: > > >>> PurePath('setup.py') > PurePosixPath('setup.py') > > Extraneous path separators and ``"."`` components are eliminated:: > > >>> PurePath('a///b/c/./d/') > PurePosixPath('a/b/c/d') > > If you pass several arguments, they will be automatically joined:: > > >>> PurePath('docs', 'Makefile') > PurePosixPath('docs/Makefile') > > Joining semantics are similar to os.path.join, in that anchored paths ignore > the information from the previously joined components:: > > >>> PurePath('/etc', '/usr', 'bin') > PurePosixPath('/usr/bin') > > However, with Windows paths, the drive is retained as necessary:: > > >>> PureNTPath('c:/foo', '/Windows') > PureNTPath('c:\\Windows') > >>> PureNTPath('c:/foo', 'd:') > PureNTPath('d:') > > Calling the constructor without any argument creates a path object pointing > to the logical "current directory":: > > >>> PurePosixPath() > PurePosixPath('.') > > A path can be joined with another using the ``__getitem__`` operator:: > > >>> p = PurePosixPath('foo') > >>> p['bar'] > PurePosixPath('foo/bar') > >>> p[PurePosixPath('bar')] > PurePosixPath('foo/bar') > > As with constructing, multiple path components can be specified at once:: > > >>> p['bar/xyzzy'] > PurePosixPath('foo/bar/xyzzy') > > A join() method is also provided, with the same behaviour. It can serve > as a factory function:: > > >>> path_factory = p.join > >>> path_factory('bar') > PurePosixPath('foo/bar') > > > Representing > ------------ > > To represent a path (e.g. to pass it to third-party libraries), just call > ``str()`` on it:: > > >>> p = PurePath('/home/antoine/pathlib/setup.py') > >>> str(p) > '/home/antoine/pathlib/setup.py' > >>> p = PureNTPath('c:/windows') > >>> str(p) > 'c:\\windows' > > To force the string representation with forward slashes, use the ``as_posix()`` > method:: > > >>> p.as_posix() > 'c:/windows' > > To get the bytes representation (which might be useful under Unix systems), > call ``bytes()`` on it, or use the ``as_bytes()`` method:: > > >>> bytes(p) > b'/home/antoine/pathlib/setup.py' > > > Properties > ---------- > > Five simple properties are provided on every path (each can be empty):: > > >>> p = PureNTPath('c:/pathlib/setup.py') > >>> p.drive > 'c:' > >>> p.root > '\\' > >>> p.anchor > 'c:\\' > >>> p.name > 'setup.py' > >>> p.ext > '.py' > > > Sequence-like access > -------------------- > > The ``parts`` property provides read-only sequence access to a path object:: > > >>> p = PurePosixPath('/etc/init.d') > >>> p.parts > > > Simple indexing returns the invidual path component as a string, while > slicing returns a new path object constructed from the selected components:: > > >>> p.parts[-1] > 'init.d' > >>> p.parts[:-1] > PurePosixPath('/etc') > > Windows paths handle the drive and the root as a single path component:: > > >>> p = PureNTPath('c:/setup.py') > >>> p.parts > > >>> p.root > '\\' > >>> p.parts[0] > 'c:\\' > > (separating them would be wrong, since ``C:`` is not the parent of ``C:\\``). > > The ``parent()`` method returns an ancestor of the path:: > > >>> p.parent() > PureNTPath('c:\\python33\\bin') > >>> p.parent(2) > PureNTPath('c:\\python33') > >>> p.parent(3) > PureNTPath('c:\\') > > The ``parents()`` method automates repeated invocations of ``parent()``, until > the anchor is reached:: > > >>> p = PureNTPath('c:/python33/bin/python.exe') > >>> for parent in p.parents(): parent > ... > PureNTPath('c:\\python33\\bin') > PureNTPath('c:\\python33') > PureNTPath('c:\\') > > > Querying > -------- > > ``is_relative()`` returns True if the path is relative (see definition > above), False otherwise. > > ``is_reserved()`` returns True if a Windows path is a reserved path such > as ``CON`` or ``NUL``. It always returns False for POSIX paths. > > ``match()`` matches the path against a glob pattern:: > > >>> PureNTPath('c:/PATHLIB/setup.py').match('c:*lib/*.PY') > True > > ``relative()`` returns a new relative path by stripping the drive and root:: > > >>> PurePosixPath('setup.py').relative() > PurePosixPath('setup.py') > >>> PurePosixPath('/setup.py').relative() > PurePosixPath('setup.py') > > ``relative_to()`` computes the relative difference of a path to another:: > > >>> PurePosixPath('/usr/bin/python').relative_to('/usr') > PurePosixPath('bin/python') > > ``normcase()`` returns a case-folded version of the path for NT paths:: > > >>> PurePosixPath('CAPS').normcase() > PurePosixPath('CAPS') > >>> PureNTPath('CAPS').normcase() > PureNTPath('caps') > > > Concrete paths API > ================== > > In addition to the operations of the pure API, concrete paths provide > additional methods which actually access the filesystem to query or mutate > information. > > > Constructing > ------------ > > The classmethod ``cwd()`` creates a path object pointing to the current > working directory in absolute form:: > > >>> Path.cwd() > PosixPath('/home/antoine/pathlib') > > > File metadata > ------------- > > The ``stat()`` method caches and returns the file's stat() result; > ``restat()`` forces refreshing of the cache. ``lstat()`` is also provided, > but doesn't have any caching behaviour:: > > >>> p.stat() > posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964) > > For ease of use, direct attribute access to the fields of the stat structure > is provided over the path object itself:: > > >>> p.st_size > 928 > >>> p.st_mtime > 1328287308.889562 > > Higher-level methods help examine the kind of the file:: > > >>> p.exists() > True > >>> p.is_file() > True > >>> p.is_dir() > False > >>> p.is_symlink() > False > > The file owner and group names (rather than numeric ids) are queried > through matching properties:: > > >>> p = Path('/etc/shadow') > >>> p.owner > 'root' > >>> p.group > 'shadow' > > > Path resolution > --------------- > > The ``resolve()`` method makes a path absolute, resolving any symlink on > the way. It is the only operation which will remove "``..``" path components. > > > Directory walking > ----------------- > > Simple (non-recursive) directory access is done by iteration:: > > >>> p = Path('docs') > >>> for child in p: child > ... > PosixPath('docs/conf.py') > PosixPath('docs/_templates') > PosixPath('docs/make.bat') > PosixPath('docs/index.rst') > PosixPath('docs/_build') > PosixPath('docs/_static') > PosixPath('docs/Makefile') > > This allows simple filtering through list comprehensions:: > > >>> p = Path('.') > >>> [child for child in p if child.is_dir()] > [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] > > Simple and recursive globbing is also provided:: > > >>> for child in p.glob('**/*.py'): child > ... > PosixPath('test_pathlib.py') > PosixPath('setup.py') > PosixPath('pathlib.py') > PosixPath('docs/conf.py') > PosixPath('build/lib/pathlib.py') > > > File opening > ------------ > > The ``open()`` method provides a file opening API similar to the builtin > ``open()`` method:: > > >>> p = Path('setup.py') > >>> with p.open() as f: f.readline() > ... > '#!/usr/bin/env python3\n' > > The ``raw_open()`` method, on the other hand, is similar to ``os.open``:: > > >>> fd = p.raw_open(os.O_RDONLY) > >>> os.read(fd, 15) > b'#!/usr/bin/env ' > > > Filesystem alteration > --------------------- > > Several common filesystem operations are provided as methods: ``touch()``, > ``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``, > ``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be > provided, for example some of the functionality of the shutil module. > > > Experimental openat() support > ----------------------------- > > On compatible POSIX systems, the concrete PosixPath class can take advantage > of \*at() functions (`openat()`_ and friends), and manages the bookkeeping of > open file descriptors as necessary. Support is enabled by passing the > *use_openat* argument to the constructor:: > > >>> p = Path(".", use_openat=True) > > Then all paths constructed by navigating this path (either by iteration or > indexing) will also use the openat() family of functions. The point of using > these functions is to avoid race conditions whereby a given directory is > silently replaced with another (often a symbolic link to a sensitive system > location) between two accesses. > > .. _`openat()`: http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html > > > Copyright > ========= > > This document has been placed into the public domain. > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ethan at stoneleaf.us Sat Oct 6 18:20:00 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 09:20:00 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87fw5sqbc0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006023923.62545731@pitrou.net> <87fw5sqbc0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <50705A30.9020006@stoneleaf.us> Stephen J. Turnbull wrote: > Antoine Pitrou writes: >> Richard Oudkerk wrote: >>> Maybe p.basename could be shorthand for p.name.split('.')[0]. >> >> Wouldn't there be some confusion with os.path.basename: >> >>--> os.path.basename('a/b/c.ext') >> 'c.ext' I wouldn't worry too much about this; after all, we are trying to replace a primitive system with a more advanced, user-friendly one. > Also there are applications where "basenames" contain periods (eg, > wget often creates directories with names like "www.python.org"), and > filenames may have multiple extensions, eg, "index.ja.html". > > I think it's reasonable to define "extension" to mean "the portion > after the last period (if any, maybe including the period), but I > think usage of the complementary concept is pretty application- > specific. FWIW, my own implementation uses the names .path -> c:\foo\bar or \\computer_name\share\dir1\dir2 .vol -> c: \\computer_name\share .dirs -> \foo\bar \dir1\dir2 .filename -> some_file.txt or archive.tar.gz .basename -> some_file archive .ext -> .txt .tar.gz ~Ethan~ From ethan at stoneleaf.us Sat Oct 6 18:27:26 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 09:27:26 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <506FC4A8.9060009@stoneleaf.us> <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <50705BEE.9010602@stoneleaf.us> Stephen J. Turnbull wrote: > Ethan Furman writes: >> Eric Snow wrote: >>>--> p = PureNTPath('c:/orders/12345/abc67890.dbf') >>>--> p.replace(ext='.csv') >>> PureNTPath('c:\\orders\\12345\\abc67890.csv') >> >> +1 > > How about a more general subst() method? Indeed, it would need > keyword arguments for named components like ext, but I often do things > like "mv ~/Maildir/{tmp,new}/42" in the shell. I think it would be > useful to be able to replace any component of a path. How would 'subst' differ from 'replace'? As you can see from the example, the keyword 'ext' is being used to specify with component gets replaced. ~Ethan~ From ethan at stoneleaf.us Sat Oct 6 18:38:54 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 09:38:54 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006144635.34f21f84@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> Message-ID: <50705E9E.3030202@stoneleaf.us> Antoine Pitrou wrote: > On Sat, 6 Oct 2012 16:40:49 +0400 > Oleg Broytman wrote: >> On Sat, Oct 06, 2012 at 04:26:42PM +0400, Oleg Broytman wrote: >>> newpath = path.with_drive('C:') >>> newpath = path.with_name('newname') >>> newpath = path.with_ext('.zip') >> BTW, I think having these three -- replacing drive, name and extension -- >> is enough. I do not. > What is the point of replacing the drive? At my work we have identical path structures on several machines, and we sometimes move entire branches from one machine to another. In those instances it is good to be able to change from one drive/mount/share to another. > Replacing the name is already trivial: path.parent()[newname] Or, if '/' is allowed, path.path/newname. I can see the reasonableness of using indexing (to me, it sorta looks like a window onto the path ;) ), but I prefer other methods when possible (tender wrists -- arthritis sucks) ~Ethan~ From solipsis at pitrou.net Sat Oct 6 19:08:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 19:08:21 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: <20121006190821.02ae50cd@pitrou.net> On Sat, 6 Oct 2012 12:14:40 -0400 Calvin Spealman wrote: > > It feels like this proposal is "make it object oriented, because > object oriented is good" without any actual justification or obvious > problem this solves. The API looks clunky and redundant, and does not > appear to actually improve anything over the facilities in the os.path > module. Personally, I cringe everytime I have to type `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two directories upwards of a given path. Compare, with, say: >>> p = Path('/a/b/c/d') >>> p.parent(2) PosixPath('/a/b') Really, I don't think os.path is the prettiest or most convenient "battery" in the stdlib. > This takes a lot of things we can already do with paths and > files and remixes them into a not-so intuitive API for the sake of > change, not for the sake of solving a real problem. Ironing out difficulties such as platform-specific case-sensitivity rules or the various path separators is a real problem that is not solved by a os.path-like API, because you can't muck with str and give it the required semantics for a filesystem path. So people end up sprinkling their code with calls to os.path.normpath() and/or os.path.normcase() in the hope that it will appease the Gods of Portability (which will also lose casing information). > Not inheriting from str means that we can't directly path these path > objects to existing code that just expects a string, so we have a > really hard boundary around the edges of this new API. It does not > lend itself well to incrementally transitioning to it from existing > code. As discussed in the PEP, I consider inheriting from str to be a mistake when your intent is to provide different semantics from str. Why should indexing or iterating over a path produce individual characters? Why should Path.split() split over whitespace by default? Why should "c:\\" be considered unequal to "C:\\" under Windows? Why should startswith() work character by character, rather than path component by path component? These are all standard str behaviours that are unhelpful when applied to filesystem paths. As for the transition, you just have to call str() on the path object. Since str() also works on plain str objects (and is a no-op), it seems rather painless to me. (Of course, you are not forced to transition. The PEP doesn't call for deprecation of os.path.) > The stat operations and other file-facilities tacked on feel out of > place, and limited. Why does it make sense to add these facilities to > path and not other file operations? Why not give me a read method on > paths? or maybe a copy? There is always room to improve and complete the API without breaking compatibility. To quote the PEP: ?More operations could be provided, for example some of the functionality of the shutil module?. The focus of the PEP is not to enumerate every possible file operation, but to propose the semantic and syntactic foundations (such as how to join paths, how to divide them into their individual components, etc.). > Putting lots of file facilities on a path > object feels wrong because you can't extend it easily. This is one > place that function(thing) works better than thing.function() But you can still define a function() taking a Path as an argument, if you need to. Similarly, you can define a function() taking a datetime object if the datetime object's API lacks some useful functionality for you. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From grosser.meister.morti at gmx.net Sat Oct 6 19:26:26 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 06 Oct 2012 19:26:26 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <507069C2.7080100@gmx.net> Would there be something like this: >>> prefix.join("some","sub","path") This would be the same as: >>> prefix["some"]["sub"]["path"] But the join variant would be much less of a finger-twister on non-english keyboards. From ericsnowcurrently at gmail.com Sat Oct 6 19:29:37 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 6 Oct 2012 11:29:37 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006140924.3bfdb710@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 6:09 AM, Antoine Pitrou wrote: > On Fri, 5 Oct 2012 23:16:55 -0600 > Eric Snow > wrote: >> Each namedtuple has a _replace() method that's is used to generate a >> new instance with one or more attributes changed. We could do >> something similar here: > > The concrete Path objects' replace() method already maps to > os.replace(). > Note os.replace() is new in 3.3 and is a portable always-overwriting > alternative to os.rename(): > http://docs.python.org/dev/library/os.html#os.replace Sure. The point is that the API include some method that works this way, regardless of what the name ultimately is. :) Stephen J. Turnbull called it subst() and expanded on the idea. -eric From massimo.dipierro at gmail.com Sat Oct 6 19:32:08 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Sat, 6 Oct 2012 12:32:08 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006190821.02ae50cd@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> Message-ID: <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> How about something along this lines: import os class Path(str): def __add__(self,other): return Path(self+os.path.sep+other) def __getitem__(self,i): return self.split(os.path.sep)[i] def __setitem__(self,i,v): items = self.split(os.path.sep) items[i]=v return Path(os.path.sep.join(items)) def append(self,v): self += os.path.sep+v @property def filename(self): return self.split(os.path.sep)[-1] @property def folder(self): items =self.split(os.path.sep) return Path(os.path.sep.join(items[:-1])) path = Path('/this/is/an/example.png') print isinstance(path,str) # True print path[-1] # example.png print path.filename # example.png print path.folder # /this/is/an On Oct 6, 2012, at 12:08 PM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 12:14:40 -0400 > Calvin Spealman > wrote: >> >> It feels like this proposal is "make it object oriented, because >> object oriented is good" without any actual justification or obvious >> problem this solves. The API looks clunky and redundant, and does not >> appear to actually improve anything over the facilities in the os.path >> module. > > Personally, I cringe everytime I have to type > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two > directories upwards of a given path. Compare, with, say: > >>>> p = Path('/a/b/c/d') >>>> p.parent(2) > PosixPath('/a/b') > > Really, I don't think os.path is the prettiest or most convenient > "battery" in the stdlib. > >> This takes a lot of things we can already do with paths and >> files and remixes them into a not-so intuitive API for the sake of >> change, not for the sake of solving a real problem. > > Ironing out difficulties such as platform-specific case-sensitivity > rules or the various path separators is a real problem that is not > solved by a os.path-like API, because you can't muck with str and give > it the required semantics for a filesystem path. So people end up > sprinkling their code with calls to os.path.normpath() and/or > os.path.normcase() in the hope that it will appease the Gods of > Portability (which will also lose casing information). > >> Not inheriting from str means that we can't directly path these path >> objects to existing code that just expects a string, so we have a >> really hard boundary around the edges of this new API. It does not >> lend itself well to incrementally transitioning to it from existing >> code. > > As discussed in the PEP, I consider inheriting from str to be a mistake > when your intent is to provide different semantics from str. > > Why should indexing or iterating over a path produce individual > characters? > Why should Path.split() split over whitespace by default? > Why should "c:\\" be considered unequal to "C:\\" under Windows? > Why should startswith() work character by character, rather than path > component by path component? > > These are all standard str behaviours that are unhelpful when applied > to filesystem paths. > > As for the transition, you just have to call str() on the path object. > Since str() also works on plain str objects (and is a no-op), it seems > rather painless to me. > > (Of course, you are not forced to transition. The PEP doesn't call for > deprecation of os.path.) > >> The stat operations and other file-facilities tacked on feel out of >> place, and limited. Why does it make sense to add these facilities to >> path and not other file operations? Why not give me a read method on >> paths? or maybe a copy? > > There is always room to improve and complete the API without breaking > compatibility. To quote the PEP: ?More operations could be provided, > for example some of the functionality of the shutil module?. > > The focus of the PEP is not to enumerate every possible file operation, > but to propose the semantic and syntactic foundations (such as how to > join paths, how to divide them into their individual components, etc.). > >> Putting lots of file facilities on a path >> object feels wrong because you can't extend it easily. This is one >> place that function(thing) works better than thing.function() > > But you can still define a function() taking a Path as an argument, if > you need to. > Similarly, you can define a function() taking a datetime object if the > datetime object's API lacks some useful functionality for you. > > Regards > > Antoine. > > > -- > Software development and contracting: http://pro.pitrou.net > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ericsnowcurrently at gmail.com Sat Oct 6 19:41:00 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 6 Oct 2012 11:41:00 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006122642.GA15492@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> Message-ID: On Sat, Oct 6, 2012 at 6:26 AM, Oleg Broytman wrote: > On Sat, Oct 06, 2012 at 02:09:24PM +0200, Antoine Pitrou wrote: >> On Fri, 5 Oct 2012 23:16:55 -0600 >> Eric Snow >> wrote: >> > Each namedtuple has a _replace() method that's is used to generate a >> > new instance with one or more attributes changed. We could do >> > something similar here: >> >> The concrete Path objects' replace() method already maps to >> os.replace(). > > Call it "with": > > newpath = path.with_drive('C:') > newpath = path.with_name('newname') > newpath = path.with_ext('.zip') Yeah, having dedicated methods makes more sense here, given the small number of candidates for replacement. -eric From guido at python.org Sat Oct 6 19:44:37 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 10:44:37 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: On Fri, Oct 5, 2012 at 11:25 AM, Antoine Pitrou wrote: > This PEP is a resurrection of the idea of having object-oriented > filesystem paths in the stdlib. It comes with a general API proposal > as well as a specific implementation (*). The implementation is young > and discussion is quite open. Thanks for getting this started! I haven't read the whole PEP or the whole thread, but I like many of the principles, such as not deriving from existing built-in types (str or tuple), immutability, explicitly caring about OS differences, and distinguishing between pure and impure (I/O-using) operations. (Though admittedly I'm not super-keen on the specific term "pure".) I can't say I'm thrilled about overloading p[s], but I can't get too excited about p/s either; p+s makes more sense but that would beg the question of how to append an extension to a path (transforming e.g. 'foo/bar' to 'foo/bar.py' by appending '.py'). At the same time I'm not in the camp that says you can't use / because it's not division. But rather than diving right into the syntax, I would like to focus on some use cases. (Some of this may already be in the PEP, my apologize.) Some things I care about (based on path manipulations I remember I've written at some point or another): - Distinguishing absolute paths from relative paths; this affects joining behavior as for os.path.join(). - Various normal forms that can be used for comparing paths for equality; there should be a pure normalization as well as an impure one (like os.path.realpath()). - An API that encourage Unix lovers to write code that is most likely also to make sense on Windows. - An API that encourages Windows lovers to write code that is most likely also to make sense on Unix. - Integration with fnmatch (pure) and glob (impure). - In addition to stat(), some simple derived operations like getmtime(), getsize(), islink(). - Easy checks and manipulations (applying to the basename) like "ends with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc extension with .py", "remove trailing ~", "append .tmp", "remove leading @", and so on. - While it's nice to be able to ask for "the extension" it would be nice if the checks above would not be hardcoded to use "." as a separator; and it would be nice if the extension-parsing code could deal with multiple extensions and wasn't confused by names starting or ending with a dot. - Matching on patterns on directory names (e.g. "does not contain a segment named .hg"). - A matching notation based on glob/fnmatch syntax instead of regular expressions. PS. Another occasional use for "posix" style paths I have found is manipulating the path portion of a URL. There are some posix-like features, e.g. the interpretation of trailing / as "directory", the requirement of leading / as root, the interpretation of "." and "..", and the notion of relative paths (although path joining is different). It would be nice if the "pure" posix path class could be reused for this purpose, or if a related class with a subset or superset of the same methods existed. This may influence the basic design somewhat in showing the need for custom subclasses etc. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Sat Oct 6 19:51:40 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 19:51:40 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> Message-ID: Am 06.10.2012 19:32, schrieb Massimo DiPierro: > How about something along this lines: > > import os > > class Path(str): > def __add__(self,other): > return Path(self+os.path.sep+other) > def __getitem__(self,i): > return self.split(os.path.sep)[i] > def __setitem__(self,i,v): > items = self.split(os.path.sep) > items[i]=v > return Path(os.path.sep.join(items)) > def append(self,v): > self += os.path.sep+v > @property > def filename(self): > return self.split(os.path.sep)[-1] > @property > def folder(self): > items =self.split(os.path.sep) > return Path(os.path.sep.join(items[:-1])) > > path = Path('/this/is/an/example.png') > print isinstance(path,str) # True > print path[-1] # example.png > print path.filename # example.png > print path.folder # /this/is/an If you inherit from str, you cannot override any of the operations that str already has (i.e. __add__, __getitem__). And obviously you also can't make it mutable, i.e. __setitem__. Georg From jeanpierreda at gmail.com Sat Oct 6 19:53:55 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 6 Oct 2012 13:53:55 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 12:14 PM, Calvin Spealman wrote: > The stat operations and other file-facilities tacked on feel out of > place, and limited. Why does it make sense to add these facilities to > path and not other file operations? Why not give me a read method on > paths? or maybe a copy? Putting lots of file facilities on a path > object feels wrong because you can't extend it easily. This is one > place that function(thing) works better than thing.function() The only reason to have objects for anything is to let people have other implementations that do something else with the same method. I remember one of the advantages to having an object-oriented path API, that I always wanted, is that the actual filesystem doesn't have to be what the paths access. They could be names for web resources, or files within a zip archive, or virtual files on a pretend hard drive in your demo application. That's fantastic to have, imo, and it's something function calls (like you suggest) can't possibly support, because functions aren't extensibly polymorphic. If we don't get this sort of polymorphism of functionality, there's very little point to an object oriented path API. It is syntax sugar for function calls with slightly better type safety (NTPath(...) / UnixPath(...) == TypeError -- I hope.) So I'd assume the reason that these methods exist is to enable polymorphism. As for why your suggested methods don't exist, they are better written as functions because they don't need to be ad-hoc polymorphic, they work just fine as regular functions that call methods on path objects. e.g. def read(path): return path.open().read() def copy(path1, path2): path2.open('w').write(path1.read()) # won't work for very large files, blah blah blah Whereas the open method cannot work this way, because the path should define how file opening works. (It might return an io.StringIO for example.) And the return value of .open() might not be a real file with a real fd, so you can't implement a stat function in terms of open and f.fileno() and such. And so on. -- Devin From g.brandl at gmx.net Sat Oct 6 19:57:02 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 19:57:02 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006140652.630794f4@pitrou.net> <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Am 06.10.2012 16:49, schrieb Stephen J. Turnbull: > Antoine Pitrou writes: > > > > > Someone else proposed overloading '+', which would be confusing > > > > since we need to be able to combine paths and regular strings, for > > > > ease of use. > > > > > > Is it really that obnoxious to write "p + Path('bar')" (where p is a > > > Path)? > > > > > > What about the case "'bar' + p"? Since Python isn't C, you can't > > > express that as "'bar'[p]"! > > > > The issue I envision is if you write `p + "bar"`, thinking p is a Path, > > and p is actually a str object. It won't raise, but give you the wrong > > result. > > No, my point is that for me prepending new segments is quite common, > though not as common as appending them. The asymmetry of the bracket > operator means that there's no easy way to deal with that. > > On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is > a Path, not a string) both seem reasonable to me. It's true that one > could screw up as you suggest, but that requires *two* mistakes, first > thinking that p is a Path when it's a string, and then forgetting to > convert 'bar' to Path. I don't think that's very likely if you don't > allow mixing strings and Paths without explicit conversion. But having to call Path() explicitly every time is not very convenient either; in that case you can also call .join() -- and I bet people would prefer p + Path('foo/bar/baz') (which is probably not correct in all cases) to p + Path('foo') + Path('bar') + Path('baz') just because it's such a pain. On the other hand, when the explicit conversion is not needed, confusion will ensue, as Antoine says. In any case, for me using "+" to join paths is quite ugly. I guess it's because after all, I think of the underlying path as a string, and "+" is hardwired in my brain as string concatenation (at least in Python). Georg From massimo.dipierro at gmail.com Sat Oct 6 20:22:06 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Sat, 6 Oct 2012 13:22:06 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> Message-ID: <0063EDB4-0AC4-4B3A-BEC1-94A44B572E48@gmail.com> I was thinking of the api more than the implementation. The point to me is that it would be nice to have something the behaves as a string and as a list at the same time. Here is another possible incomplete implementation. import os class Path(object): def __init__(self,s='/',sep=os.path.sep): self.sep = sep self.s = s.split(sep) def __str__(self): return self.sep.join(self.s) def __add__(self,other): if other[0]=='': return Path(other) else: return Path(str(self)+os.sep+str(other)) def __getitem__(self,i): return self.s[i] def __setitem__(self,i,v): self.s[i] = v def append(self,v): self.s.append(v) @property def filename(self): return self.s[-1] @property def folder(self): return Path(self.sep.join(self.s[:-1])) >>> path = Path('/this/is/an/example.png') >>> print path[-1] example.png >>> print path.filename example.png >>> print path.folder /this/is/an >>> path[1]='that' /that/is/an/example.png >>> print path.folder + 'this' /that/is/an/this On Oct 6, 2012, at 12:51 PM, Georg Brandl wrote: > Am 06.10.2012 19:32, schrieb Massimo DiPierro: >> How about something along this lines: >> >> import os >> >> class Path(str): >> def __add__(self,other): >> return Path(self+os.path.sep+other) >> def __getitem__(self,i): >> return self.split(os.path.sep)[i] >> def __setitem__(self,i,v): >> items = self.split(os.path.sep) >> items[i]=v >> return Path(os.path.sep.join(items)) >> def append(self,v): >> self += os.path.sep+v >> @property >> def filename(self): >> return self.split(os.path.sep)[-1] >> @property >> def folder(self): >> items =self.split(os.path.sep) >> return Path(os.path.sep.join(items[:-1])) >> >> path = Path('/this/is/an/example.png') >> print isinstance(path,str) # True >> print path[-1] # example.png >> print path.filename # example.png >> print path.folder # /this/is/an > > If you inherit from str, you cannot override any of the operations that > str already has (i.e. __add__, __getitem__). And obviously you also > can't make it mutable, i.e. __setitem__. > > Georg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From phd at phdru.name Sat Oct 6 11:55:53 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 13:55:53 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <506FC4A8.9060009@stoneleaf.us> <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121006095553.GA12181@iskra.aviel.ru> On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" wrote: > > Eric Snow wrote: > > > Each namedtuple has a _replace() method that's is used to generate a > > > new instance with one or more attributes changed. We could do > > > something similar here: > > > > > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') > > >>>> p.replace(ext='.csv') > > > PureNTPath('c:\\orders\\12345\\abc67890.csv') > > How about a more general subst() method? Indeed, it would need > keyword arguments for named components like ext, but I often do things > like "mv ~/Maildir/{tmp,new}/42" in the shell. I think it would be > useful to be able to replace any component of a path. I think this would be overgeneralization. IMO there is no need to replace parts beyond drive/name/extension. To "replace" root or path components just construct a new Path. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ironfroggy at gmail.com Sat Oct 6 20:42:22 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 6 Oct 2012 14:42:22 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006190821.02ae50cd@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 1:08 PM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 12:14:40 -0400 > Calvin Spealman > wrote: >> >> It feels like this proposal is "make it object oriented, because >> object oriented is good" without any actual justification or obvious >> problem this solves. The API looks clunky and redundant, and does not >> appear to actually improve anything over the facilities in the os.path >> module. > > Personally, I cringe everytime I have to type > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two > directories upwards of a given path. Compare, with, say: > >>>> p = Path('/a/b/c/d') >>>> p.parent(2) > PosixPath('/a/b') I would never do the first version in the first place. I would just join(my_path, "../..") Note that we really need to get out of the habit of "import os" instead of "from os.path import join, etc..." We are making our code uglier and arbitrarily creating many of your concerns by making the use of os.path harder than it should be. > Really, I don't think os.path is the prettiest or most convenient > "battery" in the stdlib. > >> This takes a lot of things we can already do with paths and >> files and remixes them into a not-so intuitive API for the sake of >> change, not for the sake of solving a real problem. > > Ironing out difficulties such as platform-specific case-sensitivity > rules or the various path separators is a real problem that is not > solved by a os.path-like API, because you can't muck with str and give > it the required semantics for a filesystem path. So people end up > sprinkling their code with calls to os.path.normpath() and/or > os.path.normcase() in the hope that it will appease the Gods of > Portability (which will also lose casing information). I agree this stuff is difficult, but I think normalizing is a lot more predictable than lots of platform specific paths (both FS and code paths) >> Not inheriting from str means that we can't directly path these path >> objects to existing code that just expects a string, so we have a >> really hard boundary around the edges of this new API. It does not >> lend itself well to incrementally transitioning to it from existing >> code. > > As discussed in the PEP, I consider inheriting from str to be a mistake > when your intent is to provide different semantics from str. > > Why should indexing or iterating over a path produce individual > characters? > Why should Path.split() split over whitespace by default? > Why should "c:\\" be considered unequal to "C:\\" under Windows? > Why should startswith() work character by character, rather than path > component by path component? Good points, but I'm not convinced that subclasses from string means you can't change these in your subclass. > These are all standard str behaviours that are unhelpful when applied > to filesystem paths. We agree there. > As for the transition, you just have to call str() on the path object. > Since str() also works on plain str objects (and is a no-op), it seems > rather painless to me. But then I loose all the helpful path information. Something further down the call chain, path aware, might be able to make use of it. > (Of course, you are not forced to transition. The PEP doesn't call for > deprecation of os.path.) If we are only adding something redundant and intend to leave both forever, it only feels like bloat. We should be shrinking the stdlib, not growing it with redundant APIs. >> The stat operations and other file-facilities tacked on feel out of >> place, and limited. Why does it make sense to add these facilities to >> path and not other file operations? Why not give me a read method on >> paths? or maybe a copy? > > There is always room to improve and complete the API without breaking > compatibility. To quote the PEP: ?More operations could be provided, > for example some of the functionality of the shutil module?. What I meant is that I can't extend it in third party code without being second class. I can add another library that does file operations os.path or stat() don't provide, and they sit side by side. > The focus of the PEP is not to enumerate every possible file operation, > but to propose the semantic and syntactic foundations (such as how to > join paths, how to divide them into their individual components, etc.). > >> Putting lots of file facilities on a path >> object feels wrong because you can't extend it easily. This is one >> place that function(thing) works better than thing.function() > > But you can still define a function() taking a Path as an argument, if > you need to. > Similarly, you can define a function() taking a datetime object if the > datetime object's API lacks some useful functionality for you. > > Regards > > Antoine. > > > -- > Software development and contracting: http://pro.pitrou.net > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ethan at stoneleaf.us Sat Oct 6 20:39:17 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 11:39:17 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> Message-ID: <50707AD5.3030407@stoneleaf.us> Georg Brandl wrote: > If you inherit from str, you cannot override any of the operations that > str already has (i.e. __add__, __getitem__). Is this a 3.x thing? My 2.x version of Path overrides many of the str methods and works just fine. > And obviously you also can't make it mutable, i.e. __setitem__. Well, since Paths (both Antoine's and mine) are immutable that's not an issue. ~Ethan~ From ethan at stoneleaf.us Sat Oct 6 20:44:02 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 11:44:02 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006095553.GA12181@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <506FC4A8.9060009@stoneleaf.us> <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006095553.GA12181@iskra.aviel.ru> Message-ID: <50707BF2.2070907@stoneleaf.us> Oleg Broytman wrote: > On Sat, Oct 06, 2012 at 05:04:44PM +0900, "Stephen J. Turnbull" wrote: >> > Eric Snow wrote: >> > > Each namedtuple has a _replace() method that's is used to generate a >> > > new instance with one or more attributes changed. We could do >> > > something similar here: >> > > >> > >>>> p = PureNTPath('c:/orders/12345/abc67890.dbf') >> > >>>> p.replace(ext='.csv') >> > > PureNTPath('c:\\orders\\12345\\abc67890.csv') >> >> How about a more general subst() method? Indeed, it would need >> keyword arguments for named components like ext, but I often do things >> like "mv ~/Maildir/{tmp,new}/42" in the shell. I think it would be >> useful to be able to replace any component of a path. > > I think this would be overgeneralization. IMO there is no need to > replace parts beyond drive/name/extension. To "replace" root or path > components just construct a new Path. And if your new path is exactly the same as the old, /except/ for the root? Are you suggesting something like: --> p = PureNTPath('c:/orders/12345/abc67890.dbf') --> q = '//another_machine/share' + p.parts() + p.filename ? ~Ethan~ From mikegraham at gmail.com Sat Oct 6 20:56:21 2012 From: mikegraham at gmail.com (Mike Graham) Date: Sat, 6 Oct 2012 14:56:21 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50707AD5.3030407@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> Message-ID: On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman wrote: > Georg Brandl wrote: >> >> If you inherit from str, you cannot override any of the operations that >> str already has (i.e. __add__, __getitem__). > > > Is this a 3.x thing? My 2.x version of Path overrides many of the str > methods and works just fine. This is for theoretical/practical reasons, not technical ones. Mike From phd at phdru.name Sat Oct 6 21:05:34 2012 From: phd at phdru.name (Oleg Broytman) Date: Sat, 6 Oct 2012 23:05:34 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50707BF2.2070907@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <506FC4A8.9060009@stoneleaf.us> <87ehlcqb4z.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006095553.GA12181@iskra.aviel.ru> <50707BF2.2070907@stoneleaf.us> Message-ID: <20121006190534.GA25975@iskra.aviel.ru> On Sat, Oct 06, 2012 at 11:44:02AM -0700, Ethan Furman wrote: > Oleg Broytman wrote: > > IMO there is no need to > >replace parts beyond drive/name/extension. To "replace" root or path > >components just construct a new Path. > > And if your new path is exactly the same as the old, /except/ for > the root? Are you suggesting something like: > > --> p = PureNTPath('c:/orders/12345/abc67890.dbf') > --> q = '//another_machine/share' + p.parts() + p.filename > > ? Yes. Even if the new path differs from the old by one letter somewhere in a middle component. "Practicality beats purity". We need to see real use cases to decide what is really needed. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ethan at stoneleaf.us Sat Oct 6 20:59:29 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 11:59:29 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> Message-ID: <50707F91.6060301@stoneleaf.us> Mike Graham wrote: > On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman wrote: >> Georg Brandl wrote: >>> If you inherit from str, you cannot override any of the operations that >>> str already has (i.e. __add__, __getitem__). >> >> Is this a 3.x thing? My 2.x version of Path overrides many of the str >> methods and works just fine. > > This is for theoretical/practical reasons, not technical ones. Ah, you mean you can't give them different semantics. Gotcha. ~Ethan~ From storchaka at gmail.com Sat Oct 6 22:10:51 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 06 Oct 2012 23:10:51 +0300 Subject: [Python-ideas] Propagating StopIteration value Message-ID: As StopIteration now have value, this value is lost when using functions which works with iterators/generators (map, filter, itertools). Therefore, wrapping the iterator, which preserved its semantics in versions before 3.3, no longer preserves it: map(lambda x: x, iterator) filter(lambda x: True, iterator) itertools.accumulate(iterator, lambda x, y: y) itertools.chain(iterator) itertools.compress(iterator, itertools.cycle([True])) itertools.dropwhile(lambda x: False, iterator) itertools.filterfalse(lambda x: False, iterator) next(itertools.groupby(iterator, lambda x: None))[1] itertools.takewhile(lambda x: True, iterator) itertools.tee(iterator, 1)[0] Perhaps it would be worth to propagate original exception (or at least it's value) in functions for which it makes sense. From g.brandl at gmx.net Sat Oct 6 22:20:27 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 06 Oct 2012 22:20:27 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50707F91.6060301@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> Message-ID: Am 06.10.2012 20:59, schrieb Ethan Furman: > Mike Graham wrote: >> On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman wrote: >>> Georg Brandl wrote: >>>> If you inherit from str, you cannot override any of the operations that >>>> str already has (i.e. __add__, __getitem__). >>> >>> Is this a 3.x thing? My 2.x version of Path overrides many of the str >>> methods and works just fine. >> >> This is for theoretical/practical reasons, not technical ones. > > Ah, you mean you can't give them different semantics. Gotcha. Yep. Not much use being able to pass them directly to APIs expecting strings if they can't operate on them like any other string :) Georg From ethan at stoneleaf.us Sat Oct 6 22:19:54 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 13:19:54 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <5070926A.5020901@stoneleaf.us> I was hesitant to put mine on PyPI because there's already a slew of others, but for the sake of discussion here it is [1]. Mine is str based, has no actual I/O components, and can easily be used in normal os, shutil, etc., calls. Example usage: job = '12345' home = Path('c:/orders'/job) work = Path('c:/work/') for pdf in glob(work/'*.pdf'): dash = pdf.filename.index('-') dest = home/'reports'/job + pdf.filename[dash:] shutil.copy(pdf, dest) Assuming I haven't typo'ed anything, the above code takes all the pdf files, removes the standard (and useless to me) header info before the '-' in the filename, then copies it over to its final resting place. If I understand Antoine's Path, the code would look something like: job = '12345' home = Path('c:/orders/')[job] work = Path('c:/work/') for child in work: if child.ext != '.pdf': continue name = child.filename dash = name.index('-') dest = home['reports'][name] shutil.copy(str(child), str(dest)) My biggest objections are the extra str calls, and indexing just doesn't look like path concatenation. ~Ethan~ [1]http://pypi.python.org/pypi/strpath P.S. Oh, very nice ascii-art! From solipsis at pitrou.net Sat Oct 6 22:39:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 6 Oct 2012 22:39:34 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <5070926A.5020901@stoneleaf.us> Message-ID: <20121006223934.50b7a871@pitrou.net> On Sat, 06 Oct 2012 13:19:54 -0700 Ethan Furman wrote: > > If I understand Antoine's Path, the code would look something like: > > job = '12345' > home = Path('c:/orders/')[job] > work = Path('c:/work/') > for child in work: > if child.ext != '.pdf': > continue You could actually write `for child in work.glob('*.pdf')` (non-recursive) or `for child in work.glob('**/*.pdf')` (recursive). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From mikegraham at gmail.com Sat Oct 6 22:47:36 2012 From: mikegraham at gmail.com (Mike Graham) Date: Sat, 6 Oct 2012 16:47:36 -0400 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: Message-ID: On Sat, Oct 6, 2012 at 4:10 PM, Serhiy Storchaka wrote: > As StopIteration now have value, this value is lost when using functions > which works with iterators/generators (map, filter, itertools). Therefore, > wrapping the iterator, which preserved its semantics in versions before 3.3, > no longer preserves it: > > map(lambda x: x, iterator) > filter(lambda x: True, iterator) > itertools.accumulate(iterator, lambda x, y: y) > itertools.chain(iterator) > itertools.compress(iterator, itertools.cycle([True])) > itertools.dropwhile(lambda x: False, iterator) > itertools.filterfalse(lambda x: False, iterator) > next(itertools.groupby(iterator, lambda x: None))[1] > itertools.takewhile(lambda x: True, iterator) > itertools.tee(iterator, 1)[0] > > Perhaps it would be worth to propagate original exception (or at least it's > value) in functions for which it makes sense. Can you provide an example of a time when you want to use such a value with a generator on which you want to use one of these so I can better understand why this is necessary? the times I'm familiar with wanting this value I'd usually be manually stepping through my generator. Mike From storchaka at gmail.com Sat Oct 6 23:01:40 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 07 Oct 2012 00:01:40 +0300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: On 05.10.12 21:25, Antoine Pitrou wrote: > PS: You can all admire my ASCII-art skills. PurePosixPath and PureNTPath looks closer to Path than to PurePath. > The ``parent()`` method returns an ancestor of the path:: p[:-n] is shorter and looks neater than p.parent(n). Possible the ``parent()`` method is unnecessary? From amcnabb at mcnabbs.org Sat Oct 6 23:45:40 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Sat, 6 Oct 2012 16:45:40 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <506F813D.2050305@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> Message-ID: <20121006214540.GB20907@mcnabbs.org> On Sat, Oct 06, 2012 at 01:54:21PM +1300, Greg Ewing wrote: > Andrew McNabb wrote: > > >This is the difference between C++ style operators, where the only thing > >that matters is what the operator symbol looks like, and Python style > >operators, where an operator symbol is just syntactic sugar. In Python, > >the "/" is synonymous with `operator.div` and is defined in terms of the > >`__div__` special method. This distinction is why I hate operator > >overloading in C++ but like it in Python. > > Not sure what you're saying here -- in both languages, operators > are no more than syntactic sugar for dispatching to an appropriate > method or function. Python just avoids introducing a special syntax > for spelling the name of the operator, which is nice, but it's > not a huge difference. To clarify my point: in Python, "/" is not just a symbol--it specifically means "div". > The same issues of what you *should* use operators for arises in > both communities, and it seems to be very much a matter of > personal taste. Overriding the div operator requires creating a "__div__" special method, which I think has helped influence personal taste within the Python community. I personally would feel dirty creating a "__div__" method that had absolutely nothing to do with division. Whether or not the sense of personal taste within the Python community is directly attributable to this or not, I believe that overloaded operators in Python tend to be more predictable and consistent than what I have seen in C++. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From ethan at stoneleaf.us Sat Oct 6 23:27:52 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 14:27:52 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: <5070A258.20204@stoneleaf.us> Serhiy Storchaka wrote: > On 05.10.12 21:25, Antoine Pitrou wrote: >> PS: You can all admire my ASCII-art skills. > > PurePosixPath and PureNTPath looks closer to Path than to PurePath. > >> The ``parent()`` method returns an ancestor of the path:: > > p[:-n] is shorter and looks neater than p.parent(n). Possible the > ``parent()`` method is unnecessary? Sequencing currently operates as an os.listdir, so [:-n] would give the last entry of the folder. Perhaps Path should not have a default iteration, but instead have .children, .parents, .parts, etc. ~Ethan~ From guido at python.org Sun Oct 7 00:00:54 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 15:00:54 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: This is an incredibly important discussion. I would like to contribute despite my limited experience with the various popular options. My own async explorations are limited to the constraints of the App Engine runtime environment, where a rather unique type of reactor is required. I am developing some ideas around separating reactors, futures, and yield-based coroutines, but they take more thinking and probably some experimental coding before I'm ready to write it up in any detail. For a hint on what I'm after, you might read up on monocle (https://github.com/saucelabs/monocle) and my approach to building coroutines on top of Futures (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349). In the mean time I'd like to bring up a few higher-order issues: (1) How importance is it to offer a compatibility path for asyncore? I would have thought that offering an integration path forward for Twisted and Tornado would be more important. (2) We're at a fork in the road here. On the one hand, we could choose to deeply integrate greenlets/gevents into the standard library. (It's not monkey-patching if it's integrated, after all. :-) I'm not sure how this would work for other implementations than CPython, or even how to address CPython on non-x86 architectures. But users seem to like the programming model: write synchronous code, get async operation for free. It's easy to write protocol parsers that way. On the other hand, we could reject this approach: the integration would never be completely smooth, there's the issue of other implementations and architectures, it probably would never work smoothly even for CPython/x86 when 3rd party extension modules are involved. Callback-based APIs don't have these downsides, but they are harder to program; however we can make programming them easier by using yield-based coroutines. Even Twisted offers those (inline callbacks). Before I invest much more time in these ideas I'd like to at least have (2) sorted out. -- --Guido van Rossum (python.org/~guido) From massimo.dipierro at gmail.com Sun Oct 7 00:10:09 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Sat, 6 Oct 2012 17:10:09 -0500 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <81AF2BDB-EFF9-4BCB-9990-F8732FBB08F3@gmail.com> I would strongly support integrating gevents into the standard library. That would finally make me switch to Python 3. :-) On Oct 6, 2012, at 5:00 PM, Guido van Rossum wrote: > This is an incredibly important discussion. > > I would like to contribute despite my limited experience with the > various popular options. My own async explorations are limited to the > constraints of the App Engine runtime environment, where a rather > unique type of reactor is required. I am developing some ideas around > separating reactors, futures, and yield-based coroutines, but they > take more thinking and probably some experimental coding before I'm > ready to write it up in any detail. For a hint on what I'm after, you > might read up on monocle (https://github.com/saucelabs/monocle) and my > approach to building coroutines on top of Futures > (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349). > > In the mean time I'd like to bring up a few higher-order issues: > > (1) How importance is it to offer a compatibility path for asyncore? I > would have thought that offering an integration path forward for > Twisted and Tornado would be more important. > > (2) We're at a fork in the road here. On the one hand, we could choose > to deeply integrate greenlets/gevents into the standard library. (It's > not monkey-patching if it's integrated, after all. :-) I'm not sure > how this would work for other implementations than CPython, or even > how to address CPython on non-x86 architectures. But users seem to > like the programming model: write synchronous code, get async > operation for free. It's easy to write protocol parsers that way. On > the other hand, we could reject this approach: the integration would > never be completely smooth, there's the issue of other implementations > and architectures, it probably would never work smoothly even for > CPython/x86 when 3rd party extension modules are involved. > Callback-based APIs don't have these downsides, but they are harder to > program; however we can make programming them easier by using > yield-based coroutines. Even Twisted offers those (inline callbacks). > > Before I invest much more time in these ideas I'd like to at least > have (2) sorted out. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From solipsis at pitrou.net Sun Oct 7 00:24:02 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 00:24:02 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <20121007002402.43472817@pitrou.net> On Sat, 6 Oct 2012 15:00:54 -0700 Guido van Rossum wrote: > > (2) We're at a fork in the road here. On the one hand, we could choose > to deeply integrate greenlets/gevents into the standard library. (It's > not monkey-patching if it's integrated, after all. :-) I'm not sure > how this would work for other implementations than CPython, or even > how to address CPython on non-x86 architectures. But users seem to > like the programming model: write synchronous code, get async > operation for free. It's easy to write protocol parsers that way. On > the other hand, we could reject this approach: the integration would > never be completely smooth, there's the issue of other implementations > and architectures, it probably would never work smoothly even for > CPython/x86 when 3rd party extension modules are involved. > Callback-based APIs don't have these downsides, but they are harder to > program; however we can make programming them easier by using > yield-based coroutines. Even Twisted offers those (inline callbacks). greenlets/gevents only get you half the advantages of single-threaded "async" programming: they get you scalability in the face of a high number of concurrent connections, but they don't get you the robustness of cooperative multithreading (because it's not obvious when reading the code where the possible thread-switching points are). (I don't actually understand the attraction of gevent, except for extreme situations; threads should be cheap on a decent OS) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From greg.ewing at canterbury.ac.nz Sun Oct 7 00:41:01 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 11:41:01 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006142529.7ea5c8b6@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <50700CBF.8060109@hotpy.org> <20121006142529.7ea5c8b6@pitrou.net> Message-ID: <5070B37D.7060409@canterbury.ac.nz> Antoine Pitrou wrote: > I didn't choose / at first because I knew this choice would be quite > contentious. However, if there happens to be a strong majority in its > favour, why not. Count me as +1 on / as a path concatenation operator. It's very intuitive, IMO, and it would free up indexing for the purpose of extracting pathname components, which is a more intuitive use for that as well, I think. -- Greg From christian at python.org Sun Oct 7 00:41:24 2012 From: christian at python.org (Christian Heimes) Date: Sun, 07 Oct 2012 00:41:24 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <5070B394.2030609@python.org> Am 05.10.2012 20:25, schrieb Antoine Pitrou: > Hello, > > This PEP is a resurrection of the idea of having object-oriented > filesystem paths in the stdlib. It comes with a general API proposal > as well as a specific implementation (*). The implementation is young > and discussion is quite open. I already gave you my +1 on #python-dev. I've some additional ideas that I like to suggest for pathlib. * Jason Orendorff's path module has some methods that are quite useful for shell and find like script. I especially like the files(pattern=None), dirs(pattern=None) and their recursive counterparts walkfiles() and walkdirs(). They make code like recursively remove all pyc files easy to write: for pyc in path.walkfiles('*.py'): pyc.remove() * I like to see a convenient method to format sizes in SI units (for example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable, multiple of 2). I've some code that would be useful for the task. * Web application often need to know the mimetype of a file. How about a mimetype property that returns the mimetype according to the extension? * Symlink and directory traversal attacks are a constant thread. I like to see a pathlib object that restricts itself an all its offsprings to a directory. Perhaps this can be implemented as a proxy object around a pathlib object? * While we are working on pathlib I like to improve os.listdir() in two ways. The os.listdir() function currently returns a list of file names. This can consume lots of memory for a directory with hundreds of thousands files. How about I implement an iterator version that returns some additional information, too? On Linux and most BSD you can get the file type (d_type, e.g. file, directory, symlink) for free. * Implement "if filename in directory" with os.path.exists(). Christian From josiah.carlson at gmail.com Sun Oct 7 00:44:05 2012 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 6 Oct 2012 15:44:05 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121005220954.6be30804@pitrou.net> References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <20121005220954.6be30804@pitrou.net> Message-ID: On Fri, Oct 5, 2012 at 1:09 PM, Antoine Pitrou wrote: > On Fri, 5 Oct 2012 11:51:21 -0700 > Josiah Carlson > wrote: >> >> My long-term dream (which has been the case for 6+ years, since I >> proposed doing it myself on the python-dev mailing list and was told >> "no") is that whether someone uses urllib2, httplib2, smtpd, requests, >> ftplib, etc., they all have access to high-quality protocol-level >> protocol parsers. > > I'm not sure what you're talking about: what were you told "no" about, > specifically? Your proposal sounds reasonable and (ideally) desirable to > me. I've managed to find the email where I half-way proposed it (though not as pointed as what I posted above): http://mail.python.org/pipermail/python-dev/2004-November/049827.html Phillip J. Eby said in a reply that policy would kill it. My experience at the time told me that policy was a tough nut to crack, and my 24-year old self wasn't confident enough to keep pushing (even though I had the time). Now, my 32-year old self has the confidence and the knowledge to do it (or advise how to do it), but not the time (I'm finishing up my first book, doing a conference tour, running a startup, and preparing for my first child). One of the big reasons why I like and am pushing Giampaolo's ideas (and existing code) is my faith that he *can* and *will* do it, if he says he will. Regards, - Josiah From greg.ewing at canterbury.ac.nz Sun Oct 7 00:49:43 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 11:49:43 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50702D4F.7070202@hotpy.org> References: <20121005202534.5f721292@pitrou.net> <50700CBF.8060109@hotpy.org> <20121006142529.7ea5c8b6@pitrou.net> <50702D4F.7070202@hotpy.org> Message-ID: <5070B587.6060604@canterbury.ac.nz> Mark Shannon wrote: > Actually I did mean the '//' (floor division) operator as it would stand > out more than '/'. -0.97 This would weaken the mnemonic value. We separate paths with single slashes, not double slashes. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 00:57:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 11:57:04 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006154228.0b2a6087@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> Message-ID: <5070B740.2000008@canterbury.ac.nz> Antoine Pitrou wrote: > True, but since we already have the name attribute it stands reasonable > for basename to mean something else than name :-) > Do you have another suggestion? If we have a method for replacing the extension, I don't think we have a strong need a name for "all of the last name except the extension", because usually all you want that for is so you can add a different extension (possibly empty). So I propose to avoid the term "basename" altogether, and just have path.name --> all of the last component path.ext --> the extension path.with_name(foo) -- replaces all of the last component path.with_ext(ext) -- replaces the extension Then if you really want to extract the last component without the extension (which I expect to be a rare requirement), you can do path.with_ext("").name -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 01:01:21 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 12:01:21 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006140652.630794f4@pitrou.net> <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5070B841.5010003@canterbury.ac.nz> Stephen J. Turnbull wrote: > On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is > a Path, not a string) both seem reasonable to me. I don't like the idea of using + as the path concatenation operator, because path + ".c" is an obvious way to add an extension or other suffix to a filename, and it ought to work. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 01:21:01 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 12:21:01 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006190821.02ae50cd@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> Message-ID: <5070BCDD.6030506@canterbury.ac.nz> Antoine Pitrou wrote: > Personally, I cringe everytime I have to type > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two > directories upwards of a given path. Compare, with, say: > >>>>p = Path('/a/b/c/d') >>>>p.parent(2) Or if we allow slicing, p[:-2] -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 01:22:32 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 12:22:32 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <507069C2.7080100@gmx.net> References: <20121005202534.5f721292@pitrou.net> <507069C2.7080100@gmx.net> Message-ID: <5070BD38.8020502@canterbury.ac.nz> Mathias Panzenb?ck wrote: > Would there be something like this: > > >>> prefix.join("some","sub","path") Using a / operator, this would be prefix / "some" / "sub" / "path" -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 01:28:51 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 12:28:51 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> Message-ID: <5070BEB3.2090204@canterbury.ac.nz> Massimo DiPierro wrote: > How about something along this lines: > > class Path(str): > ... > > path = Path('/this/is/an/example.png') > print path[-1] # example.png Unfortunately, if you subclass from str, I don't think it will be feasible to make indexing return pathname components, because code that's treating it as a string will be expecting it to index characters. Similarly you can't make + mean path concatenation -- it must remain ordinary string concatenation. -- Greg From guido at python.org Sun Oct 7 02:23:48 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 17:23:48 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121007002402.43472817@pitrou.net> References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <20121007002402.43472817@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 15:00:54 -0700 > Guido van Rossum wrote: >> >> (2) We're at a fork in the road here. On the one hand, we could choose >> to deeply integrate greenlets/gevents into the standard library. (It's >> not monkey-patching if it's integrated, after all. :-) I'm not sure >> how this would work for other implementations than CPython, or even >> how to address CPython on non-x86 architectures. But users seem to >> like the programming model: write synchronous code, get async >> operation for free. It's easy to write protocol parsers that way. On >> the other hand, we could reject this approach: the integration would >> never be completely smooth, there's the issue of other implementations >> and architectures, it probably would never work smoothly even for >> CPython/x86 when 3rd party extension modules are involved. >> Callback-based APIs don't have these downsides, but they are harder to >> program; however we can make programming them easier by using >> yield-based coroutines. Even Twisted offers those (inline callbacks). > > greenlets/gevents only get you half the advantages of single-threaded > "async" programming: they get you scalability in the face of a high > number of concurrent connections, but they don't get you the robustness > of cooperative multithreading (because it's not obvious when reading > the code where the possible thread-switching points are). I used to think that too, long ago, until I discovered that as you add abstraction layers, cooperative multithreading is untenable -- sooner or later you will lose track of where the threads are switched. > (I don't actually understand the attraction of gevent, except for > extreme situations; threads should be cheap on a decent OS) I think it's the observation that the number of sockets you can realistically have open in a single process or machine is always 1-2 orders of maginuted larger than the number of threads you can have -- and this makes sense since the total amount of memory (kernel and user) to represent a socket is just much smaller than needed for a thread. Just check the configuration limits of your typical Linux kernel if you don't believe me. :-) -- --Guido van Rossum (python.org/~guido) From ben+python at benfinney.id.au Sun Oct 7 02:41:14 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 07 Oct 2012 11:41:14 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: <7wk3v3cdw5.fsf@benfinney.id.au> Antoine Pitrou writes: > >>> p = Path('/home/antoine/pathlib/setup.py') > >>> p.name > 'setup.py' > >>> p.ext > '.py' The term ?extension? is a barnacle from mainframe filesystems where a filename is necessarily divided into exactly two parts, the name and the extension. It doesn't really apply to POSIX filesystems. On filesystems where the user has always been free to have any number of parts in a filename, the closest concept is better referred to by the term ?suffix?:: >>> p.suffix '.py' It may be useful to add an API method to query the *sequence* of suffixes of a filename:: >>> p = Path('/home/antoine/pathlib.tar.gz') >>> p.name 'pathlib.tar.gz' >>> p.suffix '.gz' >>> p.suffixes ['.tar', '.gz'] Thanks for keeping this proposal active, Antoine. -- \ ?In any great organization it is far, far safer to be wrong | `\ with the majority than to be right alone.? ?John Kenneth | _o__) Galbraith, 1989-07-28 | Ben Finney From carlopires at gmail.com Sun Oct 7 02:45:59 2012 From: carlopires at gmail.com (Carlo Pires) Date: Sat, 6 Oct 2012 21:45:59 -0300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: +1000 Can we dream with gevent integrated to standard cpython ? This would be a fantastic path for 3.4 :) And I definitely should move to 3.x. Because for web programming, I just can't think another way to program using python. I'm seeing some people going to other languages where async is more easy like Go (some are trying Erlang). Async is a MUST HAVE for web programming these days... In my experience, I've found that "robustness of cooperative multithreading" come at the price of a code difficult to maintain. And, in single threading it never reach the SMP benefits with easy. Thats why erlang shines... it abstracts the hard work of to maintain the switching under control. Gevent walks the same line: makes the programmer life easier. -- Carlo Pires 2012/10/6 Guido van Rossum > This is an incredibly important discussion. > > I would like to contribute despite my limited experience with the > various popular options. My own async explorations are limited to the > constraints of the App Engine runtime environment, where a rather > unique type of reactor is required. I am developing some ideas around > separating reactors, futures, and yield-based coroutines, but they > take more thinking and probably some experimental coding before I'm > ready to write it up in any detail. For a hint on what I'm after, you > might read up on monocle (https://github.com/saucelabs/monocle) and my > approach to building coroutines on top of Futures > ( > http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349 > ). > > In the mean time I'd like to bring up a few higher-order issues: > > (1) How importance is it to offer a compatibility path for asyncore? I > would have thought that offering an integration path forward for > Twisted and Tornado would be more important. > > (2) We're at a fork in the road here. On the one hand, we could choose > to deeply integrate greenlets/gevents into the standard library. (It's > not monkey-patching if it's integrated, after all. :-) I'm not sure > how this would work for other implementations than CPython, or even > how to address CPython on non-x86 architectures. But users seem to > like the programming model: write synchronous code, get async > operation for free. It's easy to write protocol parsers that way. On > the other hand, we could reject this approach: the integration would > never be completely smooth, there's the issue of other implementations > and architectures, it probably would never work smoothly even for > CPython/x86 when 3rd party extension modules are involved. > Callback-based APIs don't have these downsides, but they are harder to > program; however we can make programming them easier by using > yield-based coroutines. Even Twisted offers those (inline callbacks). > > Before I invest much more time in these ideas I'd like to at least > have (2) sorted out. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sun Oct 7 02:47:56 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sat, 6 Oct 2012 18:47:56 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7wk3v3cdw5.fsf@benfinney.id.au> References: <20121005202534.5f721292@pitrou.net> <7wk3v3cdw5.fsf@benfinney.id.au> Message-ID: On Oct 6, 2012 6:41 PM, "Ben Finney" wrote: > > Antoine Pitrou > writes: > > > >>> p = Path('/home/antoine/pathlib/setup.py') > > >>> p.name > > 'setup.py' > > >>> p.ext > > '.py' > > The term ?extension? is a barnacle from mainframe filesystems where a > filename is necessarily divided into exactly two parts, the name and the > extension. It doesn't really apply to POSIX filesystems. > > On filesystems where the user has always been free to have any number of > parts in a filename, the closest concept is better referred to by the > term ?suffix?:: > > >>> p.suffix > '.py' > > It may be useful to add an API method to query the *sequence* of > suffixes of a filename:: > > >>> p = Path('/home/antoine/pathlib.tar.gz') > >>> p.name > 'pathlib.tar.gz' > >>> p.suffix > '.gz' > >>> p.suffixes > ['.tar', '.gz'] +1 -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 7 03:09:44 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Oct 2012 12:09:44 +1100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: Message-ID: <5070D658.9020300@pearwood.info> On 07/10/12 07:10, Serhiy Storchaka wrote: > As StopIteration now have value, this value is lost when using functions which >works with iterators/generators (map, filter, itertools). Therefore, wrapping >the iterator, which preserved its semantics in versions before 3.3, no longer > preserves it: [...] > Perhaps it would be worth to propagate original exception (or at least it's >value) in functions for which it makes sense. A concrete example would be useful for those who don't know about the (new?) StopIteration.value attribute. I think you are referring to this: py> def myiter(): ... yield 1 ... raise StopIteration("spam") ... py> it = map(lambda x:x, myiter()) py> next(it) 1 py> next(it) Traceback (most recent call last): File "", line 1, in StopIteration The argument given to StopIteration is eaten by map. But this is not *new* to 3.3, it goes back to at least 2.4, so I'm not sure if you are talking about this or something different. -- Steven From ben+python at benfinney.id.au Sun Oct 7 03:13:23 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Sun, 07 Oct 2012 12:13:23 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> <5070B740.2000008@canterbury.ac.nz> Message-ID: <7wfw5rccek.fsf@benfinney.id.au> Greg Ewing writes: > If we have a method for replacing the extension, I don't think > we have a strong need a name for "all of the last name except the > extension", because usually all you want that for is so you can add > a different extension (possibly empty). This is based on the false concept that there is one ?extension? in a filename. On POSIX filesystems, that's just not true; filenames often have several suffixes in sequence, e.g. ?foo.tar.gz? or ?foo.pg.sql?, and each one conveys meaningful intent by whoever named the file. > So I propose to avoid the term "basename" altogether, and just > have > > path.name --> all of the last component > path.ext --> the extension > > path.with_name(foo) -- replaces all of the last component > path.with_ext(ext) -- replaces the extension +1 on avoiding the term ?basename? for anything to do with the concept being discussed here, since it already has a different meaning (?the part of the filename without any leading directory parts?). ?1 on entrenching this false concept of ?the extension? of a filename. -- \ Eccles: ?I'll get [the job] too, you'll see. I'm wearing a | `\ Cambridge tie.? Greenslade: ?What were you doing there?? | _o__) Eccles: ?Buying a tie.? ?The Goon Show, _The Greenslade Story_ | Ben Finney From ethan at stoneleaf.us Sun Oct 7 03:13:17 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sat, 06 Oct 2012 18:13:17 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7wk3v3cdw5.fsf@benfinney.id.au> References: <20121005202534.5f721292@pitrou.net> <7wk3v3cdw5.fsf@benfinney.id.au> Message-ID: <5070D72D.3080008@stoneleaf.us> Ben Finney wrote: > Antoine Pitrou > writes: > >> >>> p = Path('/home/antoine/pathlib/setup.py') >> >>> p.name >> 'setup.py' >> >>> p.ext >> '.py' > > The term ?extension? is a barnacle from mainframe filesystems where a > filename is necessarily divided into exactly two parts, the name and the > extension. It doesn't really apply to POSIX filesystems. > > On filesystems where the user has always been free to have any number of > parts in a filename, the closest concept is better referred to by the > term ?suffix?:: > > >>> p.suffix > '.py' > > It may be useful to add an API method to query the *sequence* of > suffixes of a filename:: > > >>> p = Path('/home/antoine/pathlib.tar.gz') > >>> p.name > 'pathlib.tar.gz' > >>> p.suffix > '.gz' > >>> p.suffixes > ['.tar', '.gz'] +1 From steve at pearwood.info Sun Oct 7 03:36:30 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Oct 2012 12:36:30 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5070B394.2030609@python.org> References: <20121005202534.5f721292@pitrou.net> <5070B394.2030609@python.org> Message-ID: <5070DC9E.6080603@pearwood.info> On 07/10/12 09:41, Christian Heimes wrote: > * Jason Orendorff's path module has some methods that are quite useful > for shell and find like script. I especially like the > files(pattern=None), dirs(pattern=None) and their recursive counterparts > walkfiles() and walkdirs(). They make code like recursively remove all > pyc files easy to write: > > for pyc in path.walkfiles('*.py'): > pyc.remove() Ouch! My source code!!! *grin* > * I like to see a convenient method to format sizes in SI units (for > example 1.2 MB, 5 GB) and non SI units (MiB, GiB, aka human readable, > multiple of 2). I've some code that would be useful for the task. So do I. http://pypi.python.org/pypi/byteformat Although it's only listed as an "alpha" package, that's just me being conservative about allowing changes to the API. The code is actually fairly mature. If there is interest in having this in the standard library, I am more than happy to target 3.4 and commit to maintaining it. -- Steven From steve at pearwood.info Sun Oct 7 03:41:44 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Oct 2012 12:41:44 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006190821.02ae50cd@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> Message-ID: <5070DDD8.30401@pearwood.info> On 07/10/12 04:08, Antoine Pitrou wrote: > Personally, I cringe everytime I have to type > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two > directories upwards of a given path. Compare, with, say: I would cringe too if I did that, because it goes THREE directories up, not two: py> path = '/a/b/c/d' py> os.path.dirname(os.path.dirname(os.path.dirname(path))) '/a' :) >>>> p = Path('/a/b/c/d') >>>> p.parent(2) > PosixPath('/a/b') You know, I don't think I've ever needed to call dirname more than once at a time, but if I was using it a lot: parent = os.path.dirname parent(parent(parent(p)) which is not as short as p.parent(3), but it's still pretty clear. -- Steven From guido at python.org Sun Oct 7 03:45:54 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 18:45:54 -0700 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: <5070D658.9020300@pearwood.info> References: <5070D658.9020300@pearwood.info> Message-ID: On Sat, Oct 6, 2012 at 6:09 PM, Steven D'Aprano wrote: > A concrete example would be useful for those who don't know about the (new?) > StopIteration.value attribute. I think you are referring to this: > > py> def myiter(): > ... yield 1 > ... raise StopIteration("spam") > ... > py> it = map(lambda x:x, myiter()) > py> next(it) > 1 > py> next(it) > Traceback (most recent call last): > File "", line 1, in > StopIteration > > The argument given to StopIteration is eaten by map. > > But this is not *new* to 3.3, it goes back to at least 2.4, so I'm > not sure if you are talking about this or something different. What's new in 3.3 (due to PEP 380) is that instead of the rather awkward and uncommon raise StopIteration("spam") you can now write return "spam" with exactly the same effect. But yes, this was all considered and accepted when PEP 380 was debated (endlessly :-), and I see no reason to change anything about this. "Don't do that" is the best I can say about it -- there are a zillion other situations in Python where that's the only sensible motto. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sun Oct 7 04:11:54 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 15:11:54 +1300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: <5070D658.9020300@pearwood.info> References: <5070D658.9020300@pearwood.info> Message-ID: <5070E4EA.5010904@canterbury.ac.nz> Steven D'Aprano wrote: > py> def myiter(): > ... yield 1 > ... raise StopIteration("spam") > ... > py> it = map(lambda x:x, myiter()) > py> next(it) > 1 > py> next(it) > Traceback (most recent call last): > File "", line 1, in > StopIteration > > The argument given to StopIteration is eaten by map. It's highly debatable whether this is even wrong. The purpose of StopIteration(value) is for a generator to return a value to its immediate caller when invoked using yield-from. The value is not intended to propagate any further than that. A non-iterator analogy would be def f(): return 42 def g(): f() Would you expect g() to return 42 here? -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 7 04:19:44 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 07 Oct 2012 15:19:44 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7wfw5rccek.fsf@benfinney.id.au> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006140924.3bfdb710@pitrou.net> <20121006122642.GA15492@iskra.aviel.ru> <20121006124049.GC16843@iskra.aviel.ru> <20121006144635.34f21f84@pitrou.net> <20121006145744.19e3789c@pitrou.net> <20121006154228.0b2a6087@pitrou.net> <5070B740.2000008@canterbury.ac.nz> <7wfw5rccek.fsf@benfinney.id.au> Message-ID: <5070E6C0.3080509@canterbury.ac.nz> Ben Finney wrote: > filenames often > have several suffixes in sequence, e.g. ?foo.tar.gz? or ?foo.pg.sql?, > and each one conveys meaningful intent by whoever named the file. When I talk about "the extension", I mean the last one. The vast majority of the time, that's all you're interested in -- you unwrap one layer of the onion at a time, and leave the rest for the next layer of software up. That's not always true, but it's true often enough that I think it's worth having special APIs for dealing with the last dot-suffix. -- Greg From josiah.carlson at gmail.com Sun Oct 7 04:22:26 2012 From: josiah.carlson at gmail.com (Josiah Carlson) Date: Sat, 6 Oct 2012 19:22:26 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum wrote: > This is an incredibly important discussion. > > I would like to contribute despite my limited experience with the > various popular options. My own async explorations are limited to the > constraints of the App Engine runtime environment, where a rather > unique type of reactor is required. I am developing some ideas around > separating reactors, futures, and yield-based coroutines, but they > take more thinking and probably some experimental coding before I'm > ready to write it up in any detail. For a hint on what I'm after, you > might read up on monocle (https://github.com/saucelabs/monocle) and my > approach to building coroutines on top of Futures > (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349). Yield-based coroutines like monocle are the simplest way to do multi-paradigm in the same code. Whether you have a async-style reactor, greenlet-style stack switching, cooperatively scheduled generator trampolines, or just plain blocking threaded sockets; that style works with all of them (the futures and wrapper around everything just looks a little different). That said, it forces everyone to drink the same coroutine-styled kool-aid. That doesn't bother me. But I understand it, and have built similar systems before. I don't have an intuition about whether 3rd parties will like it or will migrate to it. Someone want to ping the Twisted and Tornado folks about it? > In the mean time I'd like to bring up a few higher-order issues: > > (1) How importance is it to offer a compatibility path for asyncore? I > would have thought that offering an integration path forward for > Twisted and Tornado would be more important. > > (2) We're at a fork in the road here. On the one hand, we could choose > to deeply integrate greenlets/gevents into the standard library. (It's > not monkey-patching if it's integrated, after all. :-) I'm not sure > how this would work for other implementations than CPython, or even > how to address CPython on non-x86 architectures. But users seem to > like the programming model: write synchronous code, get async > operation for free. It's easy to write protocol parsers that way. On > the other hand, we could reject this approach: the integration would > never be completely smooth, there's the issue of other implementations > and architectures, it probably would never work smoothly even for > CPython/x86 when 3rd party extension modules are involved. > Callback-based APIs don't have these downsides, but they are harder to > program; however we can make programming them easier by using > yield-based coroutines. Even Twisted offers those (inline callbacks). > > Before I invest much more time in these ideas I'd like to at least > have (2) sorted out. Combining your responses to #1 and now this, are you proposing a path forward for Twisted/Tornado to be greenlets? That's an interesting approach to the problem, though I can see the draw. ;) I have been hesitant on the Twisted side of things for an arbitrarily selfish reason. After 2-3 hours of reading over a codebase (which I've done 5 or 6 times in the last 8 years), I ask myself whether I believe I understand 80+% of how things work; how data flows, how callbacks/layers are invoked, and whether I could add a piece of arbitrary functionality to one layer or another (or to determine the proper layer in which to add the functionality). If my answer is "no", then my gut says "this is probably a bad idea". But if I start figuring out the layers before I've finished my 2-3 hours, and I start finding bugs? Well, then I think it's a much better idea, even if the implementation is buggy. Maybe something like Monocle would be better (considering your favor for that style, it obviously has a leg-up on the competition). I don't know. But if something like Monocle can merge it all together, then maybe I'd be happy. Incidentally, I can think of a few different styles of wrappers that would actually let people using asyncore-derived stuff use something like Monocle. So maybe that's really the right answer? Regards, - Josiah P.S. Thank you for weighing in on this Guido. Even if it doesn't end up the way I had originally hoped, at least now there's discussion. From guido at python.org Sun Oct 7 06:05:13 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 21:05:13 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson wrote: > On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum wrote: >> This is an incredibly important discussion. >> >> I would like to contribute despite my limited experience with the >> various popular options. My own async explorations are limited to the >> constraints of the App Engine runtime environment, where a rather >> unique type of reactor is required. I am developing some ideas around >> separating reactors, futures, and yield-based coroutines, but they >> take more thinking and probably some experimental coding before I'm >> ready to write it up in any detail. For a hint on what I'm after, you >> might read up on monocle (https://github.com/saucelabs/monocle) and my >> approach to building coroutines on top of Futures >> (http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349). > > Yield-based coroutines like monocle are the simplest way to do > multi-paradigm in the same code. Whether you have a async-style > reactor, greenlet-style stack switching, cooperatively scheduled > generator trampolines, or just plain blocking threaded sockets; that > style works with all of them (the futures and wrapper around > everything just looks a little different). Glad I'm not completely crazy here. :-) > That said, it forces everyone to drink the same coroutine-styled > kool-aid. That doesn't bother me. But I understand it, and have built > similar systems before. I don't have an intuition about whether 3rd > parties will like it or will migrate to it. Someone want to ping the > Twisted and Tornado folks about it? They should be reading this. Or maybe we should bring it up on python-dev before too long. >> In the mean time I'd like to bring up a few higher-order issues: >> >> (1) How importance is it to offer a compatibility path for asyncore? I >> would have thought that offering an integration path forward for >> Twisted and Tornado would be more important. >> >> (2) We're at a fork in the road here. On the one hand, we could choose >> to deeply integrate greenlets/gevents into the standard library. (It's >> not monkey-patching if it's integrated, after all. :-) I'm not sure >> how this would work for other implementations than CPython, or even >> how to address CPython on non-x86 architectures. But users seem to >> like the programming model: write synchronous code, get async >> operation for free. It's easy to write protocol parsers that way. On >> the other hand, we could reject this approach: the integration would >> never be completely smooth, there's the issue of other implementations >> and architectures, it probably would never work smoothly even for >> CPython/x86 when 3rd party extension modules are involved. >> Callback-based APIs don't have these downsides, but they are harder to >> program; however we can make programming them easier by using >> yield-based coroutines. Even Twisted offers those (inline callbacks). >> >> Before I invest much more time in these ideas I'd like to at least >> have (2) sorted out. > > Combining your responses to #1 and now this, are you proposing a path > forward for Twisted/Tornado to be greenlets? That's an interesting > approach to the problem, though I can see the draw. ;) Can't tell whether you're serious, but that's not what I meant. Surely it will never fly for Twisted. Tornado apparently already works with greenlets (though maybe through a third party hack). But personally I'd be leaning towards rejecting greenlets, for the same reasons I've kept the doors tightly shut for Stackless -- I like it fine as a library, but not as a language feature, because I don't see how it can be supported on all platforms where Python must be supported. However I figured that if we define the interfaces well enough, it might be possible to use (a superficially modified version of) Twisted's reactors instead of the standard ones, and, orthogonally, Twisted's deferred's could be wrapped in the standard Futures (or the other way around?) when used with a non-Twisted reactor. Which would hopefully open the door for migrating some of their more useful protocol parsers into the stdlib. > I have been hesitant on the Twisted side of things for an arbitrarily > selfish reason. After 2-3 hours of reading over a codebase (which I've > done 5 or 6 times in the last 8 years), I ask myself whether I believe > I understand 80+% of how things work; how data flows, how > callbacks/layers are invoked, and whether I could add a piece of > arbitrary functionality to one layer or another (or to determine the > proper layer in which to add the functionality). If my answer is "no", > then my gut says "this is probably a bad idea". But if I start > figuring out the layers before I've finished my 2-3 hours, and I start > finding bugs? Well, then I think it's a much better idea, even if the > implementation is buggy. Can't figure what you're implying here. On which side does Twisted fall for you? > Maybe something like Monocle would be better (considering your favor > for that style, it obviously has a leg-up on the competition). I don't > know. But if something like Monocle can merge it all together, then > maybe I'd be happy. My worry is that monocle is too simple and does not cater for advanced needs. It doesn't seem to have caught on much outside the company where it originated. > Incidentally, I can think of a few different > styles of wrappers that would actually let people using > asyncore-derived stuff use something like Monocle. So maybe that's > really the right answer? I still don't really think asyncore is going to be a problem. It can easily be separated into a reactor and callbacks. > Regards, > - Josiah > > P.S. Thank you for weighing in on this Guido. Even if it doesn't end > up the way I had originally hoped, at least now there's discussion. Hm, there seemed to be plenty of discussion before... -- --Guido van Rossum (python.org/~guido) From oubiwann at twistedmatrix.com Sun Oct 7 06:17:23 2012 From: oubiwann at twistedmatrix.com (Duncan McGreggor) Date: Sat, 6 Oct 2012 21:17:23 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Sat, Oct 6, 2012 at 9:05 PM, Guido van Rossum wrote: > On Sat, Oct 6, 2012 at 7:22 PM, Josiah Carlson > wrote: > > On Sat, Oct 6, 2012 at 3:00 PM, Guido van Rossum > wrote: > >> This is an incredibly important discussion. > >> > >> I would like to contribute despite my limited experience with the > >> various popular options. My own async explorations are limited to the > >> constraints of the App Engine runtime environment, where a rather > >> unique type of reactor is required. I am developing some ideas around > >> separating reactors, futures, and yield-based coroutines, but they > >> take more thinking and probably some experimental coding before I'm > >> ready to write it up in any detail. For a hint on what I'm after, you > >> might read up on monocle (https://github.com/saucelabs/monocle) and my > >> approach to building coroutines on top of Futures > >> ( > http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349 > ). > > > > Yield-based coroutines like monocle are the simplest way to do > > multi-paradigm in the same code. Whether you have a async-style > > reactor, greenlet-style stack switching, cooperatively scheduled > > generator trampolines, or just plain blocking threaded sockets; that > > style works with all of them (the futures and wrapper around > > everything just looks a little different). > > Glad I'm not completely crazy here. :-) > > > That said, it forces everyone to drink the same coroutine-styled > > kool-aid. That doesn't bother me. But I understand it, and have built > > similar systems before. I don't have an intuition about whether 3rd > > parties will like it or will migrate to it. Someone want to ping the > > Twisted and Tornado folks about it? > > They should be reading this. Yup, we are. I've pinged others in the Twisted cabal on this matter, so hopefully you'll be hearing from one or more of us soon... d > Or maybe we should bring it up on > python-dev before too long. > > >> In the mean time I'd like to bring up a few higher-order issues: > >> > >> (1) How importance is it to offer a compatibility path for asyncore? I > >> would have thought that offering an integration path forward for > >> Twisted and Tornado would be more important. > >> > >> (2) We're at a fork in the road here. On the one hand, we could choose > >> to deeply integrate greenlets/gevents into the standard library. (It's > >> not monkey-patching if it's integrated, after all. :-) I'm not sure > >> how this would work for other implementations than CPython, or even > >> how to address CPython on non-x86 architectures. But users seem to > >> like the programming model: write synchronous code, get async > >> operation for free. It's easy to write protocol parsers that way. On > >> the other hand, we could reject this approach: the integration would > >> never be completely smooth, there's the issue of other implementations > >> and architectures, it probably would never work smoothly even for > >> CPython/x86 when 3rd party extension modules are involved. > >> Callback-based APIs don't have these downsides, but they are harder to > >> program; however we can make programming them easier by using > >> yield-based coroutines. Even Twisted offers those (inline callbacks). > >> > >> Before I invest much more time in these ideas I'd like to at least > >> have (2) sorted out. > > > > Combining your responses to #1 and now this, are you proposing a path > > forward for Twisted/Tornado to be greenlets? That's an interesting > > approach to the problem, though I can see the draw. ;) > > Can't tell whether you're serious, but that's not what I meant. Surely > it will never fly for Twisted. Tornado apparently already works with > greenlets (though maybe through a third party hack). But personally > I'd be leaning towards rejecting greenlets, for the same reasons I've > kept the doors tightly shut for Stackless -- I like it fine as a > library, but not as a language feature, because I don't see how it can > be supported on all platforms where Python must be supported. > > However I figured that if we define the interfaces well enough, it > might be possible to use (a superficially modified version of) > Twisted's reactors instead of the standard ones, and, orthogonally, > Twisted's deferred's could be wrapped in the standard Futures (or the > other way around?) when used with a non-Twisted reactor. Which would > hopefully open the door for migrating some of their more useful > protocol parsers into the stdlib. > > > I have been hesitant on the Twisted side of things for an arbitrarily > > selfish reason. After 2-3 hours of reading over a codebase (which I've > > done 5 or 6 times in the last 8 years), I ask myself whether I believe > > I understand 80+% of how things work; how data flows, how > > callbacks/layers are invoked, and whether I could add a piece of > > arbitrary functionality to one layer or another (or to determine the > > proper layer in which to add the functionality). If my answer is "no", > > then my gut says "this is probably a bad idea". But if I start > > figuring out the layers before I've finished my 2-3 hours, and I start > > finding bugs? Well, then I think it's a much better idea, even if the > > implementation is buggy. > > Can't figure what you're implying here. On which side does Twisted fall > for you? > > > Maybe something like Monocle would be better (considering your favor > > for that style, it obviously has a leg-up on the competition). I don't > > know. But if something like Monocle can merge it all together, then > > maybe I'd be happy. > > My worry is that monocle is too simple and does not cater for advanced > needs. It doesn't seem to have caught on much outside the company > where it originated. > > > Incidentally, I can think of a few different > > styles of wrappers that would actually let people using > > asyncore-derived stuff use something like Monocle. So maybe that's > > really the right answer? > > I still don't really think asyncore is going to be a problem. It can > easily be separated into a reactor and callbacks. > > > Regards, > > - Josiah > > > > P.S. Thank you for weighing in on this Guido. Even if it doesn't end > > up the way I had originally hoped, at least now there's discussion. > > Hm, there seemed to be plenty of discussion before... > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Sun Oct 7 06:23:43 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 7 Oct 2012 00:23:43 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum wrote: > However I figured that if we define the interfaces well enough, it > might be possible to use (a superficially modified version of) > Twisted's reactors instead of the standard ones, and, orthogonally, > Twisted's deferred's could be wrapped in the standard Futures (or the > other way around?) when used with a non-Twisted reactor. Which would > hopefully open the door for migrating some of their more useful > protocol parsers into the stdlib. I thought futures were meant for thread and process pools? The blocking methods make them a bad fit for an asynchronous networking toolset. The Twisted folks have discussed integrating futures and Twisted (see also the reply, which has some corrections): http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html -- Devin From steve at pearwood.info Sun Oct 7 06:33:42 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 07 Oct 2012 15:33:42 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005235457.GA7755@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> Message-ID: <50710626.4010300@pearwood.info> On 06/10/12 09:54, Andrew McNabb wrote: > On Sat, Oct 06, 2012 at 08:41:05AM +1000, Steven D'Aprano wrote: >> On 06/10/12 05:53, Andrew McNabb wrote: >> >>> Path concatenation is obviously not a form of division, so it makes >>> little sense to use the division operator for this purpose. >> >> But / is not just a division operator. It is also used for: >> >> * alternatives: "tea and/or coffee, breakfast/lunch/dinner" >> * italic markup: "some apps use /slashes/ for italics" >> * instead of line breaks when quoting poetry >> * abbreviations such as n/a b/w c/o and even w/ (not applicable, >> between, care of, with) >> * date separator > > This is the difference between C++ style operators, where the only thing > that matters is what the operator symbol looks like, and Python style > operators, where an operator symbol is just syntactic sugar. In Python, > the "/" is synonymous with `operator.div` and is defined in terms of the > `__div__` special method. This distinction is why I hate operator > overloading in C++ but like it in Python. I'm afraid that it's a distinction that seems meaningless to me. int + int and str + str are not the same, even though the operator symbol looks the same. Likewise int - int and set - set are not the same even though they use the same operator symbol. Similarly for & and | operators. For what it is worth, when I am writing pseudocode on paper, just playing around with ideas, I often use / to join path components: open(path/name) # pseudo-code sort of thing, so I would be much more comfortable writing either of these: path/"name.txt" path+"name.txt" than path["name.txt"] which looks like it ought to be a lookup, not a constructor. -- Steven From guido at python.org Sun Oct 7 06:35:42 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 6 Oct 2012 21:35:42 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: On Saturday, October 6, 2012, Devin Jeanpierre wrote: > On Sun, Oct 7, 2012 at 12:05 AM, Guido van Rossum > wrote: > > However I figured that if we define the interfaces well enough, it > > might be possible to use (a superficially modified version of) > > Twisted's reactors instead of the standard ones, and, orthogonally, > > Twisted's deferred's could be wrapped in the standard Futures (or the > > other way around?) when used with a non-Twisted reactor. Which would > > hopefully open the door for migrating some of their more useful > > protocol parsers into the stdlib. > > I thought futures were meant for thread and process pools? The > blocking methods make them a bad fit for an asynchronous networking > toolset. The specific Future implementation in the py3k stdlib uses threads and is indeed meant for thread and process pools. But the *concept* of futures works fine in event-based systems, see the link I posted into the NDB sources. I'm not keen on cancellation and threadpools FWIW. > The Twisted folks have discussed integrating futures and Twisted (see > also the reply, which has some corrections): > > http://twistedmatrix.com/pipermail/twisted-python/2011-January/023296.html > > -- Devin > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sun Oct 7 10:36:11 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 07 Oct 2012 17:36:11 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5070B841.5010003@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005215520.19b63efe@pitrou.net> <87d30wq9ji.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006140652.630794f4@pitrou.net> <87ehlb4pvu.fsf@uwakimon.sk.tsukuba.ac.jp> <5070B841.5010003@canterbury.ac.nz> Message-ID: <87fw5qel1g.fsf@uwakimon.sk.tsukuba.ac.jp> Greg Ewing writes: > Stephen J. Turnbull wrote: > > > On the other hand, `p + Path('foo')` and `Path('foo') + p` (where p is > > a Path, not a string) both seem reasonable to me. > > I don't like the idea of using + as the path concatenation > operator, because > > path + ".c" > > is an obvious way to add an extension or other suffix to a > filename, and it ought to work. I don't have a problem with it because I don't append extensions as often as I substitute, and because I don't think of paths as strings. I think of (some) strings as representatives of paths. From stephen at xemacs.org Sun Oct 7 10:40:18 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sun, 07 Oct 2012 17:40:18 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50705A30.9020006@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <506F5371.6080302@stoneleaf.us> <20121006014823.1fc46741@pitrou.net> <20121006023923.62545731@pitrou.net> <87fw5sqbc0.fsf@uwakimon.sk.tsukuba.ac.jp> <50705A30.9020006@stoneleaf.us> Message-ID: <87ehlaekul.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Stephen J. Turnbull wrote: > > Antoine Pitrou writes: > >> Richard Oudkerk wrote: > >>> Maybe p.basename could be shorthand for p.name.split('.')[0]. > >> > >> Wouldn't there be some confusion with os.path.basename: > >> > >>--> os.path.basename('a/b/c.ext') > >> 'c.ext' > > I wouldn't worry too much about this; after all, we are trying to > replace a primitive system with a more advanced, user-friendly one. Please, don't oversell your case. We are *not* replacing POSIX, Antoine is proposing a system that coexists with it. From solipsis at pitrou.net Sun Oct 7 12:09:31 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 12:09:31 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <20121007002402.43472817@pitrou.net> Message-ID: <20121007120931.09c12ec4@pitrou.net> On Sat, 6 Oct 2012 17:23:48 -0700 Guido van Rossum wrote: > On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou wrote: > > On Sat, 6 Oct 2012 15:00:54 -0700 > > Guido van Rossum wrote: > >> > >> (2) We're at a fork in the road here. On the one hand, we could choose > >> to deeply integrate greenlets/gevents into the standard library. (It's > >> not monkey-patching if it's integrated, after all. :-) I'm not sure > >> how this would work for other implementations than CPython, or even > >> how to address CPython on non-x86 architectures. But users seem to > >> like the programming model: write synchronous code, get async > >> operation for free. It's easy to write protocol parsers that way. On > >> the other hand, we could reject this approach: the integration would > >> never be completely smooth, there's the issue of other implementations > >> and architectures, it probably would never work smoothly even for > >> CPython/x86 when 3rd party extension modules are involved. > >> Callback-based APIs don't have these downsides, but they are harder to > >> program; however we can make programming them easier by using > >> yield-based coroutines. Even Twisted offers those (inline callbacks). > > > > greenlets/gevents only get you half the advantages of single-threaded > > "async" programming: they get you scalability in the face of a high > > number of concurrent connections, but they don't get you the robustness > > of cooperative multithreading (because it's not obvious when reading > > the code where the possible thread-switching points are). > > I used to think that too, long ago, until I discovered that as you add > abstraction layers, cooperative multithreading is untenable -- sooner > or later you will lose track of where the threads are switched. Even with an explicit notation like "yield" / "yield from"? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sun Oct 7 12:15:53 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 12:15:53 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <5070DDD8.30401@pearwood.info> Message-ID: <20121007121553.7176e0f4@pitrou.net> On Sun, 07 Oct 2012 12:41:44 +1100 Steven D'Aprano wrote: > On 07/10/12 04:08, Antoine Pitrou wrote: > > > Personally, I cringe everytime I have to type > > `os.path.dirname(os.path.dirname(os.path.dirname(...)))` to go two > > directories upwards of a given path. Compare, with, say: > > I would cringe too if I did that, because it goes THREE directories > up, not two: > > py> path = '/a/b/c/d' > py> os.path.dirname(os.path.dirname(os.path.dirname(path))) > '/a' Not if d is a file, actually (yes, the formulation was a bit ambiguous). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Sun Oct 7 17:04:30 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 08:04:30 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121007120931.09c12ec4@pitrou.net> References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <20121007002402.43472817@pitrou.net> <20121007120931.09c12ec4@pitrou.net> Message-ID: On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 17:23:48 -0700 > Guido van Rossum wrote: >> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou wrote: >> > greenlets/gevents only get you half the advantages of single-threaded >> > "async" programming: they get you scalability in the face of a high >> > number of concurrent connections, but they don't get you the robustness >> > of cooperative multithreading (because it's not obvious when reading >> > the code where the possible thread-switching points are). >> >> I used to think that too, long ago, until I discovered that as you add >> abstraction layers, cooperative multithreading is untenable -- sooner >> or later you will lose track of where the threads are switched. > > Even with an explicit notation like "yield" / "yield from"? If you strictly adhere to using those you should be safe (though distinguishing between the two may prove challenging) -- but in practice it's hard to get everyone and every API to use this style. So you'll have some blocking API calls hidden deep inside what looks like a perfectly innocent call to some helper function. IIUC in Go this is solved by mixing threads and lighter-weight constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the rest of the system continues to make progress by spawning another thread. My own experience with NDB is that it's just too hard to make everyone use the async APIs all the time -- so I gave up and made async APIs an optional feature, offering a blocking and an async version of every API. I didn't start out that way, but once I started writing documentation aimed at unsophisticated users, I realized that it was just too much of an uphill battle to bother. So I think it's better to accept this and deal with it, possibly adding locking primitives into the mix that work well with the rest of the framework. Building a lock out of a tasklet-based (i.e. non-threading) Future class is easy enough. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Oct 7 19:37:35 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 19:37:35 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: <20121007193735.7bb924ac@pitrou.net> On Sat, 6 Oct 2012 10:44:37 -0700 Guido van Rossum wrote: > > But rather than diving right into the syntax, I would like to focus on > some use cases. (Some of this may already be in the PEP, my > apologize.) Some things I care about (based on path manipulations I > remember I've written at some point or another): > > - Distinguishing absolute paths from relative paths; this affects > joining behavior as for os.path.join(). The proposed API does function like os.path.join() in that respect: when joining a relative path to an absolute path, the relative path is simply discarded: >>> p = PurePath('a') >>> q = PurePath('/b') >>> p[q] PurePosixPath('/b') > - Various normal forms that can be used for comparing paths for > equality; there should be a pure normalization as well as an impure > one (like os.path.realpath()). Impure normalization is done with the resolve() method: >>> os.chdir('/etc') >>> Path('ssl/certs').resolve() PosixPath('/etc/pki/tls/certs') (/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system) Pure comparison already obeys case-sensitivity rules as well as the different path separators: >>> PureNTPath('a/b') == PureNTPath('A\\B') True >>> PurePosixPath('a/b') == PurePosixPath('a\\b') False Note the case information isn't lost either: >>> str(PureNTPath('a/b')) 'a\\b' >>> str(PureNTPath('A/B')) 'A\\B' > - An API that encourage Unix lovers to write code that is most likely > also to make sense on Windows. > > - An API that encourages Windows lovers to write code that is most > likely also to make sense on Unix. I agree on these goals, that's why I'm trying to avoid system-specific methods. For example is_reserved() is also defined under Unix, it just always returns False: >>> PurePosixPath('CON').is_reserved() False >>> PureNTPath('CON').is_reserved() True > - Integration with fnmatch (pure) and glob (impure). This is provided indeed, with the match() and glob() methods respectively. > - In addition to stat(), some simple derived operations like > getmtime(), getsize(), islink(). The PEP proposes properties mimicking the stat object attributes: >>> p = Path('setup.py') >>> p.st_size 977 >>> p.st_mtime 1349461817.8768747 And methods to query the file type: >>> p.is_symlink() False >>> p.is_file() True Perhaps the properties / methods mix isn't very consistent. > - Easy checks and manipulations (applying to the basename) like "ends > with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc > extension with .py", "remove trailing ~", "append .tmp", "remove > leading @", and so on. I'll try to reconcile this with Ben Finney's suffix / suffixes proposal. > - Matching on patterns on directory names (e.g. "does not contain a > segment named .hg"). Sequence-like access on the parts property provides this: >>> p = PurePath('foo/.hg/hgrc') >>> '.hg' in p.parts True Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From storchaka at gmail.com Sun Oct 7 20:40:41 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 07 Oct 2012 21:40:41 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: Message-ID: On 06.10.12 23:47, Mike Graham wrote: > Can you provide an example of a time when you want to use such a value > with a generator on which you want to use one of these so I can better > understand why this is necessary? the times I'm familiar with wanting > this value I'd usually be manually stepping through my generator. There are no many uses yet because it's a new feature. Python 3.3 just released. For example see the proposed patch for http://bugs.python.org/issue16009. In general case `yield from` returns such a value. From storchaka at gmail.com Sun Oct 7 21:06:29 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 07 Oct 2012 22:06:29 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: <5070E4EA.5010904@canterbury.ac.nz> References: <5070D658.9020300@pearwood.info> <5070E4EA.5010904@canterbury.ac.nz> Message-ID: On 07.10.12 05:11, Greg Ewing wrote: > It's highly debatable whether this is even wrong. The purpose > of StopIteration(value) is for a generator to return a value > to its immediate caller when invoked using yield-from. The > value is not intended to propagate any further than that. If immediate caller can propagate generated values with the help of "yield from", why it can not propagate returned from "yield from" value? > A non-iterator analogy would be > > def f(): > return 42 > > def g(): > f() No, a non-iterator analogy would be g = functools.partial(f) or g = functools.lru_cache()(f) I expect g() to return 42 here. And it will be expected and useful if yield from itertools.chain([prefix], iterator) will return the same value as yield from iterator Now chain equivalent to: def chain(*iterables): for it in iterables: yield from it I propose make it equivalent to: def chain(*iterables): value = None for it in iterables: value = yield from it return value From shibturn at gmail.com Sun Oct 7 21:18:37 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sun, 07 Oct 2012 20:18:37 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <5070E4EA.5010904@canterbury.ac.nz> Message-ID: On 07/10/2012 8:06pm, Serhiy Storchaka wrote: > I propose make it equivalent to: > > def chain(*iterables): > value = None > for it in iterables: > value = yield from it > return value That means that all but the last return value is ignored. Why is the last return value any more important than the earlier ones? ISTM it would make just as much sense to do def chain(*iterables): values = [] for it in iterables: values.append(yield from it) return values But I don't see any point in changing the current behaviour. Richard From storchaka at gmail.com Sun Oct 7 21:30:16 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sun, 07 Oct 2012 22:30:16 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 07.10.12 04:45, Guido van Rossum wrote: > But yes, this was all considered and accepted when PEP 380 was debated > (endlessly :-), and I see no reason to change anything about this. The reason is that when someone uses StopIteration.value for some purposes, he will lose this value if the iterator will be wrapped into itertools.chain (quite often used technique) or into other standard iterator wrapper. > "Don't do that" is the best I can say about it -- there are a zillion > other situations in Python where that's the only sensible motto. The problem is that two different authors can use two legal techniques (using values returned by "yield from" and wrap iterators with itertools.chain) which do not work in combination. The conflict easily solved if instead of standard itertools.chain to use handwriten code. It looks as bug in itertools.chain. From rndblnch at gmail.com Sun Oct 7 21:37:41 2012 From: rndblnch at gmail.com (rndblnch) Date: Sun, 7 Oct 2012 19:37:41 +0000 (UTC) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: Antoine Pitrou writes: > PS: You can all admire my ASCII-art skills. but you got the direction of the "is a" arrows wrong. see http://en.wikipedia.org/wiki/Class_diagram#Generalization renaud From guido at python.org Sun Oct 7 22:19:11 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 13:19:11 -0700 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka wrote: > On 07.10.12 04:45, Guido van Rossum wrote: >> >> But yes, this was all considered and accepted when PEP 380 was debated >> (endlessly :-), and I see no reason to change anything about this. > > The reason is that when someone uses StopIteration.value for some purposes, > he will lose this value if the iterator will be wrapped into itertools.chain > (quite often used technique) or into other standard iterator wrapper. If this is just about iterator.chain() I may see some value in it (but TBH the discussion so far mostly confuses -- please spend some more time coming up with good examples that show actually useful use cases rather than f() and g() or foo() and bar()) OTOH yield from is not primarily for iterators -- it is for coroutines. I suspect most of the itertools functionality just doesn't work with coroutines. >> "Don't do that" is the best I can say about it -- there are a zillion >> other situations in Python where that's the only sensible motto. > > The problem is that two different authors can use two legal techniques > (using values returned by "yield from" and wrap iterators with > itertools.chain) which do not work in combination. The conflict easily > solved if instead of standard itertools.chain to use handwriten code. It > looks as bug in itertools.chain. Okay, so please do work out a complete, useful use case. We may yet see the light. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 7 22:24:59 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 13:24:59 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121007193735.7bb924ac@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: On Sun, Oct 7, 2012 at 10:37 AM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 10:44:37 -0700 > Guido van Rossum wrote: >> >> But rather than diving right into the syntax, I would like to focus on >> some use cases. (Some of this may already be in the PEP, my >> apologize.) Some things I care about (based on path manipulations I >> remember I've written at some point or another): >> >> - Distinguishing absolute paths from relative paths; this affects >> joining behavior as for os.path.join(). > > The proposed API does function like os.path.join() in that respect: > when joining a relative path to an absolute path, the relative path is > simply discarded: > >>>> p = PurePath('a') >>>> q = PurePath('/b') >>>> p[q] > PurePosixPath('/b') > >> - Various normal forms that can be used for comparing paths for >> equality; there should be a pure normalization as well as an impure >> one (like os.path.realpath()). > > Impure normalization is done with the resolve() method: > >>>> os.chdir('/etc') >>>> Path('ssl/certs').resolve() > PosixPath('/etc/pki/tls/certs') > > (/etc/ssl/certs being a symlink to /etc/pki/tks/certs on my system) > > Pure comparison already obeys case-sensitivity rules as well as the > different path separators: > >>>> PureNTPath('a/b') == PureNTPath('A\\B') > True >>>> PurePosixPath('a/b') == PurePosixPath('a\\b') > False > > Note the case information isn't lost either: > >>>> str(PureNTPath('a/b')) > 'a\\b' >>>> str(PureNTPath('A/B')) > 'A\\B' > >> - An API that encourage Unix lovers to write code that is most likely >> also to make sense on Windows. >> >> - An API that encourages Windows lovers to write code that is most >> likely also to make sense on Unix. > > I agree on these goals, that's why I'm trying to avoid system-specific > methods. For example is_reserved() is also defined under Unix, it just > always returns False: > >>>> PurePosixPath('CON').is_reserved() > False >>>> PureNTPath('CON').is_reserved() > True > >> - Integration with fnmatch (pure) and glob (impure). > > This is provided indeed, with the match() and glob() methods > respectively. > >> - In addition to stat(), some simple derived operations like >> getmtime(), getsize(), islink(). > > The PEP proposes properties mimicking the stat object attributes: > >>>> p = Path('setup.py') >>>> p.st_size > 977 >>>> p.st_mtime > 1349461817.8768747 > > And methods to query the file type: > >>>> p.is_symlink() > False >>>> p.is_file() > True > > Perhaps the properties / methods mix isn't very consistent. I would warn about caching these results on the path object. I can easily imagine cases where I want to repeatedly call stat() because I'm waiting for a file to change (e.g. tail -f does something like this). I would prefer to have a stat() method that always calls os.stat(), and no caching of the results; the user can cache the stat() return value. (Maybe we can add is_file() etc. as methods on stat() results now they are no longer just tuples?) >> - Easy checks and manipulations (applying to the basename) like "ends >> with .pyc", "starts with foo", "ends with .tar.gz", "replace .pyc >> extension with .py", "remove trailing ~", "append .tmp", "remove >> leading @", and so on. > > I'll try to reconcile this with Ben Finney's suffix / suffixes proposal. > >> - Matching on patterns on directory names (e.g. "does not contain a >> segment named .hg"). > > Sequence-like access on the parts property provides this: > >>>> p = PurePath('foo/.hg/hgrc') >>>> '.hg' in p.parts > True Sounds cool. I will try to refrain from bikeshedding much more on this proposal; I'd rather focus on reactors and futures... -- --Guido van Rossum (python.org/~guido) From andy at insectnation.org Sun Oct 7 22:25:34 2012 From: andy at insectnation.org (Andy Buckley) Date: Sun, 07 Oct 2012 22:25:34 +0200 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5071E53E.4030906@insectnation.org> On 05/10/12 12:26, Stephen J. Turnbull wrote: > Andy Buckley writes: > > > A couple of weeks ago I posted a question on superuser.com > > Maybe it's a bug. (See below.) Have you checked the tracker? Have > you posted to python-list? That's a better place than here to get > that kind of information. > > > As you might have noticed, > > The people on this list (and on python-dev) probably don't pay much > attention to questions on superuser.com, unless they're the kind of > people who hang out on python-list. Hi Stephen -- thanks for the feedback. I know StackExchange sites are not affiliated to the Python project! By "as you might have noticed" I didn't mean to imply that you spend your time scouring all Q&A sites for anything Python-related, but just that if you followed the link I posted you'd probably notice the zero response :) >From searching around before that SuperUser post, and some more afterwards, I couldn't find any reference at all to history-stepping as an available Python interpreter feature, so I was trying to suggest that as a new feature -- not a bug report. Sorry if python-ideas is only for language/stdlib features rather than the standard infrastructure. However, I hadn't remembered when I first posted that I was already making use of a PYTHONSTARTUP script with the readline module to enable some history functionality -- I'd set that up years ago and ported it between systems. So my premise that readline *should* work was not accurate: sorry for the noise. Notably the operate-and-get-next readline function (thanks for the bind -p suggestion) bound to Ctrl-o does not work with Python readline... but I will follow up on that potential bug elsewhere. So one last question, in case it is an acceptable python-ideas topic: how about adding readline-like support by default in the interpreter? But maybe there is a reason for new users to have a more bare-bones, no-history introduction to the language, unless they start with ipython? Thanks again, Andy From mikegraham at gmail.com Sun Oct 7 22:27:48 2012 From: mikegraham at gmail.com (Mike Graham) Date: Sun, 7 Oct 2012 16:27:48 -0400 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <5070E4EA.5010904@canterbury.ac.nz> Message-ID: On Sun, Oct 7, 2012 at 3:06 PM, Serhiy Storchaka wrote: > On 07.10.12 05:11, Greg Ewing wrote: >> >> It's highly debatable whether this is even wrong. The purpose >> of StopIteration(value) is for a generator to return a value >> to its immediate caller when invoked using yield-from. The >> value is not intended to propagate any further than that. > > > If immediate caller can propagate generated values with the help of "yield > from", why it can not propagate returned from "yield from" value? > > >> A non-iterator analogy would be >> >> def f(): >> return 42 >> >> def g(): >> f() > > > No, a non-iterator analogy would be > > g = functools.partial(f) > > or > > g = functools.lru_cache()(f) > > I expect g() to return 42 here. Rather than speaking in analogies, can we be concrete? I can't imagine doing map(f, x) where x is a generator whose return value I cared about. Can you show us a concrete example of something that looks like practical code? Mike From phd at phdru.name Sun Oct 7 22:45:41 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 8 Oct 2012 00:45:41 +0400 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <5071E53E.4030906@insectnation.org> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> Message-ID: <20121007204541.GA30399@iskra.aviel.ru> On Sun, Oct 07, 2012 at 10:25:34PM +0200, Andy Buckley wrote: > Sorry if python-ideas is only for > language/stdlib features rather than the standard infrastructure. readline is a Python module so ideas about it are certainly allowed here. > Notably the operate-and-get-next readline > function (thanks for the bind -p suggestion) bound to Ctrl-o does not > work with Python readline... but I will follow up on that potential bug > elsewhere. You probably need to reread the entire thread because the reason why it does not work with Python was already found and reported. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ubershmekel at gmail.com Sun Oct 7 23:15:38 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 7 Oct 2012 23:15:38 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121007193735.7bb924ac@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 10:44:37 -0700 > Guido van Rossum wrote: > > > > But rather than diving right into the syntax, I would like to focus on > > some use cases. (Some of this may already be in the PEP, my > > apologize.) Some things I care about (based on path manipulations I > > remember I've written at some point or another): > > > > - Distinguishing absolute paths from relative paths; this affects > > joining behavior as for os.path.join(). > > The proposed API does function like os.path.join() in that respect: > when joining a relative path to an absolute path, the relative path is > simply discarded: > > >>> p = PurePath('a') > >>> q = PurePath('/b') > >>> p[q] > PurePosixPath('/b') > > What's the use case for this behavior? I'd much rather if joining an absolute path to a relative one fail and reveal the potential bug.... >>> os.unlink(Path('myproj') / Path('/lib')) Traceback (most recent call last): File "", line 1, in TypeError: absolute path can't be appended to a relative path -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnodel at gmail.com Sun Oct 7 23:43:02 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Sun, 7 Oct 2012 22:43:02 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121007193735.7bb924ac@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: On 7 October 2012 18:37, Antoine Pitrou wrote: > Pure comparison already obeys case-sensitivity rules as well as the > different path separators: > >>>> PureNTPath('a/b') == PureNTPath('A\\B') > True >>>> PurePosixPath('a/b') == PurePosixPath('a\\b') > False Naive question: how do you deal with HFS+, which is case-preserving but on most machines case-insensitive? -- Arnaud From solipsis at pitrou.net Sun Oct 7 23:42:12 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 23:42:12 +0200 Subject: [Python-ideas] PEP 428 - joining References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: <20121007234212.100109cb@pitrou.net> On Sun, 7 Oct 2012 23:15:38 +0200 Yuval Greenfield wrote: > On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou wrote: > > > On Sat, 6 Oct 2012 10:44:37 -0700 > > Guido van Rossum wrote: > > > > > > But rather than diving right into the syntax, I would like to focus on > > > some use cases. (Some of this may already be in the PEP, my > > > apologize.) Some things I care about (based on path manipulations I > > > remember I've written at some point or another): > > > > > > - Distinguishing absolute paths from relative paths; this affects > > > joining behavior as for os.path.join(). > > > > The proposed API does function like os.path.join() in that respect: > > when joining a relative path to an absolute path, the relative path is > > simply discarded: > > > > >>> p = PurePath('a') > > >>> q = PurePath('/b') > > >>> p[q] > > PurePosixPath('/b') > > > > > What's the use case for this behavior? > > I'd much rather if joining an absolute path to a relative one fail and > reveal the potential bug.... > > >>> os.unlink(Path('myproj') / Path('/lib')) > Traceback (most recent call last): > File "", line 1, in > TypeError: absolute path can't be appended to a relative path In all honesty I followed os.path.join's behaviour here. I agree a ValueError (not TypeError) would be sensible too. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Sun Oct 7 23:47:18 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 7 Oct 2012 23:47:18 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: <20121007234718.1831839f@pitrou.net> On Sun, 7 Oct 2012 22:43:02 +0100 Arnaud Delobelle wrote: > On 7 October 2012 18:37, Antoine Pitrou wrote: > > Pure comparison already obeys case-sensitivity rules as well as the > > different path separators: > > > >>>> PureNTPath('a/b') == PureNTPath('A\\B') > > True > >>>> PurePosixPath('a/b') == PurePosixPath('a\\b') > > False > > Naive question: how do you deal with HFS+, which is case-preserving > but on most machines case-insensitive? I don't know. How does os.path deal with it? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From g.brandl at gmx.net Mon Oct 8 00:11:26 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 08 Oct 2012 00:11:26 +0200 Subject: [Python-ideas] PEP 428 - joining In-Reply-To: <20121007234212.100109cb@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234212.100109cb@pitrou.net> Message-ID: Am 07.10.2012 23:42, schrieb Antoine Pitrou: > On Sun, 7 Oct 2012 23:15:38 +0200 > Yuval Greenfield > wrote: >> On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou wrote: >> >> > On Sat, 6 Oct 2012 10:44:37 -0700 >> > Guido van Rossum wrote: >> > > >> > > But rather than diving right into the syntax, I would like to focus on >> > > some use cases. (Some of this may already be in the PEP, my >> > > apologize.) Some things I care about (based on path manipulations I >> > > remember I've written at some point or another): >> > > >> > > - Distinguishing absolute paths from relative paths; this affects >> > > joining behavior as for os.path.join(). >> > >> > The proposed API does function like os.path.join() in that respect: >> > when joining a relative path to an absolute path, the relative path is >> > simply discarded: >> > >> > >>> p = PurePath('a') >> > >>> q = PurePath('/b') >> > >>> p[q] >> > PurePosixPath('/b') >> > >> > >> What's the use case for this behavior? >> >> I'd much rather if joining an absolute path to a relative one fail and >> reveal the potential bug.... >> >> >>> os.unlink(Path('myproj') / Path('/lib')) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: absolute path can't be appended to a relative path > > In all honesty I followed os.path.join's behaviour here. I agree a > ValueError (not TypeError) would be sensible too. Please no -- this is a very important use case (for os.path.join, at least): resolving a path from config/user/command line that can be given either absolute or relative to a certain directory. Right now it's as simple as join(default, path), and i'd prefer to keep this. There is no bug here, it's working as designed. Georg From python at mrabarnett.plus.com Mon Oct 8 00:29:25 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 07 Oct 2012 23:29:25 +0100 Subject: [Python-ideas] PEP 428 - joining In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234212.100109cb@pitrou.net> Message-ID: <50720245.3070007@mrabarnett.plus.com> On 2012-10-07 23:11, Georg Brandl wrote: > Am 07.10.2012 23:42, schrieb Antoine Pitrou: >> On Sun, 7 Oct 2012 23:15:38 +0200 >> Yuval Greenfield >> wrote: >>> On Sun, Oct 7, 2012 at 7:37 PM, Antoine Pitrou wrote: >>> >>> > On Sat, 6 Oct 2012 10:44:37 -0700 >>> > Guido van Rossum wrote: >>> > > >>> > > But rather than diving right into the syntax, I would like to focus on >>> > > some use cases. (Some of this may already be in the PEP, my >>> > > apologize.) Some things I care about (based on path manipulations I >>> > > remember I've written at some point or another): >>> > > >>> > > - Distinguishing absolute paths from relative paths; this affects >>> > > joining behavior as for os.path.join(). >>> > >>> > The proposed API does function like os.path.join() in that respect: >>> > when joining a relative path to an absolute path, the relative path is >>> > simply discarded: >>> > >>> > >>> p = PurePath('a') >>> > >>> q = PurePath('/b') >>> > >>> p[q] >>> > PurePosixPath('/b') >>> > >>> > >>> What's the use case for this behavior? >>> >>> I'd much rather if joining an absolute path to a relative one fail and >>> reveal the potential bug.... >>> >>> >>> os.unlink(Path('myproj') / Path('/lib')) >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: absolute path can't be appended to a relative path >> >> In all honesty I followed os.path.join's behaviour here. I agree a >> ValueError (not TypeError) would be sensible too. > > Please no -- this is a very important use case (for os.path.join, at least): > resolving a path from config/user/command line that can be given either absolute > or relative to a certain directory. > > Right now it's as simple as join(default, path), and i'd prefer to keep this. > There is no bug here, it's working as designed. > In that use case, wouldn't it be more likely that the default is itself absolute, so it'd be either relative to that absolute path or overriding that absolute path with another absolute path? From greg.ewing at canterbury.ac.nz Mon Oct 8 00:40:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Oct 2012 11:40:04 +1300 Subject: [Python-ideas] PEP 428 - joining In-Reply-To: <50720245.3070007@mrabarnett.plus.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234212.100109cb@pitrou.net> <50720245.3070007@mrabarnett.plus.com> Message-ID: <507204C4.5010902@canterbury.ac.nz> MRAB wrote: > In that use case, wouldn't it be more likely that the default is itself > absolute, Not necessarily -- the default could be something provided on the command line, to be interpreted relative to the current directory. -- Greg From oscar.j.benjamin at gmail.com Mon Oct 8 00:43:15 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 7 Oct 2012 23:43:15 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 7 October 2012 21:19, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka wrote: >> On 07.10.12 04:45, Guido van Rossum wrote: >>> >>> But yes, this was all considered and accepted when PEP 380 was debated >>> (endlessly :-), and I see no reason to change anything about this. >> >> The reason is that when someone uses StopIteration.value for some purposes, >> he will lose this value if the iterator will be wrapped into itertools.chain >> (quite often used technique) or into other standard iterator wrapper. > > If this is just about iterator.chain() I may see some value in it (but > TBH the discussion so far mostly confuses -- please spend some more > time coming up with good examples that show actually useful use cases > rather than f() and g() or foo() and bar()) > > OTOH yield from is not primarily for iterators -- it is for > coroutines. I suspect most of the itertools functionality just doesn't > work with coroutines. I think what Serhiy is saying is that although pep 380 mainly discusses generator functions it has effectively changed the definition of what it means to be an iterator for all iterators: previously an iterator was just something that yielded values but now it also returns a value. Since the meaning of an iterator has changed, functions that work with iterators need to be updated. Before pep 380 filter(lambda x: True, obj) returned an object that was the same kind of iterator as obj (it would yield the same values). Now the "kind of iterator" that obj is depends not only on the values that it yields but also on the value that it returns. Since filter does not pass on the same return value, filter(lambda x: True, obj) is no longer the same kind of iterator as obj. The same considerations apply to many other functions such as map, itertools.groupby, itertools.dropwhile. Cases like itertools.chain and zip are trickier since they each act on multiple underlying iterables. Probably chain should return a tuple of the return values from each of its iterables. This feature was new in Python 3.3 which was released a week ago so it is not widely used but it has uses that are not anything to do with coroutines. As an example of how you could use it, consider parsing a file that can contains #include statements. When the #include statement is encountered we need to insert the contents of the included file. This is easy to do with a recursive generator. The example uses the return value of the generator to keep track of which line is being parsed in relation to the flattened output file: def parse(filename, output_lineno=0): with open(filename) as fin: for input_lineno, line in enumerate(fin): if line.startswith('#include '): subfilename = line.split()[1] output_lineno = yield from parse(subfilename, output_lineno) else: try: yield parse_line(line) except ParseLineError: raise ParseError(filename, input_lineno, output_lineno) output_lineno += 1 return output_lineno When writing code like the above that depends on being able to get the value returned from an iterator, it is no longer possible to freely mix utilities like filter, map, zip, itertools.chain with the iterators returned by parse() as they no longer act as transparent wrappers over the underlying iterators (by not propagating the value attached to StopIteration). Hopefully, I've understood Serhiy and the docs correctly (I don't have access to Python 3.3 right now to test any of this). Oscar From steve at pearwood.info Mon Oct 8 00:47:37 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 08 Oct 2012 09:47:37 +1100 Subject: [Python-ideas] Issue 8492 [was Re: [Python-dev] History stepping in interactive session?] In-Reply-To: <506EEDA7.9000108@pearwood.info> References: <506EA800.1080106@insectnation.org> <20121005140927.759293ed@pitrou.net> <506EEDA7.9000108@pearwood.info> Message-ID: <50720689.4020105@pearwood.info> Over on python-ideas, a question about readline was raised and, I think, resolved. But while investigating the question, it became obvious to me that the ability to inspect the current readline bindings from Python was both useful and important. I wrote: > I don't believe that there is any direct mechanism for querying the current > readline bindings in Python, But it was requested some time ago: http://bugs.python.org/issue8492 Is there anyone willing and able to give this issue some attention please? (Replies to python-dev only please.) -- Steven From greg.ewing at canterbury.ac.nz Mon Oct 8 00:55:26 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Oct 2012 11:55:26 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121007234718.1831839f@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234718.1831839f@pitrou.net> Message-ID: <5072085E.3050902@canterbury.ac.nz> Antoine Pitrou wrote: > On Sun, 7 Oct 2012 22:43:02 +0100 > Arnaud Delobelle > wrote: > >>Naive question: how do you deal with HFS+, which is case-preserving >>but on most machines case-insensitive? > > I don't know. How does os.path deal with it? Not all that well, apparently. From the docs for os.path: os.path.normcase(path) Normalize the case of a pathname. On Unix and Mac OS X, this returns the path unchanged; on case-insensitive filesystems, it converts the path to lowercase. On Windows, it also converts forward slashes to backward slashes. This is partially self-contradictory, since many MacOSX filesystems are actually case-insensitive; it depends on the particular filesystem concerned. Worse, different parts of the same path can have different case sensitivities. Also, with network file systems, not all paths are necessarily case-insensitive on Windows. So there's really no certain way to compare pure paths for equality. Basing it on which OS is running your code is no more than a guess. -- Greg From greg.ewing at canterbury.ac.nz Mon Oct 8 01:30:29 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Oct 2012 12:30:29 +1300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: <50721095.1000800@canterbury.ac.nz> Oscar Benjamin wrote: > Before pep 380 filter(lambda x: True, obj) returned an object that was > the same kind of iterator as obj (it would yield the same values). Now > the "kind of iterator" that obj is depends not only on the values that > it yields but also on the value that it returns. Since filter does not > pass on the same return value, filter(lambda x: True, obj) is no > longer the same kind of iterator as obj. Something like this has happened before, when the ability to send() values into a generator was added. If you wrap a generator with filter, you likewise don't get the same kind of object -- you don't get the ability to send() things into your filtered generator. So, "provide the same kind of iterator" is not currently part of the contract of these functions. > When writing code like the above that depends on being able to get the > value returned from an iterator, it is no longer possible to freely > mix utilities like filter, map, zip, itertools.chain with the > iterators returned by parse() as they no longer act as transparent > wrappers over the underlying iterators (by not propagating the value > attached to StopIteration). In many cases they *can't* act as transparent wrappers with respect to the return value, because there is more than one return value to deal with. There's also the added complication that sometimes not all of the sub-iterators are run to completion -- e.g. izip() stops as soon as one of them reaches the end. -- Greg From greg.ewing at canterbury.ac.nz Mon Oct 8 01:36:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 08 Oct 2012 12:36:20 +1300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: <507211F4.8020200@canterbury.ac.nz> Serhiy Storchaka wrote: > The conflict easily > solved if instead of standard itertools.chain to use handwriten code. It > looks as bug in itertools.chain. Don't underestimate the value of handwritten code. It makes the intent clear to the reader, whereas relying on some arbitrary default behaviour of a function doesn't. -- Greg From guido at python.org Mon Oct 8 01:36:20 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 16:36:20 -0700 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin wrote: > On 7 October 2012 21:19, Guido van Rossum wrote: >> On Sun, Oct 7, 2012 at 12:30 PM, Serhiy Storchaka wrote: >>> On 07.10.12 04:45, Guido van Rossum wrote: >>>> >>>> But yes, this was all considered and accepted when PEP 380 was debated >>>> (endlessly :-), and I see no reason to change anything about this. >>> >>> The reason is that when someone uses StopIteration.value for some purposes, >>> he will lose this value if the iterator will be wrapped into itertools.chain >>> (quite often used technique) or into other standard iterator wrapper. >> >> If this is just about iterator.chain() I may see some value in it (but >> TBH the discussion so far mostly confuses -- please spend some more >> time coming up with good examples that show actually useful use cases >> rather than f() and g() or foo() and bar()) >> >> OTOH yield from is not primarily for iterators -- it is for >> coroutines. I suspect most of the itertools functionality just doesn't >> work with coroutines. > > I think what Serhiy is saying is that although pep 380 mainly > discusses generator functions it has effectively changed the > definition of what it means to be an iterator for all iterators: > previously an iterator was just something that yielded values but now > it also returns a value. Since the meaning of an iterator has changed, > functions that work with iterators need to be updated. I think there are different philosophical viewpoints possible on that issue. My own perspective is that there is no change in the definition of iterator -- only in the definition of generator. Note that the *ability* to attach a value to StopIteration is not new at all. > Before pep 380 filter(lambda x: True, obj) returned an object that was > the same kind of iterator as obj (it would yield the same values). Now > the "kind of iterator" that obj is depends not only on the values that > it yields but also on the value that it returns. Since filter does not > pass on the same return value, filter(lambda x: True, obj) is no > longer the same kind of iterator as obj. The same considerations apply > to many other functions such as map, itertools.groupby, > itertools.dropwhile. There are other differences between iterators and generators that are not preserved by the various forms of "iterator algebra" that can be applied -- in particular, non-generator iterators don't support send(). I think it's perfectly valid to view generators as a kind of special iterators with properties that aren't preserved by applying generic iterator operations to them (like itertools or filter()). > Cases like itertools.chain and zip are trickier since they each act on > multiple underlying iterables. Probably chain should return a tuple of > the return values from each of its iterables. That's one possible interpretation, but I doubt it's the most useful one. > This feature was new in Python 3.3 which was released a week ago It's been in alpha/beta/candidate for a long time, and PEP 380 was first discussed in 2009. > so it is not widely used but it has uses that are not anything to do with > coroutines. Yes, as a shortcut for "for x in : yield x". Note that the for-loop ignores the value in the StopIteration -- would you want to change that too? > As an example of how you could use it, consider parsing a > file that can contains #include statements. When the #include > statement is encountered we need to insert the contents of the > included file. This is easy to do with a recursive generator. The > example uses the return value of the generator to keep track of which > line is being parsed in relation to the flattened output file: > > def parse(filename, output_lineno=0): > with open(filename) as fin: > for input_lineno, line in enumerate(fin): > if line.startswith('#include '): > subfilename = line.split()[1] > output_lineno = yield from parse(subfilename, output_lineno) > else: > try: > yield parse_line(line) > except ParseLineError: > raise ParseError(filename, input_lineno, output_lineno) > output_lineno += 1 > return output_lineno Hm. This example looks constructed to prove your point... It would be easier to count the output lines in the caller. Or you could use a class to hold that state. I think it's just a bad habit to start using the return value for this purpose. Please use the same approach as you would before 3.3, using "yield from" just as the shortcut I mentione above. > When writing code like the above that depends on being able to get the > value returned from an iterator, it is no longer possible to freely > mix utilities like filter, map, zip, itertools.chain with the > iterators returned by parse() as they no longer act as transparent > wrappers over the underlying iterators (by not propagating the value > attached to StopIteration). I see that as one more argument for not using the return value here... > Hopefully, I've understood Serhiy and the docs correctly (I don't have > access to Python 3.3 right now to test any of this). I don't doubt it. But I think you're fighting windmills. -- --Guido van Rossum (python.org/~guido) From sven at marnach.net Mon Oct 8 01:43:25 2012 From: sven at marnach.net (Sven Marnach) Date: Mon, 8 Oct 2012 00:43:25 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: <20121007234325.GA20216@bagheera> On Thu, Oct 04, 2012 at 05:08:40PM +0200, Victor Stinner wrote: > I think that the optimization should be implemented for Unicode > strings, but disabled in PyObject_RichCompareBool(). Actually, this change to PyObject_RichCompareBool() has been made before, but was reverted after the discussion in http://bugs.python.org/issue4296 Cheers, Sven From alexander.belopolsky at gmail.com Mon Oct 8 02:35:14 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 20:35:14 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <506D94EE.30808@pearwood.info> References: <506D94EE.30808@pearwood.info> Message-ID: On Thu, Oct 4, 2012 at 9:53 AM, Steven D'Aprano wrote: > (Please do not start an argument about NANs and reflexivity. That's > been argued to death, and there are very good reasons for the IEEE 754 > standard to define NANs the way they do.) Why not? This is python-ideas, isn't it? I've been hearing that IEEE 754 committee had some "very good reasons" to violate reflexivity of equality comparison with NaNs since I first learned about NaNs some 20 years ago. From time to time, I've also heard claims that there are some important numeric algorithms that depend on this behavior. However, I've never been able to dig out the actual rationale that convinced the committee that voted for IEEE 754 or any very good reasons to preserve this behavior in Python. I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs. A reference to IEEE 754 does not help much. Java implements IEEE 754 to some extent, but preserves reflexivity of object equality. From rosuav at gmail.com Mon Oct 8 02:42:28 2012 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 8 Oct 2012 11:42:28 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Mon, Oct 8, 2012 at 11:35 AM, Alexander Belopolsky wrote: > I am not suggesting any language changes, but I think it will be > useful to explain why float('nan') != float('nan') somewhere in the > docs. A reference to IEEE 754 does not help much. Java implements > IEEE 754 to some extent, but preserves reflexivity of object equality. NaN isn't a single value, but a whole category of values. Conceptually, it's an uncountably infinite (I think that's the technical term) of invalid results; in implementation, NaN has the highest possible exponent and any non-zero mantissa. So then the question becomes: Should *all* NaNs be equal, or only ones with the same bit pattern? Aside from signalling vs non-signalling NaNs, I don't think there's any difference between one and another, so they should probably all compare equal. And once you go there, a huge can o'worms is opened involving floating point equality. It's much MUCH easier and simpler to defer to somebody else's standard and just say "NaNs behave according to IEEE 754, blame them if you don't like it". There would possibly be value in guaranteeing reflexivity, but it would increase confusion somewhere else. ChrisA From mikegraham at gmail.com Mon Oct 8 02:43:35 2012 From: mikegraham at gmail.com (Mike Graham) Date: Sun, 7 Oct 2012 20:43:35 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky wrote: > Java implements IEEE 754 to some extent, but preserves reflexivity of object equality. I don't actually know Java, but if I run class HelloNaN { public static void main(String[] args) { double nan1 = 0.0 / 0.0; double nan2 = 0.0 / 0.0; System.out.println(nan1 == nan2); } } I get the output "false". Mike From alexander.belopolsky at gmail.com Mon Oct 8 02:47:36 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 20:47:36 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: Try this with Double instead of double. Note that I said "*object* equality". In Java, lowercase double is not an object type. On Sun, Oct 7, 2012 at 8:43 PM, Mike Graham wrote: > On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky > wrote: >> Java implements IEEE 754 to some extent, but preserves reflexivity of object equality. > > I don't actually know Java, but if I run > > class HelloNaN { > public static void main(String[] args) { > double nan1 = 0.0 / 0.0; > double nan2 = 0.0 / 0.0; > System.out.println(nan1 == nan2); > } > } > > I get the output "false". > > Mike From alexander.belopolsky at gmail.com Mon Oct 8 02:50:01 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 20:50:01 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico wrote: > It's much MUCH easier and simpler to defer to somebody else's standard > and just say "NaNs behave according to IEEE 754, blame them if you > don't like it". There would possibly be value in guaranteeing > reflexivity, but it would increase confusion somewhere else. I agree, but a good thing about standards is that there are plenty to choose from. We can as easily refer to Java as a standard. From guido at python.org Mon Oct 8 02:52:41 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 17:52:41 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <60F4AB4E-5A1F-4980-A462-53A6689145E4@gmail.com> References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <60F4AB4E-5A1F-4980-A462-53A6689145E4@gmail.com> Message-ID: On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor wrote: > We're here ;-) > > I'm forwarding this to the rest of the Twisted cabal... Quick question. I'd like to see how Twisted typically implements a protocol parser. Where would be a good place to start reading example code? -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 02:54:10 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 17:54:10 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 5:50 PM, Alexander Belopolsky wrote: > On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico wrote: >> It's much MUCH easier and simpler to defer to somebody else's standard >> and just say "NaNs behave according to IEEE 754, blame them if you >> don't like it". There would possibly be value in guaranteeing >> reflexivity, but it would increase confusion somewhere else. > > I agree, but a good thing about standards is that there are plenty to > choose from. We can as easily refer to Java as a standard. Very funny. Seriously, we can't change our position on this topic now without making a lot of people seriously unhappy. IEEE 754 it is. -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Mon Oct 8 03:09:08 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 21:09:08 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum wrote: > Seriously, we can't change our position on this topic now without > making a lot of people seriously unhappy. IEEE 754 it is. I did not suggest a change. I wrote: "I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs." If there is a concise explanation for the choice of IEEE 754 vs. Java, I think we should write it down and put an end to this debate. From ben at bendarnell.com Mon Oct 8 03:41:52 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 7 Oct 2012 18:41:52 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit Message-ID: Hi python-ideas, I'm jumping in to this thread on behalf of Tornado. I think there are actually two separate issues here and it's important to keep them distinct: at a low level, there is a need for a standardized event loop, while at a higher level there is a question of what asynchronous code should look like. This thread so far has been more about the latter, but the need for standardization is more acute for the core event loop. I've written a bridge between Tornado and Twisted so libraries written for both event loops can coexist, but obviously that wouldn't scale if there were a proliferation of event loop implementations out there. I'd be in favor of a simple event loop interface in the standard library, with reference implementation(s) (select, epoll, kqueue, iocp) and some means of configuring the global (or thread-local) singleton. My preference is to keep the interface fairly low-level and close to the underlying mechanisms (i.e. like IReactorFDSet instead of IReactor{TCP,UDP,SSL,etc}), so that different interfaces like Tornado's IOStream or Twisted's protocols can be built on top of it. As for the higher-level question of what asynchronous code should look like, there's a lot more room for spirited debate, and I don't think there's enough consensus to declare a One True Way. Personally, I'm -1 on greenlets as a general solution (what if you have to call MySQLdb or getaddrinfo?), although they can be useful in particular cases to convert well-behaved synchronous code into async (as in Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/). I like Futures, though, and I find that they work well in asynchronous code. The use of the result() method to encapsulate both successful responses and exceptions is especially nice with generator coroutines. FWIW, here's the interface I'm moving towards for async code. From the caller's perspective, asynchronous functions return a Future (the future has to be constructed by hand since there is no Executor involved), and also take an optional callback argument (mainly for consistency with currently-prevailing patterns for async code; if the callback is given it is simply added to the Future with add_done_callback). In Tornado the Future is created by a decorator and hidden from the asynchronous function (it just sees the callback), although this relies on some Tornado-specific magic for exception handling. In a coroutine, the decorator recognizes Futures and resumes execution when the future is done. With these decorators asynchronous code looks almost like synchronous code, except for the "yield" keyword before each asynchronous call. -Ben From guido at python.org Mon Oct 8 03:51:51 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 18:51:51 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky wrote: > On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum wrote: >> Seriously, we can't change our position on this topic now without >> making a lot of people seriously unhappy. IEEE 754 it is. > > I did not suggest a change. I wrote: "I am not suggesting any > language changes, but I think it will be > useful to explain why float('nan') != float('nan') somewhere in the > docs." If there is a concise explanation for the choice of IEEE 754 > vs. Java, I think we should write it down and put an end to this > debate. Referencing Java here is absurd and I still consider this suggestion as a troll. Python is not in any way based on Java. On the other hand referencing IEEE 754 makes all the sense in the world, since every other aspect of Python float is based on IEEE 754 double whenever the underlying platform implements this standard -- and all modern CPUs do. I don't think there's anything else we need to say. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 04:01:42 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 19:01:42 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell wrote: > Hi python-ideas, > > I'm jumping in to this thread on behalf of Tornado. Welcome! > I think there are > actually two separate issues here and it's important to keep them > distinct: at a low level, there is a need for a standardized event > loop, while at a higher level there is a question of what asynchronous > code should look like. Yes, yes. I tried to bring up thing distinction. I'm glad I didn't completely fail. > This thread so far has been more about the latter, but the need for > standardization is more acute for the core event loop. I've written a > bridge between Tornado and Twisted so libraries written for both event > loops can coexist, but obviously that wouldn't scale if there were a > proliferation of event loop implementations out there. I'd be in > favor of a simple event loop interface in the standard library, with > reference implementation(s) (select, epoll, kqueue, iocp) and some > means of configuring the global (or thread-local) singleton. My > preference is to keep the interface fairly low-level and close to the > underlying mechanisms (i.e. like IReactorFDSet instead of > IReactor{TCP,UDP,SSL,etc}), so that different interfaces like > Tornado's IOStream or Twisted's protocols can be built on top of it. As long as it's not so low-level that other people shy away from it. I also have a feeling that one way or another this will require cooperation between the Twisted and Tornado developers in order to come up with a compromise that both are willing to conform to in a meaningful way. (Unfortunately I don't know how to define "meaningful way" more precisely here. I guess the idea is that almost all things *using* an event loop use the standardized abstract API without caring whether underneath it's Tornado, Twisted, or some simpler thing in the stdlib. > As for the higher-level question of what asynchronous code should look > like, there's a lot more room for spirited debate, and I don't think > there's enough consensus to declare a One True Way. Personally, I'm > -1 on greenlets as a general solution (what if you have to call > MySQLdb or getaddrinfo?), although they can be useful in particular > cases to convert well-behaved synchronous code into async (as in > Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/). Agreed on both counts. > I like Futures, though, and I find that they work well in > asynchronous code. The use of the result() method to encapsulate both > successful responses and exceptions is especially nice with generator > coroutines. Yay! > FWIW, here's the interface I'm moving towards for async code. From > the caller's perspective, asynchronous functions return a Future (the > future has to be constructed by hand since there is no Executor > involved), Ditto for NDB (though there's a decorator that often takes care of the future construction). > and also take an optional callback argument (mainly for > consistency with currently-prevailing patterns for async code; if the > callback is given it is simply added to the Future with > add_done_callback). That's interesting. I haven't found the need for this yet. Is it really so common that you can't write this as a Future() constructor plus a call to add_done_callback()? Or is there some subtle semantic difference? > In Tornado the Future is created by a decorator > and hidden from the asynchronous function (it just sees the callback), Hm, interesting. NDB goes the other way, the callbacks are mostly used to make Futures work, and most code (including large swaths of internal code) uses Futures. I think NDB is similar to monocle here. In NDB, you can do f = r = yield f where "yield f" is mostly equivalent to f.result(), except it gives better opportunity for concurrency. > although this relies on some Tornado-specific magic for exception > handling. In a coroutine, the decorator recognizes Futures and > resumes execution when the future is done. With these decorators > asynchronous code looks almost like synchronous code, except for the > "yield" keyword before each asynchronous call. Yes! Same here. I am currently trying to understand if using "yield from" (and returning a value from a generator) will simplify things. For example maybe the need for a special decorator might go away. But I keep getting headaches -- perhaps there's a Monad involved. :-) -- --Guido van Rossum (python.org/~guido) From alexander.belopolsky at gmail.com Mon Oct 8 04:33:37 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 22:33:37 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum wrote: > Referencing Java here is absurd and I still consider this suggestion > as a troll. Python is not in any way based on Java. I did not suggest that. Sorry if it came out this way. I am well aware that Python and Java were invented independently and have different roots. (IIRC, Java was born from Oak and Python from ABC and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes both languages and one team decided that equality reflexivity for hashable objects was more important than IEEE 784 compliance while the other decided otherwise. Many Python features (mostly library) are motivated by C. In the 90s, "because C does it this way" was a good explanation for a language feature. Doing things differently from the "C way", on the other hand would deserve an explanation. These days, C is rarely first language that a student learns. Hopefully Python will take this place in not so distant future, but many students graduated in late 90s - early 2000s knowing nothing but Java. As a result, these days it is a valid question to ask about a language feature: "Why does Python do X differently from Java?" Hopefully in most cases the answer is "because Python does it better." In case of nan != nan, I would really like to know a modern reason why Python's way is better. Better compliance with a 20-year old standard does not really qualify. From ned at nedbatchelder.com Mon Oct 8 04:35:17 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Sun, 07 Oct 2012 22:35:17 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: <50723BE5.3060300@nedbatchelder.com> On 10/7/2012 9:51 PM, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky > wrote: >> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum wrote: >>> Seriously, we can't change our position on this topic now without >>> making a lot of people seriously unhappy. IEEE 754 it is. >> I did not suggest a change. I wrote: "I am not suggesting any >> language changes, but I think it will be >> useful to explain why float('nan') != float('nan') somewhere in the >> docs." If there is a concise explanation for the choice of IEEE 754 >> vs. Java, I think we should write it down and put an end to this >> debate. > Referencing Java here is absurd and I still consider this suggestion > as a troll. Python is not in any way based on Java. > > On the other hand referencing IEEE 754 makes all the sense in the > world, since every other aspect of Python float is based on IEEE 754 > double whenever the underlying platform implements this standard -- > and all modern CPUs do. I don't think there's anything else we need to > say. > I don't understand the reluctance to address a common conceptual speed-bump in the docs. After all, the tutorial has an entire chapter (http://docs.python.org/tutorial/floatingpoint.html) that explains how floats work, even though they work exactly as IEEE 754 says they should. A sentence in section 5.4 (Numeric Types) would help. Something like, "In accordance with the IEEE 754 standard, NaN's are not equal to any value, even another NaN. This is because NaN doesn't represent a particular number, it represents an unknown result, and there is no way to know if one unknown result is equal to another unknown result." --Ned. From tjreedy at udel.edu Mon Oct 8 04:40:31 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 07 Oct 2012 22:40:31 -0400 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: <50721095.1000800@canterbury.ac.nz> References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> Message-ID: On 10/7/2012 7:30 PM, Greg Ewing wrote: > Oscar Benjamin wrote: >> Before pep 380 filter(lambda x: True, obj) returned an object that was >> the same kind of iterator as obj (it would yield the same values). Now >> the "kind of iterator" that obj is depends not only on the values that >> it yields but also on the value that it returns. Since filter does not >> pass on the same return value, filter(lambda x: True, obj) is no >> longer the same kind of iterator as obj. > > Something like this has happened before, when the ability to > send() values into a generator was added. If you wrap a > generator with filter, you likewise don't get the same kind > of object -- you don't get the ability to send() things > into your filtered generator. > > So, "provide the same kind of iterator" is not currently part > of the contract of these functions. Iterators are Python's generic sequential access device. They do that one thing and do it well. The iterator protocol is intentionally and properly minimal. An iterator class *must* have appropriate .__iter__ and .__next__ methods. It *may* also have any other method and any data attribute. Indeed, any iterator much have some specific internal data. But these are ignored in generic iterator (or iterable) functions. If one does not want that, one should write more specific code. For instance, file objects are iterators. Wrappers such as filter(lambda line: line[0] != '\n', open('somefile')) do not have any of the many other file methods and attributes. No one expects otherwise. If one needs access to the other attributes of the file object, one keeps a direct reference to the file object. Hence, the recommended idiom is to use a with statement. Generators are another class of objects that are both iterators (and hence iterables) and something more. When they are used as input arguments to generic functions of iterables, the other behaviors are ignored, and should be ignored, just as with file objects and any other iterator+ objects. Serhily, if you want a module of *generator* specific functions ('gentools' ?), you should write one and submit it to pypi for testing. -- Terry Jan Reedy From alexander.belopolsky at gmail.com Mon Oct 8 04:48:53 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 22:48:53 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 10:33 PM, Alexander Belopolsky wrote: > In case of nan != nan, I would really like to know a modern reason why > Python's way is better. To this end, a link to Kahan's "How Java?s Floating-Point Hurts Everyone Everywhere" may be appropriate. From guido at python.org Mon Oct 8 04:49:29 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 7 Oct 2012 19:49:29 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <60F4AB4E-5A1F-4980-A462-53A6689145E4@gmail.com> Message-ID: On Sun, Oct 7, 2012 at 7:16 PM, Duncan McGreggor wrote: > > > On Sun, Oct 7, 2012 at 5:52 PM, Guido van Rossum wrote: >> >> On Sat, Oct 6, 2012 at 9:09 PM, Duncan M. McGreggor >> wrote: >> > We're here ;-) >> > >> > I'm forwarding this to the rest of the Twisted cabal... >> >> Quick question. I'd like to see how Twisted typically implements a >> protocol parser. Where would be a good place to start reading example >> code? > > > I'm not exactly sure what you're looking for (e.g., I'm not sure what your > exact definition of a protocol parser is), but this might be getting close > to what you want: > > * https://github.com/twisted/twisted/blob/master/twisted/mail/pop3.py > * https://github.com/twisted/twisted/blob/master/twisted/protocols/basic.py > > The POP3 protocol implementation in Twisted is a pretty good example of how > one should create a protocol. It's a subclass of the > twisted.protocol.basic.LineOnlyReceiver, and I'm guessing when you said > "parsing" you're wanting to look at what's in the dataReceived method of > that class. > > Hopefully that's what you were after... Yes, those are perfect. The term I used came from one of Josiah's previous messages in this thread, but I think he really meant protocol handler. My current goal is to see if it would be possible to come up with an abstraction that makes it possible to write protocol handlers that are independent from the rest of the infrastructure (e.g. transport, reactor). I honestly have no idea if this is a sane idea but I'm going to look into it anyway; if it works it would be cool to be able to reuse the same POP3 logic in different environments (e.g. synchronous thread-based, Twisted) without having to pul in all of Twisted. I.e. Twisted could contribute the code to the stdlib and the stdlib could make it work with SocketServer but Twisted could still use it (assuming Twisted ever gets ported to Py3k :-). -- --Guido van Rossum (python.org/~guido) From rob.cliffe at btinternet.com Mon Oct 8 05:09:06 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Mon, 08 Oct 2012 04:09:06 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50723BE5.3060300@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> Message-ID: <507243D2.8000505@btinternet.com> On 08/10/2012 03:35, Ned Batchelder wrote: > On 10/7/2012 9:51 PM, Guido van Rossum wrote: >> On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky >> wrote: >>> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum >>> wrote: >>>> Seriously, we can't change our position on this topic now without >>>> making a lot of people seriously unhappy. IEEE 754 it is. >>> I did not suggest a change. I wrote: "I am not suggesting any >>> language changes, but I think it will be >>> useful to explain why float('nan') != float('nan') somewhere in the >>> docs." If there is a concise explanation for the choice of IEEE 754 >>> vs. Java, I think we should write it down and put an end to this >>> debate. >> Referencing Java here is absurd and I still consider this suggestion >> as a troll. Python is not in any way based on Java. >> >> On the other hand referencing IEEE 754 makes all the sense in the >> world, since every other aspect of Python float is based on IEEE 754 >> double whenever the underlying platform implements this standard -- >> and all modern CPUs do. I don't think there's anything else we need to >> say. >> > I don't understand the reluctance to address a common conceptual > speed-bump in the docs. After all, the tutorial has an entire chapter > (http://docs.python.org/tutorial/floatingpoint.html) that explains how > floats work, even though they work exactly as IEEE 754 says they should. > > A sentence in section 5.4 (Numeric Types) would help. Something like, > "In accordance with the IEEE 754 standard, NaN's are not equal to any > value, even another NaN. This is because NaN doesn't represent a > particular number, it represents an unknown result, and there is no > way to know if one unknown result is equal to another unknown result." > > --Ned. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > I understand that the undefined result of a computation is not the same as the undefined result of another computation. (E.g. one might represent positive infinity, another might represent underflow or loss of accuracy.) But I can't help feeling (strongly) that the result of a computation should be equal to itself. In other words, after x = float('nan') y = float('nan') I would expect x != y but x == x After all, how much sense does this make (I got this in a quick test with Python 2.7.3): >>> x=float('nan') >>> x is x True # Well I guess you'd sorta expect this >>> x==x False # You what? >>> D = {1:x, 2:x} >>> D[1]==D[2] False # I see, both NANs - hmph! >>> [x]==[x] True # Oh yeh, it doesn't always work that way then? Making equality non-reflexive feels utterly wrong to me, partly no doubt because of my mathematical background, partly because of the difficulty in implementing container objects and algorithms and God knows what else when you have to remember that some of the objects they may deal with may not be equal to themselves. In particular the difference between my last two examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by saying that for historical reasons the designers of lists and the designers of dictionaries made different - but entirely reasonable - assumptions about the equality relation, and (perhaps) whether identity implies equality (how do you explain to a Python learner that it doesn't (pathological code examples aside) ???). Couldn't each NAN when generated contain something that identified it uniquely, so that different NANs would always compare as not equal, but any given NAN would compare equal to itself? Rob Cliffe From alexander.belopolsky at gmail.com Mon Oct 8 05:46:43 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Sun, 7 Oct 2012 23:46:43 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <507243D2.8000505@btinternet.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe wrote: > Couldn't each NAN when generated contain something that identified it > uniquely, so that different NANs would always compare as not equal, but any > given NAN would compare equal to itself? If we take this route and try to distinguish NaNs with different payload, I am sure you will want to distinguish between -0.0 and 0.0 as well. The later would violate transitivity in -0.0 == 0 == 0.0. The only sensible thing to do with NaNs is either to treat them all equal (the Eiffel way) or to stick to IEEE default. I don't think NaN behavior in Python is a result of a deliberate decision to implement IEEE 754. If that was the case, why 0.0/0.0 does not produce NaN? Similarly, Python math library does not produce infinities where IEEE 754 compliant library should: >>> math.log(0.0) Traceback (most recent call last): File "", line 1, in ValueError: math domain error Some other operations behave inconsistently: >>> 2 * 10.**308 inf but >>> 10.**309 Traceback (most recent call last): File "", line 1, in OverflowError: (34, 'Result too large') I think non-reflexivity of nan in Python is an accidental feature. Python's float type was not designed with NaN in mind and until recently, it was relatively difficult to create a nan in pure python. It is also not true that IEEE 754 requires that nan == nan is false. IEEE 754 does not define operator '==' (nor does it define boolean false). Instead, IEEE defines a comparison operation that can have one of four results: >, <, =, or unordered. The standard does require than NaN compares unordered with anything including itself, but it does not follow that a language that defines an == operator with boolean results must define it so that nan == nan is false. From ben at bendarnell.com Mon Oct 8 06:44:27 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 7 Oct 2012 21:44:27 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum wrote: > As long as it's not so low-level that other people shy away from it. That depends on the target audience. The low-level IOLoop and Reactor are pretty similar -- you can implement one in terms of the other -- but as you move up the stack cross-compatibility becomes harder. For example, if I wanted to implement tornado's IOStreams in twisted, I wouldn't start with the analogous class in twisted (Protocol?), I'd go down to the Reactor and build from there, so putting something IOStream or Protocol in asycore2 wouldn't do much to unify the two worlds. (it would help people build async stuff with the stdlib alone, but at that point it becomes more like a peer or competitor to tornado and twisted instead of a bridge between them) > > I also have a feeling that one way or another this will require > cooperation between the Twisted and Tornado developers in order to > come up with a compromise that both are willing to conform to in a > meaningful way. (Unfortunately I don't know how to define "meaningful > way" more precisely here. I guess the idea is that almost all things > *using* an event loop use the standardized abstract API without caring > whether underneath it's Tornado, Twisted, or some simpler thing in the > stdlib. I'd phrase the goal as being able to run both Tornado and Twisted in the same thread without any piece of code needing to know about both systems. I think that's achievable as far as core functionality goes. I expect both sides have some lesser-used functionality that might not make it into the stdlib version, but as long as it's possible to plug in a "real" IOLoop or Reactor when needed it should be OK. > >> As for the higher-level question of what asynchronous code should look >> like, there's a lot more room for spirited debate, and I don't think >> there's enough consensus to declare a One True Way. Personally, I'm >> -1 on greenlets as a general solution (what if you have to call >> MySQLdb or getaddrinfo?), although they can be useful in particular >> cases to convert well-behaved synchronous code into async (as in >> Motor: http://emptysquare.net/blog/introducing-motor-an-asynchronous-mongodb-driver-for-python-and-tornado/). > > Agreed on both counts. > >> I like Futures, though, and I find that they work well in >> asynchronous code. The use of the result() method to encapsulate both >> successful responses and exceptions is especially nice with generator >> coroutines. > > Yay! > >> FWIW, here's the interface I'm moving towards for async code. From >> the caller's perspective, asynchronous functions return a Future (the >> future has to be constructed by hand since there is no Executor >> involved), > > Ditto for NDB (though there's a decorator that often takes care of the > future construction). > >> and also take an optional callback argument (mainly for >> consistency with currently-prevailing patterns for async code; if the >> callback is given it is simply added to the Future with >> add_done_callback). > > That's interesting. I haven't found the need for this yet. Is it > really so common that you can't write this as a Future() constructor > plus a call to add_done_callback()? Or is there some subtle semantic > difference? It's a Future constructor, a (conditional) add_done_callback, plus the calls to set_result or set_exception and the with statement for error handling. In full: def future_wrap(f): @functools.wraps(f) def wrapper(*args, **kwargs): future = Future() if kwargs.get('callback') is not None: future.add_done_callback(kwargs.pop('callback')) kwargs['callback'] = future.set_result def handle_error(typ, value, tb): future.set_exception(value) return True with ExceptionStackContext(handle_error): f(*args, **kwargs) return future return wrapper > >> In Tornado the Future is created by a decorator >> and hidden from the asynchronous function (it just sees the callback), > > Hm, interesting. NDB goes the other way, the callbacks are mostly used > to make Futures work, and most code (including large swaths of > internal code) uses Futures. I think NDB is similar to monocle here. > In NDB, you can do > > f = > r = yield f > > where "yield f" is mostly equivalent to f.result(), except it gives > better opportunity for concurrency. Yes, tornado's gen.engine does the same thing here. However, the stakes are higher than "better opportunity for concurrency" - in an event loop if you call future.result() without yielding, you'll deadlock if that Future's task needs to run on the same event loop. > >> although this relies on some Tornado-specific magic for exception >> handling. In a coroutine, the decorator recognizes Futures and >> resumes execution when the future is done. With these decorators >> asynchronous code looks almost like synchronous code, except for the >> "yield" keyword before each asynchronous call. > > Yes! Same here. > > I am currently trying to understand if using "yield from" (and > returning a value from a generator) will simplify things. For example > maybe the need for a special decorator might go away. But I keep > getting headaches -- perhaps there's a Monad involved. :-) I think if you build generator handling directly into the event loop and use "yield from" for calls from one async function to another then you can get by without any decorators. But I'm not sure if you can do that and maintain any compatibility with existing non-generator async code. I think the ability to return from a generator is actually a bigger deal than "yield from" (and I only learned about it from another python-ideas thread today). The only reason a generator decorated with @tornado.gen.engine needs a callback passed in to it is to act as a psuedo-return, and a real return would prevent the common mistake of running the callback then falling through to the rest of the function. For concreteness, here's a crude sketch of what the APIs I'm talking about would look like in use (in a hypothetical future version of tornado). @future_wrap @gen.engine def async_http_client(url, callback): parsed_url = urlparse.urlsplit(url) # works the same whether the future comes from a thread pool or @future_wrap addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) stream = IOStream(socket.socket()) yield stream.connect((addrinfo[0][-1])) stream.write('GET %s HTTP/1.0' % parsed_url.path) header_data = yield stream.read_until('\r\n\r\n') headers = parse_headers(header_data) body_data = yield stream.read_bytes(int(headers['Content-Length'])) stream.close() callback(body_data) # another function to demonstrate composability @future_wrap @gen.engine def fetch_some_urls(url1, url2, url3, callback): body1 = yield async_http_client(url1) # yield a list of futures for concurrency future2 = yield async_http_client(url2) future3 = yield async_http_client(url3) body2, body3 = yield [future2, future3] callback((body1, body2, body3)) One hole in this design is how to deal with callbacks that are run multiple times. For example, the IOStream read methods take both a regular callback and an optional streaming_callback (which is called with each chunk of data as it arrives). I think this needs to be modeled as something like an iterator of Futures, but I haven't worked out the details yet. -Ben > > -- > --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Mon Oct 8 08:05:29 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 08 Oct 2012 08:05:29 +0200 Subject: [Python-ideas] PEP 428 - joining In-Reply-To: <50720245.3070007@mrabarnett.plus.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234212.100109cb@pitrou.net> <50720245.3070007@mrabarnett.plus.com> Message-ID: Am 08.10.2012 00:29, schrieb MRAB: >>>> I'd much rather if joining an absolute path to a relative one fail and >>>> reveal the potential bug.... >>>> >>>> >>> os.unlink(Path('myproj') / Path('/lib')) >>>> Traceback (most recent call last): >>>> File "", line 1, in >>>> TypeError: absolute path can't be appended to a relative path >>> >>> In all honesty I followed os.path.join's behaviour here. I agree a >>> ValueError (not TypeError) would be sensible too. >> >> Please no -- this is a very important use case (for os.path.join, at least): >> resolving a path from config/user/command line that can be given either absolute >> or relative to a certain directory. >> >> Right now it's as simple as join(default, path), and i'd prefer to keep this. >> There is no bug here, it's working as designed. >> > In that use case, wouldn't it be more likely that the default is itself > absolute, so it'd be either relative to that absolute path or > overriding that absolute path with another absolute path? That doesn't really matter; the default could be anything (e.g. "." could be a common value). Georg From solipsis at pitrou.net Mon Oct 8 08:26:28 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 08:26:28 +0200 Subject: [Python-ideas] checking for identity before comparing built-in objects References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> Message-ID: <20121008082628.193be362@pitrou.net> On Sun, 07 Oct 2012 22:35:17 -0400 Ned Batchelder wrote: > I don't understand the reluctance to address a common conceptual > speed-bump in the docs. After all, the tutorial has an entire chapter > (http://docs.python.org/tutorial/floatingpoint.html) that explains how > floats work, even though they work exactly as IEEE 754 says they should. > > A sentence in section 5.4 (Numeric Types) would help. Something like, > "In accordance with the IEEE 754 standard, NaN's are not equal to any > value, even another NaN. This is because NaN doesn't represent a > particular number, it represents an unknown result, and there is no way > to know if one unknown result is equal to another unknown result." +1 Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Mon Oct 8 08:30:08 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 08:30:08 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <20121007234718.1831839f@pitrou.net> <5072085E.3050902@canterbury.ac.nz> Message-ID: <20121008083008.3b931907@pitrou.net> On Mon, 08 Oct 2012 11:55:26 +1300 Greg Ewing wrote: > > Not all that well, apparently. From the docs for os.path: > > os.path.normcase(path) > Normalize the case of a pathname. On Unix and Mac OS X, this returns the > path unchanged; on case-insensitive filesystems, it converts the path to > lowercase. On Windows, it also converts forward slashes to backward slashes. > > This is partially self-contradictory, since many MacOSX filesystems are > actually case-insensitive; it depends on the particular filesystem concerned. > Worse, different parts of the same path can have different case sensitivities. > Also, with network file systems, not all paths are necessarily case-insensitive > on Windows. That's true, but considering paths case-insensitive under Windows and case-sensitive under (non-OS X) Unix is still a very good approximation that seems to satisfy most everyone. > So there's really no certain way to compare pure paths for equality. Basing > it on which OS is running your code is no more than a guess. I wonder how well other file-dealing tools cope under OS X, especially those that are portable and not OS X-specific. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From stephen at xemacs.org Mon Oct 8 10:12:24 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Oct 2012 17:12:24 +0900 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <5071E53E.4030906@insectnation.org> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> Message-ID: <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> Andy Buckley writes: > So one last question, in case it is an acceptable python-ideas topic: > how about adding readline-like support by default in the > interpreter? If readline-like support is available on the system, it's used. However, it's apparently only readline-like. For example, on Mac OS X, the BSD-licensed libedit readline emulation is used by default, it appears. I wouldn't expect full functionality there. On GNU/Linux systems, as I wrote, True GNU readline is used. Why this particular function isn't bound or doesn't work right, I don't know offhand. It is apparently a bug (my Python sources are from April, but I can't see why this would change), since the sources say (ll. 927-931 of Modules/readline.c): /* Initialize (allows .inputrc to override) * * XXX: A bug in the readline-2.2 library causes a memory leak * inside this function. Nothing we can do about it. */ but even adding a binding to .inputrc doesn't work for me (Gentoo Linux). Bugs http://bugs.python.org/issue8492 http://bugs.python.org/issue5845 are related; I don't know whether it's worth filing an additional bug as I suspect it will get fixed in passing if 8492 is fixed. From ncoghlan at gmail.com Mon Oct 8 12:31:06 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Oct 2012 16:01:06 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006141858.73b42c38@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: I've said before that I like the general shape of the pathlib API and that's still the case. It's the only OO API I've seen that's semantically clean enough for me to support introducing it as "the" standard path abstraction in the standard library. However, there are still a few rough edges I would like to see smoothed out :) On Sat, Oct 6, 2012 at 5:48 PM, Antoine Pitrou wrote: > On Sat, 6 Oct 2012 11:27:58 +0100 > Paul Moore wrote: >> I agree that's what I thought relative() would be when I first read the name. > > You are right, relative() could be removed and replaced with the > current relative_to() method. I wasn't sure about how these names would > feel to a native English speaker. The minor problem is that "relative" on its own is slightly unclear about whether the invariant involved is "a == b.subpath(a.relative(b))" or "b == a.subpath(a.relative(b))" By including the extra word, the intended meaning becomes crystal clear: "a == b.subpath(a.relative_to(b))" However, "a relative to b" is the more natural interpretation, so +1 for using "relative" for the semantics of the method based equivalent to the current os.path.relpath(). I agree there's no need for a shorthand for "a.relative(a.root)" As the invariants above suggest, I'm also currently -1 on *any* of the proposed shorthands for "p.subpath(subpath)", *as well as* the use of "join" as the method name (due to the major difference in semantics relative to str.join). All of the shorthands are magical and/or cryptic and save very little typing over the explicitly named method. As already noted in the PEP, you can also shorten it manually by saving the bound method to a local variable. It's important to remember that you can't readily search for syntactic characters or common method names to find out what they mean, and these days that kind of thing should be taken into account when designing an API. "p.subpath('foo', 'bar')" looks like executable pseudocode for creating a new path based on existing one to me, unlike "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". The method semantics are obvious by comparison, since they would be the same as those for ordinary construction: "p.subpath(*args) == type(p)(p, *args)" I'm not 100% sold on "subpath" as an alternative (since ".." entries may mean that the result isn't really a subpath of the original directory at all), but I do like the way it reads in the absence of parent directory references, and I definitely like it better than "join" or "[]" or "/" or "+". This interpretation is also favoured by the fact that the calculation of relative path references is strict by default (i.e. it won't insert ".." to make the reference work when the target isn't a subpath) > You can't really add '..' components and expect the result to be > correct, for example if '/usr/lib' is a symlink to '/lib', then > '/usr/lib/..' is '/', not /usr'. > > That's why the resolve() method, which resolves symlinks along the path, > is the only one allowed to muck with '..' components. This seems too strict for the general case. Configuration files in bundled applications, for example, often contain paths relative to the file (e.g. open up a Visual Studio project file). There are no symlinks involved there. Perhaps a "require_subpath" flag that defaults to True would be appropriate? Passing "require_subpath=False" would then provide explicit permission to add ".." entries as appropriate, and it would be up to the developer to document the "no symlinks!" restriction on their layout. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ronaldoussoren at mac.com Mon Oct 8 12:00:22 2012 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 08 Oct 2012 12:00:22 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> Message-ID: <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> On 7 Oct, 2012, at 23:43, Arnaud Delobelle wrote: > On 7 October 2012 18:37, Antoine Pitrou wrote: >> Pure comparison already obeys case-sensitivity rules as well as the >> different path separators: >> >>>>> PureNTPath('a/b') == PureNTPath('A\\B') >> True >>>>> PurePosixPath('a/b') == PurePosixPath('a\\b') >> False > > Naive question: how do you deal with HFS+, which is case-preserving > but on most machines case-insensitive? Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. Ronald From phd at phdru.name Mon Oct 8 13:07:48 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 8 Oct 2012 15:07:48 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> Message-ID: <20121008110748.GA17653@iskra.aviel.ru> On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren wrote: > Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. But there is no API to ask what type of filesystem a path belongs to. So guessing by OS name is the only heuristic we can do. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From flub at devork.be Mon Oct 8 13:10:05 2012 From: flub at devork.be (Floris Bruynooghe) Date: Mon, 8 Oct 2012 12:10:05 +0100 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <60F4AB4E-5A1F-4980-A462-53A6689145E4@gmail.com> Message-ID: On 8 October 2012 03:49, Guido van Rossum wrote: > My current goal is to see if it would be possible to come up with an > abstraction that makes it possible to write protocol handlers that are > independent from the rest of the infrastructure (e.g. transport, > reactor). This would be my ideal situation too and I think this is what PEP 3153 was trying to achieve. While I am an greenlet (eventlet) user I agree with the sentiment that it is not ideal to include it into the stdlib itself and instead work to a solution where we can share protocol implementations while having the freedom to run on a twisted reactor, tornado, something greenlet based or something in the stdlib depending on the preference of the developer. FWIW I have implemented the AgentX protocol based on PEP-3153 and it isn't complete yet (I had to go outside of what it defines). It is also rather heavy handed and I'm not sure how one could migrate the stdlib to something like this. So hopefully there are better solutions possible. Regards, Floris From p.f.moore at gmail.com Mon Oct 8 13:11:52 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Oct 2012 12:11:52 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: On 8 October 2012 11:31, Nick Coghlan wrote: > It's important to remember that you can't readily search for syntactic > characters or common method names to find out what they mean, and > these days that kind of thing should be taken into account when > designing an API. "p.subpath('foo', 'bar')" looks like executable > pseudocode for creating a new path based on existing one to me, unlike > "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". Until precisely this point in your email, I'd been completely confused, because I thought that p.supbath(xxx) was some sort of "is xxx a subpath of p" query. It never occurred to me that it was the os.path.join equivalent operation. In fact, I'm not sure where you got it from, as I couldn't find it in either the PEP or in pathlib's documentation. I'm not unhappy with using a method for creating a new path based on an existing one (none of the operator forms seems particularly compelling to me) but I really don't like subpath as a name. I don't dislike p.join(parts) as it links back nicely to os.path.join. I can't honestly see anyone getting confused in practice. But I'm not so convinced that I would want to insist on it. +1 on a method -1 on subpath as its name +0 on join as its name I'm happy for someone to come up with a better name -0 on a convenience operator form. Mainly because "only one way to do it" and the general controversy over which is the best operator to use, suggests that leaving the operator form out altogether at least in the initial implementation is the better option. Paul. From christian at python.org Mon Oct 8 14:39:14 2012 From: christian at python.org (Christian Heimes) Date: Mon, 08 Oct 2012 14:39:14 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: <5072C972.5070207@python.org> Hi Ben, Am 08.10.2012 03:41, schrieb Ben Darnell: > This thread so far has been more about the latter, but the need for > standardization is more acute for the core event loop. I've written a > bridge between Tornado and Twisted so libraries written for both event > loops can coexist, but obviously that wouldn't scale if there were a > proliferation of event loop implementations out there. I'd be in > favor of a simple event loop interface in the standard library, with > reference implementation(s) (select, epoll, kqueue, iocp) and some > means of configuring the global (or thread-local) singleton. [...] Python's standard library doesn't contain in interface to I/O Completion Ports. I think a common event loop system is a good reason to add IOCP if somebody is up for the challenge. Would you prefer an IOCP wrapper in the stdlib or your own version? Twisted has its own Cython based wrapper, some other libraries use a libevent-based solution. Christian From stephen at xemacs.org Mon Oct 8 14:46:13 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 08 Oct 2012 21:46:13 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: <87haq5uo6i.fsf@uwakimon.sk.tsukuba.ac.jp> Paul Moore writes: > On 8 October 2012 11:31, Nick Coghlan wrote: > > designing an API. "p.subpath('foo', 'bar')" looks like executable > > pseudocode for creating a new path based on existing one to me, unlike > > "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". > > Until precisely this point in your email, I'd been completely > confused, because I thought that p.supbath(xxx) was some sort of "is > xxx a subpath of p" query. I agree with Paul on this. If .join() doesn't work for you, how about .append() for adding new path components at the end, vs. .suffix() for adding an extension to the last component? (I don't claim Paul would agree with this next, but as long as I'm here....) I really think that the main API for paths should be the API for sequences specialized to "sequence of path components", with a subsidiary set of operations for common textual manipulations applied to individual components. From him at online.de Mon Oct 8 15:34:52 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Mon, 08 Oct 2012 15:34:52 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: <5072D67C.2020106@online.de> On 08/10/2012 03:41 Ben Darnell wrote: > As for the higher-level question of what asynchronous code should look > like, there's a lot more room for spirited debate, and I don't think > there's enough consensus to declare a One True Way. Personally, I'm > -1 on greenlets as a general solution (what if you have to call > MySQLdb or getaddrinfo?) The caller of such a potentially blocking function could: * spawn a new thread for the call * call the function inside the thread and collect return value or exception * register the thread (id) to inform the event loop (scheduler) it's waiting for it's completion * yield (aka "switch" in greenlet) to the event loop / scheduler * upon continuation either continue with the result or reraise the exception that happened in the thread Unfortunately on Unix systems select/poll/kqueue cannot specify threads as event resources, so an additional pipe descriptor would be needed for the scheduler to detect thread completions without blocking (threads would write to the pipe upon completion), not elegant but doable. Joachim From phd at phdru.name Mon Oct 8 16:28:12 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 8 Oct 2012 18:28:12 +0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> Message-ID: <20121008142812.GA22502@iskra.aviel.ru> On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren wrote: > On 8 Oct, 2012, at 13:07, Oleg Broytman wrote: > > > On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren wrote: > >> Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. > > > > But there is no API to ask what type of filesystem a path belongs to. > > So guessing by OS name is the only heuristic we can do. > > I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. > > The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. If a filesystem mounted to w32 is exported from a server by CIFS/SMB protocol -- is it case sensitive? What if said server is Linux? What if said filesystem was actually imported to Linux from a Novel server by NetWare Core Protocol. It's not a fictional situation -- I do it at oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem and reexport them via Samba. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ronaldoussoren at mac.com Mon Oct 8 15:59:18 2012 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Mon, 08 Oct 2012 15:59:18 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008110748.GA17653@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> Message-ID: <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> On 8 Oct, 2012, at 13:07, Oleg Broytman wrote: > On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren wrote: >> Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. > > But there is no API to ask what type of filesystem a path belongs to. > So guessing by OS name is the only heuristic we can do. I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. Ronald From rosuav at gmail.com Mon Oct 8 17:03:59 2012 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 9 Oct 2012 02:03:59 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008142812.GA22502@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <20121008142812.GA22502@iskra.aviel.ru> Message-ID: On Tue, Oct 9, 2012 at 1:28 AM, Oleg Broytman wrote: > If a filesystem mounted to w32 is exported from a server by CIFS/SMB > protocol -- is it case sensitive? What if said server is Linux? What if > said filesystem was actually imported to Linux from a Novel server by > NetWare Core Protocol. It's not a fictional situation -- I do it at > oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem > and reexport them via Samba. And I thought I was weird in using sshfs and Samba together to "bounce" drive access without having to set up SMB passwords for lots of systems... Would it be safer to simply assume that everything's case sensitive until you actually do a filesystem call (a stat or something)? That is, every Pure function works as though the FS is case sensitive? ChrisA From jsbueno at python.org.br Mon Oct 8 17:13:55 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 8 Oct 2012 12:13:55 -0300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008142812.GA22502@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <20121008142812.GA22502@iskra.aviel.ru> Message-ID: On 8 October 2012 11:28, Oleg Broytman wrote: > On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren wrote: >> On 8 Oct, 2012, at 13:07, Oleg Broytman wrote: >> >> > On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren wrote: >> >> Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. >> > >> > But there is no API to ask what type of filesystem a path belongs to. >> > So guessing by OS name is the only heuristic we can do. >> >> I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. >> >> The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. > > If a filesystem mounted to w32 is exported from a server by CIFS/SMB > protocol -- is it case sensitive? What if said server is Linux? What if > said filesystem was actually imported to Linux from a Novel server by > NetWare Core Protocol. It's not a fictional situation -- I do it at > oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem > and reexport them via Samba. > Actually, after just thinking of a few corner cases, (and in this case seen some real world scenarios) it is easy to infer that it is impossible to estabilish for certain that a filesystem, worse, that a given directory, is case-sensitive or not. So, regardless of general passive assumptions, I think Python should include a way to actively verify the filesystem case sensitivity. Something along "assert_case_sensitiveness()" that would check for a filename in the given path, and try to retrieve it inverting some capitalization. If a suitable filename were not found in the given directory, it could raise an error - or try to make an active test by writtng there (this behavior should be controled by keyword parameters). So, whenever one needs to know about case sensitiveness, there would be one obvious way in place to know for shure, even at the cost of some extra system resources. js -><- > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. Hmmm...maybe that applies for programmers who not kept up with the times only? I'd rather raise StopVitalFunctions when my time comes. From steve at pearwood.info Mon Oct 8 15:03:55 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Oct 2012 00:03:55 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121006214540.GB20907@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> Message-ID: <5072CF3B.2070203@pearwood.info> On 07/10/12 08:45, Andrew McNabb wrote: > To clarify my point: in Python, "/" is not just a symbol--it > specifically means "div". I think that's wrong. / is a symbol that means whatever the class gives it. It isn't like __init__ or __call__ that have defined language semantics, and there is no rule that says that / means division. I'll grant you that it's a strong convention, but it is just a convention. > Overriding the div operator requires creating a "__div__" special > method, Actually it is __truediv__ in Python 3. __div__ no longer has any meaning or special status. But it's just a name. __add__ doesn't necessarily perform addition, __sub__ doesn't necessarily perform subtraction, and __or__ doesn't necessarily have anything to do with either bitwise or boolean OR. Why should we insist that __*div__ (true, floor or just plain div) must only be used for numeric division when we don't privilege other numeric operators like that? -- Steven From guido at python.org Mon Oct 8 17:30:12 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 08:30:12 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Sun, Oct 7, 2012 at 9:44 PM, Ben Darnell wrote: > On Sun, Oct 7, 2012 at 7:01 PM, Guido van Rossum wrote: >> As long as it's not so low-level that other people shy away from it. > > That depends on the target audience. The low-level IOLoop and Reactor > are pretty similar -- you can implement one in terms of the other -- > but as you move up the stack cross-compatibility becomes harder. For > example, if I wanted to implement tornado's IOStreams in twisted, I > wouldn't start with the analogous class in twisted (Protocol?), I'd go > down to the Reactor and build from there, so putting something > IOStream or Protocol in asycore2 wouldn't do much to unify the two > worlds. (it would help people build async stuff with the stdlib > alone, but at that point it becomes more like a peer or competitor to > tornado and twisted instead of a bridge between them) Sure. And of course we can't expect Twisted and Tornado to just merge projects. They each have different strengths and weaknesses and they each have strong opinions on how things should be done. I do get your point that none of that is incompatible with a shared reactor specification. >> I also have a feeling that one way or another this will require >> cooperation between the Twisted and Tornado developers in order to >> come up with a compromise that both are willing to conform to in a >> meaningful way. (Unfortunately I don't know how to define "meaningful >> way" more precisely here. I guess the idea is that almost all things >> *using* an event loop use the standardized abstract API without caring >> whether underneath it's Tornado, Twisted, or some simpler thing in the >> stdlib. > > I'd phrase the goal as being able to run both Tornado and Twisted in > the same thread without any piece of code needing to know about both > systems. I think that's achievable as far as core functionality goes. > I expect both sides have some lesser-used functionality that might > not make it into the stdlib version, but as long as it's possible to > plug in a "real" IOLoop or Reactor when needed it should be OK. Sounds good. I think a reactor is always going to be an extension of the shared spec. [...] >> That's interesting. I haven't found the need for this yet. Is it >> really so common that you can't write this as a Future() constructor >> plus a call to add_done_callback()? Or is there some subtle semantic >> difference? > > It's a Future constructor, a (conditional) add_done_callback, plus the > calls to set_result or set_exception and the with statement for error > handling. In full: > > def future_wrap(f): > @functools.wraps(f) > def wrapper(*args, **kwargs): > future = Future() > if kwargs.get('callback') is not None: > future.add_done_callback(kwargs.pop('callback')) > kwargs['callback'] = future.set_result > def handle_error(typ, value, tb): > future.set_exception(value) > return True > with ExceptionStackContext(handle_error): > f(*args, **kwargs) > return future > return wrapper Hmm... I *think* it automatically adds a special keyword 'callback' to the *call* site so that you can do things like fut = some_wrapped_func(blah, callback=my_callback) and then instead of using yield to wait for the callback, put the continuation of your code in the my_callback() function. But it also seems like it passes callback=future.set_result as the callback to the wrapped function, which looks to me like that function was apparently written before Futures were widely used. This seems pretty impure to me and I'd like to propose a "future" where such functions either be given the Future where the result is expected, or (more commonly) the function would create the Future itself. Unless I'm totally missing the programming model here. PS. I'd like to learn more about ExceptionStackContext() -- I've struggled somewhat with getting decent tracebacks in NDB. >>> In Tornado the Future is created by a decorator >>> and hidden from the asynchronous function (it just sees the callback), >> >> Hm, interesting. NDB goes the other way, the callbacks are mostly used >> to make Futures work, and most code (including large swaths of >> internal code) uses Futures. I think NDB is similar to monocle here. >> In NDB, you can do >> >> f = >> r = yield f >> >> where "yield f" is mostly equivalent to f.result(), except it gives >> better opportunity for concurrency. > > Yes, tornado's gen.engine does the same thing here. However, the > stakes are higher than "better opportunity for concurrency" - in an > event loop if you call future.result() without yielding, you'll > deadlock if that Future's task needs to run on the same event loop. That would depend on the semantics of the event loop implementation. In NDB's event loop, such a .result() call would just recursively enter the event loop, and you'd only deadlock if you actually have two pieces of code waiting for each other's completion. [...] >> I am currently trying to understand if using "yield from" (and >> returning a value from a generator) will simplify things. For example >> maybe the need for a special decorator might go away. But I keep >> getting headaches -- perhaps there's a Monad involved. :-) > > I think if you build generator handling directly into the event loop > and use "yield from" for calls from one async function to another then > you can get by without any decorators. But I'm not sure if you can do > that and maintain any compatibility with existing non-generator async > code. > > I think the ability to return from a generator is actually a bigger > deal than "yield from" (and I only learned about it from another > python-ideas thread today). The only reason a generator decorated > with @tornado.gen.engine needs a callback passed in to it is to act as > a psuedo-return, and a real return would prevent the common mistake of > running the callback then falling through to the rest of the function. Ah, so you didn't come up with the clever hack of raising an exception to signify the return value. In NDB, you raise StopIteration (though it is given the alias 'Return' for clarity) with an argument, and the wrapper code that is responsible for the Future takes the value from the StopIteration exception and passes it to the Future's set_result(). > For concreteness, here's a crude sketch of what the APIs I'm talking > about would look like in use (in a hypothetical future version of > tornado). > > @future_wrap > @gen.engine > def async_http_client(url, callback): > parsed_url = urlparse.urlsplit(url) > # works the same whether the future comes from a thread pool or @future_wrap And you need the thread pool because there's no async version of getaddrinfo(), right? > addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) > stream = IOStream(socket.socket()) > yield stream.connect((addrinfo[0][-1])) > stream.write('GET %s HTTP/1.0' % parsed_url.path) Why no yield in front of the write() call? > header_data = yield stream.read_until('\r\n\r\n') > headers = parse_headers(header_data) > body_data = yield stream.read_bytes(int(headers['Content-Length'])) > stream.close() > callback(body_data) > > # another function to demonstrate composability > @future_wrap > @gen.engine > def fetch_some_urls(url1, url2, url3, callback): > body1 = yield async_http_client(url1) > # yield a list of futures for concurrency > future2 = yield async_http_client(url2) > future3 = yield async_http_client(url3) > body2, body3 = yield [future2, future3] > callback((body1, body2, body3)) This second one is nearly identical to the way we it's done in NDB. However I think you have a typo -- I doubt that there should be yields on the lines creating future2 and future3. > One hole in this design is how to deal with callbacks that are run > multiple times. For example, the IOStream read methods take both a > regular callback and an optional streaming_callback (which is called > with each chunk of data as it arrives). I think this needs to be > modeled as something like an iterator of Futures, but I haven't worked > out the details yet. Ah. Yes, that's a completely different kind of thing, and probably needs to be handled in a totally different way. I think it probably needs to be modeled more like an infinite loop where at the blocking point (e.g. a low-level read() or accept() call) you yield a Future. Although I can see that this doesn't work well with the IOLoop's concept of file descriptor (or other event source) registration. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 17:34:29 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 08:34:29 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <60F4AB4E-5A1F-4980-A462-53A6689145E4@gmail.com> Message-ID: On Mon, Oct 8, 2012 at 4:10 AM, Floris Bruynooghe wrote: > On 8 October 2012 03:49, Guido van Rossum wrote: >> My current goal is to see if it would be possible to come up with an >> abstraction that makes it possible to write protocol handlers that are >> independent from the rest of the infrastructure (e.g. transport, >> reactor). > > This would be my ideal situation too and I think this is what PEP 3153 > was trying to achieve. While I am an greenlet (eventlet) user I agree > with the sentiment that it is not ideal to include it into the stdlib > itself and instead work to a solution where we can share protocol > implementations while having the freedom to run on a twisted reactor, > tornado, something greenlet based or something in the stdlib depending > on the preference of the developer. > > FWIW I have implemented the AgentX protocol based on PEP-3153 and it > isn't complete yet (I had to go outside of what it defines). It is > also rather heavy handed and I'm not sure how one could migrate the > stdlib to something like this. So hopefully there are better > solutions possible. The more I think about this the more I think it will be really hard to accomplish. I think we ought to try and go for goals that are easier to obtain (and still useful) first, such as a common reactor/ioloop specification and a "best practice" implementation (which may choose a different polling mechanism depending on the platform OS) in the stdlib. 3rd party code could then hook into this mechanism and offer alternate reactors, e.g. integrated with a 3rd party GUI library such as Wx, Gtk, Qt -- maybe we can offer Tk integration in the stdlib. 3rd party reactors could also offer additional functionality, e.g. advanced scheduling, threadpool integration, or whatever (my imagination isn't very good here). -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 17:35:08 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 08:35:08 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5072C972.5070207@python.org> References: <5072C972.5070207@python.org> Message-ID: On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: > Python's standard library doesn't contain in interface to I/O Completion > Ports. I think a common event loop system is a good reason to add IOCP > if somebody is up for the challenge. > > Would you prefer an IOCP wrapper in the stdlib or your own version? > Twisted has its own Cython based wrapper, some other libraries use a > libevent-based solution. What's an IOCP? -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 17:37:28 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 08:37:28 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5072D67C.2020106@online.de> References: <5072D67C.2020106@online.de> Message-ID: On Mon, Oct 8, 2012 at 6:34 AM, Joachim K?nig wrote: > On 08/10/2012 03:41 Ben Darnell wrote: >> >> As for the higher-level question of what asynchronous code should look >> like, there's a lot more room for spirited debate, and I don't think >> there's enough consensus to declare a One True Way. Personally, I'm >> -1 on greenlets as a general solution (what if you have to call >> MySQLdb or getaddrinfo?) > > > The caller of such a potentially blocking function could: > > * spawn a new thread for the call > * call the function inside the thread and collect return value or exception > * register the thread (id) to inform the event loop (scheduler) it's waiting for it's completion > * yield (aka "switch" in greenlet) to the event loop / scheduler > * upon continuation either continue with the result or reraise the exception that happened in the thread Ben just posted an example of how to do exactly that for getaddrinfo(). > Unfortunately on Unix systems select/poll/kqueue cannot specify threads as > event resources, so an additional pipe descriptor would be needed for the scheduler > to detect thread completions without blocking (threads would write to the pipe upon > completion), not elegant but doable. However it must be done this seems a useful thing to solve once and for all in a standard reactor specification and stdlib implementation. (Ditto for signal handlers BTW.) -- --Guido van Rossum (python.org/~guido) From amcnabb at mcnabbs.org Mon Oct 8 18:06:17 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 8 Oct 2012 10:06:17 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5072CF3B.2070203@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> Message-ID: <20121008160617.GA1527@mcnabbs.org> On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote: > / is a symbol that means whatever the class > gives it. It isn't like __init__ or __call__ that have defined > language semantics, and there is no rule that says that / means > division. I'll grant you that it's a strong convention, but it is > just a convention. I'll grant you that the semantics of the __truediv__ method are defined by convention. > But it's just a name. __add__ doesn't necessarily perform addition, > __sub__ doesn't necessarily perform subtraction, and __or__ doesn't > necessarily have anything to do with either bitwise or boolean OR. > Why should we insist that __*div__ (true, floor or just plain div) > must only be used for numeric division when we don't privilege other > numeric operators like that? __add__ for strings doesn't mean numerical addition, but people find it perfectly natural to speak of "adding two strings," for example. Seeing `string1.__add__(string2)` is readable, as is `operator.add(string1, string2)`. Every other example of operator overloading that I find tasteful is analogous enough to the numerical operators to retain use the name. Since this really is a matter of personal taste, I'll end my participation in this discussion by voicing support for Nick Coghlan's suggestion of a `join` method, whether it's named `join` or `append` or something else. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From guido at python.org Mon Oct 8 18:19:31 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 09:19:31 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On Sun, Oct 7, 2012 at 7:33 PM, Alexander Belopolsky wrote: > On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum wrote: >> Referencing Java here is absurd and I still consider this suggestion >> as a troll. Python is not in any way based on Java. > > I did not suggest that. Sorry if it came out this way. I am well > aware that Python and Java were invented independently and have > different roots. (IIRC, Java was born from Oak and Python from ABC > and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes > both languages and one team decided that equality reflexivity for > hashable objects was more important than IEEE 784 compliance while the > other decided otherwise. > > Many Python features (mostly library) are motivated by C. In the 90s, > "because C does it this way" was a good explanation for a language > feature. Doing things differently from the "C way", on the other hand > would deserve an explanation. These days, C is rarely first language > that a student learns. Hopefully Python will take this place in not > so distant future, but many students graduated in late 90s - early > 2000s knowing nothing but Java. As a result, these days it is a > valid question to ask about a language feature: "Why does Python do X > differently from Java?" Hopefully in most cases the answer is > "because Python does it better." Explaining the differences between Python and Java is a job for educators, not for the language reference. I agree that documenting APIs as "this behaves just like C" does not have the same appeal -- but that turn of phrase was mostly used for system calls anyway, and for those I think that a slightly modified redirection (to the OS man pages) is still completely appropriate. > In case of nan != nan, I would really like to know a modern reason why > Python's way is better. Better compliance with a 20-year old standard > does not really qualify. I am not aware of an update to the standard. Being 20 years old does not make it outdated. Again, there are plenty of reasons (you have to ask the numpy folks), but I don't think it is the job of the Python reference manual to give its motivations. It just needs to explain how things work, and if that can be done best by deferring to an existing standard that's fine. Of course a tutorial should probably mention this behavior, but a tutorial does not have the task of giving you the reason for every language feature either -- most readers of the tutorial don't have the context yet to understand those reasons, many don't care, and whether they like it or not, it's not going to change. You keep getting very close to suggesting to make changes, despite your insistence that you just want to know the reason. But assuming you really just are asking in an obnoxious way for the reason, I recommand that you ask the people who wrote the IEEE 754 standard. I'm sure their explanation (which I recall having read once but can't reproduce here) makes sense for Python too. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 18:25:16 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 09:25:16 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50723BE5.3060300@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> Message-ID: On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder wrote: > I don't understand the reluctance to address a common conceptual speed-bump > in the docs. After all, the tutorial has an entire chapter > (http://docs.python.org/tutorial/floatingpoint.html) that explains how > floats work, even though they work exactly as IEEE 754 says they should. I'm sorry. I didn't intend to refuse to document the behavior. I was mostly reacting to things I thought I read between the lines -- the suggestion that there is no reason for the NaN behavior except silly compatibility with an old standard that nobody cares about. From this it is only a small step to reading (again between the lines) the suggesting to change the behavior. > A sentence in section 5.4 (Numeric Types) would help. Something like, "In > accordance with the IEEE 754 standard, NaN's are not equal to any value, > even another NaN. This is because NaN doesn't represent a particular > number, it represents an unknown result, and there is no way to know if one > unknown result is equal to another unknown result." That sounds like a great addition to the docs, except for the nit that I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs" myself. Also, the words here can still cause confusion. The exact behavior is that every one of the 6 comparison operators (==, !=, <, <=, >, >=) returns False when either argument (or both) is a NaN. I think your suggested words could lead someone to believe that they mean that x != NaN or NaN != Nan would return True. Anyway, once we can agree to words I agree that we should update that section. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 18:29:42 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 09:29:42 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <507243D2.8000505@btinternet.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Sun, Oct 7, 2012 at 8:09 PM, Rob Cliffe wrote: > I understand that the undefined result of a computation is not the same as > the undefined result of another computation. > (E.g. one might represent positive infinity, another might represent > underflow or loss of accuracy.) > But I can't help feeling (strongly) that the result of a computation should > be equal to itself. > In other words, after > x = float('nan') > y = float('nan') > I would expect > x != y > but > x == x That's too bad. It sounds like this mailing list really wouldn't have enough space in its margins to convince you otherwise. And yet you are wrong. > After all, how much sense does this make (I got this in a quick test with > Python 2.7.3): >>>> x=float('nan') >>>> x is x > True # Well I guess you'd sorta expect this >>>> x==x > False # You what? >>>> D = {1:x, 2:x} >>>> D[1]==D[2] > False # I see, both NANs - hmph! >>>> [x]==[x] > True # Oh yeh, it doesn't always work that way then? > > Making equality non-reflexive feels utterly wrong to me, partly no doubt > because of my mathematical background, Do you have any background at all in *numerical* mathematics? > partly because of the difficulty in > implementing container objects and algorithms and God knows what else when > you have to remember that some of the objects they may deal with may not be > equal to themselves. In particular the difference between my last two > examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by > saying that for historical reasons the designers of lists and the designers > of dictionaries made different - but entirely reasonable - assumptions about > the equality relation, and (perhaps) whether identity implies equality (how > do you explain to a Python learner that it doesn't (pathological code > examples aside) ???). > Couldn't each NAN when generated contain something that identified it > uniquely, so that different NANs would always compare as not equal, but any > given NAN would compare equal to itself? It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False. I admit that a tutorial section describing the behavior would be good. But I am less than ever convinced that it's possible to explain the *reason* for the behavior in a tutorial. -- --Guido van Rossum (python.org/~guido) From massimo.dipierro at gmail.com Mon Oct 8 18:38:31 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 8 Oct 2012 11:38:31 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008160617.GA1527@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> Message-ID: <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> http://en.wikipedia.org/wiki/List_of_mathematical_symbols#Symbols The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset). The / symbol means the quotient of a group. It always returns a subgroup. When I see path1 / path2 I would expect it to return all paths that start by path2 or contain path2, not concatenation. The fact that string paths in Unix use the / to represent concatenation is accidental. That's just how the path is serialized into a string. In fact Windows uses a different separator. I do think a serialized representation of an object makes a good example for its abstract representation. Massimo On Oct 8, 2012, at 11:06 AM, Andrew McNabb wrote: > On Tue, Oct 09, 2012 at 12:03:55AM +1100, Steven D'Aprano wrote: >> / is a symbol that means whatever the class >> gives it. It isn't like __init__ or __call__ that have defined >> language semantics, and there is no rule that says that / means >> division. I'll grant you that it's a strong convention, but it is >> just a convention. > > I'll grant you that the semantics of the __truediv__ method are defined > by convention. > >> But it's just a name. __add__ doesn't necessarily perform addition, >> __sub__ doesn't necessarily perform subtraction, and __or__ doesn't >> necessarily have anything to do with either bitwise or boolean OR. >> Why should we insist that __*div__ (true, floor or just plain div) >> must only be used for numeric division when we don't privilege other >> numeric operators like that? > > __add__ for strings doesn't mean numerical addition, but people find it > perfectly natural to speak of "adding two strings," for example. Seeing > `string1.__add__(string2)` is readable, as is `operator.add(string1, > string2)`. Every other example of operator overloading that I find > tasteful is analogous enough to the numerical operators to retain use > the name. > > Since this really is a matter of personal taste, I'll end my > participation in this discussion by voicing support for Nick Coghlan's > suggestion of a `join` method, whether it's named `join` or `append` or > something else. > > -- > Andrew McNabb > http://www.mcnabbs.org/andrew/ > PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Oct 8 18:45:34 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Oct 2012 12:45:34 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <20121008124534.24d05df6@resist.wooz.org> On Oct 06, 2012, at 03:00 PM, Guido van Rossum wrote: >This is an incredibly important discussion. Indeed. If Python gets it right, it could be yet another killer reason for upgrading to Python 3, at least for the growing subset of event-driven applications. >(1) How importance is it to offer a compatibility path for asyncore? I've written and continue to use async-based code. I don't personally care much about compatibility. I've use async because it was the simplest and most stdlibby of the options for the Python versions I can use, but I have no love for it. If there were a better, more readable and comprehensible way to do it, I'd ditch the async-based versions as soon as possible. >I would have thought that offering an integration path forward for Twisted >and Tornado would be more important. Agreed. I share the same dream as someone else in this thread mentioned. It would be really fantastic if the experts in a particular protocol could write support for that protocol Just Once and have it as widely shared as possible. Maybe this is an unrealistic dream, but now's the time to have them anyway. Even something like the email package could benefit from this. The FeedParser is our attempt to support asynchronous reading of email data for parsing. I'm not so sure that the asynchronous part of that is very useful. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Mon Oct 8 18:47:48 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 09:47:48 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Sun, Oct 7, 2012 at 8:46 PM, Alexander Belopolsky wrote: > On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe wrote: >> Couldn't each NAN when generated contain something that identified it >> uniquely, so that different NANs would always compare as not equal, but any >> given NAN would compare equal to itself? > > If we take this route and try to distinguish NaNs with different > payload, I am sure you will want to distinguish between -0.0 and 0.0 > as well. The later would violate transitivity in -0.0 == 0 == 0.0. > > The only sensible thing to do with NaNs is either to treat them all > equal (the Eiffel way) or to stick to IEEE default. > > I don't think NaN behavior in Python is a result of a deliberate > decision to implement IEEE 754. Oh, it was. It was very deliberate. Like in many other areas of Python, I refused to invent new rules when there was existing behavior elsewhere that I could borrow and with which I had no reason to quibble. (And in the case of floating point behavior, there really is no alternate authority to choose from besides IEEE 754. Languages that disagree with it do not make an authority.) Even if I *did* have reasons to quibble with the NaN behavior (there were no NaNs on the mainframe where I learned programming, so they were as new and weird to me as they are to today's novices), Tim Peters, who has implemented numerical libraries for Fortran compilers in a past life and is an absolute authority on floating points, convinced me to follow IEEE 754 as closely as I could. > If that was the case, why 0.0/0.0 does not produce NaN? Easy. It was an earlier behavior, from the days where IEEE 754 hardware did not yet rule the world, and Python didn't have much op an opinion on float behavior at all -- it just did whatever the platform did. Infinities and NaNs were not on my radar (I hadn't met Tim yet :-). However division by zero (which is not just a float but also an int behavior) was something that I just had to address, so I made the runtime check for it and raise an exception. When we became more formal about this, we considered changing this but decided that the ZeroDivisionError was more user-friendly than silently propagating NaNs everywhere, given the typical use of Python. (I suppose we could make it optional, and IIRC that's what Decimal does -- but for floats we don't have a well-developed numerical context concept yet.) > Similarly, Python math library does not produce > infinities where IEEE 754 compliant library should: > >>>> math.log(0.0) > Traceback (most recent call last): > File "", line 1, in > ValueError: math domain error Again, this mostly comes from backward compatibility with the math module's origins (and it is as old as Python itself, again predating its use of IEEE 754). AFAIK Tim went over the math library very carefully and cleaned up what he could, so he probably thought about this as well. Also, IIUC the IEEE library prescribes exceptions as well as return values; e.g. "man 3 log" on my OSX computer says that log(0) returns -inf as well as raise a divide-by-zero exception. So I think this is probably compliant with the standard -- one can decide to ignore the exceptions in certain contexts and honor them in others. (Probably even the 1/0 behavior can be defended this way.) > Some other operations behave inconsistently: > >>>> 2 * 10.**308 > inf > > but >>>> 10.**309 > Traceback (most recent call last): > File "", line 1, in > OverflowError: (34, 'Result too large') Probably the same. IEEE 754 may be more complex than you think! > I think non-reflexivity of nan in Python is an accidental feature. It is not. > Python's float type was not designed with NaN in mind and until > recently, it was relatively difficult to create a nan in pure python. And when we did add NaN and Inf we thought about the issues carefully. > It is also not true that IEEE 754 requires that nan == nan is false. > IEEE 754 does not define operator '==' (nor does it define boolean > false). Instead, IEEE defines a comparison operation that can have > one of four results: >, <, =, or unordered. The standard does require > than NaN compares unordered with anything including itself, but it > does not follow that a language that defines an == operator with > boolean results must define it so that nan == nan is false. Are you proposing changes again? Because it sure sounds like you are unhappy with the status quo and will not take an explanation, however authoritative it is. Given a language with the 6 comparisons like Python (and most do), they have to be mapped to the IEEE comparison *somehow*, and I believe we chose one of the most logical translations imaginable (given that nobody likes == and != raising exceptions). -- --Guido van Rossum (python.org/~guido) From mikegraham at gmail.com Mon Oct 8 19:04:00 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 8 Oct 2012 13:04:00 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5072C972.5070207@python.org> Message-ID: On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: >> Python's standard library doesn't contain in interface to I/O Completion >> Ports. I think a common event loop system is a good reason to add IOCP >> if somebody is up for the challenge. >> >> Would you prefer an IOCP wrapper in the stdlib or your own version? >> Twisted has its own Cython based wrapper, some other libraries use a >> libevent-based solution. > > What's an IOCP? It's the non-crappy select equivalent on Windows. Mike From p.f.moore at gmail.com Mon Oct 8 19:07:25 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Oct 2012 18:07:25 +0100 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5072C972.5070207@python.org> Message-ID: On 8 October 2012 18:04, Mike Graham wrote: >> What's an IOCP? > > It's the non-crappy select equivalent on Windows. I/O Completion port, just for clarity :-) Paul. From ncoghlan at gmail.com Mon Oct 8 19:41:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Oct 2012 23:11:42 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 4:41 PM, Paul Moore wrote: > On 8 October 2012 11:31, Nick Coghlan wrote: >> It's important to remember that you can't readily search for syntactic >> characters or common method names to find out what they mean, and >> these days that kind of thing should be taken into account when >> designing an API. "p.subpath('foo', 'bar')" looks like executable >> pseudocode for creating a new path based on existing one to me, unlike >> "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". > > Until precisely this point in your email, I'd been completely > confused, because I thought that p.supbath(xxx) was some sort of "is > xxx a subpath of p" query. That's OK, I don't set the bar for my mnemonics *that* high: I use Guido's rule that good names are easy to remember once you know what they mean. Being able to guess precisely just from the name is a nice bonus, but not strictly necessary. > It never occurred to me that it was the > os.path.join equivalent operation. In fact, I'm not sure where you got > it from, as I couldn't find it in either the PEP or in pathlib's > documentation. I made it up by using "make subpath" as the reverse of "get relative path". The "is subpath" query could be handled by calling "b.startswith(a)". I'd be fine with "joinpath" as well (that is what path.py uses to avoid the conflict with str.join) > I'm not unhappy with using a method for creating a new path based on > an existing one (none of the operator forms seems particularly > compelling to me) but I really don't like subpath as a name. > > I don't dislike p.join(parts) as it links back nicely to os.path.join. > I can't honestly see anyone getting confused in practice. But I'm not > so convinced that I would want to insist on it. I really don't like it because of the semantic conflict with str.join. That semantic conflict is the reason I only do "from os.path import join as joinpath" or else call it as "os.path.join" - I find that using the bare "join" directly is too hard to interpret when reading code. I consider .append() and .extend() unacceptable for the same reason - they're too closely tied to mutating method semantics on sequences. > -0 on a convenience operator form. Mainly because "only one way to do > it" and the general controversy over which is the best operator to > use, suggests that leaving the operator form out altogether at least > in the initial implementation is the better option. Right, this is my main point as well. The method form *has* to exist. I am *not* convinced that the cute syntactic shorthands actually *improve* readability - they improve *brevity*, but that's not the same thing. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Oct 8 19:59:58 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 8 Oct 2012 23:29:58 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman wrote: > Responding late, but I didn't get a chance to get my very strong > feelings on this proposal in yesterday. > > I do not like it. I'll give full disclosure and say that I think our > earlier failure to include the path library in the stdlib has been a > loss for Python and I'll always hope we can fix that one day. I still > hold out hope. > > It feels like this proposal is "make it object oriented, because > object oriented is good" without any actual justification or obvious > problem this solves. The API looks clunky and redundant, and does not > appear to actually improve anything over the facilities in the os.path > module. This takes a lot of things we can already do with paths and > files and remixes them into a not-so intuitive API for the sake of > change, not for the sake of solving a real problem. The PEP needs to better articulate the rationale, but the key points are: - better abstraction and encapsulation of cross-platform logic so file manipulation algorithms written on Windows are more likely to work correctly on POSIX systems (and vice-versa) - improved ability to manipulate paths with Windows semantics on a POSIX system (and vice-versa) - better support for creation of "mock" filesystem APIs > As for specific problems I have with the proposal: > > Frankly, I think not keeping the / operator for joining is a huge > mistake. This is the number one best feature of path and despite that > many people don't like it, it makes sense. It makes our most common > path operation read very close to the actual representation of the > what you're creating. This is great. It trades readability (and discoverability) for brevity. Not good. > Not inheriting from str means that we can't directly path these path > objects to existing code that just expects a string, so we have a > really hard boundary around the edges of this new API. It does not > lend itself well to incrementally transitioning to it from existing > code. It's the exact design philosophy as was used in the creation of the new ipaddress module: the objects in ipaddress must still be converted to a string or integer before they can be passed to other operations (such as the socket module APIs). Strings and integers remain the data interchange formats here as well (although far more focused on strings in the path case). > > The stat operations and other file-facilities tacked on feel out of > place, and limited. Why does it make sense to add these facilities to > path and not other file operations? Why not give me a read method on > paths? or maybe a copy? Putting lots of file facilities on a path > object feels wrong because you can't extend it easily. This is one > place that function(thing) works better than thing.function() Indeed, I'm personally much happier with the "pure" path classes than I am with the ones that can do filesystem manipulation. Having both "p.open(mode)" and "open(str(p), mode)" seems strange. OTOH, I can see the attraction in being able to better fake filesystem access through the method API, so I'm willing to go along with it. > Overall, I'm completely -1 on the whole thing. I find this very hard to square with your enthusiastic support for path.py. Like ipaddr, which needed to clean up its semantic model before it could be included in the standard library (as ipaddress), we need a clean cross-platform semantic model for path objects before a convenience API can be added for manipulating them. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Mon Oct 8 20:23:45 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Oct 2012 05:23:45 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: <50731A31.30606@pearwood.info> On 08/10/12 21:31, Nick Coghlan wrote: > I've said before that I like the general shape of the pathlib API and > that's still the case. It's the only OO API I've seen that's > semantically clean enough for me to support introducing it as "the" > standard path abstraction in the standard library. The use of indexing to join path components: # Example from the PEP >>> p = PurePosixPath('foo') >>> p['bar'] PurePosixPath('foo/bar') is an absolute deal breaker for me. I'd rather stick with the status quo than have to deal with something which so clearly shouts "index/key lookup" but does something radically different (join/concatenate components). I would *much* rather use the / or + operator, but I think even better (and less likely to cause arguments about the operator) is an explicit `join` method. After all, we call it "joining path components", so the name is intuitive (at least for English speakers) and simple. I don't believe that there will be confusion with str.join -- we already have an os.path.join method, and I haven't seen any sign of confusion caused by that. [...] > It's important to remember that you can't readily search for syntactic > characters or common method names to find out what they mean, and > these days that kind of thing should be taken into account when > designing an API. To some degree, that's a failure of the search engine, not of the language. Why can't we type "symbol=+" into the search field and get information about addition? If Google can let you do mathematical calculations in their search field, surely we could search for symbols? But I digress. >"p.subpath('foo', 'bar')" looks like executable > pseudocode for creating a new path based on existing one to me, That notation quite possibly goes beyond unintuitive to downright perverse. You are using a method called "subpath" to generate a *superpath* (deeper, longer path which includes p as a part). http://en.wiktionary.org/wiki/subpath Given: p = /a/b/c q = /a/b/c/d/e # p.subpath(d, e) p is a subpath of q, not the other way around: q is a path PLUS some subdirectories of that path, i.e. a longer path. It's also a pretty unusual term outside of graph theory: Googling finds fewer than 400,000 references to "subpath". It gets used in graphics applications, some games, and in an extension to mercurial for adding symbolic names to repo URLs. I can't see any sign that it is used in the sense you intend. > unlike > "p / 'foo' / 'bar'", "p['foo', 'bar']", or "p.join('foo', 'bar')". Okay, I'll grant you that we'll probably never get a consensus on operators + versus / but I really don't understand why you think that p.join is unsuitable for a method which joins path components. -- Steven From solipsis at pitrou.net Mon Oct 8 20:36:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 20:36:37 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <5072C972.5070207@python.org> Message-ID: <20121008203637.5b0c147d@pitrou.net> On Mon, 8 Oct 2012 13:04:00 -0400 Mike Graham wrote: > On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum wrote: > > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: > >> Python's standard library doesn't contain in interface to I/O Completion > >> Ports. I think a common event loop system is a good reason to add IOCP > >> if somebody is up for the challenge. > >> > >> Would you prefer an IOCP wrapper in the stdlib or your own version? > >> Twisted has its own Cython based wrapper, some other libraries use a > >> libevent-based solution. > > > > What's an IOCP? > > It's the non-crappy select equivalent on Windows. Except that it's not exactly an equivalent, it's a whole different programming model ;) (but I understand what you mean: it allows to do non-blocking I/O on an arbitrary number of objects in parallel) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Mon Oct 8 20:39:03 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 11:39:03 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <507308A6.60109@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 10:36 AM, Guido van Rossum wrote: > >>> It's not about equality. If you ask whether two NaNs are *unequal* the >>> answer is *also* False. >> >> Does this mean that the following behaviour of lists is a bug? >> >>> x=float('NAN') >> >>> [x]==[x], [x]<=[x], [x]>=[x] >> (True, True, True) > > No. That's a special case in the comparisons for sequences. [Now that I'm back at a real keyboard I can elaborate...] This applies to all container comparisons: without the rule that if two contained items reference the same object they are to be considered equal without calling their __eq__, containers couldn't take the shortcut that a container is always equal to itself (i.e. c1 is c2 => c1 == c2). Without this shortcut, container comparisons would be much more expensive: any time a large container was compared to itself, it would be forced to recursively compare all the contained items. You might say that it has to do this anyway when comparing to a container that is not itself, but if the anser is "unequal" the comparison can stop as soon as two unequal items are found, whereas if the answer is "equal" you end up comparing all items. For two different containers there is no possible shortcut, but comparing a container to itself is quite common and really does deserve the shortcut. We discussed this in the past and always came to the same conclusion: despite the rules for NaN, the shortcut for containers is required. A similar shortcut exists for 'x in [x]' BTW. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Mon Oct 8 20:39:23 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 00:09:23 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50731A31.30606@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> Message-ID: On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano wrote: >> "p.subpath('foo', 'bar')" looks like executable >> pseudocode for creating a new path based on existing one to me, > > > That notation quite possibly goes beyond unintuitive to downright > perverse. You are using a method called "subpath" to generate a > *superpath* (deeper, longer path which includes p as a part). Huh? It's a tree structure. A subpath lives inside its parent path, just as subnodes are children of their parent node. Agreed it's not a widely used term though - it's a generalisation of subdirectory to also cover file paths. They're certainly not "super" anything, any more than a subdirectory is really a superdirectory (which is what you appear to be arguing). > Okay, I'll grant you that we'll probably never get a consensus on > operators + versus / but I really don't understand why you think that > p.join is unsuitable for a method which joins path components. "p.join(r)" has exactly the same problem as "p + r": pass in a string to a function expecting a path object and you get data corruption instead of an exception. When you want *different* semantics, then ducktyping is your enemy and it's necessary to take steps to avoid it, include changing method names and avoiding some operators. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Oct 8 20:40:28 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 11:40:28 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121008203637.5b0c147d@pitrou.net> References: <5072C972.5070207@python.org> <20121008203637.5b0c147d@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 11:36 AM, Antoine Pitrou wrote: > On Mon, 8 Oct 2012 13:04:00 -0400 > Mike Graham wrote: >> On Mon, Oct 8, 2012 at 11:35 AM, Guido van Rossum wrote: >> > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: >> >> Python's standard library doesn't contain in interface to I/O Completion >> >> Ports. I think a common event loop system is a good reason to add IOCP >> >> if somebody is up for the challenge. >> >> >> >> Would you prefer an IOCP wrapper in the stdlib or your own version? >> >> Twisted has its own Cython based wrapper, some other libraries use a >> >> libevent-based solution. >> > >> > What's an IOCP? >> >> It's the non-crappy select equivalent on Windows. > > Except that it's not exactly an equivalent, it's a whole different > programming model ;) > > (but I understand what you mean: it allows to do non-blocking I/O on an > arbitrary number of objects in parallel) Now I know what it is I think that (a) the abstract reactor design should support IOCP, and (b) the stdlib should have enabled by default IOCP when on Windows. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Oct 8 20:40:14 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 20:40:14 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> Message-ID: <20121008204014.10ba901e@pitrou.net> On Mon, 8 Oct 2012 10:06:17 -0600 Andrew McNabb wrote: > > Since this really is a matter of personal taste, I'll end my > participation in this discussion by voicing support for Nick Coghlan's > suggestion of a `join` method, whether it's named `join` or `append` or > something else. The join() method already exists in the current PEP, but it's less convenient, synctatically, than either '[]' or '/'. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From p.f.moore at gmail.com Mon Oct 8 20:47:43 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Oct 2012 19:47:43 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> Message-ID: On 8 October 2012 19:39, Nick Coghlan wrote: >> Okay, I'll grant you that we'll probably never get a consensus on >> operators + versus / but I really don't understand why you think that >> p.join is unsuitable for a method which joins path components. > > "p.join(r)" has exactly the same problem as "p + r": pass in a string > to a function expecting a path object and you get data corruption > instead of an exception. When you want *different* semantics, then > ducktyping is your enemy and it's necessary to take steps to avoid it, > include changing method names and avoiding some operators. Ah, OK. I understand your objection now. I concede that Path.join() is a bad idea based on this. I still don't like subpath() though. And pathjoin() is too likely to be redundant in real code: temp_path = Path(tempfile.mkdtemp()) generated_file = temp_path.pathjoin('data_extract.csv') I can't think of a better term, though :-( Paul From massimo.dipierro at gmail.com Mon Oct 8 20:48:05 2012 From: massimo.dipierro at gmail.com (massimo.dipierro at gmail.com) Date: Mon, 8 Oct 2012 11:48:05 -0700 (PDT) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths Message-ID: <45903572.1732.1349722088578.JavaMail.seven@ap8.p0.sjc.7sys.net> An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 8 20:49:03 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 00:19:03 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008204014.10ba901e@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou wrote: > On Mon, 8 Oct 2012 10:06:17 -0600 > Andrew McNabb wrote: >> >> Since this really is a matter of personal taste, I'll end my >> participation in this discussion by voicing support for Nick Coghlan's >> suggestion of a `join` method, whether it's named `join` or `append` or >> something else. > > The join() method already exists in the current PEP, but it's less > convenient, synctatically, than either '[]' or '/'. Right. My objections boil down to: 1. The case has not been adequately made that a second way to do it is needed. Therefore, the initial version should just include the method API. 2. Using "join" as the method name is a bad idea for the same reason that using "+" as the operator syntax would be a bad idea: it can cause erroneous output instead of an exception if a string is passed where a Path object is expected. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Mon Oct 8 20:47:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 20:47:07 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax Message-ID: <20121008204707.48559bf9@pitrou.net> Hello, Since there has been some controversy about the joining syntax used in PEP 428 (filesystem path objects), I would like to run an informal poll about it. Please answer with +1/+0/-0/-1 for each proposal: - `p[q]` joins path q to path p - `p + q` joins path q to path p - `p / q` joins path q to path p - `p.join(q)` joins path q to path p (you can include a rationale if you want, but don't forget to vote :-)) Thank you Antoine. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Mon Oct 8 20:53:11 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 11:53:11 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 11:49 AM, Nick Coghlan wrote: > On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou wrote: >> On Mon, 8 Oct 2012 10:06:17 -0600 >> Andrew McNabb wrote: >>> >>> Since this really is a matter of personal taste, I'll end my >>> participation in this discussion by voicing support for Nick Coghlan's >>> suggestion of a `join` method, whether it's named `join` or `append` or >>> something else. >> >> The join() method already exists in the current PEP, but it's less >> convenient, synctatically, than either '[]' or '/'. > > Right. My objections boil down to: > > 1. The case has not been adequately made that a second way to do it is > needed. Therefore, the initial version should just include the method > API. > > 2. Using "join" as the method name is a bad idea for the same reason > that using "+" as the operator syntax would be a bad idea: it can > cause erroneous output instead of an exception if a string is passed > where a Path object is expected. It took me a while before I realized that 'abc'.join('def') already has a meaning (returning 'dabceabcf'). But yes, this makes it a poor choice for a Path method. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 20:54:06 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 11:54:06 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: I don't like any of those; I'd vote for another regular method, maybe p.pathjoin(q). On Mon, Oct 8, 2012 at 11:47 AM, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p > - `p + q` joins path q to path p > - `p / q` joins path q to path p > - `p.join(q)` joins path q to path p > > (you can include a rationale if you want, but don't forget to vote :-)) > > Thank you > > Antoine. > > > -- > Software development and contracting: http://pro.pitrou.net > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Oct 8 20:51:39 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 20:51:39 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> Message-ID: <20121008205139.3e3c7463@pitrou.net> On Tue, 9 Oct 2012 00:09:23 +0530 Nick Coghlan wrote: > On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano wrote: > >> "p.subpath('foo', 'bar')" looks like executable > >> pseudocode for creating a new path based on existing one to me, > > > > > > That notation quite possibly goes beyond unintuitive to downright > > perverse. You are using a method called "subpath" to generate a > > *superpath* (deeper, longer path which includes p as a part). > > Huh? It's a tree structure. A subpath lives inside its parent path, > just as subnodes are children of their parent node. Agreed it's not a > widely used term though - it's a generalisation of subdirectory to > also cover file paths. Well, it's a "subpath", except when it isn't: >>> p = Path('a') >>> p.join('/b') PosixPath('/b') I have to admit I didn't understand what your meant by "subpath" until you explained that it was another name for "join". It really don't think it's a good name. child() would be a good name, except for the case above where you join with an absolute path (above). Actually, child() could be a variant of join() which wouldn't allow for absolute arguments. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Mon Oct 8 20:56:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 20:56:34 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> Message-ID: <20121008205634.113419ea@pitrou.net> On Tue, 9 Oct 2012 00:19:03 +0530 Nick Coghlan wrote: > > > > The join() method already exists in the current PEP, but it's less > > convenient, synctatically, than either '[]' or '/'. > > Right. My objections boil down to: > > 1. The case has not been adequately made that a second way to do it is > needed. Therefore, the initial version should just include the method > API. But you really want a short method name, otherwise it's better to have a dedicated operator. joinpath() definitely doesn't cut it, IMO. (perhaps that's the same reason I am reluctant to use str.format() :-)) By the way, I also thought of using __call__, but for some reason I think it tastes a bit bad ("too clever"?). > 2. Using "join" as the method name is a bad idea for the same reason > that using "+" as the operator syntax would be a bad idea: it can > cause erroneous output instead of an exception if a string is passed > where a Path object is expected. Admitted, although I think the potential for confusion is smaller than with "+" (I can't really articulate why, it's just that I fear one much less than the other :-)). Regards Antione. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Mon Oct 8 21:04:56 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 12:04:56 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008205634.113419ea@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou wrote: > On Tue, 9 Oct 2012 00:19:03 +0530 > Nick Coghlan wrote: >> > >> > The join() method already exists in the current PEP, but it's less >> > convenient, synctatically, than either '[]' or '/'. >> >> Right. My objections boil down to: >> >> 1. The case has not been adequately made that a second way to do it is >> needed. Therefore, the initial version should just include the method >> API. > > But you really want a short method name, otherwise it's better to have > a dedicated operator. joinpath() definitely doesn't cut it, IMO. Maybe you're overreacting? The current notation for this operation is os.path.join(p, q) which is even longer than p.pathjoin(q). To me the latter is fine. > (perhaps that's the same reason I am reluctant to use str.format() :-)) > > By the way, I also thought of using __call__, but for some reason I > think it tastes a bit bad ("too clever"?). __call__ overloading is often overused. Please don't go there. It is really hard to figure out what some (semi-)obscure operation means if it uses __call__ overloading. >> 2. Using "join" as the method name is a bad idea for the same reason >> that using "+" as the operator syntax would be a bad idea: it can >> cause erroneous output instead of an exception if a string is passed >> where a Path object is expected. > > Admitted, although I think the potential for confusion is smaller > than with "+" (I can't really articulate why, it's just that I fear > one much less than the other :-)). Personally I fear '+' much more -- to me, + can be used to add an extension without adding a new directory level. If we *have* to overload an operator, I'd prefer p/q over p[q] any day. -- --Guido van Rossum (python.org/~guido) From stefan at bytereef.org Mon Oct 8 21:14:44 2012 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 8 Oct 2012 21:14:44 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: <20121008191444.GA28668@sleipnir.bytereef.org> Guido van Rossum wrote: > Personally I fear '+' much more -- to me, + can be used to add an > extension without adding a new directory level. If we *have* to > overload an operator, I'd prefer p/q over p[q] any day. '^' or '@' are used for concatenation in some languages. At least accidental confusion with xor is pretty unlikely. Stefan Krah From ncoghlan at gmail.com Mon Oct 8 21:15:41 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 00:45:41 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: > I don't like any of those; I'd vote for another regular method, maybe > p.pathjoin(q). My own current preference is to take "p.joinpath(q)" straight from path.py (https://github.com/jaraco/path.py/blob/master/path.py#L236). My rationale for disliking all of the poll options (clarified during the previous discussions, so I can summarise it better now): "p[q]", "p + q", "p / q": A method API is desirable *anyway* (for better integration with all the tools that deal with callables in general), and no compelling justification has been provided for offering two ways to do it (mere brevity when writing doesn't cut it, when the result is something that is more cryptic when reading and learning). "p + q", "p.join(q)": passing strings where path objects are needed is expected to be a common error mode, especially for people just starting to use the new API. It is desirable that such errors produce an exception rather than silently producing an incorrect string. I don't *love* joinpath as a name, I just don't actively dislike it the way I do the four presented options (and it has the virtue of the path.py precedent). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ethan at stoneleaf.us Mon Oct 8 20:58:50 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 08 Oct 2012 11:58:50 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <5073226A.50307@stoneleaf.us> `p[q]` -1 `p + q` -1 ('+' should just tack on to the filename field) `p / q` +1 `p.join(q)` +0 From phd at phdru.name Mon Oct 8 21:17:16 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 8 Oct 2012 23:17:16 +0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121008191716.GA29859@iskra.aviel.ru> On Mon, Oct 08, 2012 at 08:47:07PM +0200, Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1. Confusing with p[-2] > - `p + q` joins path q to path p -0. What is "path addition"? Concatenation? Joining? Puzzled... > - `p / q` joins path q to path p +0. Again, "path division" is a bit strange but at least I understand '/' is the separation symbol. > - `p.join(q)` joins path q to path p +1. That one I love best, even with the name "join". I used to use os.path.join() quite extensively so there is no chance I confuse that with str.join(). Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From dreamingforward at gmail.com Mon Oct 8 21:20:57 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Mon, 8 Oct 2012 14:20:57 -0500 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell wrote: >> I think there are >> actually two separate issues here and it's important to keep them >> distinct: at a low level, there is a need for a standardized event >> loop, while at a higher level there is a question of what asynchronous >> code should look like. > > Yes, yes. I tried to bring up thing distinction. I'm glad I didn't > completely fail. Perhaps this is obvious to others, but (like hinted at above) there seem to be two primary issues with event handlers: 1) event handlers for the machine-program interface (ex. network I/O) 2) event handlers for the program-user interface (ex. mouse I/O) While similar, my gut tell me they have to be handled in completely different way in order to preserve order (i.e. sanity). This issue, for me, has come up with wanting to make a p2p network application with VPython. MarkJ From python at mrabarnett.plus.com Mon Oct 8 21:22:06 2012 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 08 Oct 2012 20:22:06 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <507327DE.2020302@mrabarnett.plus.com> On 2012-10-08 19:47, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p -1. I would much prefer subscripting to be used to slice paths, e.g. p[-1] == os.path.basename(p). > - `p + q` joins path q to path p +0. I would prefer that to mean "join without directory separator", e.g. Path("/foo/bar") + ".txt" == Path("/foo/bar.txt"). > - `p / q` joins path q to path p +1. Join with directory separator, e.g. Path("/foo") / "bar" == Path("/foo/bar"). > - `p.join(q)` joins path q to path p +0 > > (you can include a rationale if you want, but don't forget to vote :-)) > From ncoghlan at gmail.com Mon Oct 8 21:24:03 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 00:54:03 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 12:34 AM, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 11:56 AM, Antoine Pitrou wrote: >> Admitted, although I think the potential for confusion is smaller >> than with "+" (I can't really articulate why, it's just that I fear >> one much less than the other :-)). > > Personally I fear '+' much more -- to me, + can be used to add an > extension without adding a new directory level. If we *have* to > overload an operator, I'd prefer p/q over p[q] any day. Yes, of all the syntactic shorthands, I also favour "/". However, I'm also a big fan of starting with a minimalist core and growing it. Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least isn't going backwards, and is more obvious in isolation than "a / b / c / d / e". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From amcnabb at mcnabbs.org Mon Oct 8 21:25:46 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Mon, 8 Oct 2012 13:25:46 -0600 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121008192546.GE1527@mcnabbs.org> On Mon, Oct 08, 2012 at 08:47:07PM +0200, Antoine Pitrou wrote: > > - `p[q]` joins path q to path p -1 > - `p + q` joins path q to path p -1 (or +0 if q is forbidden from being a string) > - `p / q` joins path q to path p -1 > - `p.join(q)` joins path q to path p +1 -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From ncoghlan at gmail.com Mon Oct 8 21:29:17 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 00:59:17 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: Reducing to numeric votes: p[q]: -1 (confusing w.r.t to indexing/slicing, not convinced it is needed) p + q : -1 (confusing w.r.t to strings, not convinced it is needed) p / q : -0 (not convinced it is needed) p.join(q): -0 (confusing w.r.t strings) p.joinpath(q): +1 (avoids confusion, path.py precedent, need a method API anyway) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From joshua.landau.ws at gmail.com Mon Oct 8 21:41:30 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 8 Oct 2012 20:41:30 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008191444.GA28668@sleipnir.bytereef.org> References: <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <20121008191444.GA28668@sleipnir.bytereef.org> Message-ID: On 8 October 2012 20:14, Stefan Krah wrote: > Guido van Rossum wrote: > > Personally I fear '+' much more -- to me, + can be used to add an > > extension without adding a new directory level. If we *have* to > > overload an operator, I'd prefer p/q over p[q] any day. > > '^' or '@' are used for concatenation in some languages. At least > accidental > confusion with xor is pretty unlikely. > On the basis that we want standard libraries to be non-contentious issues: is it not obvious that "+", "/" and "[]" *cannot* be the right choices as they're contentious? I would argue that a lot of this argument is ?pointless? because there is no right answer. For example, I prefer indexing out of the lot, but since a lot of people really dislike it I'm not going to bother vouching for it. I think we should ague more along the lines of: # Possibility for accidental validity if configdir is a string > configdir.join("myprogram") # A bit long > # My personal objection is that one shouldn't have to state "path" in the > name: it's not str.stringjoin() > configdir.joinpath("myprogram") > configdir.pathjoin("myprogram") # There's argument here, but I don't find them intuitive or nice > configdir.subpath("mypogram") > configdir.superpath("mypogram") # My favorites ('cause my opinion: so there) > configdir.child("myprogram") # Does sorta' imply IO > configdir.get("myprogram") # 'Cause it's short, but it does sorta' imply > IO > configdir.goto("myprogam") # "GOTO IS BAD!! BOO!" # What I'm surprised (but half-glad) hasn't been mentioned configdir.cd("myprogam") # Not a link, just GMail's silly-ness We already know the semantics for the function; now it's *just a name*. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Mon Oct 8 21:44:54 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 8 Oct 2012 15:44:54 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors Message-ID: I regularly see learners using "is" to check for string equality and sometimes other equality. Due to optimizations, they often come away thinking it worked for them. There are no cases where if x is "foo": or if x is 4: is actually the code someone intended to write. Although this has no benefit to anyone but new learners, it also doesn't really do any harm. Mike From guido at python.org Mon Oct 8 21:46:43 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 12:46:43 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50732385.2090800@btinternet.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <507308A6.60109@btinternet.com> <50732385.2090800@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 12:03 PM, Rob Cliffe wrote: > > On 08/10/2012 19:39, Guido van Rossum wrote: >> >> Does this mean that the following behaviour of lists is a bug? >>>>>>> >>>>>>> x=float('NAN') >>>>>>> [x]==[x], [x]<=[x], [x]>=[x] >>>> >>>> (True, True, True) >>> >>> No. That's a special case in the comparisons for sequences. >> >> [Now that I'm back at a real keyboard I can elaborate...] >> >> This applies to all container comparisons: without the rule that if >> two contained items reference the same object they are to be >> considered equal without calling their __eq__, containers couldn't >> take the shortcut that a container is always equal to itself (i.e. c1 >> is c2 => c1 == c2). Without this shortcut, container comparisons would >> be much more expensive: any time a large container was compared to >> itself, it would be forced to recursively compare all the contained >> items. You might say that it has to do this anyway when comparing to a >> container that is not itself, but if the anser is "unequal" the >> comparison can stop as soon as two unequal items are found, whereas if >> the answer is "equal" you end up comparing all items. For two >> different containers there is no possible shortcut, but comparing a >> container to itself is quite common and really does deserve the >> shortcut. We discussed this in the past and always came to the same >> conclusion: despite the rules for NaN, the shortcut for containers is >> required. A similar shortcut exists for 'x in [x]' BTW. >> > Thank you for elaborating, I was going to ask what the justification for the > special case was. > You have explained why > >>>> x=float('NAN'); A=[x]; A==A > True > > but not as far as I can see why > >>>> x=float('NAN'); A=[x]; B=[x]; A==B, [x]=[x] > (True, True) > > where neither of the results is comparing a container to itself. It's so that when the container is iterating over pairs of elements it can check for item identity (a simple pointer comparison) first, which makes a pretty big difference in speed. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 21:48:07 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 12:48:07 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: > I regularly see learners using "is" to check for string equality and > sometimes other equality. Due to optimizations, they often come away > thinking it worked for them. > > There are no cases where > > if x is "foo": > > or > > if x is 4: > > is actually the code someone intended to write. > > Although this has no benefit to anyone but new learners, it also > doesn't really do any harm. I think the best we can do is to make these SyntaxWarnings. I had the same thought recently and I do agree that these are common beginners mistakes that can easily hide bugs by succeeding in simple tests. -- --Guido van Rossum (python.org/~guido) From masklinn at masklinn.net Mon Oct 8 21:59:53 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 8 Oct 2012 21:59:53 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: <0C5C1551-451A-4C7A-A773-17822E475C9F@masklinn.net> On 2012-10-08, at 21:48 , Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: >> I regularly see learners using "is" to check for string equality and >> sometimes other equality. Due to optimizations, they often come away >> thinking it worked for them. >> >> There are no cases where >> >> if x is "foo": >> >> or >> >> if x is 4: >> >> is actually the code someone intended to write. >> >> Although this has no benefit to anyone but new learners, it also >> doesn't really do any harm. > > I think the best we can do is to make these SyntaxWarnings. I had the > same thought recently and I do agree that these are common beginners > mistakes that can easily hide bugs by succeeding in simple tests. How would the rather common pattern of using an `object` instance as a placeholder be handled? An identity test precisely expresses what is meant and desired in that case, while an equality test does not. An other one which seems to have some serious usage in the stdlib is type-testing (e.g. collections.abc, decimal or tests of exception types). Without type inference, I'm not too sure how that could be handled as syntactic warnings, and as above an identity test expresses the purpose of the code better than an equality one. From massimo.dipierro at gmail.com Mon Oct 8 22:00:06 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 8 Oct 2012 15:00:06 -0500 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: p[q]: -1 p + q : +1 p / q : -1 p.join(q): -1 p.joinpath(q): -0 Looks like I am a minority. :-( Rationale: A directory structure with symbolic links is a graph. A path is an ordered set of links. A path can start anywhere in the graph and can end up anywhere. Links are represented by folder names. To me this means a natural representation of a Path as a list of strings which can be serialized in a OS-specific path. In fact today we all do, already: path.split(os.path.sep) and then manipulate the resulting list. Representing the Path with an object that has the same API as a list of strings (add, radd, append, insert, get item, get slice) and a few extra ones, will make it easier for new users to understand it and remember the APIs. I do not like p[q] and p/q because they fire the wrong neurons in my brain. p[q] picks an element in a set, p/q picks a subset of p. I also do not like p.join because p is not a string and it may be confusing. I am not opposed to q.joinpath(q) but it would require that users learn a new API. They cannot just guess it. they would have to look it up. That gives aways the main reason I use Python: it is intuitive. Moreover p.joinpath(q) seems to indicate that q is a path but q could be a string not a Path. Massimo On Oct 8, 2012, at 2:29 PM, Nick Coghlan wrote: > Reducing to numeric votes: > > p[q]: -1 (confusing w.r.t to indexing/slicing, not convinced it is needed) > p + q : -1 (confusing w.r.t to strings, not convinced it is needed) > p / q : -0 (not convinced it is needed) > p.join(q): -0 (confusing w.r.t strings) > p.joinpath(q): +1 (avoids confusion, path.py precedent, need a method > API anyway) > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From mwm at mired.org Mon Oct 8 22:05:25 2012 From: mwm at mired.org (Mike Meyer) Date: Mon, 8 Oct 2012 15:05:25 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008191444.GA28668@sleipnir.bytereef.org> References: <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <20121008191444.GA28668@sleipnir.bytereef.org> Message-ID: On Mon, Oct 8, 2012 at 2:14 PM, Stefan Krah wrote: > Guido van Rossum wrote: >> Personally I fear '+' much more -- to me, + can be used to add an >> extension without adding a new directory level. If we *have* to >> overload an operator, I'd prefer p/q over p[q] any day. > > '^' or '@' are used for concatenation in some languages. At least accidental > confusion with xor is pretty unlikely. @? I like it (@ is used for array indexing in some languages), but don't see a special method for it..... Maybe you meant **? From guido at python.org Mon Oct 8 22:00:45 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 13:00:45 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 12:20 PM, Mark Adam wrote: > On Sun, Oct 7, 2012 at 9:01 PM, Guido van Rossum wrote: >> On Sun, Oct 7, 2012 at 6:41 PM, Ben Darnell wrote: >>> I think there are >>> actually two separate issues here and it's important to keep them >>> distinct: at a low level, there is a need for a standardized event >>> loop, while at a higher level there is a question of what asynchronous >>> code should look like. >> >> Yes, yes. I tried to bring up this distinction. I'm glad I didn't >> completely fail. > > Perhaps this is obvious to others, but (like hinted at above) there > seem to be two primary issues with event handlers: > > 1) event handlers for the machine-program interface (ex. network I/O) > 2) event handlers for the program-user interface (ex. mouse I/O) > > While similar, my gut tell me they have to be handled in completely > different way in order to preserve order (i.e. sanity). > > This issue, for me, has come up with wanting to make a p2p network > application with VPython. Interesting. I agree that these are different in nature, but I think it would still be useful to have a single event loop ("reactor") that can multiplex them together. I think where the paths diverge is when it comes to the signature of the callback; for GUI events there is certain standard structure that must be passed to the callback and which isn't readily available when you *specify* the callback. OTOH for your typical socket event the callback can just call the appropriate method on the socket once it knows the socket is ready. But still, in many cases I would like to see these all serialized in the same thread and multiplexed according to some kind of assigned or implied priorities, and IIRC, GUI events often are "collapsed" (e.g. multple redraw events for the same window, or multiple mouse motion events). I also imagine the typical GUI event loop has hooks for integrating file descriptor polling, or perhaps it gives you a file descriptor to add to your select/poll/etc. map. Also, doesn't the Windows IOCP unify the two? -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 8 22:07:37 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 13:07:37 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: <0C5C1551-451A-4C7A-A773-17822E475C9F@masklinn.net> References: <0C5C1551-451A-4C7A-A773-17822E475C9F@masklinn.net> Message-ID: On Mon, Oct 8, 2012 at 12:59 PM, Masklinn wrote: > On 2012-10-08, at 21:48 , Guido van Rossum wrote: >> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: >>> I regularly see learners using "is" to check for string equality and >>> sometimes other equality. Due to optimizations, they often come away >>> thinking it worked for them. >>> >>> There are no cases where >>> >>> if x is "foo": >>> >>> or >>> >>> if x is 4: >>> >>> is actually the code someone intended to write. >>> >>> Although this has no benefit to anyone but new learners, it also >>> doesn't really do any harm. >> >> I think the best we can do is to make these SyntaxWarnings. I had the >> same thought recently and I do agree that these are common beginners >> mistakes that can easily hide bugs by succeeding in simple tests. > > How would the rather common pattern of using an `object` instance as a > placeholder be handled? An identity test precisely expresses what is > meant and desired in that case, while an equality test does not. It wouldn't be affected. The warning should only be emitted if either argument to 'is' is a literal number or string. Even if x could be an object instance I still don't see how it would lend meaning to "if x is 4:". > An other one which seems to have some serious usage in the stdlib is > type-testing (e.g. collections.abc, decimal or tests of exception types). > Without type inference, I'm not too sure how that could be handled > as syntactic warnings, and as above an identity test expresses the > purpose of the code better than an equality one. Looks like you're mistaking the proposal for "reject 'is' whenever either argument is a numeric or string value". The proposal is meant to look at the source code and only trigger if a *literal* of those types is used. -- --Guido van Rossum (python.org/~guido) From masklinn at masklinn.net Mon Oct 8 22:08:05 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 8 Oct 2012 22:08:05 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 2012-10-08, at 20:47 , Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1 > - `p + q` joins path q to path p -1 > - `p / q` joins path q to path p +0, looks like a unix path although others will have issues > - `p.join(q)` joins path q to path p +1, especially if `p.join(*q)`, strongly reminiscent of os.path.join (which I often import "bare" in path-heavy code), I don't think the common naming with str.join is an issue anymore than it is for threading.Thread.join. > - `p.joinpath(q)` joins path q to path p same as `join`, although more of a +0.9 as it's longer without benefits. From masklinn at masklinn.net Mon Oct 8 22:14:52 2012 From: masklinn at masklinn.net (Masklinn) Date: Mon, 8 Oct 2012 22:14:52 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <0C5C1551-451A-4C7A-A773-17822E475C9F@masklinn.net> Message-ID: On 2012-10-08, at 22:07 , Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 12:59 PM, Masklinn wrote: >> On 2012-10-08, at 21:48 , Guido van Rossum wrote: >>> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: >>>> I regularly see learners using "is" to check for string equality and >>>> sometimes other equality. Due to optimizations, they often come away >>>> thinking it worked for them. >>>> >>>> There are no cases where >>>> >>>> if x is "foo": >>>> >>>> or >>>> >>>> if x is 4: >>>> >>>> is actually the code someone intended to write. >>>> >>>> Although this has no benefit to anyone but new learners, it also >>>> doesn't really do any harm. >>> >>> I think the best we can do is to make these SyntaxWarnings. I had the >>> same thought recently and I do agree that these are common beginners >>> mistakes that can easily hide bugs by succeeding in simple tests. >> >> How would the rather common pattern of using an `object` instance as a >> placeholder be handled? An identity test precisely expresses what is >> meant and desired in that case, while an equality test does not. > > It wouldn't be affected. The warning should only be emitted if either > argument to 'is' is a literal number or string. Even if x could be an > object instance I still don't see how it would lend meaning to "if x > is 4:". I went from the description and missed the "literals" part of "non-singleton literals". Sorry about that. From mark at hotpy.org Mon Oct 8 22:20:12 2012 From: mark at hotpy.org (Mark Shannon) Date: Mon, 08 Oct 2012 21:20:12 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <5073357C.400@hotpy.org> On 08/10/12 19:47, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p -1 Counter intuitive > - `p + q` joins path q to path p -1 Confusion with strings > - `p / q` joins path q to path p +1 Matches (unix) file separator, no confusion with strings > - `p.join(q)` joins path q to path p -1 Confusion with strings again > p.pathjoin(q) +0 Cheers, Mark From p.f.moore at gmail.com Mon Oct 8 22:22:30 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 8 Oct 2012 21:22:30 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Monday, 8 October 2012, Antoine Pitrou wrote: > > > - `p[q]` joins path q to path p -1 it isn't really indexing > - `p + q` joins path q to path p -1 risk of ambiguity (string concatenation, e.g. it's too easy to assume you can add an extension with p + '.txt') > - `p / q` joins path q to path p -0 best of the operator options > - `p.join(q)` joins path q to path p +0 would like it except for the risk of silent errors if p is a string p.joinpath(q) +1 I wish there was a better name, but I doubt one will appear :-( Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Mon Oct 8 22:26:49 2012 From: barry at python.org (Barry Warsaw) Date: Mon, 8 Oct 2012 16:26:49 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors References: Message-ID: <20121008162649.73989cc4@resist.wooz.org> On Oct 08, 2012, at 03:44 PM, Mike Graham wrote: >I regularly see learners using "is" to check for string equality and >sometimes other equality. Due to optimizations, they often come away >thinking it worked for them. > >There are no cases where > > if x is "foo": > >or > > if x is 4: > >is actually the code someone intended to write. > >Although this has no benefit to anyone but new learners, it also >doesn't really do any harm. Conversely, I often see this: if x == None and even if x == True Okay, so maybe these are less harmful than the original complaint, but still, yuck! -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From joshua.landau.ws at gmail.com Mon Oct 8 22:38:31 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 8 Oct 2012 21:38:31 +0100 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: <20121008162649.73989cc4@resist.wooz.org> References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: On 8 October 2012 21:26, Barry Warsaw wrote: > On Oct 08, 2012, at 03:44 PM, Mike Graham wrote: > > >I regularly see learners using "is" to check for string equality and > >sometimes other equality. Due to optimizations, they often come away > >thinking it worked for them. > > > >There are no cases where > > > > if x is "foo": > > > >or > > > > if x is 4: > > > >is actually the code someone intended to write. > > > >Although this has no benefit to anyone but new learners, it also > >doesn't really do any harm. > > Conversely, I often see this: > > if x == None > > and even > > if x == True > > Okay, so maybe these are less harmful than the original complaint, but > still, > yuck! > We can't really warn against these. >>> class EqualToTrue: > ... def __eq__(self, other): > ... return other is True > ... > >>> EqualToTrue() is True > False > >>> EqualToTrue() == True > True -------------- next part -------------- An HTML attachment was scrubbed... URL: From ned at nedbatchelder.com Mon Oct 8 22:39:52 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 08 Oct 2012 16:39:52 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> Message-ID: <50733A18.10400@nedbatchelder.com> On 10/8/2012 12:25 PM, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder wrote: >> I don't understand the reluctance to address a common conceptual speed-bump >> in the docs. After all, the tutorial has an entire chapter >> (http://docs.python.org/tutorial/floatingpoint.html) that explains how >> floats work, even though they work exactly as IEEE 754 says they should. > I'm sorry. I didn't intend to refuse to document the behavior. I was > mostly reacting to things I thought I read between the lines -- the > suggestion that there is no reason for the NaN behavior except silly > compatibility with an old standard that nobody cares about. From this > it is only a small step to reading (again between the lines) the > suggesting to change the behavior. > >> A sentence in section 5.4 (Numeric Types) would help. Something like, "In >> accordance with the IEEE 754 standard, NaN's are not equal to any value, >> even another NaN. This is because NaN doesn't represent a particular >> number, it represents an unknown result, and there is no way to know if one >> unknown result is equal to another unknown result." > That sounds like a great addition to the docs, except for the nit that > I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs" > myself. Also, the words here can still cause confusion. The exact > behavior is that every one of the 6 comparison operators (==, !=, <, > <=, >, >=) returns False when either argument (or both) is a NaN. I > think your suggested words could lead someone to believe that they > mean that x != NaN or NaN != Nan would return True. > > Anyway, once we can agree to words I agree that we should update that section. > How about: "In accordance with the IEEE 754 standard, when NaNs are compared to any value, even another NaN, the result is always False, regardless of the comparison. This is because NaN represents an unknown result. There is no way to know the relationship between an unknown result and any other result, especially another unknown one. Even comparing a NaN to itself always produces False." --Ned. From guido at python.org Mon Oct 8 22:47:53 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 13:47:53 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50733A18.10400@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> Message-ID: On Mon, Oct 8, 2012 at 1:39 PM, Ned Batchelder wrote: > On 10/8/2012 12:25 PM, Guido van Rossum wrote: >> Anyway, once we can agree to words I agree that we should update that >> section. >> > How about: > > "In accordance with the IEEE 754 standard, when NaNs are compared to any > value, even another NaN, the result is always False, regardless of the > comparison. This is because NaN represents an unknown result. There is no > way to know the relationship between an unknown result and any other result, > especially another unknown one. Even comparing a NaN to itself always > produces False." Sounds good. (But now maybe we also need to come clean with the exceptions for NaNs compared as part of container comparisons?) -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Mon Oct 8 22:51:14 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Oct 2012 16:51:14 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: On 10/8/2012 12:19 PM, Guido van Rossum wrote: > I am not aware of an update to the standard. Being 20 years old does > not make it outdated. Similarly, being hundreds or thousands of years old does not make the equality standard, which includes reflexivity of equality, outdated. The IEEE standard violated that older standard. http://bugs.python.org/issue4296 illustrates some of the problems than come with that violation. But given the compromise made to maintain sane behavior of Python's collection classes, I see little reason to change nan in isolation. I wonder if it would be helpful to make a NaN subclass of floats with its own arithmetic and comparison methods. This would clearly mark a nan as Not a Normal float. Since subclasses rule (at least some) binary operations*, this might also simplify normal float code. But perhaps this was considered and rejected before adding math.isnan in 2.6. (And ditto for infinities.) * in that class_ob op subclass_ob is delegated to subclass.__op__, but I am not sure if this applies only to arithmetic, comparisons, or both. -- Terry Jan Reedy From solipsis at pitrou.net Mon Oct 8 22:50:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 8 Oct 2012 22:50:47 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121008225047.213169c7@pitrou.net> I'm forwarding Barry's answer: -------- Message transf?r? -------- De: Barry Warsaw ?: Antoine Pitrou Sujet: Re: PEP 428: poll about the joining syntax Date: Mon, 8 Oct 2012 15:17:01 -0400 Like a good American low-information voter, I'll cast my ballot without having read PEP 428. On Oct 08, 2012, at 08:47 PM, Antoine Pitrou wrote: >- `p[q]` joins path q to path p -1 Definitely not intuitive. >- `p + q` joins path q to path p +0. IMHO, the most intuitive, but causes problems when you just want to tack on an extension, er, suffix. I guess if PathObj + str works it's not so bad. >- `p / q` joins path q to path p +0. Cute! Too *nix centric? >- `p.join(q)` joins path q to path p -0. Explicit (yay), but a bit verbose (boo). Maybe this should be the default underlying API, with one of the above as nice syntactic sugar? -Barry From storchaka at gmail.com Mon Oct 8 23:02:36 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 00:02:36 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 07.10.12 23:19, Guido van Rossum wrote: > If this is just about iterator.chain() I may see some value in it (but > TBH the discussion so far mostly confuses -- please spend some more > time coming up with good examples that show actually useful use cases > rather than f() and g() or foo() and bar()) Not I was the first one who showed an example with f() and g(). ;) I only showed that it was wrong analogy. Yes, first of all I think about itertools.chain(). But then I found all other iterator tools which also can be extended to better generators support. Perhaps. I have only one imperfect example for use of StopIterator's value from generator (my patch for issue16009). It is difficult to find examples for feature, which appeared only recently. But I think I can find them before 3.4 feature freezing. > OTOH yield from is not primarily for iterators -- it is for > coroutines. I suspect most of the itertools functionality just doesn't > work with coroutines. Indeed. But they work with subset of generators, and this subset can be extended. Please look at http://bugs.python.org/issue16150 (Implement generator interface in itertools.chain). Does it make sense? From storchaka at gmail.com Mon Oct 8 23:06:49 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 00:06:49 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 08.10.12 01:43, Oscar Benjamin wrote: > Hopefully, I've understood Serhiy and the docs correctly (I don't have > access to Python 3.3 right now to test any of this). Thank you for explanation and example, Oscar. From storchaka at gmail.com Mon Oct 8 23:12:18 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 00:12:18 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> Message-ID: On 08.10.12 05:40, Terry Reedy wrote: > Serhily, if you want a module of *generator* specific functions > ('gentools' ?), you should write one and submit it to pypi for testing. > In http://bugs.python.org/issue16150 there is proposed extending of itertools.chain to support generators (send(), throw() and close() methods). Is it wrong? From shibturn at gmail.com Mon Oct 8 23:17:07 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 08 Oct 2012 22:17:07 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 08/10/2012 9:22pm, Paul Moore wrote: > p.joinpath(q) > +1 I wish there was a better name, but I doubt one will appear :-( I would go for p.add(q) which at least has the virtue of brevity. Richard From tjreedy at udel.edu Mon Oct 8 23:17:56 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Oct 2012 17:17:56 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On 10/8/2012 12:47 PM, Guido van Rossum wrote: > this as well. Also, IIUC the IEEE library prescribes exceptions as > well as return values; e.g. "man 3 log" on my OSX computer says that > log(0) returns -inf as well as raise a divide-by-zero exception. So I > think this is probably compliant with the standard -- one can decide > to ignore the exceptions in certain contexts and honor them in others. > (Probably even the 1/0 behavior can be defended this way.) I agree. In C, as I remember, a function can both (passively) 'raise an exception' by setting errno *and* return a value. This requires the programmer to check for an exception, and forgetting to do so is a common bug. In Python, raising an exception actively aborts returning a value, so you had to choose one of the two behaviors. >> Some other operations behave inconsistently: >> >>>>> 2 * 10.**308 >> inf >> >> but >>>>> 10.**309 >> Traceback (most recent call last): >> File "", line 1, in >> OverflowError: (34, 'Result too large') > > Probably the same. IEEE 754 may be more complex than you think! Or this might be an accidental inconsistency, in that float multiplication was changed to return inf but pow was not. But I would be reluctant to fiddle with such details now. Alexander, while I might have chosen to make nan == nan True, I consider it a near tossup with no happy resolution and would not change it now. Guido's explanation is pretty clear: he went with the IEEE standard as interpreted for Python by Tim Peters. -- Terry Jan Reedy From storchaka at gmail.com Mon Oct 8 23:22:57 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 00:22:57 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <5070E4EA.5010904@canterbury.ac.nz> Message-ID: On 07.10.12 22:18, Richard Oudkerk wrote: > That means that all but the last return value is ignored. Why is the > last return value any more important than the earlier ones? Because I think the last return value more useful for idiom lookahead = next(iterator) process(lookahead) iterator = itertools.chain([lookahead], iterator) > ISTM it would make just as much sense to do > > def chain(*iterables): > values = [] > for it in iterables: > values.append(yield from it) > return values It changes the behavior for iterators. And now more difficult to get a generator which yields and returns the same values as the original. We need yet one wrapper. def lastvalue(generator): return (yield from generator)[-1] iterator = lastvalue(itertools.chain([lookahead], iterator)) Yes, it can work. From grosser.meister.morti at gmx.net Mon Oct 8 23:39:22 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Mon, 08 Oct 2012 23:39:22 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <5073480A.7020009@gmx.net> `p[q]` 0 `p + q` -1 `p / q` +0 `p.join(q)` +1 `p.pathjoin(q)` +0 Where .join/.pathjoin shall take argument lists. The arguments my be path objects or strings. Example usage (where filename is a string): >>> prefix.join(some,path,components,filename+".txt") I'm against + because how would you do the example above? Because this: >>> prefix + some + path + components + filename + ".txt" would do something different than this: >>> prefix + some + path + components + (filename + ".txt") Which might surprise a user and is in any case confusing. From storchaka at gmail.com Tue Oct 9 00:00:43 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 01:00:43 +0300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 08.10.12 21:47, Antoine Pitrou wrote: > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: Of course I have no right to vote, but because the poll is informal, I give my humble opinion. > - `p[q]` joins path q to path p -1. Counter intuitive and indexing can be used for path splitting. > - `p + q` joins path q to path p -1. Confusion with strings. path + str can be used for suffix appending. > - `p / q` joins path q to path p +1. Intuitive. No risk of conflicts. > - `p.join(q)` joins path q to path p -0. A bit confusion with strings. -0.1 verbose. +0.1 can have many arguments. +0.1 similar to os.path.join. -0.1 but have a little different semantic. > - `p.pathjoin(q)` joins path q to path p +0. Same as `p.join(q)`, but more verbose (-) and less confusion (+). From rosuav at gmail.com Tue Oct 9 00:02:32 2012 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 9 Oct 2012 09:02:32 +1100 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Tue, Oct 9, 2012 at 6:44 AM, Mike Graham wrote: > There are no cases where > > if x is "foo": > > is actually the code someone intended to write. Are literals guaranteed to be interned? If so, this code would make sense, if the programmer knows that x is itself an interned string. Although I guess a warning wouldn't be a problem there, as they're easily ignored/suppressed. ChrisA From greg.ewing at canterbury.ac.nz Tue Oct 9 00:11:26 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 11:11:26 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: <50734F8E.6010708@canterbury.ac.nz> Nick Coghlan wrote: > I'm not 100% sold on "subpath" as an alternative I don't much like the term "subpath" at all. To me it suggests extracting components out of the path somehow, rather than adding them on. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 00:15:26 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 11:15:26 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> Message-ID: <5073507E.7040802@canterbury.ac.nz> Paul Moore wrote: > "only one way to do > it" and the general controversy over which is the best operator to > use, suggests that leaving the operator form out altogether at least > in the initial implementation is the better option. Although if we start with a method, it will be impossible to add an operator later without there then being two ways to do it. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 00:18:54 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 11:18:54 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5072CF3B.2070203@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> Message-ID: <5073514E.5050904@canterbury.ac.nz> Steven D'Aprano wrote: > But it's just a name. __add__ doesn't necessarily perform addition, > __sub__ doesn't necessarily perform subtraction, and __or__ doesn't > necessarily have anything to do with either bitwise or boolean OR. Maybe they should have been called __plus__, __dash__, __star__, __slash__ etc., then we wouldn't keep having this argument... -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 00:23:53 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 11:23:53 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> Message-ID: <50735279.8080506@canterbury.ac.nz> Ronald Oussoren wrote: > neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. Even if they could, you wouldn't be entirely out of the woods, because different parts of the same path can be on different file systems... But how important is all this anyway? I'm trying to think of occasions when I've wanted to compare two entire paths for equality, and I can't think of *any*. -- Greg From guido at python.org Tue Oct 9 00:26:57 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 15:26:57 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 3:02 PM, Chris Angelico wrote: > On Tue, Oct 9, 2012 at 6:44 AM, Mike Graham wrote: >> There are no cases where >> >> if x is "foo": >> >> is actually the code someone intended to write. > > Are literals guaranteed to be interned? If so, this code would make > sense, if the programmer knows that x is itself an interned string. No, interning is not guaranteed. > Although I guess a warning wouldn't be a problem there, as they're > easily ignored/suppressed. -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Tue Oct 9 00:42:24 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 01:42:24 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On 08.10.12 22:44, Mike Graham wrote: > There are no cases where > > if x is "foo": I see such code in docutils (Doc/tools/docutils/writers/latex2e/__init__.py) > or > > if x is 4: and in tests (Lib/test/test_long.py, Lib/test/test_int.py, Lib/test/test_grammar.py, Lib/test/test_winsound.py). From guido at python.org Tue Oct 9 00:44:12 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 15:44:12 -0700 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On Mon, Oct 8, 2012 at 2:02 PM, Serhiy Storchaka wrote: > On 07.10.12 23:19, Guido van Rossum wrote: >> >> If this is just about iterator.chain() I may see some value in it (but >> TBH the discussion so far mostly confuses -- please spend some more >> time coming up with good examples that show actually useful use cases >> rather than f() and g() or foo() and bar()) > > > Not I was the first one who showed an example with f() and g(). ;) I only > showed that it was wrong analogy. > > Yes, first of all I think about itertools.chain(). But then I found all > other iterator tools which also can be extended to better generators > support. Perhaps. > > I have only one imperfect example for use of StopIterator's value from > generator (my patch for issue16009). I don't understand that code at all, and it seems to be undocumented (no docstrings, no mention in the external docs). Why is it using StopIteration at all? There isn't an iterator or generator in sight. AFAICT it should just use a different exception. But even if you did use StopIteration -- why would you care about itertools here? AFAICT it's just being used as a private communication channel between scan_once() and its caller. Where is the possibility to wrap anything in itertools at all? > It is difficult to find examples for > feature, which appeared only recently. But I think I can find them before > 3.4 feature freezing. I think you're going at this from the wrong direction. You shouldn't be using this feature in circumstances where you're at all likely to run into this "problem". >> OTOH yield from is not primarily for iterators -- it is for >> coroutines. I suspect most of the itertools functionality just doesn't >> work with coroutines.> > > Indeed. But they work with subset of generators, and this subset can be > extended. Please look at http://bugs.python.org/issue16150 (Implement > generator interface in itertools.chain). Does it make sense? But that just seems to perpetuate the idea that you have, which IMO is wrongheaded. Itertools is for iterators, and all the extra generator features make no sense for it. -- --Guido van Rossum (python.org/~guido) From Andy.Henshaw at gtri.gatech.edu Tue Oct 9 00:32:06 2012 From: Andy.Henshaw at gtri.gatech.edu (Henshaw, Andy) Date: Mon, 8 Oct 2012 22:32:06 +0000 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 08/10/2012 9:22pm, Paul Moore wrote: > p.joinpath(q) > +1 I wish there was a better name, but I doubt one will appear :-( How about p.extend(q) ? I can imagine getting "joinpath" wrong 50% of the time by typing "pathjoin". From greg.ewing at canterbury.ac.nz Tue Oct 9 00:47:55 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 11:47:55 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008160617.GA1527@mcnabbs.org> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> Message-ID: <5073581B.2030900@canterbury.ac.nz> Andrew McNabb wrote: > Since this really is a matter of personal taste, I'll end my > participation in this discussion by voicing support for Nick Coghlan's > suggestion of a `join` method, whether it's named `join` or `append` or > something else. I'd prefer 'append', because path.append("somedir", "file.txt") is pretty self-explanatory, whereas path.join("somedir", "path.txt") looks confusingly similar to s.join("somedir", "path.txt") where s is a string, but has very different semantics. -- Greg From massimo.dipierro at gmail.com Tue Oct 9 00:54:04 2012 From: massimo.dipierro at gmail.com (Massimo DiPierro) Date: Mon, 8 Oct 2012 17:54:04 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073581B.2030900@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> Message-ID: <3B7F5487-4C6E-4C95-917D-97C9DADE66A3@gmail.com> +1 On Oct 8, 2012, at 5:47 PM, Greg Ewing wrote: > Andrew McNabb wrote: >> Since this really is a matter of personal taste, I'll end my >> participation in this discussion by voicing support for Nick Coghlan's >> suggestion of a `join` method, whether it's named `join` or `append` or >> something else. > > I'd prefer 'append', because > > path.append("somedir", "file.txt") > > is pretty self-explanatory, whereas > > path.join("somedir", "path.txt") > > looks confusingly similar to > > s.join("somedir", "path.txt") > > where s is a string, but has very different semantics. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From bauertomer at gmail.com Tue Oct 9 01:02:05 2012 From: bauertomer at gmail.com (T.B.) Date: Tue, 09 Oct 2012 01:02:05 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: <50735B6D.700@gmail.com> On 2012-10-08 23:17, Richard Oudkerk wrote: > On 08/10/2012 9:22pm, Paul Moore wrote: >> p.joinpath(q) >> +1 I wish there was a better name, but I doubt one will appear :-( > > I would go for > > p.add(q) > I like the short 'add'. A small problem I see with 'add' (and with 'append') is that the outcome of adding (or appending) an absolute path is too surprising, unlike with the 'join' or 'joinpath' names. Also, How would we add an extension to a path (without turning it into a str first)? Will there be a method called addext() or addsuffix() as the .ext/.suffix property is immutable? The suggestions I saw in the thread so far targeted substituting the extension, not adding. Regarding '/', I would like to mention Scapy [1], the packet manipulation program. From its documentation: "The / operator has been used as a composition operator between two layers". The '/' feels natural to use with Scapy. An example from the docs: > Let?s say I want a broadcast MAC address, and IP payload to ketchup.com and to mayo.com, TTL value from 1 to 9, and an UDP payload: > > >>> Ether(dst="ff:ff:ff:ff:ff:ff")/IP(dst=["ketchup.com","mayo.com"],ttl=(1,9))/UDP() Regards, TB [1] http://www.secdev.org/projects/scapy/ From guido at python.org Tue Oct 9 01:02:27 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 16:02:27 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 3:42 PM, Serhiy Storchaka wrote: > On 08.10.12 22:44, Mike Graham wrote: >> >> There are no cases where >> >> if x is "foo": > > I see such code in docutils (Doc/tools/docutils/writers/latex2e/__init__.py) And that's probably a bug. >> or >> >> if x is 4: > > and in tests (Lib/test/test_long.py, Lib/test/test_int.py, > Lib/test/test_grammar.py, Lib/test/test_winsound.py). The tests are easily rewritten using four = 4 if x is four: ... -- --Guido van Rossum (python.org/~guido) From mikegraham at gmail.com Tue Oct 9 01:05:25 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 8 Oct 2012 19:05:25 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 6:42 PM, Serhiy Storchaka wrote: > On 08.10.12 22:44, Mike Graham wrote: >> >> There are no cases where >> >> if x is "foo": > > > I see such code in docutils (Doc/tools/docutils/writers/latex2e/__init__.py) Thanks for finding these! I can't find this in a couple versions of Python I checked. If this code is still around, it sounds like it has a bug and should be fixed. >> or >> >> if x is 4: > > > and in tests (Lib/test/test_long.py, Lib/test/test_int.py, > Lib/test/test_grammar.py, Lib/test/test_winsound.py). test_grammar.py is correct, but trivially so. It merely ensures that `1 is 1` and `1 is not 1` are proper Python syntax. As we're talking about tweaking Python's syntax rules, obviously code that tests that the grammar is the current thing would use the current thing. test_int.py and test_long.py are valid but unique, in that they rely on the behavior that no other code should implicitly rely on to test an implementation detail test_winsound.py has an `is 0` check and an `is ""` check. Both should be fixed. Thanks again, Mike From ryan at ryanhiebert.com Tue Oct 9 01:06:25 2012 From: ryan at ryanhiebert.com (Ryan D Hiebert) Date: Mon, 8 Oct 2012 16:06:25 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073581B.2030900@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> Message-ID: <367410E6-E0BE-4486-8E1B-268446454167@ryanhiebert.com> On Oct 8, 2012, at 3:47 PM, Greg Ewing wrote: > I'd prefer 'append', because > > path.append("somedir", "file.txt") +1 In so many ways, I see a path as a list of its components. Because of that, path.append and path.extend, with similar semantics to list.append and list.extend, makes a lot of sense to me. When I think about a path as a list of components rather than as a string, the '+' operator starts to make sense for joins as well. I'm OK with using the '/' for path joining as well, because the parallel with list doesn't fit in this case, although I understand Massimo's objection to it. In very many ways, I like thinking of a path as a list (slicing, append, etc). The fact that list.append doesn't return the new list has always bugged me, but if we were to use append and extend, they should mirror the semantics from list. I'm much more inclined to think of path as a special list than as a special string. Ryan From phd at phdru.name Tue Oct 9 01:11:38 2012 From: phd at phdru.name (Oleg Broytman) Date: Tue, 9 Oct 2012 03:11:38 +0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50735B6D.700@gmail.com> References: <20121008204707.48559bf9@pitrou.net> <50735B6D.700@gmail.com> Message-ID: <20121008231138.GB3712@iskra.aviel.ru> On Tue, Oct 09, 2012 at 01:02:05AM +0200, "T.B." wrote: > Regarding '/', I would like to mention Scapy [1], the packet > manipulation program. From its documentation: "The / operator has > been used as a composition operator between two layers". The '/' > feels natural to use with Scapy. An example from the docs: > >Let?s say I want a broadcast MAC address, and IP payload to ketchup.com and to mayo.com, TTL value from 1 to 9, and an UDP payload: > > > >>>> Ether(dst="ff:ff:ff:ff:ff:ff")/IP(dst=["ketchup.com","mayo.com"],ttl=(1,9))/UDP() Except that layers are divided (pun intended) in wrong order. It seems that Ether is at the top where the traditional stack order from Ether to UDP is from bottom to top. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ryan at ryanhiebert.com Tue Oct 9 01:19:00 2012 From: ryan at ryanhiebert.com (Ryan D Hiebert) Date: Mon, 8 Oct 2012 16:19:00 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> On Oct 8, 2012, at 11:47 AM, Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1 > - `p + q` joins path q to path p +1 > - `p / q` joins path q to path p +0.5 > - `p.join(q)` joins path q to path p -1, but +1 to p.append(q) If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin, i.e., if the joined path is absolute, have it replace the path, except possibly for the drive on windows. I like to follow any parallels to list that make sense. Ryan From oscar.j.benjamin at gmail.com Tue Oct 9 01:24:47 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 00:24:47 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 8 October 2012 00:36, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin > wrote: >> I think what Serhiy is saying is that although pep 380 mainly >> discusses generator functions it has effectively changed the >> definition of what it means to be an iterator for all iterators: >> previously an iterator was just something that yielded values but now >> it also returns a value. Since the meaning of an iterator has changed, >> functions that work with iterators need to be updated. > > I think there are different philosophical viewpoints possible on that > issue. My own perspective is that there is no change in the definition > of iterator -- only in the definition of generator. Note that the > *ability* to attach a value to StopIteration is not new at all. I guess I'm viewing it from the perspective that an ordinary iterator is simply an iterator that happens to return None just like a function that doesn't bother to return anything. If I understand correctly, though, it is possible for any iterator to return a value that yield from would propagate, so the feature (returning a value) is not specific to generators. >> This feature was new in Python 3.3 which was released a week ago > > It's been in alpha/beta/candidate for a long time, and PEP 380 was > first discussed in 2009. > >> so it is not widely used but it has uses that are not anything to do with >> coroutines. > > Yes, as a shortcut for "for x in : yield x". Note that the > for-loop ignores the value in the StopIteration -- would you want to > change that too? Not really. I thought about how it could be changed. Once APIs are available that use this feature to communicate important information, use cases will arise for using the same APIs outside of a coroutine context. I'm not really sure how you could get the value from a for loop. I guess it would have to be tied to the else clause in some way. > >> As an example of how you could use it, consider parsing a >> file that can contains #include statements. When the #include >> statement is encountered we need to insert the contents of the >> included file. This is easy to do with a recursive generator. The >> example uses the return value of the generator to keep track of which >> line is being parsed in relation to the flattened output file: >> >> def parse(filename, output_lineno=0): >> with open(filename) as fin: >> for input_lineno, line in enumerate(fin): >> if line.startswith('#include '): >> subfilename = line.split()[1] >> output_lineno = yield from parse(subfilename, output_lineno) >> else: >> try: >> yield parse_line(line) >> except ParseLineError: >> raise ParseError(filename, input_lineno, output_lineno) >> output_lineno += 1 >> return output_lineno > > Hm. This example looks constructed to prove your point... It would be > easier to count the output lines in the caller. Or you could use a > class to hold that state. I think it's just a bad habit to start using > the return value for this purpose. Please use the same approach as you > would before 3.3, using "yield from" just as the shortcut I mentione > above. I'll admit that the example is contrived but it's to think about how to use the new feature rather than to prove a point (Otherwise I would have contrived a reason for wanting to use filter()). I just wanted to demonstrate that people can (and will) use this outside of a coroutine context. Also I envisage something like this being a common use case. The 'yield from' expression can only provide information to its immediate caller by returning a value attached to StopIteration or be raising a different type of exception. There will be many cases where people want to get some information about what was yielded/done by 'yield from' at the point where it is used. Oscar From ericsnowcurrently at gmail.com Tue Oct 9 01:31:10 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 8 Oct 2012 17:31:10 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 1:24 PM, Nick Coghlan wrote: > However, I'm > also a big fan of starting with a minimalist core and growing it. > Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write > it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least > isn't going backwards, and is more obvious in isolation than "a / b / > c / d / e". +1 From ericsnowcurrently at gmail.com Tue Oct 9 01:35:52 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 8 Oct 2012 17:35:52 -0600 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1 > - `p + q` joins path q to path p -1 > - `p / q` joins path q to path p -1 > - `p.join(q)` joins path q to path p +1 (with a different name) I've found Nick's argument against operators-from-day-1 to be convincing, as well as his argument against join() or any other name already provided by string/sequence APIs. -eric From oscar.j.benjamin at gmail.com Tue Oct 9 01:45:00 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 00:45:00 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> Message-ID: On 8 October 2012 03:40, Terry Reedy wrote: > On 10/7/2012 7:30 PM, Greg Ewing wrote: >> >> Oscar Benjamin wrote: >>> >>> Before pep 380 filter(lambda x: True, obj) returned an object that was >>> the same kind of iterator as obj (it would yield the same values). Now >>> the "kind of iterator" that obj is depends not only on the values that >>> it yields but also on the value that it returns. Since filter does not >>> pass on the same return value, filter(lambda x: True, obj) is no >>> longer the same kind of iterator as obj. >> >> >> Something like this has happened before, when the ability to >> send() values into a generator was added. If you wrap a >> generator with filter, you likewise don't get the same kind >> of object -- you don't get the ability to send() things >> into your filtered generator. >> >> So, "provide the same kind of iterator" is not currently part >> of the contract of these functions. They do provide the same kind of iterator in the sense that they reproduce the properties of the object *in so far as it is an iterator* by yielding the same values. I probably should have compared filter(lambda x: True, obj) with iter(obj) rather than obj. In most cases iter(obj) has a more limited interface. send() is clearly specific to generators: user defined iterator classes can provide any number of state-changing methods (usually with more relevant names) but this is difficult for generators so a generic mechanism is needed. The return value attached to StopIteration "feels" more fundamental to me since there is now specific language syntax both for extracting it and for returning it in generator functions. > Iterators are Python's generic sequential access device. They do that one > thing and do it well. > > The iterator protocol is intentionally and properly minimal. An iterator > class *must* have appropriate .__iter__ and .__next__ methods. It *may* also > have any other method and any data attribute. Indeed, any iterator much have > some specific internal data. But these are ignored in generic iterator (or > iterable) functions. If one does not want that, one should write more > specific code. Generalising the concept of an iterator this way is entirely backwards compatible with existing iterators and does not place any additional burden on defining iterators: most iterators can simply be iterators that return None. The feature is optional for any iterator but this thread is about whether it should be optional for a generic processor of iterators. > Serhily, if you want a module of *generator* specific functions ('gentools' > ?), you should write one and submit it to pypi for testing. This is probably the right idea. As the feature gains use cases the best way to handle it will become clearer. Oscar From ericsnowcurrently at gmail.com Tue Oct 9 01:45:39 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 8 Oct 2012 17:45:39 -0600 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073581B.2030900@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 4:47 PM, Greg Ewing wrote: > Andrew McNabb wrote: >> >> Since this really is a matter of personal taste, I'll end my >> participation in this discussion by voicing support for Nick Coghlan's >> suggestion of a `join` method, whether it's named `join` or `append` or >> something else. > > > I'd prefer 'append', because > > path.append("somedir", "file.txt") > > is pretty self-explanatory, whereas > > path.join("somedir", "path.txt") > > looks confusingly similar to > > s.join("somedir", "path.txt") > > where s is a string, but has very different semantics. As Nick noted, the problem is that append() conflicts with MutableSequence.append(). If someone subclasses Path and friends to act like a list then it complicates the situation. In my mind the name should be one that is not already in use by strings or sequences. -eric From guido at python.org Tue Oct 9 01:47:23 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 16:47:23 -0700 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On Mon, Oct 8, 2012 at 4:24 PM, Oscar Benjamin wrote: > On 8 October 2012 00:36, Guido van Rossum wrote: >> On Sun, Oct 7, 2012 at 3:43 PM, Oscar Benjamin >> wrote: >>> I think what Serhiy is saying is that although pep 380 mainly >>> discusses generator functions it has effectively changed the >>> definition of what it means to be an iterator for all iterators: >>> previously an iterator was just something that yielded values but now >>> it also returns a value. Since the meaning of an iterator has changed, >>> functions that work with iterators need to be updated. >> >> I think there are different philosophical viewpoints possible on that >> issue. My own perspective is that there is no change in the definition >> of iterator -- only in the definition of generator. Note that the >> *ability* to attach a value to StopIteration is not new at all. > > I guess I'm viewing it from the perspective that an ordinary iterator > is simply an iterator that happens to return None just like a function > that doesn't bother to return anything. If I understand correctly, > though, it is possible for any iterator to return a value that yield > from would propagate, so the feature (returning a value) is not > specific to generators. Substitute "pass a value via StopIteration" and I'll agree that it is *possible*. I still don't think it is all that useful, nor that it should be encouraged (outside the use case of coroutines). >>> This feature was new in Python 3.3 which was released a week ago >> >> It's been in alpha/beta/candidate for a long time, and PEP 380 was >> first discussed in 2009. >> >>> so it is not widely used but it has uses that are not anything to do with >>> coroutines. >> >> Yes, as a shortcut for "for x in : yield x". Note that the >> for-loop ignores the value in the StopIteration -- would you want to >> change that too? > > Not really. I thought about how it could be changed. Once APIs are > available that use this feature to communicate important information, > use cases will arise for using the same APIs outside of a coroutine > context. I'm not really sure how you could get the value from a for > loop. I guess it would have to be tied to the else clause in some way. Given the elusive nature of StopIteration (many operations catch and ignore it, and that's the main intended use) I don't think it should be used to pass along *important* information except for the specific case of coroutines, where the normal use case is to use .send() instead of .__next__() and to catch the StopIteration exception. >>> As an example of how you could use it, consider parsing a >>> file that can contains #include statements. When the #include >>> statement is encountered we need to insert the contents of the >>> included file. This is easy to do with a recursive generator. The >>> example uses the return value of the generator to keep track of which >>> line is being parsed in relation to the flattened output file: >>> >>> def parse(filename, output_lineno=0): >>> with open(filename) as fin: >>> for input_lineno, line in enumerate(fin): >>> if line.startswith('#include '): >>> subfilename = line.split()[1] >>> output_lineno = yield from parse(subfilename, output_lineno) >>> else: >>> try: >>> yield parse_line(line) >>> except ParseLineError: >>> raise ParseError(filename, input_lineno, output_lineno) >>> output_lineno += 1 >>> return output_lineno >> >> Hm. This example looks constructed to prove your point... It would be >> easier to count the output lines in the caller. Or you could use a >> class to hold that state. I think it's just a bad habit to start using >> the return value for this purpose. Please use the same approach as you >> would before 3.3, using "yield from" just as the shortcut I mentione >> above. > > I'll admit that the example is contrived but it's to think about how > to use the new feature rather than to prove a point (Otherwise I would > have contrived a reason for wanting to use filter()). I just wanted to > demonstrate that people can (and will) use this outside of a coroutine > context. Just that they will use it doesn't make it a good idea. I claim it's a bad idea and I don't think you're close to convincing me otherwise. > Also I envisage something like this being a common use case. The > 'yield from' expression can only provide information to its immediate > caller by returning a value attached to StopIteration or be raising a > different type of exception. There will be many cases where people > want to get some information about what was yielded/done by 'yield > from' at the point where it is used. Maybe. But I think we should wait a few years before we conclude that we made a mistake. The story of iterators and generators has evolved in many small steps, each informed by how the previous step turned out. It's way too soon to say that the existence of yield-from requires us to change all the other iterator algebra to preserve the value from StopIteration. I'll happily take this discussion up again after we've used it for a couple of years though! -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Tue Oct 9 01:59:26 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 00:59:26 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 9 October 2012 00:47, Guido van Rossum wrote: > > Given the elusive nature of StopIteration (many operations catch and > ignore it, and that's the main intended use) I don't think it should > be used to pass along *important* information except for the specific > case of coroutines, where the normal use case is to use .send() > instead of .__next__() and to catch the StopIteration exception. > It certainly is elusive! I caught a bug a few weeks ago where StopIteration was generated from a call to next and caught by a for loop several frames above. I couldn't work out why the loop was terminating early (since there was no attempt to catch any exceptions anywhere in the code) and it took about 20 minutes of processing to reproduce. With no traceback and no way to catch the exception with pdb it had me stumped for a while. Oscar From greg.ewing at canterbury.ac.nz Tue Oct 9 02:02:11 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 13:02:11 +1300 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: <50736983.5030001@canterbury.ac.nz> Guido van Rossum wrote: > It's not about equality. If you ask whether two NaNs are *unequal* the > answer is *also* False. That's the weirdest part about this whole business, I think. Unless you're really keeping your wits about you, it's easy to forget that the assumption (x == y) == False implies (x != y) == True doesn't necessarily hold. This is actually a very important assumption when it comes to reasoning about programs -- even more important than reflexivity, etc, I believe. Consider if x == y: dosomething() else: dosomethingelse() where x and y are known to be floats. It's easy to see that the following is equivalent: if not x == y: dosomethingelse() else: dosomething() but it's not quite so easy to spot that the following is *not* equivalent: if x != y: dosomethingelse() else: dosomething() This trap is made all the easier to fall into because float comparison is *mostly* well-behaved, except for a small subset of the possible values. Most other nonstandard comparison behaviours in Python apply to whole types. E.g. we refuse to compare complex numbers for ordering, even if their values happen to be real, so if you try that you get an early exception. But the weirdness with NaNs only shows up in corner cases that may escape testing. Now, there *is* a third possibility -- we could raise an exception if a comparison involving NaNs is attempted. This would be a more faithful way of adhering to the IEEE 754 specification that NaNs are "unordered". More importantly, it would make the second code transformation above valid in all cases. So the question that really needs to be answered, I think, is not "Why is NaN == NaN false?", but "Why doesn't NaN == anything raise an exception, when it would make so much more sense to do so?" -- Greg From ubershmekel at gmail.com Tue Oct 9 02:01:59 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 9 Oct 2012 02:01:59 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: `p[q]` 0 `p + q` +0.5 `p / q` +1 `p.join(q)` -1 `p.pathjoin(q)` -1 `p.pathjoin(q)` -1 `p.add(q)` +0.5 Joining/adding/appending paths is one of the most common ops. Please let's make it short, easy and obvious. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 9 02:11:33 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 17:11:33 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50736983.5030001@canterbury.ac.nz> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> It's not about equality. If you ask whether two NaNs are *unequal* the >> answer is *also* False. > > > That's the weirdest part about this whole business, I think. > Unless you're really keeping your wits about you, it's easy > to forget that the assumption (x == y) == False implies > (x != y) == True doesn't necessarily hold. > > This is actually a very important assumption when it comes > to reasoning about programs -- even more important than > reflexivity, etc, I believe. Consider > > if x == y: > dosomething() > else: > dosomethingelse() > > where x and y are known to be floats. It's easy to see that > the following is equivalent: > > if not x == y: > dosomethingelse() > else: > dosomething() > > but it's not quite so easy to spot that the following is > *not* equivalent: > > if x != y: > dosomethingelse() > else: > dosomething() > > This trap is made all the easier to fall into because float > comparison is *mostly* well-behaved, except for a small subset > of the possible values. Most other nonstandard comparison behaviours > in Python apply to whole types. E.g. we refuse to compare complex > numbers for ordering, even if their values happen to be real, > so if you try that you get an early exception. But the weirdness > with NaNs only shows up in corner cases that may escape testing. > > Now, there *is* a third possibility -- we could raise an exception > if a comparison involving NaNs is attempted. This would be a > more faithful way of adhering to the IEEE 754 specification that > NaNs are "unordered". More importantly, it would make the second code > transformation above valid in all cases. > > So the question that really needs to be answered, I think, is > not "Why is NaN == NaN false?", but "Why doesn't NaN == anything > raise an exception, when it would make so much more sense to > do so?" Because == raising an exception is really unpleasant. We had this in Python 2 for unicode/str comparisons and it was very awkward. Nobody arguing against the status quo seems to care at all about numerical algorithms though. I propose that you go find some numerical mathematicians and ask them. -- --Guido van Rossum (python.org/~guido) From christian at python.org Tue Oct 9 02:13:03 2012 From: christian at python.org (Christian Heimes) Date: Tue, 09 Oct 2012 02:13:03 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5072C972.5070207@python.org> Message-ID: <50736C0F.90401@python.org> Am 08.10.2012 17:35, schrieb Guido van Rossum: > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: >> Python's standard library doesn't contain in interface to I/O Completion >> Ports. I think a common event loop system is a good reason to add IOCP >> if somebody is up for the challenge. >> >> Would you prefer an IOCP wrapper in the stdlib or your own version? >> Twisted has its own Cython based wrapper, some other libraries use a >> libevent-based solution. > > What's an IOCP? I/O Completion Ports, http://en.wikipedia.org/wiki/IOCP It's a Windows (and apparently also Solaris) API for async IO that can handle multiple threads. Christian From steve at pearwood.info Tue Oct 9 02:28:13 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Oct 2012 11:28:13 +1100 Subject: [Python-ideas] Subpaths [was Re: PEP 428 - object-oriented filesystem paths] In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> Message-ID: <50736F9D.7000509@pearwood.info> Nick, I've come to the conclusion that you are right to prefer a named method over an operator for joining paths. But I think you are wrong to name that method "subpath" -- see below. On 09/10/12 05:39, Nick Coghlan wrote: > On Mon, Oct 8, 2012 at 11:53 PM, Steven D'Aprano wrote: >>> "p.subpath('foo', 'bar')" looks like executable >>> pseudocode for creating a new path based on existing one to me, >> >> >> That notation quite possibly goes beyond unintuitive to downright >> perverse. You are using a method called "subpath" to generate a >> *superpath* (deeper, longer path which includes p as a part). > > Huh? It's a tree structure. A subpath lives inside its parent path, > just as subnodes are children of their parent node. Agreed it's not a > widely used term though - it's a generalisation of subdirectory to > also cover file paths. I believe you mentioned in an earlier email that you invented the term for this discussion. Quote: I made it up by using "make subpath" as the reverse of "get relative path". Unfortunately subpath already has an established meaning, and it is the complete opposite of the sense you intend: paths are trees are graphs, and the graph a->b->c->d is a superpath, not subpath, of a->b->c: a->b->c is strictly contained within a->b->c->d; the reverse is not true. Just as "abcd" is a superstring of "abc", not a substring. Likewise for superset and subset. And likewise for trees (best viewed in a monospaced font): a-b-c \ f-g One can say that the tree a-f-g is a subtree of the whole, but one cannot say that a-f-g-h is a subtree since h is not a part of the first tree. > They're certainly not "super" anything, any more than a subdirectory > is really a superdirectory (which is what you appear to be arguing). Common usage is that "subdirectory" gets used for relative paths: given path /a/b/c/d, we say that "d" is a subdirectory of /a/b/c. I've never come across anyone giving d in absolute terms. Now perhaps I've lived a sheltered life *wink* and people do talk about subdirectories in absolute paths all the time. That's fine. But they don't talk about "subpaths" in the sense you intend, and the sense you intend goes completely against the established sense. The point is, despite the common "sub" prefix, the semantics of "subdirectory" is quite different from the semantics of "substring", "subset", "subtree" and "subpath". -- Steven From tjreedy at udel.edu Tue Oct 9 02:28:51 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Oct 2012 20:28:51 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 10/8/2012 2:47 PM, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p -1 to this > - `p + q` joins path q to path p > - `p / q` joins path q to path p > - `p.join(q)` joins path q to path p currently neutral between these -- Terry Jan Reedy From oscar.j.benjamin at gmail.com Tue Oct 9 02:32:39 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 01:32:39 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: On 9 October 2012 01:11, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing wrote: >> >> So the question that really needs to be answered, I think, is >> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything >> raise an exception, when it would make so much more sense to >> do so?" > > Because == raising an exception is really unpleasant. We had this in > Python 2 for unicode/str comparisons and it was very awkward. > > Nobody arguing against the status quo seems to care at all about > numerical algorithms though. I propose that you go find some numerical > mathematicians and ask them. The main purpose of quiet NaNs is to propagate through computation ruining everything they touch. In a programming language like C that lacks exceptions this is important as it allows you to avoid checking all the time for invalid values, whilst still being able to know if the end result of your computation was ever affected by an invalid numerical operation. The reasons for NaNs to compare unequal are no doubt related to this purpose. It is of course arguable whether the same reasoning applies to a language like Python that has a very good system of exceptions but I agree with Guido that raising an exception on == would be unfortunate. How many people would forget that they needed to catch those exceptions? How awkward could your code be if you did remember to catch all those exceptions? In an exception handling language it's important to know that there are some operations that you can trust. Oscar From christian at python.org Tue Oct 9 02:35:27 2012 From: christian at python.org (Christian Heimes) Date: Tue, 09 Oct 2012 02:35:27 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: <5073714F.3070206@python.org> Am 08.10.2012 21:15, schrieb Nick Coghlan: > My own current preference is to take "p.joinpath(q)" straight from > path.py (https://github.com/jaraco/path.py/blob/master/path.py#L236). [...] > I don't *love* joinpath as a name, I just don't actively dislike it > the way I do the four presented options (and it has the virtue of the > path.py precedent). I dislike + and [] because I find the result too surprising. If I'd be forced to choose between +, / and [] then I would go for / as it looks kinda like a path. +1 for p.joinpath(*args). It's really a must have feature. The name is debatable, though. +0 for p / sub -1 for p + sub and p[sub] Christian From stephen at xemacs.org Tue Oct 9 02:37:52 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Oct 2012 09:37:52 +0900 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> Message-ID: <87obkcpjj3.fsf@uwakimon.sk.tsukuba.ac.jp> Guido van Rossum writes: > Sounds good. (But now maybe we also need to come clean with the > exceptions for NaNs compared as part of container comparisons?) For a second I thought you meant IEEE 754 Exceptions. Whew! How about: """ For reasons of efficiency, Python allows comparisons of containers to shortcut element comparisons. These shortcuts mean that it is possible that comparison of two containers may return True, even if they contain NaNs. For details, see the language reference[1]. """ Longer than I think it deserves, but maybe somebody has a better idea? Footnotes: [1] Sorry about that, but details don't really belong in a *Python* tutorial. Maybe this should be "see the implementation notes"? From ironfroggy at gmail.com Tue Oct 9 02:39:13 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 8 Oct 2012 20:39:13 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 2:47 PM, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p > -1 This syntax makes no sense, it doesn't match the syntax in an obvious way > - `p + q` joins path q to path p > -1 Too easy to confuse with string concat > - `p / q` joins path q to path p > +1 Reads like a path, makes logical sense > - `p.join(q)` joins path q to path p > +1 Allows passing as a callable (tho we could just use operator module). I think we should have both / and .join() (you can include a rationale if you want, but don't forget to vote :-)) > > Thank you > > Antoine. > > > -- > Software development and contracting: http://pro.pitrou.net > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Tue Oct 9 02:50:52 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Oct 2012 09:50:52 +0900 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> Message-ID: <87mwzwpixf.fsf@uwakimon.sk.tsukuba.ac.jp> Terry Reedy writes: > I wonder if it would be helpful to make a NaN subclass of floats with > its own arithmetic and comparison methods. It can't be helpful, unless you go a lot further. Specifically, you'd need to require containers to check every element for NaN-ness. That doesn't seem very practical. In any case, the presentation by Kahan (cited earlier by Alexander himself) demolishes the idea that any sort of attempt to implement DWIM for floats in a programming language can succeed at the present state of the art. The best we can get is DWGM ("do what Guido means", even if what Guido means is "ask the Timbot"). Kahan pretty explicitly endorses this approach, by the way. At least in the context of choosing default policy for IEEE 754 Exceptions. From carlopires at gmail.com Tue Oct 9 02:57:57 2012 From: carlopires at gmail.com (Carlo Pires) Date: Mon, 8 Oct 2012 21:57:57 -0300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: 2012/10/8 Antoine Pitrou > - `p[q]` joins path q to path p > -1 > - `p + q` joins path q to path p > +0 > - `p / q` joins path q to path p > -1 > - `p.join(q)` joins path q to path p > +1 -- Carlo Pires -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Tue Oct 9 03:02:11 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 8 Oct 2012 21:02:11 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 1:59 PM, Nick Coghlan wrote: > On Sat, Oct 6, 2012 at 9:44 PM, Calvin Spealman > wrote: > > Responding late, but I didn't get a chance to get my very strong > > feelings on this proposal in yesterday. > > > > I do not like it. I'll give full disclosure and say that I think our > > earlier failure to include the path library in the stdlib has been a > > loss for Python and I'll always hope we can fix that one day. I still > > hold out hope. > > > > It feels like this proposal is "make it object oriented, because > > object oriented is good" without any actual justification or obvious > > problem this solves. The API looks clunky and redundant, and does not > > appear to actually improve anything over the facilities in the os.path > > module. This takes a lot of things we can already do with paths and > > files and remixes them into a not-so intuitive API for the sake of > > change, not for the sake of solving a real problem. > > The PEP needs to better articulate the rationale, but the key points are: > - better abstraction and encapsulation of cross-platform logic so file > manipulation algorithms written on Windows are more likely to work > correctly on POSIX systems (and vice-versa) > Frankly, for 99% of file path work, anything I do on one "just works" on the other, and complicating things with these POSIX versus NT path types just seems to be a whole lot of early complication for a few edge cases most people never see. Simplest example is requiring the backslash separator on NT when it handles forward slash, just like POSIX, just fine, and has for a long, long time. > - improved ability to manipulate paths with Windows semantics on a > POSIX system (and vice-versa) > - better support for creation of "mock" filesystem APIs > I admit the mock FS intrigues me > > As for specific problems I have with the proposal: > > > > Frankly, I think not keeping the / operator for joining is a huge > > mistake. This is the number one best feature of path and despite that > > many people don't like it, it makes sense. It makes our most common > > path operation read very close to the actual representation of the > > what you're creating. This is great. > > It trades readability (and discoverability) for brevity. Not good. > I thought it had all three. In these situations, where my and another's perception of a systems strengths and weaknesses are opposite, I don't really know how to make a good response. :-/ > Not inheriting from str means that we can't directly path these path > > objects to existing code that just expects a string, so we have a > > really hard boundary around the edges of this new API. It does not > > lend itself well to incrementally transitioning to it from existing > > code. > > It's the exact design philosophy as was used in the creation of the > new ipaddress module: the objects in ipaddress must still be converted > to a string or integer before they can be passed to other operations > (such as the socket module APIs). Strings and integers remain the data > interchange formats here as well (although far more focused on strings > in the path case). > > > > > The stat operations and other file-facilities tacked on feel out of > > place, and limited. Why does it make sense to add these facilities to > > path and not other file operations? Why not give me a read method on > > paths? or maybe a copy? Putting lots of file facilities on a path > > object feels wrong because you can't extend it easily. This is one > > place that function(thing) works better than thing.function() > > Indeed, I'm personally much happier with the "pure" path classes than > I am with the ones that can do filesystem manipulation. Having both > "p.open(mode)" and "open(str(p), mode)" seems strange. OTOH, I can see > the attraction in being able to better fake filesystem access through > the method API, so I'm willing to go along with it. > > > Overall, I'm completely -1 on the whole thing. > > I find this very hard to square with your enthusiastic support for > path.py. Like ipaddr, which needed to clean up its semantic model > before it could be included in the standard library (as ipaddress), we > need a clean cross-platform semantic model for path objects before a > convenience API can be added for manipulating them. > I somewhat dislike this because I loved path.py so much and this proposal seems to actively avoid exactly the aspects of path.py that I enjoyed the most (like the / joining). > Cheers, > Nick. > path.py was in teh wild, and is still in use. Why do we find ourselves debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy -------------- next part -------------- An HTML attachment was scrubbed... URL: From zachary.ware+pyideas at gmail.com Tue Oct 9 03:05:35 2012 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Mon, 8 Oct 2012 20:05:35 -0500 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: Speaking as a relatively inexperienced user (whose opinions should justly be given little weight as such), > - `p[q]` joins path q to path p -1: Doesn't make sense at first glance > - `p + q` joins path q to path p -1: For reasons stated elsewhere by several others; Path + (Path or str) != str + str > - `p / q` joins path q to path p +1: Short, makes sense if you can get your brain past "/ in Python means 'divide'" > - `p.join(q)` joins path q to path p +1: Except it needs a different name, for the same reasons as + What about p.unite(q)? The one word definition of 'join' is 'unite' and it's definitely not used by str, and I don't know of anywhere else that it is used. And it's only one extra character instead of the 4 of 'pathjoin' or 'joinpath'. From guido at python.org Tue Oct 9 03:07:48 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 18:07:48 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 5:32 PM, Oscar Benjamin wrote: > On 9 October 2012 01:11, Guido van Rossum wrote: >> On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing wrote: >>> >>> So the question that really needs to be answered, I think, is >>> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything >>> raise an exception, when it would make so much more sense to >>> do so?" >> >> Because == raising an exception is really unpleasant. We had this in >> Python 2 for unicode/str comparisons and it was very awkward. >> >> Nobody arguing against the status quo seems to care at all about >> numerical algorithms though. I propose that you go find some numerical >> mathematicians and ask them. > > The main purpose of quiet NaNs is to propagate through computation > ruining everything they touch. In a programming language like C that > lacks exceptions this is important as it allows you to avoid checking > all the time for invalid values, whilst still being able to know if > the end result of your computation was ever affected by an invalid > numerical operation. The reasons for NaNs to compare unequal are no > doubt related to this purpose. > > It is of course arguable whether the same reasoning applies to a > language like Python that has a very good system of exceptions but I > agree with Guido that raising an exception on == would be unfortunate. > How many people would forget that they needed to catch those > exceptions? How awkward could your code be if you did remember to > catch all those exceptions? In an exception handling language it's > important to know that there are some operations that you can trust. If we want to do *anything* I think we should first introduce a floating point context similar to the Decimal context. Then we can talk. -- --Guido van Rossum (python.org/~guido) From raymond.hettinger at gmail.com Tue Oct 9 03:13:25 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Mon, 8 Oct 2012 18:13:25 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Oct 8, 2012, at 12:44 PM, Mike Graham wrote: > I regularly see learners using "is" to check for string equality and > sometimes other equality. Due to optimizations, they often come away > thinking it worked for them. > > There are no cases where > > if x is "foo": > > or > > if x is 4: > > is actually the code someone intended to write. > > Although this has no benefit to anyone but new learners, it also > doesn't really do any harm. This seems like a job for pyflakes, pylint, or pychecker. Raymond From ironfroggy at gmail.com Tue Oct 9 03:14:57 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 8 Oct 2012 21:14:57 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 3:44 PM, Mike Graham wrote: > > I regularly see learners using "is" to check for string equality and > sometimes other equality. Due to optimizations, they often come away > thinking it worked for them. > > There are no cases where > > if x is "foo": > > or > > if x is 4: > > is actually the code someone intended to write. > > Although this has no benefit to anyone but new learners, it also > doesn't really do any harm. +1 > Mike > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From alexander.belopolsky at gmail.com Tue Oct 9 03:31:40 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 8 Oct 2012 21:31:40 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 5:17 PM, Terry Reedy wrote: > Alexander, while I might have chosen to make nan == nan True, I consider it > a near tossup with no happy resolution and would not change it now. While I did suggest to change nan == nan result two years ago, , I am not suggesting it now. Here I am merely trying to understand to what extent Python's float is implementing IEEE 754 and why in some cases Python's behavior deviates from the standard while in the case of nan == nan, IEEE 754 is taken as a gospel. > Guido's > explanation is pretty clear: he went with the IEEE standard as interpreted > for Python by Tim Peters. It would be helpful if that interpretation was clearly written somewhere. Without a written document this interpretation seems apocryphal to me. Earlier in this thread, Guido wrote: "I am not aware of an update to the standard." To the best of my knowledge IEEE Std 754 was last updated in 2008. I don't think the differences between 1985 and 2008 revisions matter much for this discussion, but since I am going to refer to chapter and verse, I will start by citing the document that I will use: IEEE Std 754(TM)-2008 (Revision of IEEE Std 754-1985) IEEE Standard for Floating-Point Arithmetic Approved 12 June 2008 IEEE-SA Standards Board (AFAICT, the main difference between 754-2008 and 754-1985 is that the former includes decimal floats added in 854-1987.) Now, let me put my language lawyer hat on and compare Python floating point implementations to IEEE 754-2008 standard. Here are the relevant clauses: 3. Floating-point formats 4. Attributes and rounding 5. Operations 6. Infinity, NaNs, and sign bit 7. Default exception handling 8. Alternate exception handling attributes 9. Recommended operations 10. Expression evaluation 11. Reproducible floating-point results Clause 3 (Floating-point formats) defines five formats: 3 binary and 2 decimal. Python supports a superset of decimal formats and a single binary format. Section 3.1.2 (Conformance) contains the following provision: "A programming environment conforms to this standard, in a particular radix, by implementing one or more of the basic formats of that radix as both a supported arithmetic format and a supported interchange format." I would say Python is conforming to Clause 3. Clause 4 (Attributes and rounding) is supported only by Decimal through contexts: "For attribute specification, the implementation shall provide language-defined means, such as compiler directives, to specify a constant value for the attribute parameter for all standard operations in a block; the scope of the attribute value is the block with which it is associated." I believe Decimal is mostly conforming, but float is not conforming at all. Clause 5 requires "[a]ll conforming implementations of this standard shall provide the operations listed in this clause for all supported arithmetic formats, except as stated below." In other words, a language standard that claims conformance with IEEE 754 must provide all operations unless the standard states otherwise. Let's try to map IEEE 754 required operations to Python float operations. 5.3.1 General operations sourceFormat roundToIntegralTiesToEven(source) sourceFormat roundToIntegralTiesToAway(source) sourceFormat roundToIntegralTowardZero(source) sourceFormat roundToIntegralTowardPositive(source) sourceFormat roundToIntegralTowardNegative(source) sourceFormat roundToIntegralExact(source) Python only provides float.__trunc__ which implements roundToIntegralTowardZero. (The builtin round() belongs to a different category because it changes format from double to int.) sourceFormat nextUp(source) sourceFormat nextDown(source) I don't think these are available for Python floats. sourceFormat remainder(source, source) - float.__mod__ Not fully conforming. For example, the standard requires remainder(-2.0, 1.0) to return -0.0, but in Python 3.3: >>> -2.0 % 1.0 0.0 On the other hand, >>> math.fmod(-2.0, 1.0) -0.0 sourceFormat minNum(source, source) sourceFormat maxNum(source, source) sourceFormat minNumMag(source, source) sourceFormat maxNumMag(source, source) I don't think these are available for Python floats. 5.3.3 logBFormat operations I don't think these are available for Python floats. 5.4.1 Arithmetic operations formatOf-addition(source1, source2) - float.__add__ formatOf-subtraction(source1, source2) - float.__sub__ formatOf-multiplication(source1, source2) - float.__mul__ formatOf-division(source1, source2) - float.__truediv__ formatOf-squareRoot(source1) - math.sqrt formatOf-fusedMultiplyAdd(source1, source2, source3) - missing formatOf-convertFromInt(int) - float.__new__ With exception of fusedMultiplyAdd, Python float is conforming. intFormatOf-convertToIntegerTiesToEven(source) intFormatOf-convertToIntegerTowardZero(source) intFormatOf-convertToIntegerTowardPositive(source) intFormatOf-convertToIntegerTowardNegative(source) intFormatOf-convertToIntegerTiesToAway(source) intFormatOf-convertToIntegerExactTiesToEven(source) intFormatOf-convertToIntegerExactTowardZero(source) intFormatOf-convertToIntegerExactTowardPositive(source) intFormatOf-convertToIntegerExactTowardNegative(source) intFormatOf-convertToIntegerExactTiesToAway(source) Python has a single builtin round(). 5.5.1 Sign bit operations sourceFormat copy(source) - float.__pos__ sourceFormat negate(source) - float.__neg__ sourceFormat abs(source) - float.__abs__ sourceFormat copySign(source, source) - math.copysign Python float is conforming. Now we are getting close to the issue at hand: """ 5.6.1 Comparisons Implementations shall provide the following comparison operations, for all supported floating-point operands of the same radix in arithmetic formats: boolean compareQuietEqual(source1, source2) boolean compareQuietNotEqual(source1, source2) boolean compareSignalingEqual(source1, source2) boolean compareSignalingGreater(source1, source2) boolean compareSignalingGreaterEqual(source1, source2) boolean compareSignalingLess(source1, source2) boolean compareSignalingLessEqual(source1, source2) boolean compareSignalingNotEqual(source1, source2) boolean compareSignalingNotGreater(source1, source2) boolean compareSignalingLessUnordered(source1, source2) boolean compareSignalingNotLess(source1, source2) boolean compareSignalingGreaterUnordered(source1, source2) boolean compareQuietGreater(source1, source2) boolean compareQuietGreaterEqual(source1, source2) boolean compareQuietLess(source1, source2) boolean compareQuietLessEqual(source1, source2) boolean compareQuietUnordered(source1, source2) boolean compareQuietNotGreater(source1, source2) boolean compareQuietLessUnordered(source1, source2) boolean compareQuietNotLess(source1, source2) boolean compareQuietGreaterUnordered(source1, source2) boolean compareQuietOrdered(source1, source2). """ Signaling comparisons are missing. Ordered/Unordered comparisons are missing. Note that the standard does not require any particular spelling for operations. "In this standard, operations are written as named functions; in a specific programming environment they might be represented by operators, or by families of format-specific functions, or by operations or functions whose names might differ from those in this standard." (Sec. 5.1) It would be perfectly conforming for python to spell compareSignalingEqual() as '==' and compareQuietEqual() as math.eq() or even ieee745_2008.compareQuietEqual(). The choice that Python made was not dictated by the standard. (As I have shown above, Python's % operation does not implement a conforming IEEE 754 residual(), but math.fmod() seems to fill the gap.) This post is already too long, so I'll leave Clauses 6-11 for another time. "IEEE 754 may be more complex than you think!" (GvR, earlier in this thread.) I hope I already made the case that Python's float does not conform to IEEE 754 and that IEEE 754 does not require an operation spelled "==" or "float.__eq__" to return False when comparing two NaNs. The standard requires support for 22 comparison operations, but Python's float supports around six. On top of that, Python has an operation that has no analogue in IEEE 754 - the "is" comparison. This is why IEEE 754 standard does not help in answering the main question in this thread: should (x is y) imply (x == y)? We need to formulate a rationale for breaking this implication without a reference to IEEE 754 or Tim's interpretation thereof. Language-lawyierly-yours, Alexander Belopolsky From alexander.belopolsky at gmail.com Tue Oct 9 03:37:47 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 8 Oct 2012 21:37:47 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum wrote: > If we want to do *anything* I think we should first introduce a > floating point context similar to the Decimal context. Then we can > talk. +float('inf') From steve at pearwood.info Tue Oct 9 04:03:27 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 13:03:27 +1100 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: <20121009020327.GB27445@ando> On Mon, Oct 08, 2012 at 12:48:07PM -0700, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: > > I regularly see learners using "is" to check for string equality and > > sometimes other equality. Due to optimizations, they often come away > > thinking it worked for them. > > > > There are no cases where > > > > if x is "foo": > > > > or > > > > if x is 4: > > > > is actually the code someone intended to write. > > > > Although this has no benefit to anyone but new learners, it also > > doesn't really do any harm. > > I think the best we can do is to make these SyntaxWarnings. I had the > same thought recently and I do agree that these are common beginners > mistakes that can easily hide bugs by succeeding in simple tests. In my experience beginners barely read error messages, let alone warnings. A SyntaxWarning might help intermediate users who have graduated beyond the stage of "my program doesn't work, please somebody fix it", but I believe that at best it will be ignored by beginners, if not actively confuse them. And I expect that most intermediate users will have already learned enough not to use "is" when then mean "==". So I'm -0 on doing anything to "fix" this. Many things in Python are potentially misleading: array = [[0]*10]*10 On the other hand, I must admit that I've been known to accidently write "if x is 0:", so perhaps the real benefit is to prevent silly brainos (like typos -- thinkos perhaps?) among more experienced coders. Perhaps I should increase my vote to +0. -- Steven From guido at python.org Tue Oct 9 04:09:53 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 19:09:53 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 6:31 PM, Alexander Belopolsky wrote: > IEEE 754 standard does not help in answering > the main question in this thread: should (x is y) imply (x == y)? We > need to formulate a rationale for breaking this implication without a > reference to IEEE 754 or Tim's interpretation thereof. Such a rationale exists in my mind. Since floats are immutable, an implementation may or may not intern certain float values (just as certain string and int values are interned but others are not). Therefore, the fact that "x is y" says nothing about whether the computations that produced x and y had anything to do with each other. This is not true for mutable objects: if I have two lists, computed separately, and find they are the same object, the computations that produced them must have communicated somehow, or the same list was passed in to each computations. So, since two computations might return the same object without having followed the same computational path, in another implementation the exact same computation might not return the same object, and so the == comparison should produce the same value in either case -- in particular, if x and y are both NaN, all 6 comparisons on them should return False (given that in general comparing two NaNs returns False regardless of the operator used). The reason for invoking IEEE 754 here is that without it, Python might well have grown a language-wide rule stating that an object should *always* compare equal to itself, as there would have been no significant counterexamples. (As it is, such a rule only exists for containers, and technically even there it is optional -- it is just not required for containers to invoke == for contained items that reference the same object.) -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Tue Oct 9 04:12:15 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 13:12:15 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121009021215.GC27445@ando> On Mon, Oct 08, 2012 at 11:54:06AM -0700, Guido van Rossum wrote: > I don't like any of those; I'd vote for another regular method, maybe > p.pathjoin(q). Path.pathjoin? Like list.listappend and dict.dictupdate perhaps? :-) I'm never going to remember whether it is pathjoin or joinpath. -1 on method names that repeat the type name. -- Steven From guido at python.org Tue Oct 9 04:14:37 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 19:14:37 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: <20121009020327.GB27445@ando> References: <20121009020327.GB27445@ando> Message-ID: On Mon, Oct 8, 2012 at 7:03 PM, Steven D'Aprano wrote: > On Mon, Oct 08, 2012 at 12:48:07PM -0700, Guido van Rossum wrote: >> On Mon, Oct 8, 2012 at 12:44 PM, Mike Graham wrote: >> > I regularly see learners using "is" to check for string equality and >> > sometimes other equality. Due to optimizations, they often come away >> > thinking it worked for them. >> > >> > There are no cases where >> > >> > if x is "foo": >> > >> > or >> > >> > if x is 4: >> > >> > is actually the code someone intended to write. >> > >> > Although this has no benefit to anyone but new learners, it also >> > doesn't really do any harm. >> >> I think the best we can do is to make these SyntaxWarnings. I had the >> same thought recently and I do agree that these are common beginners >> mistakes that can easily hide bugs by succeeding in simple tests. > > In my experience beginners barely read error messages, let alone > warnings. > > A SyntaxWarning might help intermediate users who have graduated beyond > the stage of "my program doesn't work, please somebody fix it", but I > believe that at best it will be ignored by beginners, if not actively > confuse them. And I expect that most intermediate users will have > already learned enough not to use "is" when then mean "==". > > So I'm -0 on doing anything to "fix" this. Many things in Python are > potentially misleading: > > array = [[0]*10]*10 > > On the other hand, I must admit that I've been known to accidently write > "if x is 0:", so perhaps the real benefit is to prevent silly brainos > (like typos -- thinkos perhaps?) among more experienced coders. Perhaps > I should increase my vote to +0. Exactly. Pragmatically, in large code bases this occurs frequently enough to worry about it, and (unlike language warts like the aliasing problem you alluded to above) it serves no useful purpose. I have seen this particular mistake reported many times in Google's extensive Python codebase. Maybe we should do something more drastic and always create a new, unique constant whenever a literal occurs as an argument of 'is' or 'is not'? Then such code would never work, leading people to examine their code more closely. I betcha we have people who could change the bytecode compiler easily enough to do that. (I'm not seriously proposing this, except as a threat of what we could do if the SyntaxWarning is rejected. :-) -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Tue Oct 9 04:19:44 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 08 Oct 2012 22:19:44 -0400 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> Message-ID: On 10/8/2012 5:12 PM, Serhiy Storchaka wrote: > On 08.10.12 05:40, Terry Reedy wrote: >> Serhily, if you want a module of *generator* specific functions >> ('gentools' ?), you should write one and submit it to pypi for testing. >> > > In http://bugs.python.org/issue16150 there is proposed extending of > itertools.chain to support generators (send(), throw() and close() > methods). Is it wrong? Yes -- Terry Jan Reedy From steve at pearwood.info Tue Oct 9 04:26:53 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 13:26:53 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121009022653.GD27445@ando> Summary: - favourite & second-favourite choices: `p.join(q)` or `p.add(q)`. - most disliked: `p[q]` `p[q]` -1: looks like indexing or key-lookup, isn't either. `p + q` +0: potential confusion between path component joining and filename suffix appending. `p / q` +0: looks funny if either arg is a literal string. `p.join(q)`: +1: self-explanatory, suggests os.path.join, I am not convinced that Nick's fears about confusing Path.join and str.join will be a problem in practice. `p.pathjoin(q)` and `p.joinpath(q)` -1: dislike repeating the type name in the method name; also, too easy to forget which one is used. `p.add(q)` +0.5: nice and short, but not quite self-explanatory. `p.append(q)` -0: suggests an in-place modification. -- Steven From steve at pearwood.info Tue Oct 9 04:42:04 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 13:42:04 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121009022653.GD27445@ando> References: <20121008204707.48559bf9@pitrou.net> <20121009022653.GD27445@ando> Message-ID: <20121009024204.GE27445@ando> And I knew there was another suggestion tickling around in my subconscious... I have a new favourite: `p & q` +1: unlikely to be confused with int or set &; strings do not currently use it; suggests concatenation; short, can work with two paths or path and string if needed. p + ".ext" to add a suffix to the file name; an error if p is a directory. "spam" + p should probably an error. I can't think of a good use case for prepending a string to a path. p & q to concatenate (join) path q to path p. p.add(q [, r, s, ...]) to concatentation multiple path components at once, more efficient than p & q & r & ..., and to make the function more discoverable and searchable. -- Steven From ben+python at benfinney.id.au Tue Oct 9 04:54:05 2012 From: ben+python at benfinney.id.au (Ben Finney) Date: Tue, 09 Oct 2012 13:54:05 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> Message-ID: <7w7gr0cq42.fsf@benfinney.id.au> Antoine Pitrou writes: > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: I hope you count U+2212 MINUS SIGN and not only U+002D HYPHEN-MINUS :-) > - `p[q]` joins path q to path p ?1. Ugly and counter-intuitive. Bracket syntax is for accessing items of a collection. > - `p + q` joins path q to path p +1. Works as I'd expect it to work, and is easily discovered. > - `p / q` joins path q to path p ?1. ?/? as a Python operator strongly connotes ?division?, and this isn't it. > - `p.join(q)` joins path q to path p +1. Explicit and clear. > (you can include a rationale if you want, but don't forget to vote :-)) Thanks for the poll. -- \ ?The whole area of [treating source code as intellectual | `\ property] is almost assuring a customer that you are not going | _o__) to do any innovation in the future.? ?Gary Barnett | Ben Finney From steve at pearwood.info Tue Oct 9 05:11:09 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 14:11:09 +1100 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121009031109.GF27445@ando> On Mon, Oct 08, 2012 at 05:12:24PM +0900, Stephen J. Turnbull wrote: > Andy Buckley writes: > > > So one last question, in case it is an acceptable python-ideas topic: > > how about adding readline-like support by default in the > > interpreter? > > If readline-like support is available on the system, it's used. > However, it's apparently only readline-like. For example, on Mac OS > X, the BSD-licensed libedit readline emulation is used by default, it > appears. I wouldn't expect full functionality there. > > On GNU/Linux systems, as I wrote, True GNU readline is used. Why this > particular function isn't bound or doesn't work right, I don't know > offhand. It is apparently a bug (my Python sources are from April, > but I can't see why this would change), since the sources say > (ll. 927-931 of Modules/readline.c): I thought so too, but apparently the behaviour being talked about is a bash extension to readline. Adding it to Python would be a feature request, not a bug fix. While it's a useful feature, I think that it's probably something which can distinguish the vanilla Python interactive interpreter from more advanced environments like iPython, which apparently already has it. -- Steven From casevh at gmail.com Tue Oct 9 05:38:13 2012 From: casevh at gmail.com (Case Van Horsen) Date: Mon, 8 Oct 2012 20:38:13 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 6:37 PM, Alexander Belopolsky wrote: > On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum wrote: >> If we want to do *anything* I think we should first introduce a >> floating point context similar to the Decimal context. Then we can >> talk. > > +float('inf') I implemented a floating point context manager for gmpy2 and the MPFR floating point library. By default, it enables a non-stop mode where infinities and NaN are returned but you can also raise exceptions. You can experiment with gmpy2: http://code.google.com/p/gmpy/ Some examples >>> import gmpy2 >>> gmpy2.get_context() context(precision=53, real_prec=Default, imag_prec=Default, round=RoundToNearest, real_round=Default, imag_round=Default, emax=1073741823, emin=-1073741823, subnormalize=False, trap_underflow=False, underflow=False, trap_overflow=False, overflow=False, trap_inexact=False, inexact=False, trap_invalid=False, invalid=False, trap_erange=False, erange=False, trap_divzero=False, divzero=False, trap_expbound=False, allow_complex=False) >>> gmpy2.log(0) mpfr('-inf') >>> gmpy2.get_context() context(precision=53, real_prec=Default, imag_prec=Default, round=RoundToNearest, real_round=Default, imag_round=Default, emax=1073741823, emin=-1073741823, subnormalize=False, trap_underflow=False, underflow=False, trap_overflow=False, overflow=False, trap_inexact=False, inexact=False, trap_invalid=False, invalid=False, trap_erange=False, erange=False, trap_divzero=False, divzero=True, trap_expbound=False, allow_complex=False) >>> gmpy2.get_context().clear_flags() >>> gmpy2.get_context().trap_divzero=True >>> gmpy2.log(0) Traceback (most recent call last): File "", line 1, in gmpy2.DivisionByZeroError: 'mpfr' division by zero in log() >>> gmpy2.set_context(gmpy2.context()) >>> gmpy2.nan()==gmpy2.nan() False >>> gmpy2.get_context() context(precision=53, real_prec=Default, imag_prec=Default, round=RoundToNearest, real_round=Default, imag_round=Default, emax=1073741823, emin=-1073741823, subnormalize=False, trap_underflow=False, underflow=False, trap_overflow=False, overflow=False, trap_inexact=False, inexact=False, trap_invalid=False, invalid=False, trap_erange=False, erange=True, trap_divzero=False, divzero=False, trap_expbound=False, allow_complex=False) >>> gmpy2.get_context().trap_erange=True >>> gmpy2.nan()==gmpy2.nan() Traceback (most recent call last): File "", line 1, in gmpy2.RangeError: comparison with NaN >>> Standard disclaimers: * I'm the maintainer of gmpy2. * Please use SVN or beta2 (when it is released) to avoid a couple of embarrassing bugs. :( > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From stephen at xemacs.org Tue Oct 9 05:42:03 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 09 Oct 2012 12:42:03 +0900 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <20121009031109.GF27445@ando> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> <20121009031109.GF27445@ando> Message-ID: <87a9vw8g6s.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I thought so too, but apparently the behaviour being talked about is a > bash extension to readline. Adding it to Python would be a feature > request, not a bug fix. In that case, I think it's unfortunately that Python doesn't provide a way to warn about unimplemented stuff in .inputrc. Both on my Mac and on my Gentoo system, C-o simply does nothing. From guido at python.org Tue Oct 9 06:13:58 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 21:13:58 -0700 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: <87a9vw8g6s.fsf@uwakimon.sk.tsukuba.ac.jp> References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> <20121009031109.GF27445@ando> <87a9vw8g6s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Mon, Oct 8, 2012 at 8:42 PM, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I thought so too, but apparently the behaviour being talked about is a > > bash extension to readline. Adding it to Python would be a feature > > request, not a bug fix. > > In that case, I think it's unfortunately that Python doesn't provide a > way to warn about unimplemented stuff in .inputrc. Both on my Mac and > on my Gentoo system, C-o simply does nothing. Please do file a bug about this. Python's interface to readline is pretty old, I wouldn't be surprised if more functionality could be added. Regarding operate-and-get-next, I searched for "gnu readline operate-and-get-next" and found some feature requests about it for Sage and IPython (not sure of the status there), plus an explanation of why it's not part of GNU readline: it needs to be implemented by the calling app because only the latter knows what constitutes a complete statement. I think either of these would probably be a fun project for an aspiring core developer interested in improving their C skills. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Tue Oct 9 06:16:16 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 15:16:16 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50723BE5.3060300@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> Message-ID: <20121009041613.GG27445@ando> On Sun, Oct 07, 2012 at 10:35:17PM -0400, Ned Batchelder wrote: > A sentence in section 5.4 (Numeric Types) would help. Something like, > "In accordance with the IEEE 754 standard, NaN's are not equal to any > value, even another NaN. This is because NaN doesn't represent a > particular number, it represents an unknown result, and there is no way > to know if one unknown result is equal to another unknown result." NANs don't quite mean "unknown result". If they did they would probably be called "MISSING" or "UNKNOWN" or "NA" (Not Available). NANs represent a calculation result which is Not A Number. Hence the name :-) Since we're talking about the mathematical domain here, a numeric calculation that doesn't return a numeric result could be said to have no result at all: there is no real-valued x for which x**2 == -1, hence sqrt(-1) can return a NAN. It certainly doesn't mean "well, there is an answer, but I don't know what it is". It means "I know that there is no answer". Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say that they are equal. If we did, we could prove anything: sqrt(-1) = sqrt(-2) Square both sides: -1 = -2 I was not on the IEEE committee, so I can't speak for them, but my guess is that they reasoned that since there are an infinite number of "no result" not-a-number calculations, but only a finite number of NAN bit patterns available to be used for them, it isn't even safe to presume that two NANs with the same bit pattern are equal since they may have come from completely different calculations. Of course this was before object identity was a relevant factor. As I've stated before, I think that having collections choose to optimize away equality tests using object identity is fine. If I need a tuple that honours NAN semantics, I can subclass tuple to get one. I shouldn't expect the default tuple behaviour to carry that cost. By the way, NANs are awesome and don't get anywhere near enough respect. Here's a great idea from the D language: http://www.drdobbs.com/cpp/nans-just-dont-get-no-respect/240005723 -- Steven From steve at pearwood.info Tue Oct 9 06:32:36 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 15:32:36 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50733A18.10400@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> Message-ID: <20121009043236.GI27445@ando> On Mon, Oct 08, 2012 at 04:39:52PM -0400, Ned Batchelder wrote: > How about: > > "In accordance with the IEEE 754 standard, when NaNs are compared to any > value, even another NaN, the result is always False, regardless of the > comparison. This is because NaN represents an unknown result. There is no > way to know the relationship between an unknown result and any other > result, especially another unknown one. Even comparing a NaN to itself > always produces False." Two issues: 1) It is not the case that NaN NaN is always false. 2) "invalid result" is more appropriate than "unknown result". -- Steven From steve at pearwood.info Tue Oct 9 06:35:56 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 15:35:56 +1100 Subject: [Python-ideas] History stepping in interactive session? In-Reply-To: References: <506EA800.1080106@insectnation.org> <87a9w18bb8.fsf@uwakimon.sk.tsukuba.ac.jp> <5071E53E.4030906@insectnation.org> <87626le61j.fsf@uwakimon.sk.tsukuba.ac.jp> <20121009031109.GF27445@ando> <87a9vw8g6s.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121009043556.GJ27445@ando> On Mon, Oct 08, 2012 at 09:13:58PM -0700, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 8:42 PM, Stephen J. Turnbull wrote: > > Steven D'Aprano writes: > > > > > I thought so too, but apparently the behaviour being talked about is a > > > bash extension to readline. Adding it to Python would be a feature > > > request, not a bug fix. > > > > In that case, I think it's unfortunately that Python doesn't provide a > > way to warn about unimplemented stuff in .inputrc. Both on my Mac and > > on my Gentoo system, C-o simply does nothing. > > Please do file a bug about this. Python's interface to readline is > pretty old, I wouldn't be surprised if more functionality could be > added. The time machine strikes again: http://bugs.python.org/issue8492 -- Steven From steve at pearwood.info Tue Oct 9 06:26:35 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 15:26:35 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: <20121009042635.GH27445@ando> On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote: > It's not about equality. If you ask whether two NaNs are *unequal* the > answer is *also* False. Not so. I think you are conflating NAN equality/inequality with ordering comparisons. Using Python 3.3: py> nan = float('nan') py> nan > 0 False py> nan < 0 False py> nan == 0 False py> nan != 0 True but: py> nan == nan False py> nan != nan True -- Steven From greg.ewing at canterbury.ac.nz Tue Oct 9 07:08:18 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:08:18 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> Message-ID: <5073B142.4020105@canterbury.ac.nz> Nick Coghlan wrote: > Huh? It's a tree structure. A subpath lives inside its parent path, > just as subnodes are children of their parent node. You're confusing the path, which is a name, with the object that it names. It's called a path because it's the route that you follow from the root to reach the node being named. To reach a subnode of N requires following a *longer* path than you did to reach N. There's no sense in which the *path* to the subnode is "contained" within the path to N -- rather it's the other way around. -- Greg From ben at bendarnell.com Tue Oct 9 07:12:51 2012 From: ben at bendarnell.com (Ben Darnell) Date: Mon, 8 Oct 2012 22:12:51 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum wrote: >> It's a Future constructor, a (conditional) add_done_callback, plus the >> calls to set_result or set_exception and the with statement for error >> handling. In full: >> >> def future_wrap(f): >> @functools.wraps(f) >> def wrapper(*args, **kwargs): >> future = Future() >> if kwargs.get('callback') is not None: >> future.add_done_callback(kwargs.pop('callback')) >> kwargs['callback'] = future.set_result >> def handle_error(typ, value, tb): >> future.set_exception(value) >> return True >> with ExceptionStackContext(handle_error): >> f(*args, **kwargs) >> return future >> return wrapper > > Hmm... I *think* it automatically adds a special keyword 'callback' to > the *call* site so that you can do things like > > fut = some_wrapped_func(blah, callback=my_callback) > > and then instead of using yield to wait for the callback, put the > continuation of your code in the my_callback() function. Yes. Note that if you're passing in a callback you're probably going to just ignore the return value. The callback argument and the future return value are essentially two alternative interfaces; it probably doesn't make sense to use both at once (but as a library author it's useful to provide both). > But it also > seems like it passes callback=future.set_result as the callback to the > wrapped function, which looks to me like that function was apparently > written before Futures were widely used. This seems pretty impure to > me and I'd like to propose a "future" where such functions either be > given the Future where the result is expected, or (more commonly) the > function would create the Future itself. Yes, it's impure and based on pre-Future patterns. The caller's callback argument and the inner function's callback not really related any more (they were the same in pre-Future async code of course). They should probably have different names, although if the inner function's return value were passed via exception (StopIteration or return) the inner callback argument can just go away. > > Unless I'm totally missing the programming model here. > > PS. I'd like to learn more about ExceptionStackContext() -- I've > struggled somewhat with getting decent tracebacks in NDB. StackContext doesn't quite give you better tracebacks, although I think it could be adapted to do that. ExceptionStackContext is essentially a try/except block that follows you around across asynchronous operations - on entry it sets a thread-local state, and all the tornado asynchronous functions know to save this state when they are passed a callback, and restore it when they execute it. This has proven to be extremely helpful in ensuring that all exceptions get caught by something that knows how to do the appropriate cleanup (i.e. an asynchronous web page serves an error instead of just spinning forever), although it has turned out to be a little more intrusive and magical than I had originally anticipated. https://github.com/facebook/tornado/blob/master/tornado/stack_context.py > >>>> In Tornado the Future is created by a decorator >>>> and hidden from the asynchronous function (it just sees the callback), >>> >>> Hm, interesting. NDB goes the other way, the callbacks are mostly used >>> to make Futures work, and most code (including large swaths of >>> internal code) uses Futures. I think NDB is similar to monocle here. >>> In NDB, you can do >>> >>> f = >>> r = yield f >>> >>> where "yield f" is mostly equivalent to f.result(), except it gives >>> better opportunity for concurrency. >> >> Yes, tornado's gen.engine does the same thing here. However, the >> stakes are higher than "better opportunity for concurrency" - in an >> event loop if you call future.result() without yielding, you'll >> deadlock if that Future's task needs to run on the same event loop. > > That would depend on the semantics of the event loop implementation. > In NDB's event loop, such a .result() call would just recursively > enter the event loop, and you'd only deadlock if you actually have two > pieces of code waiting for each other's completion. Hmm, I think I'd rather deadlock. :) If the event loop is reentrant then the application code has be coded defensively as if it were preemptively multithreaded, which introduces the possibility of deadlock or (probably) more subtle/less frequent errors. Reentrancy has been a significant problem in my experience, so I've been moving towards a policy where methods in Tornado that take a callback never run it immediately; callbacks are always scheduled on the next iteration of the IOLoop with IOLoop.add_callback. > > [...] >>> I am currently trying to understand if using "yield from" (and >>> returning a value from a generator) will simplify things. For example >>> maybe the need for a special decorator might go away. But I keep >>> getting headaches -- perhaps there's a Monad involved. :-) >> >> I think if you build generator handling directly into the event loop >> and use "yield from" for calls from one async function to another then >> you can get by without any decorators. But I'm not sure if you can do >> that and maintain any compatibility with existing non-generator async >> code. >> >> I think the ability to return from a generator is actually a bigger >> deal than "yield from" (and I only learned about it from another >> python-ideas thread today). The only reason a generator decorated >> with @tornado.gen.engine needs a callback passed in to it is to act as >> a psuedo-return, and a real return would prevent the common mistake of >> running the callback then falling through to the rest of the function. > > Ah, so you didn't come up with the clever hack of raising an exception > to signify the return value. In NDB, you raise StopIteration (though > it is given the alias 'Return' for clarity) with an argument, and the > wrapper code that is responsible for the Future takes the value from > the StopIteration exception and passes it to the Future's > set_result(). I think I may have thought about "raise Return(x)" and dismissed it as too weird. But then, I'm abnormally comfortable with asynchronous code that passes callbacks around. > >> For concreteness, here's a crude sketch of what the APIs I'm talking >> about would look like in use (in a hypothetical future version of >> tornado). >> >> @future_wrap >> @gen.engine >> def async_http_client(url, callback): >> parsed_url = urlparse.urlsplit(url) >> # works the same whether the future comes from a thread pool or @future_wrap > > And you need the thread pool because there's no async version of > getaddrinfo(), right? Right. > >> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) >> stream = IOStream(socket.socket()) >> yield stream.connect((addrinfo[0][-1])) >> stream.write('GET %s HTTP/1.0' % parsed_url.path) > > Why no yield in front of the write() call? Because we don't need to wait for the write to complete before we continue to the next statement. write() doesn't return anything; it just succeeds or fails, and if it fails the next read_until will fail too. (although in this case it wouldn't hurt to have the yield either) > >> header_data = yield stream.read_until('\r\n\r\n') >> headers = parse_headers(header_data) >> body_data = yield stream.read_bytes(int(headers['Content-Length'])) >> stream.close() >> callback(body_data) >> >> # another function to demonstrate composability >> @future_wrap >> @gen.engine >> def fetch_some_urls(url1, url2, url3, callback): >> body1 = yield async_http_client(url1) >> # yield a list of futures for concurrency >> future2 = yield async_http_client(url2) >> future3 = yield async_http_client(url3) >> body2, body3 = yield [future2, future3] >> callback((body1, body2, body3)) > > This second one is nearly identical to the way we it's done in NDB. > However I think you have a typo -- I doubt that there should be yields > on the lines creating future2 and future3. Right. > >> One hole in this design is how to deal with callbacks that are run >> multiple times. For example, the IOStream read methods take both a >> regular callback and an optional streaming_callback (which is called >> with each chunk of data as it arrives). I think this needs to be >> modeled as something like an iterator of Futures, but I haven't worked >> out the details yet. > > Ah. Yes, that's a completely different kind of thing, and probably > needs to be handled in a totally different way. I think it probably > needs to be modeled more like an infinite loop where at the blocking > point (e.g. a low-level read() or accept() call) you yield a Future. > Although I can see that this doesn't work well with the IOLoop's > concept of file descriptor (or other event source) registration. It works just fine at the IOLoop level: you call IOLoop.add_handler(fd, func, READ), and you'll get read events whenever there's new data until you call remove_handler(fd) (or update_handler). If you're passing callbacks around explicitly it's pretty straightforward (as much as anything ever is in that style) to allow for those callbacks to be run more than once. The problem is that generators more or less require that each callback be run exactly once. That's a generally desirable property, but the mismatch between the two layers can be difficult to deal with. -Ben From greg.ewing at canterbury.ac.nz Tue Oct 9 07:18:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:18:20 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> Message-ID: <5073B39C.4010708@canterbury.ac.nz> Massimo DiPierro wrote: > The fact that string paths in Unix use the / to represent concatenation > is accidental. Maybe so, but it can be regarded as a fortuitous accident, since / also happens to be an operator in Python, so it would have mnemonic value to Unix users. The correspondence is not exact for Windows users, but / is similar enough to still have some mnemonic value for them. And all the OSes using other separators seem to have died out. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 07:31:09 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:31:09 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> Message-ID: <5073B69D.6070304@canterbury.ac.nz> Massimo DiPierro wrote: > The + symbol means addition and union of disjoint sets. A path > (including a fs path) is a set of links (for a fs path, a link is a > folder name). Using the + symbols has a natural interpretation as > concatenation of subpaths (sets) to for form a longer path (superset). A reason *not* to use '+' is that it would violate associativity in some cases, e.g. (path + "foo") + "bar" would not be the same as path + ("foo" + "bar") Using '/', or any other operator not currently defined on strings, would prevent this mistake from occuring. A reason to want an operator is the symmetry of path concatenation. Symmetrical operations deserve a symmetrical syntax, and to achieve that in Python you need either an operator or a stand-alone function. A reason to prefer an operator over a function is associativity. It would be nice to be able to write path1 / path2 / path3 and not have to think about the order in which the operations are being done. If '/' is considered too much of a stretch, how about '&'? It suggests a kind of addition or concatenation, and in fact is used for string concatenation in some other languages. -- Greg From alexander.belopolsky at gmail.com Tue Oct 9 07:32:12 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 9 Oct 2012 01:32:12 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <20121009041613.GG27445@ando> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <20121009041613.GG27445@ando> Message-ID: On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano wrote: > NANs don't quite mean "unknown result". If they did they would probably > be called "MISSING" or "UNKNOWN" or "NA" (Not Available). > > NANs represent a calculation result which is Not A Number. Hence the > name :-) This is quite true, but in Python "Not A Number" is spelled None. In many aspects, None is like signaling NaN - any numerical operation on it results in a type error, but None == None is True. .. > Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say > that they are equal. If we did, we could prove anything: > > sqrt(-1) = sqrt(-2) > > Square both sides: > > -1 = -2 This is a typical mathematical fallacy where a progression of seemingly equivalent equations contains an invalid operation. See http://en.wikipedia.org/wiki/Mathematical_fallacy#All_numbers_equal_all_other_numbers This is not an argument to make nan == nan false. The IEEE 754 argument goes as follows: in the domain of 2**64 bit patterns most patterns represent real numbers, some represent infinities and some do not represent either infinities or numbers. Boolean comparison operations are defined on the entire domain, but <, =, or > outcomes are not exclusive if NaNs are present. The forth outcome is "unordered." In other words for any two patterns x and y one and only one of the following is true: x < y or x = y or x > y or x and y are unordered. If x is NaN, it compares as unordered to any other pattern including itself. This explains why compareQuietEqual(x, x) is false when x is NaN. In this case, x is unordered with itself, unordered is different from equal, so compareQuietEqual(x, x) cannot be true. It cannot raise an exception either because it has to be quiet. Thus the only correct result is to return false. The problem that we have in Python is that float.__eq__ is used for too many different things and compareQuietEqual is not always appropriate. Here is a partial list: 1. x == y 2. x in [y] 3. {y:1}[x] 4. x in {y} 5. [y].index(x) In python 3, we already took a step away from using the same notion of equality in all these cases. Thus in #2, we use x is y or x == y instead of plain x == y. But that leads to some strange results: >>> x = float('nan') >>> x in [x] True >>> float('nan') in [float('nan')] False An alternative would be to define x in l as any(isnan(x) and isnan(y) or x == y for y in l) when x and all elements of l are floats. Again, I am not making a change proposal - just mention a possibility. From greg.ewing at canterbury.ac.nz Tue Oct 9 07:36:48 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:36:48 +1300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <5073B7F0.2010307@canterbury.ac.nz> Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1, confuses operation on a path with operation on the object named by the path. > - `p + q` joins path q to path p -0.9, interacts with string concatenation in undesirable ways > - `p / q` joins path q to path p +1 > - `p.join(q)` joins path q to path p -0.9, 'append' would be clearer IMO -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 07:41:46 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:41:46 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008205634.113419ea@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: <5073B91A.5070504@canterbury.ac.nz> Antoine Pitrou wrote: > But you really want a short method name, otherwise it's better to have > a dedicated operator. joinpath() definitely doesn't cut it, IMO. I agree, it's far too longwinded. It would clutter your code just as badly as using os.path.join() all over the place does now, but without the option of aliasing it to a shorter name. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 07:48:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:48:23 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008191444.GA28668@sleipnir.bytereef.org> References: <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <20121008191444.GA28668@sleipnir.bytereef.org> Message-ID: <5073BAA7.5000309@canterbury.ac.nz> Stefan Krah wrote: > '^' or '@' are used for concatenation in some languages. At least accidental > confusion with xor is pretty unlikely. We'd have to add '@' as a new operator before we could use that. But '^' might have possibilities... if you squint, it looks a bit like a compromise between Unix and Windows path separators. :-) -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 07:56:15 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 18:56:15 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: <5073BC7F.5040203@canterbury.ac.nz> Mark Adam wrote: > 1) event handlers for the machine-program interface (ex. network I/O) > 2) event handlers for the program-user interface (ex. mouse I/O) > > While similar, my gut tell me they have to be handled in completely > different way in order to preserve order (i.e. sanity). They can't be *completely* different, because deep down there has to be a single event loop that can handle all kinds of asynchronous events. Upper layers can provide different APIs for them, but there has to be some commonality in the lowest layers. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 08:02:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 19:02:35 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> Message-ID: <5073BDFB.6000604@canterbury.ac.nz> Nick Coghlan wrote: > Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write > it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least > isn't going backwards, and is more obvious in isolation than "a / b / > c / d / e". I think we should keep in mind that we're (hopefully) not going to see things like "a / b / c / d / e" in real-life code. Rather we're going to see things like backupath = destdir / "archive" / filename + ".bak" In other words, there should be some clue from the names that paths are involved, from which it should be fairly easy to guess what the "/" means. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 08:11:18 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 19:11:18 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <20121008191444.GA28668@sleipnir.bytereef.org> Message-ID: <5073C006.4000206@canterbury.ac.nz> > On 8 October 2012 20:14, Stefan Krah > wrote: > > # A bit long > # My personal objection is that one shouldn't have to state "path" > in the name: it's not str.stringjoin() > configdir.joinpath("myprogram") > configdir.pathjoin("myprogram") I was just thinking the same thing. My preference for this at the moment is 'append', notwithstanding the fact that it will be non-mutating. It's a single, short word, it avoids re-stating the datatype, and it resonates with the idea of appending to a sequence of path components. > # My favorites ('cause my opinion: so there) > configdir.child("myprogram") # Does sorta' imply IO Except that the result isn't always a child (the RHS could be an absolute path, start with "..", etc.) > configdir.cd("myprogam") Aaaghh... my brain... the lobotomy does nothing... -- Greg From alexander.belopolsky at gmail.com Tue Oct 9 08:14:10 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Tue, 9 Oct 2012 02:14:10 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum wrote: > Such a rationale exists in my mind. Since floats are immutable, an > implementation may or may not intern certain float values (just as > certain string and int values are interned but others are not). This is an interesting argument, but I don't quite understand it. Are you suggesting that some valid Python implementation may inter NaNs? Wouldn't that require that all NaNs are equal? > Therefore, the fact that "x is y" says nothing about whether the > computations that produced x and y had anything to do with each other. True. > This is not true for mutable objects: if I have two lists, computed > separately, and find they are the same object, the computations that > produced them must have communicated somehow, or the same list was > passed in to each computations. True. > So, since two computations might > return the same object without having followed the same computational > path, in another implementation the exact same computation might not > return the same object, and so the == comparison should produce the > same value in either case True, but this logic does not dictate what this values should be. > -- in particular, if x and y are both NaN, > all 6 comparisons on them should return False (given that in general > comparing two NaNs returns False regardless of the operator used). Except for operator compareQuietUnordered() which is missing in Python. Note that IEEE 754 also defines totalOrder() operation which is more or less lexicographical ordering of bit patterns. A hypothetical language could map its 6 comparisons to totalOrder() and still claim IEEE 754 conformity as long as it implements the other 22 comparison predicates somehow. > The reason for invoking IEEE 754 here is that without it, Python might > well have grown a language-wide rule stating that an object should > *always* compare equal to itself, as there would have been no > significant counterexamples. Why would it be a bad thing? Isn't this rule what Bertrand Meyer calls one of the pillars of civilization? It looks like you give a circular argument. Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison and we implement float.__eq__ as IEEE 754 equality comparison in order to provide a significant counterexample to x is y implies x == y rule. I am not sure how interning comes into play here, so I must have missed something. From dickinsm at gmail.com Tue Oct 9 08:43:57 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 9 Oct 2012 07:43:57 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50733A18.10400@nedbatchelder.com> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> Message-ID: On Mon, Oct 8, 2012 at 9:39 PM, Ned Batchelder wrote: > How about: > > "In accordance with the IEEE 754 standard, when NaNs are compared to any > value, even another NaN, the result is always False, regardless of the > comparison. This is because NaN represents an unknown result. There is no > way to know the relationship between an unknown result and any other result, > especially another unknown one. Even comparing a NaN to itself always > produces False." Looks fine, but I'd suggest leaving out the philosophy ('there is no way to know ...') and sticking to the statement that Python follows the IEEE 754 standard in this respect. The justification isn't particularly convincing and (IMO) only serves to invite arguments. -- Mark From guido at python.org Tue Oct 9 08:44:12 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 23:44:12 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <20121009042635.GH27445@ando> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <20121009042635.GH27445@ando> Message-ID: This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later..... On Monday, October 8, 2012, Steven D'Aprano wrote: > On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote: > > > It's not about equality. If you ask whether two NaNs are *unequal* the > > answer is *also* False. > > Not so. I think you are conflating NAN equality/inequality with ordering > comparisons. Using Python 3.3: > > py> nan = float('nan') > py> nan > 0 > False > py> nan < 0 > False > py> nan == 0 > False > py> nan != 0 > True > > but: > > py> nan == nan > False > py> nan != nan > True > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Tue Oct 9 08:49:30 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Tue, 9 Oct 2012 07:49:30 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <20121009042635.GH27445@ando> Message-ID: On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum wrote: > This smells like a bug in the != operator, it seems to fall back to not == > which it didn't used to. More later..... I'm fairly sure it's deliberate, and has been this way in Python for a long time. IEEE 754 also has x != x when x is a NaN (at least, for those IEEE 754 functions that return a boolean rather than signaling an invalid exception), and it's a well documented property of NaNs across languages. -- Mark From ben at bendarnell.com Tue Oct 9 08:53:11 2012 From: ben at bendarnell.com (Ben Darnell) Date: Mon, 8 Oct 2012 23:53:11 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5073BC7F.5040203@canterbury.ac.nz> References: <5073BC7F.5040203@canterbury.ac.nz> Message-ID: On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing wrote: > Mark Adam wrote: >> >> 1) event handlers for the machine-program interface (ex. network I/O) >> 2) event handlers for the program-user interface (ex. mouse I/O) >> >> While similar, my gut tell me they have to be handled in completely >> different way in order to preserve order (i.e. sanity). > > > They can't be *completely* different, because deep down there > has to be a single event loop that can handle all kinds of > asynchronous events. There doesn't *have* to be - you could run a network event loop in one thread and a GUI event loop in another and pass control back and forth via methods like IOLoop.add_callback or Reactor.callFromThread. However, Twisted has Reactor implementations that are integrated with several different GUI toolkit's event loops, and while I haven't worked with such a beast my gut instinct is that in most cases a single shared event loop is the way to go. -Ben > > Upper layers can provide different APIs for them, but there > has to be some commonality in the lowest layers. > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Tue Oct 9 08:58:55 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 8 Oct 2012 23:58:55 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <20121009042635.GH27445@ando> Message-ID: On Mon, Oct 8, 2012 at 11:49 PM, Mark Dickinson wrote: > On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum wrote: >> This smells like a bug in the != operator, it seems to fall back to not == >> which it didn't used to. More later..... > > I'm fairly sure it's deliberate, and has been this way in Python for a > long time. IEEE 754 also has x != x when x is a NaN (at least, for > those IEEE 754 functions that return a boolean rather than signaling > an invalid exception), and it's a well documented property of NaNs > across languages. Yeah, sorry, I misremembered. :-) This does mean we need to update the text Ned is proposing. -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Tue Oct 9 09:05:49 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 18:05:49 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <20121009042635.GH27445@ando> Message-ID: <20121009070549.GA30054@ando> On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote: > This smells like a bug in the != operator, it seems to fall back to not == > which it didn't used to. More later..... I'm pretty sure the behaviour is correct. When I get home this evening, I will check my copy of the Standard Apple Numerics manual (one of the first IEEE 754 compliant systems). In the meantime, I quote from "What Every Computer Scientist Should Know About Floating-Point Arithmetic" "Since comparing a NaN to a number with <, ?, >, ?, or = (but not ?) always returns false..." (Admittedly it doesn't specifically state the case of comparing a NAN with a NAN.) http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html -- Steven From senthil at uthcode.com Tue Oct 9 09:10:47 2012 From: senthil at uthcode.com (Senthil Kumaran) Date: Tue, 9 Oct 2012 00:10:47 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1 I think, this is listed as example in PEP 428. I had to look it up to understand. Not intuitive (to me atleast) as join. > - `p + q` joins path q to path p +0. I would be +1. But in the PEP you have listed that we need a way separate path behaviors from confusing with builtins Though it provides a lot of convenience, it can be confused with str behaviors or other object behaviors. > - `p / q` joins path q to path p -1. > - `p.join(q)` joins path q to path p +0 > `p.pathjoin(q)` +1 It is very explicit and hard to get it wrong. From p.f.moore at gmail.com Tue Oct 9 09:12:52 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 9 Oct 2012 08:12:52 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: On 9 October 2012 00:19, Ryan D Hiebert wrote: > If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin The parallel with urljoin also suggests that pathjoin is a better name than joinpath. But note that I've seen both used in this thread - there is obviously some level of confusion possible. Paul. From guido at python.org Tue Oct 9 09:13:08 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2012 00:13:08 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Mon, Oct 8, 2012 at 11:14 PM, Alexander Belopolsky wrote: > On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum wrote: >> Such a rationale exists in my mind. Since floats are immutable, an >> implementation may or may not intern certain float values (just as >> certain string and int values are interned but others are not). > > This is an interesting argument, but I don't quite understand it. Are > you suggesting that some valid Python implementation may inter NaNs? > Wouldn't that require that all NaNs are equal? Sorry, it seems I got this part slightly wrong. Forget interning. The argument goes the other way: If you *do* compute x and y exactly the same way, and if they don't return the same object, and if they both return NaN, the rules for comparing NaN apply, and the values must compare unequal. So if you compute them exactly the same way but somehow you do return the same object, that shouldn't suddenly make them compare equal. >> Therefore, the fact that "x is y" says nothing about whether the >> computations that produced x and y had anything to do with each other. > > True. > >> This is not true for mutable objects: if I have two lists, computed >> separately, and find they are the same object, the computations that >> produced them must have communicated somehow, or the same list was >> passed in to each computations. > > True. > >> So, since two computations might >> return the same object without having followed the same computational >> path, in another implementation the exact same computation might not >> return the same object, and so the == comparison should produce the >> same value in either case > > True, but this logic does not dictate what this values should be. > >> -- in particular, if x and y are both NaN, >> all 6 comparisons on them should return False (given that in general >> comparing two NaNs returns False regardless of the operator used). > > Except for operator compareQuietUnordered() which is missing in > Python. Note that IEEE 754 also defines totalOrder() operation > which is more or less lexicographical ordering of bit patterns. A > hypothetical language could map its 6 comparisons to totalOrder() and > still claim IEEE 754 conformity as long as it implements the other 22 > comparison predicates somehow. Yes, but that's not the choice Python made, so it's irrelevant. (Unless you now *do* want to change the language, despite stating several times that you were just asking for explanations. :-) >> The reason for invoking IEEE 754 here is that without it, Python might >> well have grown a language-wide rule stating that an object should >> *always* compare equal to itself, as there would have been no >> significant counterexamples. > > Why would it be a bad thing? Isn't this rule what Bertrand Meyer > calls one of the pillars of civilization? I spent a week with Bertrand recently. He is prone to exaggeration. :-) > It looks like you give a circular argument. Python cannot have a rule > that x is y implies x == y because that would preclude implementing > float.__eq__ as IEEE 754 equality comparison and we implement > float.__eq__ as IEEE 754 equality comparison in order to provide a > significant counterexample to x is y implies x == y rule. I am not > sure how interning comes into play here, so I must have missed > something. No, that's not what I meant -- maybe my turn of phrase "invoking IEEE" was confusing. The first part is what I meant: "Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison." The second half should be: "And we have already (independently from all this) decided that we want to implement float.__eq__ as IEEE 754 equality comparison." I'm sure a logician could rearrange the words a bit and make it look more logical. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 9 09:13:38 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2012 00:13:38 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <20121009070549.GA30054@ando> References: <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <20121009042635.GH27445@ando> <20121009070549.GA30054@ando> Message-ID: Already retracted. :-( On Tue, Oct 9, 2012 at 12:05 AM, Steven D'Aprano wrote: > On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote: >> This smells like a bug in the != operator, it seems to fall back to not == >> which it didn't used to. More later..... > > I'm pretty sure the behaviour is correct. When I get home this evening, > I will check my copy of the Standard Apple Numerics manual (one of the > first IEEE 754 compliant systems). In the meantime, I quote from > > "What Every Computer Scientist Should Know About Floating-Point > Arithmetic" > > "Since comparing a NaN to a number with <, ?, >, ?, or = (but not ?) > always returns false..." > > (Admittedly it doesn't specifically state the case of comparing a NAN > with a NAN.) > > http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From senthil at uthcode.com Tue Oct 9 09:19:29 2012 From: senthil at uthcode.com (Senthil Kumaran) Date: Tue, 9 Oct 2012 00:19:29 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: On Tue, Oct 9, 2012 at 12:12 AM, Paul Moore wrote: > On 9 October 2012 00:19, Ryan D Hiebert wrote: >> If we want a p.pathjoin method, it would make sense to me for it to work similar to urllib.parse.urljoin > > The parallel with urljoin also suggests that pathjoin is a better name > than joinpath. But note that I've seen both used in this thread - > there is obviously some level of confusion possible. pathjoin is strikes well, if we are already accustomed with the term 'urljoin'. Ryan - the protocols of those two joins will vary and should not be confused. Also pathjoin specifics would be listed in PEP 428. Thanks Senthil From greg.ewing at canterbury.ac.nz Tue Oct 9 09:35:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 20:35:05 +1300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50735B6D.700@gmail.com> References: <20121008204707.48559bf9@pitrou.net> <50735B6D.700@gmail.com> Message-ID: <5073D3A9.8090303@canterbury.ac.nz> T.B. wrote: > A small problem I see with 'add' (and with > 'append') is that the outcome of adding (or appending) an absolute path > is too surprising, unlike with the 'join' or 'joinpath' names. I don't think it's any less surprising with "join" -- when you join two things, you just as much expect both of them to be part of the result. There doesn't seem to be any concise term that encompasses all the nuances of the operation. Using an arbitrarily chosen operator would at least have the advantage of sidestepping the whole concern. Programmer 1: "Hey, what does ^ do on path objects?" Programmer 2: "It concatenates them with a path separator between, except when the second one is an absolute path, in which case it just returns the second one." Programmer 1: "That's so obscure. Why didn't they just define a concat_with_pathsep_or_second_if_absolute() method... oh, wait, I think I see..." -- Greg From p.f.moore at gmail.com Tue Oct 9 09:36:58 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 9 Oct 2012 08:36:58 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073B91A.5070504@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <5073B91A.5070504@canterbury.ac.nz> Message-ID: On 9 October 2012 06:41, Greg Ewing wrote: > Antoine Pitrou wrote: > >> But you really want a short method name, otherwise it's better to have >> a dedicated operator. joinpath() definitely doesn't cut it, IMO. > > > I agree, it's far too longwinded. It would clutter your code > just as badly as using os.path.join() all over the place does > now, but without the option of aliasing it to a shorter name. Good point - the fact that it's not possible to alias a method name means that it's important to get the name right if we're to use a method, because we're all stuck with it forever. Because of that, I'm much more reluctant to "just put up with" Path.pathjoin on the basis that it's better than any other option. Are there any libraries that use a method on a path object (or something similar - URL objects, maybe) and if so, what method name did they use? I'd like to see what real code using any proposed method name would look like. As a point of reference, twisted's FilePath class uses "child". Paul From greg.ewing at canterbury.ac.nz Tue Oct 9 09:51:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 20:51:20 +1300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> Message-ID: <5073D778.6000609@canterbury.ac.nz> Oscar Benjamin wrote: > They do provide the same kind of iterator in the sense that they > reproduce the properties of the object *in so far as it is an > iterator* by yielding the same values. I think we agree on that. Where we seem to disagree is on whether returning a value with StopIteration is part of the iterator protocol or the generator protocol. To my mind it's part of the generator protocol, and as such, itertools functions are not under any obligation to support it. -- Greg From songofacandy at gmail.com Tue Oct 9 09:52:52 2012 From: songofacandy at gmail.com (INADA Naoki) Date: Tue, 9 Oct 2012 16:52:52 +0900 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: - `p[q]` joins path q to path p -1. Because I can't imagine consistent iterator and __contains__. - `p + q` joins path q to path p +0. I prefer '/' because it is very common path separator. - `p / q` joins path q to path p +1 - `p.join(q)` joins path q to path p +1. But `q` should be `*q`. -1 on `pathjoin`. `Path.pathjoin` is ugly. The `urljoin()` is OK because it is just a function. -- INADA Naoki From greg.ewing at canterbury.ac.nz Tue Oct 9 10:13:25 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 21:13:25 +1300 Subject: [Python-ideas] Subpaths [was Re: PEP 428 - object-oriented filesystem paths] In-Reply-To: <50736F9D.7000509@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <87bogfvrni.fsf@uwakimon.sk.tsukuba.ac.jp> <20121006141858.73b42c38@pitrou.net> <50731A31.30606@pearwood.info> <50736F9D.7000509@pearwood.info> Message-ID: <5073DCA5.3050504@canterbury.ac.nz> Steven D'Aprano wrote: > The point is, despite the common "sub" prefix, the semantics of > "subdirectory" is quite different from the semantics of "substring", > "subset", "subtree" and "subpath". I think the "sub" in "subdirectory" is more in the sense of "below", rather than "is a part of". Like a submarine is something that travels below the surface of the sea, not something that's part of the sea. -- Greg From him at online.de Tue Oct 9 10:18:10 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Tue, 09 Oct 2012 10:18:10 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073581B.2030900@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> Message-ID: <5073DDC2.2080602@online.de> On 09/10/2012 00:47 Greg Ewing wrote: > I'd prefer 'append', because > > path.append("somedir", "file.txt") > > is pretty self-explanatory, whereas As has already been stated by others, paths are immutable so using them like lists is leading to confusion (and list's append() only wants one arg, so extend() might be better in that case). But paths could then be interpreted as tuples of "directory entries" instead. So adding a path to a path would "join" them: pathA + pathB and in order to not always need a path object for pathB one could also write the right argument of __add__ as a tuple of strings: pathA + ("somedir", "file.txt") One could also use "+" for adding to the last segment if it isn't a path object or a tuple: pathA + ".tar.gz" Joachim From greg.ewing at canterbury.ac.nz Tue Oct 9 10:19:54 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 21:19:54 +1300 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: <5073DE2A.6060108@canterbury.ac.nz> Oscar Benjamin wrote: > The main purpose of quiet NaNs is to propagate through computation > ruining everything they touch. But they stop doing that as soon as they hit an if statement. It seems to me that the behaviour chosen for NaN comparison could just as easily make things go wrong as make them go right. E.g. while not (error < epsilon): find_a_better_approximation() If error ever ends up being NaN, this will go into an infinite loop. -- Greg From ncoghlan at gmail.com Tue Oct 9 10:22:47 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 13:52:47 +0530 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum wrote: > No, that's not what I meant -- maybe my turn of phrase "invoking IEEE" > was confusing. The first part is what I meant: "Python cannot have a > rule that x is y implies x == y because that would preclude > implementing float.__eq__ as IEEE 754 equality comparison." The second > half should be: "And we have already (independently from all this) > decided that we want to implement float.__eq__ as IEEE 754 equality > comparison." I'm sure a logician could rearrange the words a bit and > make it look more logical. I'll have a go. It's a lot longer, though :) When designing their floating point support, language designers must choose between two mutually exclusive options: 1. IEEE754 compliant floating point comparison where NaN != NaN, *even if* they're the same object 2. The invariant that "x is y" implies "x == y" The idea behind following the IEEE754 model is that mathematics is a *value based system*. There is only really one NaN, just as there is only one 4 (or 5, or any other specific value). The idea of a number having an identity distinct from its value simply doesn't exist. Thus, when modelling mathematics in an object system, it makes sense to say that *object identity is irrelevant, and only value matters*. This is the approach Python has chosen: for *numeric* operations, including comparisons, object identity is irrelevant to the maximum extent that is practical. Thus "x = float('nan'); assert x != x" holds for *exactly the same reason* that "x = 10e50; y = 10e50; assert x == y" holds. However, when it comes to containers, being able to assume that "x is y" implies "x == y" has an immense practical benefit in terms of being able to implement a large number of non-trivial optimisations. Thus the Python language definition explicitly allows containers to make that assumption, *even though it is known not to be universally true*. This hybrid model means that even though "'x is y' implies 'x == y'" is not true in the general case, it may still be *assumed to be true* regardless by container implementations. In particular, the containers defined in the standard library reference are *required* to make this assumption. This does mean that certain invariants about containers don't hold in the presence of NaN values. This is mostly a theoretical concern, but, in those cases where it *does* matter, then the appropriate solution is to implement a custom container type that handles NaN values correctly. It's perhaps worth including a section explaining this somewhere in the language reference. It's not an accident that Python behaves the way it does, but it's certainly a rationale that can help implementors correctly interpret the rest of the language spec. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Oct 9 10:30:58 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 08:30:58 +0000 (UTC) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> Message-ID: Nick Coghlan writes: > > On Tue, Oct 9, 2012 at 12:10 AM, Antoine Pitrou wrote: > > On Mon, 8 Oct 2012 10:06:17 -0600 > > Andrew McNabb wrote: > >> > >> Since this really is a matter of personal taste, I'll end my > >> participation in this discussion by voicing support for Nick Coghlan's > >> suggestion of a `join` method, whether it's named `join` or `append` or > >> something else. > > > > The join() method already exists in the current PEP, but it's less > > convenient, synctatically, than either '[]' or '/'. > > Right. My objections boil down to: > > 1. The case has not been adequately made that a second way to do it is > needed. Therefore, the initial version should just include the method > API. For the record, most Path objects out there seem to include an operator-based join operation (Twisted's FilePath is an exception, but its API is generally not very pretty). Still, I'll let the poll run a bit more :-) Regards Antoine. From ncoghlan at gmail.com Tue Oct 9 10:31:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 14:01:42 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: On Tue, Oct 9, 2012 at 1:22 PM, INADA Naoki wrote: > -1 on `pathjoin`. `Path.pathjoin` is ugly. > The `urljoin()` is OK because it is just a function. Hmm, this is a *very* interesting point. *All* of the alternatives presented are mainly replacements for just doing this: Path(p, q) And if you want a partially applied version, that's just: prefix = functools.partial(Path, p) So perhaps the right answer for the initial API is: no method, no operator, just use the constructor? The counterargument is that this approach doesn't let "p" control the return type the way a method or operator does, though. It does suggest a whole new class of verbs though, like "make" or "build". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Tue Oct 9 10:35:21 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 21:35:21 +1300 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: <5073E1C9.6070405@canterbury.ac.nz> Alexander Belopolsky wrote: > "For attribute specification, the implementation > shall provide language-defined means, such as compiler directives, to > specify a constant value for the attribute parameter for all standard > operations in a block; the scope of the attribute value is the block > with which it is associated." I believe Decimal is mostly conforming, That depends on whether "scope" is meant lexically or dynamically. Decimal contexts are scoped dynamically. -- Greg From storchaka at gmail.com Tue Oct 9 10:35:54 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 11:35:54 +0300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121009024204.GE27445@ando> References: <20121008204707.48559bf9@pitrou.net> <20121009022653.GD27445@ando> <20121009024204.GE27445@ando> Message-ID: On 09.10.12 05:42, Steven D'Aprano wrote: > p + ".ext" to add a suffix to the file name; an error if p is a > directory. Why? A directory can have a suffix. E.g. /etc/init.d. From steve at pearwood.info Tue Oct 9 10:42:41 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 19:42:41 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <5073B91A.5070504@canterbury.ac.nz> Message-ID: <20121009084239.GB30054@ando> On Tue, Oct 09, 2012 at 08:36:58AM +0100, Paul Moore wrote: > On 9 October 2012 06:41, Greg Ewing wrote: > > Antoine Pitrou wrote: > > > >> But you really want a short method name, otherwise it's better to have > >> a dedicated operator. joinpath() definitely doesn't cut it, IMO. > > > > > > I agree, it's far too longwinded. It would clutter your code > > just as badly as using os.path.join() all over the place does > > now, but without the option of aliasing it to a shorter name. > > Good point - the fact that it's not possible to alias a method name > means that it's important to get the name right if we're to use a > method, because we're all stuck with it forever. Huh? py> f = str.join # "join" is too long and I don't like it py> f("*", ["spam", "ham", "eggs"]) 'spam*ham*eggs' We should get the name right because we're stuck with it forever due to backwards compatibility, not because you can't alias it. -- Steven From steve at pearwood.info Tue Oct 9 10:43:51 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 9 Oct 2012 19:43:51 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <20121009022653.GD27445@ando> <20121009024204.GE27445@ando> Message-ID: <20121009084351.GC30054@ando> On Tue, Oct 09, 2012 at 11:35:54AM +0300, Serhiy Storchaka wrote: > On 09.10.12 05:42, Steven D'Aprano wrote: > >p + ".ext" to add a suffix to the file name; an error if p is a > >directory. > > Why? A directory can have a suffix. E.g. /etc/init.d. Fair point. -- Steven From rosuav at gmail.com Tue Oct 9 10:44:45 2012 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 9 Oct 2012 19:44:45 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5073DE2A.6060108@canterbury.ac.nz> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> <5073DE2A.6060108@canterbury.ac.nz> Message-ID: On Tue, Oct 9, 2012 at 7:19 PM, Greg Ewing wrote: > But they stop doing that as soon as they hit an if statement. > It seems to me that the behaviour chosen for NaN comparison > could just as easily make things go wrong as make them go > right. E.g. > > while not (error < epsilon): > find_a_better_approximation() > > If error ever ends up being NaN, this will go into an > infinite loop. But if you know that that's a possibility, you simply code your condition the other way: while error > epsilon: find_a_better_approximation() Which will then immediately terminate the loop if error bonks to NaN. ChrisA From oscar.j.benjamin at gmail.com Tue Oct 9 10:52:01 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 09:52:01 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5073DE2A.6060108@canterbury.ac.nz> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> <5073DE2A.6060108@canterbury.ac.nz> Message-ID: On Oct 9, 2012 9:20 AM, "Greg Ewing" wrote: > > Oscar Benjamin wrote: >> >> The main purpose of quiet NaNs is to propagate through computation >> ruining everything they touch. > > > But they stop doing that as soon as they hit an if statement. > It seems to me that the behaviour chosen for NaN comparison > could just as easily make things go wrong as make them go > right. E.g. > > while not (error < epsilon): > find_a_better_approximation() > > If error ever ends up being NaN, this will go into an > infinite loop. I should expect that an experienced numericist would be aware of the possibility of a NaN and make a trivial modification of your loop to take advantage of the simple fact that any comparison with NaN returns false. It is only because you have artificially placed a not in the while clause that it doesn't work. I would have tested for error>eps without even thinking about NaNs. Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Tue Oct 9 11:33:08 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Tue, 9 Oct 2012 11:33:08 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p > - `p + q` joins path q to path p > - `p / q` joins path q to path p > - `p.join(q)` joins path q to path p I cannot decide with such trivial examples. More realistic examples: --- def read_config(name): home = Path(os.path.expanduser("~")) # pathlib doesn't support expanduser?? with open(home / ".config" / name + ".conf") as f: return fp.read() --- The join() method has an advantage: it avoids temporary objects (config / ".config" is my example). --- def read_config(name): home = Path(os.path.expanduser("~")) # pathlib doesn't support expanduser?? with open(home.join(".config", name + ".conf")) as f: return fp.read() --- It should work even if name is a Path object, so Path + str should concatenate a suffix without adding directory separator. My vote: > - `p[q]` joins path q to path p home[".config"][name] # + ".conf" ??? -1 > - `p + q` joins path q to path p home + ".config" + name # + ".conf" ??? -1 -> Path + str must be reserved to add a suffix > - `p / q` joins path q to path p home / ".config" / name + ".conf" +1: it's natural, but maybe "suboptimal" in performance > - `p.join(q)` joins path q to path p home.join(".config", name + ".conf") +0: more efficient, but it may be confusing with str.join() which is very different. a.join(b, c) : a is the separator or the root directory, depending on the type of a (str or Path). We should avoid confusion between Path and str methods and operator (a+b and a.join(b)). Victor From solipsis at pitrou.net Tue Oct 9 11:43:02 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 09:43:02 +0000 (UTC) Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: Nick Coghlan writes: > > On Tue, Oct 9, 2012 at 1:22 PM, INADA Naoki wrote: > > -1 on `pathjoin`. `Path.pathjoin` is ugly. > > The `urljoin()` is OK because it is just a function. > > Hmm, this is a *very* interesting point. *All* of the alternatives > presented are mainly replacements for just doing this: > > Path(p, q) > > And if you want a partially applied version, that's just: > > prefix = functools.partial(Path, p) > > So perhaps the right answer for the initial API is: no method, no > operator, just use the constructor? Well, you would have to use either PurePath(p, q) or Path(p, q) based on whether p is pure or concrete. Unless we make the constructor more magic and let Path() switch to PurePath() when the first argument is a pure path. Which does sounds a bit too magic to me (Path would instantiate something which is not a Path instance...). > It does suggest a whole new class of verbs though, like "make" or "build". They are rather vague, though. Regards Antoine. From storchaka at gmail.com Tue Oct 9 11:58:54 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 12:58:54 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On 09.10.12 02:05, Mike Graham wrote: > I can't find this in a couple versions of Python I checked. If this > code is still around, it sounds like it has a bug and should be fixed. It's "if node.tagname is 'admonition':" line. > test_winsound.py has an `is 0` check and an `is ""` check. Both should be fixed. http://bugs.python.org/issue16172 From greg.ewing at canterbury.ac.nz Tue Oct 9 12:34:24 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 23:34:24 +1300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: <5073FDB0.8040903@canterbury.ac.nz> I just consulted a thesaurus about synonyms for 'append', and it came up with 'affix' and 'adjoin'. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 9 11:11:43 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 09 Oct 2012 22:11:43 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: <5073EA4F.8030405@canterbury.ac.nz> Ben Darnell wrote: > StackContext doesn't quite give you better tracebacks, although I > think it could be adapted to do that. ExceptionStackContext is > essentially a try/except block that follows you around across > asynchronous operations - on entry it sets a thread-local state, and > all the tornado asynchronous functions know to save this state when > they are passed a callback, and restore it when they execute it. This is something that generator-based coroutines using yield-from ought to handle a lot more cleanly. You should be able to just use an ordinary try-except block in your generator code and have it do the right thing. I hope that the new async core will be designed so that generator-based coroutines can be plugged into it directly and efficiently, without the need for a lot of decorators, callbacks, Futures, etc. in between. -- Greg From ubershmekel at gmail.com Tue Oct 9 13:03:38 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 9 Oct 2012 13:03:38 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman wrote: > path.py was in teh wild, and is still in use. Why do we find ourselves > debating new libraries like this as PEPs? We need to let them play out, see > what sticks. If someone wants to make this library and stick it on PyPI, > I'm not stopping them. I'm encouraging it. Let's see how it plays out. if > it works out well, it deserves a PEP. In two or three years. > > I agree, This discussion has been framed unfairly. The only things that should appear in this PEP are the guidelines Guido mentioned earlier in the discussion along with some use cases. So python is chartering a path object module, and we should let whichever module is the best on pypi eventually get into the std-lib. Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Oct 9 13:26:47 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 11:26:47 +0000 (UTC) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> Message-ID: Yuval Greenfield writes: > > On Tue, Oct 9, 2012 at 3:02 AM, Calvin Spealman wrote: > > > > path.py was in teh wild, and is still in use. Why do we find ourselves debating new libraries like this as PEPs? We need to let them play out, see what sticks. If someone wants to make this library and stick it on PyPI, I'm not stopping them. I'm encouraging it. Let's see how it plays out. if it works out well, it deserves a PEP. In two or three years. > > > > I agree, > > This discussion has been framed unfairly. path.py (or a similar API) has already been rejected as PEP 355. I see no need to go through this again, at least not in this discussion thread. If you want to re-discuss PEP 355, please open a separate thread. Regards Antoine. From ncoghlan at gmail.com Tue Oct 9 14:07:35 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 17:37:35 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: On Tue, Oct 9, 2012 at 3:13 PM, Antoine Pitrou wrote: >> It does suggest a whole new class of verbs though, like "make" or "build". > > They are rather vague, though. Agreed, but what os.path.join actually *does* is rather vague, since it is really "joins the path segments, starting with the last absolute path segment". I'm mostly playing Devil's Advocate here, but I thought it was a very good point that requesting a Path or PurePath directly is *always* going to be an option. And really, the only time you *need* a PurePath is when you want to construct a non-native path - for native paths you'll always be able to create it, some methods just won't work if it doesn't actually exist on the filesystem. Using Victor's more complicated example, compare: open(os.path.join(home, ".config", name + ".conf")) open(str(Path(home, ".config", name + ".conf"))) open(home.join(".config", name + ".conf"))) open(str(home / ".config" / name + ".conf")) Path(home, ".config", name + ".conf").open() home.join(".config", name + ".conf").open() (home / ".config" / name + ".conf").open() One note regarding the extra "str()" calls relative to something like path.py: we get to define the language, so we can get the benefits of implicit conversion without the many downsides by *defining a new conversion method*, such as __fspath__. That may provide an attractive alternative to offering methods that shadow builtin functions: open(Path(home, ".config", name + ".conf")) open(home.join(".config", name + ".conf")) open(home / ".config" / name + ".conf") "As easy to use as path.py, without the design compromises imposed by inheriting from str" is a worthwhile goal. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Tue Oct 9 14:16:38 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 9 Oct 2012 17:46:38 +0530 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 4:33 PM, Yuval Greenfield wrote: > So python is chartering a path object module, and we should let whichever > module is the best on pypi eventually get into the std-lib. No, the module has to at least have a nodding acquaintance with good software design principles, avoid introducing too many ways to do the same thing, and various other concerns many authors of modules on PyPI often don't care about. That's *why* path.py got rejected in the first place. Just as ipaddress is not the same as ipaddr due to those additional concerns, so will whatever path abstraction makes into the standard library take those concerns into account. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Oct 9 14:34:36 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 14:34:36 +0200 (CEST) Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> Message-ID: <6a1b01106028597886936265071e2fce.squirrel@webmail.nerim.net> Nick Coghlan writes: > One note regarding the extra "str()" calls relative to something like > path.py: we get to define the language, so we can get the benefits of > implicit conversion without the many downsides by *defining a new > conversion method*, such as __fspath__. That may provide an attractive > alternative to offering methods that shadow builtin functions: > > open(Path(home, ".config", name + ".conf")) > open(home.join(".config", name + ".conf")) > open(home / ".config" / name + ".conf") > > "As easy to use as path.py, without the design compromises imposed by > inheriting from str" is a worthwhile goal. That's a very good idea! Even better if there's a way to make it work as expected with openat support (for example by allowing __fspath__ to return a (dir_fd, filename) tuple). Regards Antoine. From breamoreboy at yahoo.co.uk Tue Oct 9 14:47:41 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 09 Oct 2012 13:47:41 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 08/10/2012 19:47, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p -1 yuck > - `p + q` joins path q to path p +1 Pythonic > - `p / q` joins path q to path p -0 veering to +0 it just seems wrong but I can't strongly put my finger on why. > - `p.join(q)` joins path q to path p -1 likely to confuse idiots like me as it's too similar to string.join. For the last one would there be a real need for a path.join method, or has this already been discussed and I've forgotten about it? > > (you can include a rationale if you want, but don't forget to vote :-)) > > Thank you > > Antoine. > > -- Cheers. Mark Lawrence. From steve at pearwood.info Tue Oct 9 14:54:52 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 09 Oct 2012 23:54:52 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> <50736983.5030001@canterbury.ac.nz> Message-ID: <50741E9C.3090005@pearwood.info> On 09/10/12 11:32, Oscar Benjamin wrote: > The main purpose of quiet NaNs is to propagate through computation > ruining everything they touch. In a programming language like C that > lacks exceptions this is important as it allows you to avoid checking > all the time for invalid values, whilst still being able to know if > the end result of your computation was ever affected by an invalid > numerical operation. Correct, but I'd like to point out that NaNs are a bit more sophisticated than just "numeric contagion". 1) NaNs carry payload, so you can actually identify what sort of calculation failed. E.g. NaN-27 might mean "logarithm of a negative number", while NaN-95 might be "inverse trig function domain error". Any calculation involving a single NaN is supposed to propagate the same payload, so at the end of the calculation you can see that you tried to take the log of a negative number and debug accordingly. 2) On rare occasions, NaNs can validly disappear from a calculation, leaving you with a non-NaN answer. The rule is, if you can replace the NaN with *any* other value, and still get the same result, then the NaN is irrelevant and can be consumed. William Kahan gives an example: For example, 0*NaN must be NaN because 0*? is an INVALID operation (NaN). On the other hand, for hypot(x, y) := ?(x*x + y*y) we find that hypot(?, y) = +? for all real y, finite or not, and deduce that hypot(?, NaN) = +? too; naive implementations of hypot may do differently. Page 7 of http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF -- Steven From oscar.j.benjamin at gmail.com Tue Oct 9 15:07:45 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 14:07:45 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 7 October 2012 23:43, Oscar Benjamin wrote: > > Before pep 380 filter(lambda x: True, obj) returned an object that was > the same kind of iterator as obj (it would yield the same values). Now > the "kind of iterator" that obj is depends not only on the values that > it yields but also on the value that it returns. Since filter does not > pass on the same return value, filter(lambda x: True, obj) is no > longer the same kind of iterator as obj. The same considerations apply > to many other functions such as map, itertools.groupby, > itertools.dropwhile. > I really should have checked this before posting but I didn't have Python 3.3 available: Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import itertools >>> >>> def f(): ... return 'Returned from generator!' ... yield ... >>> next(filter(lambda x:True, f())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! So filter does propagate the same StopIteration instance. However map does not: >>> next(map(None, f())) Traceback (most recent call last): File "", line 1, in StopIteration The itertools module is inconsistent in this respect as well. As already mentioned itertools.chain() hides the value: >>> next(itertools.chain(f(), f())) Traceback (most recent call last): File "", line 1, in StopIteration >>> next(itertools.chain(f())) Traceback (most recent call last): File "", line 1, in StopIteration Other functions may or may not: >>> next(itertools.dropwhile(lambda x:True, f())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! >>> next(itertools.groupby(f())) Traceback (most recent call last): File "", line 1, in StopIteration These next two seem wrong since there are two iterables (but I don't think they can be done differently): >>> def g(): ... return 'From the other generator...' ... yield ... >>> next(itertools.compress(f(), g())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! >>> next(zip(f(), g())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! I guess this should be treated as undefined behaviour? Perhaps it should be documented as such so that anyone who chooses to rely on it was warned. Also some of the itertools documentation is ambiguous in relation to returning vs yielding values from an iterator. Those on the builtin functions page are defined carefully: http://docs.python.org/py3k/library/functions.html#filter filter(function, iterable) Construct an iterator from those elements of iterable for which function returns true. http://docs.python.org/py3k/library/functions.html#map map(function, iterable, ...) Return an iterator that applies function to every item of iterable, yielding the results. But some places in the itertools module use 'return' in place of 'yield': http://docs.python.org/py3k/library/itertools.html#itertools.filterfalse itertools.filterfalse(predicate, iterable) Make an iterator that filters elements from iterable returning only those for which the predicate is False. If predicate is None, return the items that are false. http://docs.python.org/py3k/library/itertools.html#itertools.groupby itertools.groupby(iterable, key=None) Make an iterator that returns consecutive keys and groups from the iterable. The key is a function computing a key value for each element. If not specified or is None, key defaults to an identity function and returns the element unchanged. Oscar From ericsnowcurrently at gmail.com Tue Oct 9 15:30:00 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Tue, 9 Oct 2012 07:30:00 -0600 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Oct 9, 2012 1:12 AM, "Senthil Kumaran" wrote: > > `p.pathjoin(q)` > > +1 > > It is very explicit and hard to get it wrong. +1 ...and it's not _that_ long a name. This would be a provisional module, so we could try the name on for size or hide it behind an operator later. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Tue Oct 9 15:49:06 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Tue, 9 Oct 2012 08:49:06 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073B69D.6070304@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> <5073B69D.6070304@canterbury.ac.nz> Message-ID: On Oct 9, 2012, at 12:31 AM, Greg Ewing wrote: > Massimo DiPierro wrote: >> The + symbol means addition and union of disjoint sets. A path (including a fs path) is a set of links (for a fs path, a link is a folder name). Using the + symbols has a natural interpretation as concatenation of subpaths (sets) to for form a longer path (superset). > > A reason *not* to use '+' is that it would violate associativity > in some cases, e.g. > > (path + "foo") + "bar" > > would not be the same as > > path + ("foo" + "bar") > I am missing something. Why not? > Using '/', or any other operator not currently defined on strings, > would prevent this mistake from occuring. > > A reason to want an operator is the symmetry of path concatenation. > Symmetrical operations deserve a symmetrical syntax, and to achieve > that in Python you need either an operator or a stand-alone function. > > A reason to prefer an operator over a function is associativity. > It would be nice to be able to write > > path1 / path2 / path3 > > and not have to think about the order in which the operations are > being done. > > If '/' is considered too much of a stretch, how about '&'? It > suggests a kind of addition or concatenation, and in fact is > used for string concatenation in some other languages. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From solipsis at pitrou.net Tue Oct 9 16:00:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 14:00:40 +0000 (UTC) Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> Message-ID: Eric Snow writes: > > > > `p.pathjoin(q)` > > > > +1 > > > > It is very explicit and hard to get it wrong. > +1 > ...and it's not _that_ long a name.? This would be a provisional module, so > we could try the name on for size or hide it behind an operator later. Or, precisely, since it's provisional, we needn't *wait* before we provide an operator. Any stdlib module API can be augmented; what provisional modules allow in addition to that is to modify or remove existing APIs. So we can, say, enable Path.__truediv__ and wait for people to complain about it. By the way, it's not new to have dual operator / method APIs. For example set.union and set.__or__; list.extend and list.__iadd__; etc. Regards Antoine. From michelelacchia at gmail.com Tue Oct 9 16:27:24 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Tue, 9 Oct 2012 07:27:24 -0700 (PDT) Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <56a499f1-cf04-4930-b4d8-4e090d8e6f7a@googlegroups.com> > > > - `p[q]` joins path q to path p > For some obscure reason I really like this one. I can understand the arguments against it though. So I'll probably be the only one to be +0 on this proposal. - `p + q` joins path q to path p > -0 I agree with who says this operator should be used as suffix appending, and not for path components. - `p / q` joins path q to path p > +0 I'm not against the div operator I'd prefer to use another one. I'm a *nix person but I find this proposal too *nix-centric. About the operator: I really like *Steven D'Aprano*'s proposal: I find *&*just perfect. I'm way more than +1 on it. - `p.join(q)` joins path q to path p > +1 I feel the need for a method, in parallel with some operator. About the name: if join is rejected I am: +1 on add() +1 on adjoin() +0 on append() -1 on pathjoin() / joinpath() -- too long, too similar, way too ugly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Tue Oct 9 16:30:08 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Tue, 9 Oct 2012 07:30:08 -0700 (PDT) Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> <5073B69D.6070304@canterbury.ac.nz> Message-ID: <7e7ccc17-4035-44ba-af20-0e77633dac1d@googlegroups.com> > > > > A reason *not* to use '+' is that it would violate associativity > > in some cases, e.g. > > > > (path + "foo") + "bar" > > > > would not be the same as > > > > path + ("foo" + "bar") > > > > > I am missing something. Why not? > Because the result would be (respectively): *path/foo/bar* and *path/foobar* . In the second example the two strings would be concatenated and only then joined to the path. This is a very good argument against the + operator! -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Tue Oct 9 16:57:12 2012 From: massimo.dipierro at gmail.com (Massimo Di Pierro) Date: Tue, 9 Oct 2012 09:57:12 -0500 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7e7ccc17-4035-44ba-af20-0e77633dac1d@googlegroups.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> <5073B69D.6070304@canterbury.ac.nz> <7e7ccc17-4035-44ba-af20-0e77633dac1d@googlegroups.com> Message-ID: <1FA95BC5-2745-43D7-9FE2-73EF9FCCEDFC@gmail.com> This is an excellent point. I change my vote to using the / operator (wait, do I even any right to vote not his?). On Oct 9, 2012, at 9:30 AM, Michele Lacchia wrote: > > > > > A reason *not* to use '+' is that it would violate associativity > > in some cases, e.g. > > > > (path + "foo") + "bar" > > > > would not be the same as > > > > path + ("foo" + "bar") > > > > > I am missing something. Why not? > > Because the result would be (respectively): path/foo/bar and path/foobar. > In the second example the two strings would be concatenated and only > then joined to the path. > This is a very good argument against the + operator! -------------- next part -------------- An HTML attachment was scrubbed... URL: From him at online.de Tue Oct 9 16:58:49 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Tue, 09 Oct 2012 16:58:49 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <7e7ccc17-4035-44ba-af20-0e77633dac1d@googlegroups.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <646D805C-581A-4278-B901-BFA5F1D0495E@gmail.com> <5073B69D.6070304@canterbury.ac.nz> <7e7ccc17-4035-44ba-af20-0e77633dac1d@googlegroups.com> Message-ID: <50743BA9.3020502@online.de> On 09/10/2012 16:30, Michele Lacchia wrote: > > > > > A reason *not* to use '+' is that it would violate associativity > > in some cases, e.g. > > > > (path + "foo") + "bar" > > > > would not be the same as > > > > path + ("foo" + "bar") > > > > > I am missing something. Why not? > > > Because the result would be (respectively): /path/foo/bar/ and > /path/foobar/. > In the second example the two strings would be concatenated and only > then joined to the path. > This is a very good argument against the + operator! But why not interpret a path as a tuple (not a list, it's immutable) of path segments and have: path + ("foo", "bar") and path + ".tar.gz" behave different (i.e. tuples add segments and strings add to the last segment)? And of course path1 + path2 adds the segments together. Joachim -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Tue Oct 9 17:15:29 2012 From: barry at python.org (Barry Warsaw) Date: Tue, 9 Oct 2012 11:15:29 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors References: Message-ID: <20121009111529.2ebee9a7@resist.wooz.org> On Oct 08, 2012, at 06:13 PM, Raymond Hettinger wrote: > >On Oct 8, 2012, at 12:44 PM, Mike Graham wrote: > >> I regularly see learners using "is" to check for string equality and >> sometimes other equality. Due to optimizations, they often come away >> thinking it worked for them. >> >> There are no cases where >> >> if x is "foo": >> >> or >> >> if x is 4: >> >> is actually the code someone intended to write. >> >> Although this has no benefit to anyone but new learners, it also >> doesn't really do any harm. > >This seems like a job for pyflakes, pylint, or pychecker. +1 -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From storchaka at gmail.com Tue Oct 9 17:28:14 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 18:28:14 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 09.10.12 16:07, Oscar Benjamin wrote: > I really should have checked this before posting but I didn't have > Python 3.3 available: Generator expression also eats the StopIteration value: >>> next(x for x in f()) Traceback (most recent call last): File "", line 1, in StopIteration > These next two seem wrong since there are two iterables (but I don't > think they can be done differently): > >>>> def g(): > .... return 'From the other generator...' > .... yield > .... >>>> next(itertools.compress(f(), g())) > Traceback (most recent call last): > File "", line 1, in > StopIteration: Returned from generator! >>>> next(zip(f(), g())) > Traceback (most recent call last): > File "", line 1, in > StopIteration: Returned from generator! >>> def h(): ... yield 42 ... return 'From the another generator...' ... >>> next(zip(f(), h())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! >>> next(zip(h(), f())) Traceback (most recent call last): File "", line 1, in StopIteration: Returned from generator! This is logical. Value returned from the first exhausted iterator. > I guess this should be treated as undefined behaviour? Perhaps it > should be documented as such so that anyone who chooses to rely on it > was warned. This should be treated as implementation details now. From storchaka at gmail.com Tue Oct 9 17:34:57 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 18:34:57 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: <5073D778.6000609@canterbury.ac.nz> References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> <5073D778.6000609@canterbury.ac.nz> Message-ID: On 09.10.12 10:51, Greg Ewing wrote: > Where we seem to disagree is on > whether returning a value with StopIteration is part of the > iterator protocol or the generator protocol. Is a generator expression work with the iterator protocol or the generator protocol? A generator expression eats a value with StopIteration: >>> def G(): ... return 42 ... yield ... >>> next(x for x in G()) Traceback (most recent call last): File "", line 1, in StopIteration Is it a bug? From ethan at stoneleaf.us Tue Oct 9 17:54:03 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 09 Oct 2012 08:54:03 -0700 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <20121009043236.GI27445@ando> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> Message-ID: <5074489B.6000003@stoneleaf.us> Steven D'Aprano wrote: > 1) It is not the case that NaN NaN is always false. Huh -- well, apparently NaN != Nan --> True. However, borrowing Steven's earlier example, and modifying slightly: sqr(-1) != sqr(-1) Shouldn't this be False? Or, to look at it another way, surely somewhere out in the Real World (tm) it is the case that two NaNs are indeed equal. ~Ethan~ From christian at python.org Tue Oct 9 18:11:08 2012 From: christian at python.org (Christian Heimes) Date: Tue, 09 Oct 2012 18:11:08 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5072C972.5070207@python.org> <20121008203637.5b0c147d@pitrou.net> Message-ID: <50744C9C.2040602@python.org> Am 08.10.2012 20:40, schrieb Guido van Rossum: > Now I know what it is I think that (a) the abstract reactor design > should support IOCP, and (b) the stdlib should have enabled by default > IOCP when on Windows. I've created a ticket for the topic: http://bugs.python.org/issue16175 Christian From steve at pearwood.info Tue Oct 9 18:11:42 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Oct 2012 03:11:42 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5074489B.6000003@stoneleaf.us> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> Message-ID: <50744CBE.4010600@pearwood.info> On 10/10/12 02:54, Ethan Furman wrote: > Or, to look at it another way, surely somewhere out in the Real >World (tm) it is the case that two NaNs are indeed equal. By definition, no. -- Steven From storchaka at gmail.com Tue Oct 9 18:18:09 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 09 Oct 2012 19:18:09 +0300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: On 09.10.12 01:44, Guido van Rossum wrote: > I don't understand that code at all, and it seems to be undocumented > (no docstrings, no mention in the external docs). Why is it using > StopIteration at all? There isn't an iterator or generator in sight. > AFAICT it should just use a different exception. I agree with you. StopIteration is not needed here (or I don't understand that code), ValueError can be raised instead it. Perhaps the author was going to use it for the iterative parsing. This is a bad example, but it is only real example which I have. I have also the artificial (but also imperfect) example: def file_reader(f): while not f.eof: yield f.read(0x2000) def zlib_decompressor(input): d = zlib.decompressobj() while not d.eof: yield d.decompress(d.unconsumed_tail or next(input)) return d.unused_data def bzip2_decompressor(input): decomp = bz2.BZ2Decompressor() while not decomp.eof: yield decomp.decompress(next(input)) return decomp.unused_data def detect_decompressor(input): data = b'' while len(data) < 5: data += next(input) if data.startswith('deflt'): decompressor = zlib_decompressor data = data[5:] elif data.startswith('bzip2'): decompressor = bzip2_decompressor data = data[5:] else: decompressor = None input = itertools.chain([data], input) return decompressor, input def multi_stream_decompressor(input): while True: decompressor, input = detect_decompressor(input) if decompressor is None: return input unused_data = yield from decompressor(input) if not unused_data: return input input = itertools.chain([unused_data], input) Of cause this can be implemented without generators, using a class to hold a state. > I think you're going at this from the wrong direction. You shouldn't > be using this feature in circumstances where you're at all likely to > run into this "problem". I think that the new language features (as returning value from generators/iterators) will generated new methods of solving problems. And for these new methods will be useful to expand the existing tools. But now I see that it is still too early to talk about it. > Itertools is for iterators, and all the extra generator > features make no sense for it. As said Greg, the question is whether returning a value with StopIteration is part of the iterator protocol or the generator protocol. From amcnabb at mcnabbs.org Tue Oct 9 18:35:02 2012 From: amcnabb at mcnabbs.org (Andrew McNabb) Date: Tue, 9 Oct 2012 10:35:02 -0600 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <5073FDB0.8040903@canterbury.ac.nz> References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> <5073FDB0.8040903@canterbury.ac.nz> Message-ID: <20121009163502.GB17286@mcnabbs.org> On Tue, Oct 09, 2012 at 11:34:24PM +1300, Greg Ewing wrote: > I just consulted a thesaurus about synonyms for 'append', > and it came up with 'affix' and 'adjoin'. Yet another possibility is "combine", which unlike "join", gives less of an implicit guarantee that it's a straightforward concatenation. Other synonyms for "combine" include "couple", "fuse", and "hitch". -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 From ethan at stoneleaf.us Tue Oct 9 18:37:09 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 09 Oct 2012 09:37:09 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073BDFB.6000604@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <20121008204014.10ba901e@pitrou.net> <20121008205634.113419ea@pitrou.net> <5073BDFB.6000604@canterbury.ac.nz> Message-ID: <507452B5.2030000@stoneleaf.us> Greg Ewing wrote: > Nick Coghlan wrote: > >> Moving from "os.path.join(a, b, c, d, e)" (or, the way I often write >> it, "joinpath(a, b, c, d, e)") to "a.joinpath(b, c, d, e)" at least >> isn't going backwards, and is more obvious in isolation than "a / b / >> c / d / e". > > I think we should keep in mind that we're (hopefully) not going > to see things like "a / b / c / d / e" in real-life code. Rather > we're going to see things like > > backupath = destdir / "archive" / filename + ".bak" > > In other words, there should be some clue from the names > that paths are involved, from which it should be fairly > easy to guess what the "/" means. +1 From ryan at ryanhiebert.com Tue Oct 9 18:59:06 2012 From: ryan at ryanhiebert.com (Ryan D Hiebert) Date: Tue, 9 Oct 2012 09:59:06 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <5073DDC2.2080602@online.de> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> <5073DDC2.2080602@online.de> Message-ID: <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> On Oct 9, 2012, at 1:18 AM, Joachim K?nig wrote: > As has already been stated by others, paths are immutable so using them > like lists is leading to confusion (and list's append() only wants one arg, so > extend() might be better in that case). > > But paths could then be interpreted as tuples of "directory entries" instead. > > So adding a path to a path would "join" them: > > pathA + pathB > > and in order to not always need a path object for pathB one could also write > the right argument of __add__ as a tuple of strings: > > pathA + ("somedir", "file.txt") I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense. > One could also use "+" for adding to the last segment if it isn't a path object or a tuple: > > pathA + ".tar.gz" This might be a reasonable way to appease both those who are viewing path as a special tuple and those who are viewing it as a special string. It breaks the parallel with tuple a bit, but it's clear that there are important properties of both strings and tuples that would be nice to preserve. Ryan From guido at python.org Tue Oct 9 19:05:12 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2012 10:05:12 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5073EA4F.8030405@canterbury.ac.nz> References: <5073EA4F.8030405@canterbury.ac.nz> Message-ID: On Tue, Oct 9, 2012 at 2:11 AM, Greg Ewing wrote: > Ben Darnell wrote: > >> StackContext doesn't quite give you better tracebacks, although I >> think it could be adapted to do that. ExceptionStackContext is >> essentially a try/except block that follows you around across >> asynchronous operations - on entry it sets a thread-local state, and >> all the tornado asynchronous functions know to save this state when >> they are passed a callback, and restore it when they execute it. > This is something that generator-based coroutines using > yield-from ought to handle a lot more cleanly. You should > be able to just use an ordinary try-except block in your > generator code and have it do the right thing. Indeed, in NDB this works great. However tracebacks don't work so great: If you don't catch the exception right away, it takes work to make the tracebacks look right when you catch it a few generator calls down on the (conceptual) stack. I fixed this to some extent in NDB, by passing the traceback explicitly along when setting an exception on a Future; before I did this, tracebacks looked awful. But there are still StackContextquite a few situations in NDB where an uncaught exception prints a baffling traceback, showing lots of frames from the event loop and other async machinery but not the user code that was actually waiting for anything. I have to study Tornado's to see if there are ideas there for improving this. > I hope that the new async core will be designed so that > generator-based coroutines can be plugged into it directly > and efficiently, without the need for a lot of decorators, > callbacks, Futures, etc. in between. That has been my hope too. But so far when thinking about this recently I have found the goal elusive -- somehow it seems there *has* to be a distinction between an operation you just *yield* (this would be waiting for a specific low-level I/O operation) and something you use with yield-from, which returns a value through StopIteration. I keep getting a headache when I think about this, so there must be a Monad in there somewhere... :-( Perhaps you can clear things up by showing some detailed (but still simple enough) example code to handle e.g. a simple web client? -- --Guido van Rossum (python.org/~guido) From eric at trueblade.com Tue Oct 9 19:11:48 2012 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 09 Oct 2012 13:11:48 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> <5073DDC2.2080602@online.de> <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> Message-ID: <50745AD4.8000607@trueblade.com> On 10/09/2012 12:59 PM, Ryan D Hiebert wrote: > On Oct 9, 2012, at 1:18 AM, Joachim K?nig wrote: >> As has already been stated by others, paths are immutable so using them >> like lists is leading to confusion (and list's append() only wants one arg, so >> extend() might be better in that case). >> >> But paths could then be interpreted as tuples of "directory entries" instead. >> >> So adding a path to a path would "join" them: >> >> pathA + pathB >> >> and in order to not always need a path object for pathB one could also write >> the right argument of __add__ as a tuple of strings: >> >> pathA + ("somedir", "file.txt") > > I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense. > >> One could also use "+" for adding to the last segment if it isn't a path object or a tuple: >> >> pathA + ".tar.gz" But then you'd have to say: pathA + ("file.txt",) right? That doesn't seem very friendly. Eric. From guido at python.org Tue Oct 9 19:44:27 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2012 10:44:27 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50745AD4.8000607@trueblade.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> <5073DDC2.2080602@online.de> <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> <50745AD4.8000607@trueblade.com> Message-ID: On Tue, Oct 9, 2012 at 10:11 AM, Eric V. Smith wrote: > On 10/09/2012 12:59 PM, Ryan D Hiebert wrote: >> On Oct 9, 2012, at 1:18 AM, Joachim K?nig wrote: >>> As has already been stated by others, paths are immutable so using them >>> like lists is leading to confusion (and list's append() only wants one arg, so >>> extend() might be better in that case). >>> >>> But paths could then be interpreted as tuples of "directory entries" instead. >>> >>> So adding a path to a path would "join" them: >>> >>> pathA + pathB >>> >>> and in order to not always need a path object for pathB one could also write >>> the right argument of __add__ as a tuple of strings: >>> >>> pathA + ("somedir", "file.txt") >> >> I like it. As you pointed out, my comparison with list is inappropriate because of path's immutability. So .append() and .extend() probably don't make sense. >> >>> One could also use "+" for adding to the last segment if it isn't a path object or a tuple: >>> >>> pathA + ".tar.gz" > > But then you'd have to say: > > pathA + ("file.txt",) > > right? > > That doesn't seem very friendly. Yeah, like the problem with % formatting. Another argument for picking a method name. -- --Guido van Rossum (python.org/~guido) From ryan at ryanhiebert.com Tue Oct 9 19:48:09 2012 From: ryan at ryanhiebert.com (Ryan D Hiebert) Date: Tue, 9 Oct 2012 10:48:09 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50745AD4.8000607@trueblade.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> <5073DDC2.2080602@online.de> <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> <50745AD4.8000607@trueblade.com> Message-ID: On Oct 9, 2012, at 10:11 AM, Eric V. Smith wrote: >>> One could also use "+" for adding to the last segment if it isn't a path object or a tuple: >>> >>> pathA + ".tar.gz" > > But then you'd have to say: > > pathA + ("file.txt",) or pathA + Path("file.txt") Just like with any tuple, if you wish to add a new part, it must be a tuple (Path) first. I'm not convinced that adding a string to a path should be allowed, but if not then we should probably throw a TypeError if its not a tuple or Path. That would leave the following method for appending a suffix: path[:-1] + Path(path[-1] + '.tar.gz') That's alot more verbose than the option to "add a string". Ryan From _ at lvh.cc Tue Oct 9 20:00:21 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Tue, 9 Oct 2012 20:00:21 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20120922163106.GA18772@hephaistos.amsuess.com> References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: Oh my me. This is a very long thread that I probably should have replied to a long time ago. This thread is intensely long right now, and tonight is the first chance I've had to try and go through it comprehensively. I'll try to reply to individual points made in the thread -- if I missed yours, please don't be offended, I promise it's my fault :) FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka async-pep. First of all, I'm glad to see that there's some more "let's get that pep along" movement. I tabled it because: a) I didn't have enough time to contribute, b) a lot of promised contributions ended up not happening when it came down to it, which was incredibly demotivating. The combination of this thread, plus the fact that I was strong armed at Pycon ZA by a bunch of community members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into exploring this thing again. First of all, I don't feel async-pep is an attempt at twisted light in the stdlib. Other than separation of transport and protocol, there's not really much there that even smells of twisted (especially since right now I'd probably throw consumers/producers out) -- and that separation is simply good practice. Twisted does the same thing, but it didn't invent it. Furthermore, the advantages seem clear: reusability and testability are more than enough for me. If there's one take away idea from async-pep, it's reusable protocols. The PEP should probably be a number of PEPs. At first sight, it seems that this number is at least four: 1. Protocol and transport abstractions, making no mention of asynchronous IO (this is what I want 3153 to be, because it's small, manageable, and virtually everyone appears to agree it's a fantastic idea) 2. A base reactor interface 3. A way of structuring callbacks: probably deferreds with a built-in inlineCallbacks for people who want to write synchronous-looking code with explicit yields for asynchronous procedures 4+ adapting the stdlib tools to using these new things Re: forward path for existing asyncore code. I don't remember this being raised as an issue. If anything, it was mentioned in passing, and I think the answer to it was something to the tune of "asyncore's API is broken, fixing it is more important than backwards compat". Essentially I agree with Guido that the important part is an upgrade path to a good third-party library, which is the part about asyncore that REALLY sucks right now. Regardless, an API upgrade is probably a good idea. I'm not sure if it should go in the first PEP: given the separation I've outlined above (which may be too spread out...), there's no obvious place to put it besides it being a new PEP. Re base reactor interface: drawing maximally from the lessons learned in twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, etc), asynchronous-looking name lookup, fd handling are the important parts. call_every can be implemented in terms of call_later on a separate object, so I think it should be (eg twisted.internet.task.LoopingCall). One thing that is apparently forgotten about is event loop integration. The prime way of having two event loops cooperate is *NOT* "run both in parallel", it's "have one call the other". Even though not all loops support this, I think it's important to get this as part of the interface (raise an exception for all I care if it doesn't work). cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Tue Oct 9 21:24:08 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 09 Oct 2012 21:24:08 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: Am 08.10.2012 20:47, schrieb Antoine Pitrou: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p +0 > - `p + q` joins path q to path p -1 > - `p / q` joins path q to path p +1 > - `p.join(q)` joins path q to path p -0 +0 for .joinpath() as the only way, +1 for .joinpath() as an alternative. Georg From him at online.de Tue Oct 9 21:30:15 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Tue, 09 Oct 2012 21:30:15 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50745AD4.8000607@trueblade.com> References: <20121005202534.5f721292@pitrou.net> <20121005191625.GA23607@iskra.aviel.ru> <20121005195327.GG8974@mcnabbs.org> <506F6201.4000503@pearwood.info> <20121005235457.GA7755@mcnabbs.org> <506F813D.2050305@canterbury.ac.nz> <20121006214540.GB20907@mcnabbs.org> <5072CF3B.2070203@pearwood.info> <20121008160617.GA1527@mcnabbs.org> <5073581B.2030900@canterbury.ac.nz> <5073DDC2.2080602@online.de> <54E19C80-E711-40A2-942B-C4A62254B1CC@ryanhiebert.com> <50745AD4.8000607@trueblade.com> Message-ID: <50747B47.8040002@online.de> On 09.10.2012 19:11, Eric V. Smith wrote: > But then you'd have to say: > > pathA + ("file.txt",) > > right? > > That doesn't seem very friendly. > You could of course write: pathA + "/file.txt" because with a separator it's still explicit. But this requires clarification because "/file.txt" could be considered an absolut path. But IMO the string additionen should be concatenation. YMMV. Joachim From andre.roberge at gmail.com Tue Oct 9 21:32:36 2012 From: andre.roberge at gmail.com (Andre Roberge) Date: Tue, 9 Oct 2012 16:32:36 -0300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 3:47 PM, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p > -1 ... semantics too different from usual meaning of [ ] - `p + q` joins path q to path p > -1 ... too problematic with strings... - `p / q` joins path q to path p > +0 ... my brain is hard-wired to see / as division or equivalent (e.g. quotient groups, etc.) - `p.join(q)` joins path q to path p > +1 .... only remaining choice. Besides, I think an explicit method makes more sense. If paths were only directories, I would have really like p.cd(q) [with support for multiple arguments of course] as everyone (I think) would naturally recognize this... However, since we can have file as well, I was trying to think of something to mean change path p so that it now points to the joining of path p and q ... and suggest p.goto(q) ;-) ;-) Andr? > > (you can include a rationale if you want, but don't forget to vote :-)) > > Thank you > > Antoine. > > > -- > Software development and contracting: http://pro.pitrou.net > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Oct 9 21:32:44 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 09 Oct 2012 15:32:44 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 10/9/2012 9:30 AM, Eric Snow wrote: > > On Oct 9, 2012 1:12 AM, "Senthil Kumaran" > > wrote: > > > `p.pathjoin(q)` > > > > +1 > > > > It is very explicit and hard to get it wrong. or path.concat(otherpath) -- Terry Jan Reedy From g.brandl at gmx.net Tue Oct 9 22:15:42 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 09 Oct 2012 22:15:42 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: Am 08.10.2012 22:38, schrieb Joshua Landau: > Conversely, I often see this: > > if x == None > > and even > > if x == True > > Okay, so maybe these are less harmful than the original complaint, but still, > yuck! > > > We can't really warn against these. > > >>> class EqualToTrue: > ... def __eq__(self, other): > ... return other is True > ... > >>> EqualToTrue() is True > False > >>> EqualToTrue() == True > True The point is that in 99.9...% of cases, if x == True: is just if x: Georg From solipsis at pitrou.net Tue Oct 9 22:16:00 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 9 Oct 2012 22:16:00 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: <20121009221600.08f3719a@pitrou.net> On Tue, 09 Oct 2012 22:15:42 +0200 Georg Brandl wrote: > > The point is that in 99.9...% of cases, > > if x == True: > > is just > > if x: But it's not dangerous to write `if x == True`, and so there isn't any point in warning. As Raymond said, this is a job for a style checker. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From g.brandl at gmx.net Tue Oct 9 22:16:49 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Tue, 09 Oct 2012 22:16:49 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: Am 09.10.2012 11:58, schrieb Serhiy Storchaka: > On 09.10.12 02:05, Mike Graham wrote: >> I can't find this in a couple versions of Python I checked. If this >> code is still around, it sounds like it has a bug and should be fixed. > > It's "if node.tagname is 'admonition':" line. It's not part of Python anyway, and should be reported to the docutils maintainers. Georg From storchaka at gmail.com Tue Oct 9 23:10:06 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Oct 2012 00:10:06 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: On 09.10.12 23:15, Georg Brandl wrote: > The point is that in 99.9...% of cases, > > if x == True: > > is just > > if x: Of cause. However in Lib/unittest/main.py I found a lot of "if x != False:" which is not equivalent to just "if x:". It is equivalent to "if x is None or x:" and so I left it as is. From jeanpierreda at gmail.com Tue Oct 9 23:32:33 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Tue, 9 Oct 2012 17:32:33 -0400 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121009020327.GB27445@ando> Message-ID: On Mon, Oct 8, 2012 at 10:14 PM, Guido van Rossum wrote: > Maybe we should do something more drastic and always create a new, > unique constant whenever a literal occurs as an argument of 'is' or > 'is not'? Then such code would never work, leading people to examine > their code more closely. I betcha we have people who could change the > bytecode compiler easily enough to do that. (I'm not seriously > proposing this, except as a threat of what we could do if the > SyntaxWarning is rejected. :-) Is this any better than making `x is 0` raise a TypeError with a message about what's wrong (as suggested by Mike Graham)? In both cases, `x is 0` is basically worthless, but at least if it raises an exception people can understand what "went wrong", because of the error message that comes with the exception. -- Devin From tjreedy at udel.edu Tue Oct 9 23:37:46 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 09 Oct 2012 17:37:46 -0400 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> <5073D778.6000609@canterbury.ac.nz> Message-ID: On 10/9/2012 11:34 AM, Serhiy Storchaka wrote: > On 09.10.12 10:51, Greg Ewing wrote: >> Where we seem to disagree is on >> whether returning a value with StopIteration is part of the >> iterator protocol or the generator protocol. There is a generator class but no 'generator protocol'. Adding the extra generator methods to another iterator class will not give its instances the suspend/resume behavior of generators. That requires the special bytecodes and flags resulting from the presence of 'yield' in the generator function whose call produces the generator. > Is a generator expression work with the iterator protocol or the > generator protocol? A generator expression produces a generator, which implements the iterator protocol and has the extra generator methods and suspend/resume behavior. Part of the iterator protocol is that .__next__ methods raise StopIteration to signal that no more objects will be yielded. A value can be attached to StopIteration, but it is irrelevant to it use as a 'done' signal. Any iterator .__next__ method. can raise or pass along StopIteration(something). Whether 'something' is even seen or not is a different question. The main users of iterators, for statements, ignore anything extra. > A generator expression eats a value with StopIteration: > > >>> def G(): > ... return 42 > ... yield > ... > >>> next(x for x in G()) > Traceback (most recent call last): > File "", line 1, in > StopIteration > > Is it a bug? Of course not. A generator expression is an abbreviation for a def statement defining a generator function followed by a call to that generator function. (x for x in G()) is roughly equivalent to def __(): for x in G(): yield x # when execution reaches here, None is returned, as usual _ = __() del __ _ # IE, _ is the value of the expression A for loop stops when it catches (and swallows) a StopIteration instance. That instance has served it function as a signal. The for mechanism ignores any attributes thereof. The generator .__next__ method that wraps the generator code object (the compiled body of the generator function) raises StopIteration if the code object ends by returning None. So the StopIteration printed in the traceback above is a different StopIteration instance and come from a different callable than the one from G that stopped the for loop in the generator. There is no sensible way to connect the two. Note that a generator can iterate through multiple iterators, like map and chain do. If the generator stops by raising StopIteration instead of returning None, *that* StopIteration instance is passed along by the .__next__ wrapper. (This may be an implementation detail, but it is currently true.) >>> def g2(): SI = StopIteration('g2') print(SI, id(SI)) raise SI yield 1 >>> try: next(g2()) except StopIteration as SI: print(SI, id(SI)) g2 52759816 g2 52759816 If you want any iterator to raise or propagate a value-laden StopIteration, you must do it explicitly or avoid swallowing one. >>> def G(): return 42; yield >>> def g3(): # replacement for your generator expression it = iter(G()) while True: yield next(it) >>> next(g3()) Traceback (most recent call last): File "", line 1, in next(g3()) File "", line 4, in g3 yield next(it) StopIteration: 42 # now you see the value Since filter takes a single iterable, it can be written like g3 and not catch the StopIteration of the corresponding iterator. def filter(pred, iterable): it = iter(iterable) while True: item = next(it) if pred(item): yield item # never reaches here, never returns None Map takes multiple iterables. In 2.x, map extended short iterables with None to match the longest. So it had to swallow StopIteration until it had collected one for each iterator. In 3.x, map stops at the first StopIteration, so it probably could be rewritten to not catch it. Whether it makes sense to do that is another question. -- Terry Jan Reedy From guido at python.org Tue Oct 9 23:38:19 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 9 Oct 2012 14:38:19 -0700 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121009020327.GB27445@ando> Message-ID: On Tue, Oct 9, 2012 at 2:32 PM, Devin Jeanpierre wrote: > On Mon, Oct 8, 2012 at 10:14 PM, Guido van Rossum wrote: >> Maybe we should do something more drastic and always create a new, >> unique constant whenever a literal occurs as an argument of 'is' or >> 'is not'? Then such code would never work, leading people to examine >> their code more closely. I betcha we have people who could change the >> bytecode compiler easily enough to do that. (I'm not seriously >> proposing this, except as a threat of what we could do if the >> SyntaxWarning is rejected. :-) > > Is this any better than making `x is 0` raise a TypeError with a > message about what's wrong (as suggested by Mike Graham)? > > In both cases, `x is 0` is basically worthless, but at least if it > raises an exception people can understand what "went wrong", because > of the error message that comes with the exception. But it's not a runtime error. It should depend on whether a literal is used in the source code, not whether the argument is an int. (There are tons of situations where it makes sense to dynamically compare two objects that may happen to be ints using 'is' -- just not when it's a literal.) So I claim that it should be a message produced during compilation -- or by a lint-like tool, as others have argued. -- --Guido van Rossum (python.org/~guido) From arnodel at gmail.com Tue Oct 9 23:50:29 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Tue, 9 Oct 2012 22:50:29 +0100 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: On 9 October 2012 22:10, Serhiy Storchaka wrote: > On 09.10.12 23:15, Georg Brandl wrote: >> >> The point is that in 99.9...% of cases, >> >> if x == True: >> >> is just >> >> if x: > > > Of cause. However in Lib/unittest/main.py I found a lot of "if x != False:" > which is not equivalent to just "if x:". It is equivalent to "if x is None > or x:" and so I left it as is. ??? >>> x = '' >>> bool(x != False) True >>> bool(x is None or x) False (same with any empty sequence) -- Arnaud From storchaka at gmail.com Wed Oct 10 00:00:07 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Oct 2012 01:00:07 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: On 10.10.12 00:50, Arnaud Delobelle wrote: >> Of cause. However in Lib/unittest/main.py I found a lot of "if x != False:" >> which is not equivalent to just "if x:". It is equivalent to "if x is None >> or x:" and so I left it as is. > > ??? ...in context of Lib/unittest/main.py. From storchaka at gmail.com Wed Oct 10 00:00:41 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Oct 2012 01:00:41 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: Message-ID: On 09.10.12 23:16, Georg Brandl wrote: > It's not part of Python anyway, and should be reported to the docutils > maintainers. Done. From joshua.landau.ws at gmail.com Wed Oct 10 00:13:57 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Tue, 9 Oct 2012 23:13:57 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50744CBE.4010600@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> Message-ID: Just a curiosity here (as I can guess of plausible reasons myself, so there probably are some official stances). Is there a reason NaNs are not instances of NaN class? Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected. I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1), but it seems a lot less of a big deal than all of the exceptions with container equalities. Thanks, Joshua -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Oct 10 00:49:08 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 9 Oct 2012 23:49:08 +0100 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> <5073D778.6000609@canterbury.ac.nz> Message-ID: On 9 October 2012 22:37, Terry Reedy wrote: > On 10/9/2012 11:34 AM, Serhiy Storchaka wrote: >> >> On 09.10.12 10:51, Greg Ewing wrote: >>> >>> Where we seem to disagree is on >>> whether returning a value with StopIteration is part of the >>> iterator protocol or the generator protocol. Correct. > Part of the iterator protocol is that .__next__ methods raise StopIteration > to signal that no more objects will be yielded. A value can be attached to > StopIteration, but it is irrelevant to it use as a 'done' signal. Any > iterator .__next__ method. can raise or pass along > StopIteration(something). Whether 'something' is even seen or not is a > different question. The main users of iterators, for statements, ignore > anything extra. I know this isn't going anywhere right now but since it might one day I thought I'd say that I considered how it could be different and the best I came up with was: def f(): return 42 yield for x in f(): pass else return_value: # return_value = 42 if we get here > If the generator stops by raising StopIteration instead of returning None, > *that* StopIteration instance is passed along by the .__next__ wrapper. > (This may be an implementation detail, but it is currently true.) I'm wondering whether propagating or not propagating the StopIteration should be a documented feature of some iterator-based functions or should always be considered an implementation detail (i.e. undefined language behaviour). Currently in Python 3.3 I guess that it is always an implementation detail since the behaviour probably results from an implementation that was written under the assumption that StopIteration instances are interchangeable. > Since filter takes a single iterable, it can be written like g3 and not > catch the StopIteration of the corresponding iterator. > > def filter(pred, iterable): > it = iter(iterable) > while True: > item = next(it) > if pred(item): > yield item > # never reaches here, never returns None > > Map takes multiple iterables. In 2.x, map extended short iterables with None > to match the longest. So it had to swallow StopIteration until it had > collected one for each iterator. In 3.x, map stops at the first > StopIteration, so it probably could be rewritten to not catch it. Whether it > makes sense to do that is another question. Thanks. That makes more sense now as I hadn't considered this behaviour of map before. Oscar From greg.ewing at canterbury.ac.nz Wed Oct 10 01:14:33 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Oct 2012 12:14:33 +1300 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> <50721095.1000800@canterbury.ac.nz> <5073D778.6000609@canterbury.ac.nz> Message-ID: <5074AFD9.6000807@canterbury.ac.nz> Serhiy Storchaka wrote: > Is a generator expression work with the iterator protocol or the > generator protocol? Iterator protocol, I think. There is no way to explicitly return a value from a generator expression, and I don't think it should implicitly return one either. Keep in mind that there can be more than one iterable involved in a genexp, so it's not clear what the return value should be in general. -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 10 02:44:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Oct 2012 13:44:23 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073EA4F.8030405@canterbury.ac.nz> Message-ID: <5074C4E7.60708@canterbury.ac.nz> Guido van Rossum wrote: > Indeed, in NDB this works great. However tracebacks don't work so > great: If you don't catch the exception right away, it takes work to > make the tracebacks look right when you catch it a few generator calls > down on the (conceptual) stack. I fixed this to some extent in NDB, by > passing the traceback explicitly along when setting an exception on a > Future; Was this before or after the recent change that was supposed to improve tracebacks from yield-fram chains? If there's still a problem after that, maybe exception handling in yield-from requires some more work. > But so far when thinking about this > recently I have found the goal elusive -- > Perhaps you can clear things up by > showing some detailed (but still simple enough) example code to handle > e.g. a simple web client? You might like to take a look at this, where I develop a series of examples culminating in a simple multi-threaded server: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt Code here: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/ > somehow it seems there *has* > to be a distinction between an operation you just *yield* (this would > be waiting for a specific low-level I/O operation) and something you > use with yield-from, which returns a value through StopIteration. It may be worth noting that nothing in my server example uses 'yield' to send or receive values -- yield is only used without argument as a suspension point. But the functions containing the yields *are* called with yield-from and may return values via StopIteration. So I think there are (at least) two distinct ways of using generators, but the distinction isn't quite the one you're making. Rather, we have "coroutines" (don't yield values, do return values) and "iterators" (do yield values, don't return values). Moreover, it's *only* the "coroutine" variety that we need to cater for when designing an async event system. Does that help to alleviate any of your monad-induced headaches? -- Greg From steve at pearwood.info Wed Oct 10 03:14:26 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 10 Oct 2012 12:14:26 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> Message-ID: <5074CBF2.8070507@pearwood.info> On 10/10/12 09:13, Joshua Landau wrote: > Just a curiosity here (as I can guess of plausible reasons myself, so there > probably are some official stances). > > Is there a reason NaNs are not instances of NaN class? Because that would complicate Python's using floats for absolutely no benefit. Instead of float operations always returning a float, they would have to return a float or a NAN. To check for a valid floating point instance, instead of saying: isinstance(x, float) you would have to say: isinstance(x, (float, NAN)) And what about infinities, denorm numbers, and negative zero? Do they get dedicated classes too? And what is the point of this added complexity? Nothing. You *still* have the rule that "x == x for all x, except for NANs". The only difference is that "NANs" now means "instances of NAN class" rather than "NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of a nuisance because some valid floating point values aren't floats but have a different class, but nothing meaningful is different. > Then x == x would be True (as they want), but [this NaN] == [that NaN] > would be False, as expected. Making NANs their own class wouldn't give you that. If we wanted that behaviour, we could have it without introducing a NAN class: just change the list __eq__ method to scan the list for a NAN using math.isnan before checking whether the lists were identical. But that would defeat the purpose of the identity check (an optimization to avoid scanning the list)! Replacing math.isnan with isinstance doesn't change that. > I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1), That question has already been raised, and answered, repeatedly in this thread. > but it seems a lot less of a big deal than all of the exceptions with > container equalities. Container equalities are not a big deal. I'm not sure what problem you think you are solving. -- Steven From mikegraham at gmail.com Wed Oct 10 03:25:55 2012 From: mikegraham at gmail.com (Mike Graham) Date: Tue, 9 Oct 2012 21:25:55 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5074CBF2.8070507@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: On Tue, Oct 9, 2012 at 9:14 PM, Steven D'Aprano wrote: > On 10/10/12 09:13, Joshua Landau wrote: >> >> Just a curiosity here (as I can guess of plausible reasons myself, so >> there >> probably are some official stances). >> >> Is there a reason NaNs are not instances of NaN class? > > > Because that would complicate Python's using floats for absolutely no > benefit. > Instead of float operations always returning a float, they would have to > return > a float or a NAN. To check for a valid floating point instance, instead of > saying: > > isinstance(x, float) > > you would have to say: > > isinstance(x, (float, NAN)) > > And what about infinities, denorm numbers, and negative zero? Do they get > dedicated classes too? > > And what is the point of this added complexity? Nothing. > > You *still* have the rule that "x == x for all x, except for NANs". The > only difference is that "NANs" now means "instances of NAN class" rather > than > "NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of > a nuisance because some valid floating point values aren't floats but have a > different class, but nothing meaningful is different. > > > >> Then x == x would be True (as they want), but [this NaN] == [that NaN] >> would be False, as expected. > > > Making NANs their own class wouldn't give you that. If we wanted that > behaviour, we could have it without introducing a NAN class: just change the > list __eq__ method to scan the list for a NAN using math.isnan before > checking > whether the lists were identical. > > But that would defeat the purpose of the identity check (an optimization to > avoid scanning the list)! Replacing math.isnan with isinstance doesn't > change > that. > > > >> I guess that raises the question about why x == x but sqrt(-1) != >> sqrt(-1), > > > That question has already been raised, and answered, repeatedly in this > thread. > > > >> but it seems a lot less of a big deal than all of the exceptions with >> container equalities. > > > Container equalities are not a big deal. I'm not sure what problem you think > you are solving. > > -- > Steven I'm sometimes surprised at the creativity and passion behind solutions to this issue. I've been a Python user for some years now, including time dealing with stuff like numpy where you're fairly likely to run into NaNs. I've been an active member of several support communities where I can confidently say I have encountered tens of thousands of Python questions. Not once can I recall ever having or seeing anyone have an actual problem that I had or someone else had due to the way Python handles NaN. As far as I can tell, it works _perfectly_. I appreciate the aesthetic concerns, but I really wish someone would explain to me what's actually broken and in need of fixing. Mike From wuwei23 at gmail.com Wed Oct 10 04:23:23 2012 From: wuwei23 at gmail.com (alex23) Date: Tue, 9 Oct 2012 19:23:23 -0700 (PDT) Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <50723BE5.3060300@nedbatchelder.com> <507243D2.8000505@btinternet.com> Message-ID: <3f7830f0-b650-4c1b-a9c4-75ff3a6aeabe@q7g2000pbj.googlegroups.com> On Oct 9, 5:14?pm, Guido van Rossum wrote: > I spent a week with Bertrand recently. Any chance you might blog about this? :) From dholth at gmail.com Wed Oct 10 04:34:59 2012 From: dholth at gmail.com (Daniel Holth) Date: Tue, 9 Oct 2012 22:34:59 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 3:32 PM, Terry Reedy wrote: > On 10/9/2012 9:30 AM, Eric Snow wrote: >> >> >> On Oct 9, 2012 1:12 AM, "Senthil Kumaran" >> > > wrote: >> > > `p.pathjoin(q)` >> > >> > +1 >> > >> > It is very explicit and hard to get it wrong. > > > or path.concat(otherpath) > > -- > Terry Jan Reedy I like the [] syntax. ZODB works this way when the subpath name is not a valid Python identifier. a.b['c-d'] would be like a/b/c-d if ZODB was a filesystem. I like the + syntax. No one has suggested overloading the > operator? p1 > p2 > p3 The < operator would keep its normal use for sorting. ;-) From ncoghlan at gmail.com Wed Oct 10 06:02:57 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 10 Oct 2012 09:32:57 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <6a1b01106028597886936265071e2fce.squirrel@webmail.nerim.net> References: <20121008204707.48559bf9@pitrou.net> <0D51EC77-7952-45DA-B958-1626395A69D2@ryanhiebert.com> <6a1b01106028597886936265071e2fce.squirrel@webmail.nerim.net> Message-ID: On Tue, Oct 9, 2012 at 6:04 PM, Antoine Pitrou wrote: > That's a very good idea! Even better if there's a way to make it work as > expected with openat support (for example by allowing __fspath__ to return > a (dir_fd, filename) tuple). The other thing I thought might be useful is to try to tie it into the new "opener" parameter for open somehow, On the other hand, that's getting further into full-blown filesystem emulation territory (http://packages.python.org/fs/). That may not be a bad thing, though - a proper filesystem abstraction might finally let us deal with encoding and case-sensitivity issues in a sane way, since they're filesystem dependent rather than platform dependent (e.g. opening FAT/FAT32 devices on *nix systems). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stephen at xemacs.org Wed Oct 10 08:06:10 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 10 Oct 2012 15:06:10 +0900 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5074489B.6000003@stoneleaf.us> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> Message-ID: <874nm297zh.fsf@uwakimon.sk.tsukuba.ac.jp> Ethan Furman writes: > Or, to look at it another way, surely somewhere out in the Real World > (tm) it is the case that two NaNs are indeed equal. Sure, but according to Kahan's Uncertainty principle, you'll never be able to detect it. Really-there's no-alternative-to-backward-compatibility-or-IEEE754-ly y'rs From greg.ewing at canterbury.ac.nz Wed Oct 10 09:53:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 10 Oct 2012 20:53:04 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073EA4F.8030405@canterbury.ac.nz> Message-ID: <50752960.208@canterbury.ac.nz> Guido van Rossum wrote: > But there are > still quite a few situations in NDB where an uncaught > exception prints a baffling traceback, showing lots of frames from the > event loop and other async machinery but not the user code that was > actually waiting for anything. I just tried an experiment using Python 3.3. I modified the parse_request() function of my spamserver example to raise an exception that isn't caught anywhere: def parse_request(line): tokens = line.split() print(tokens) if tokens and tokens[0] == b"EGGS": raise ValueError("Server is allergic to eggs") ... The resulting traceback looks like this. The last two lines show very clearly where abouts the exception occurred in user code. So it all seems to work quite happily. Traceback (most recent call last): File "spamserver.py", line 73, in run2() File "/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/scheduler.py", line 109, in run2 run() File "/Local/Projects/D/Python/YieldFrom/3.3/Examples/Scheduler/scheduler.py", line 53, in run next(g) File "spamserver.py", line 50, in handler n = parse_request(line) File "spamserver.py", line 61, in parse_request raise ValueError("Server is allergic to eggs") ValueError: Server is allergic to eggs -- Greg From ronaldoussoren at mac.com Wed Oct 10 10:09:33 2012 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 10 Oct 2012 10:09:33 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121008142812.GA22502@iskra.aviel.ru> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <20121008142812.GA22502@iskra.aviel.ru> Message-ID: On 8 Oct, 2012, at 16:28, Oleg Broytman wrote: > On Mon, Oct 08, 2012 at 03:59:18PM +0200, Ronald Oussoren wrote: >> On 8 Oct, 2012, at 13:07, Oleg Broytman wrote: >> >>> On Mon, Oct 08, 2012 at 12:00:22PM +0200, Ronald Oussoren wrote: >>>> Or CIFS filesystems mounted on a Linux? Case-sensitivity is a file-system property, not a operating system one. >>> >>> But there is no API to ask what type of filesystem a path belongs to. >>> So guessing by OS name is the only heuristic we can do. >> >> I guess so, as neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. >> >> The alternative would be to have a list of case insentive filesystems and use that that when comparing impure path objects. That would be fairly expensive though, as you'd have to check for every element of the path if that element is on a case insensitive filesystem. > > If a filesystem mounted to w32 is exported from a server by CIFS/SMB > protocol -- is it case sensitive? What if said server is Linux? What if > said filesystem was actually imported to Linux from a Novel server by > NetWare Core Protocol. It's not a fictional situation -- I do it at > oper.med.ru; the server is Linux that mounts two CIFS and NCP filesystem > and reexport them via Samba. Even more fun :-). CIFS/SMB from Windows to Linux or OSX behaves like a case-preserving filesystem on the systems I tested. Likewise a NFS filesystem exported from Linux to OSX behaves like a case sensitive filesystem if the Linux filesystem is case sensitive. All in all the best we seem to be able to do is use the OS as a heuristic, most Unix filesystems are case sensitive while Windows and OSX filesystems are case preserving. Ronald > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ronaldoussoren at mac.com Wed Oct 10 10:16:27 2012 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 10 Oct 2012 10:16:27 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50735279.8080506@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <50735279.8080506@canterbury.ac.nz> Message-ID: <2EAAD88B-CDEC-48C5-9D50-A27FBA8FF044@mac.com> On 9 Oct, 2012, at 0:23, Greg Ewing wrote: > Ronald Oussoren wrote: >> neither statvs, statvfs, nor pathconf seem to be able to tell if a filesystem is case insensitive. > > Even if they could, you wouldn't be entirely out of the woods, > because different parts of the same path can be on different > file systems... > > But how important is all this anyway? I'm trying to think of > occasions when I've wanted to compare two entire paths for > equality, and I can't think of *any*. AFAIK the only place I care about case sensitivity in my code is when I'm basicly using glob or fnmatch. Ronald > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From p.f.moore at gmail.com Wed Oct 10 11:54:58 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 10 Oct 2012 10:54:58 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <2EAAD88B-CDEC-48C5-9D50-A27FBA8FF044@mac.com> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <50735279.8080506@canterbury.ac.nz> <2EAAD88B-CDEC-48C5-9D50-A27FBA8FF044@mac.com> Message-ID: On 10 October 2012 09:16, Ronald Oussoren wrote: >> But how important is all this anyway? I'm trying to think of >> occasions when I've wanted to compare two entire paths for >> equality, and I can't think of *any*. > > AFAIK the only place I care about case sensitivity in my code is when I'm basicly using glob or fnmatch. Mercurial had to consider this issue when dealing with repositories built on Unix and being used on Windows. Specifically, it needed to know, if the repository contains files README and ReadMe, could it safely write both of these files without one overwriting the other. Actually, something as simple as an unzip utility could hit the same issue (it's just that it's not as critical to be careful with unzip as with a DVCS system... :-)) I don't know how Mercurial fixed the problem in the end - I believe the in-repo format encodes filenames to preserve case even on case insensitive systems, and I *think* it detects case insensitive filesystems for writing by writing a test file and reading it back in a different case. But that may have changed. Paul From robert.kern at gmail.com Wed Oct 10 15:23:38 2012 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 10 Oct 2012 14:23:38 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: On 10/10/12 2:25 AM, Mike Graham wrote: > I'm sometimes surprised at the creativity and passion behind solutions > to this issue. > > I've been a Python user for some years now, including time dealing > with stuff like numpy where you're fairly likely to run into NaNs. > I've been an active member of several support communities where I can > confidently say I have encountered tens of thousands of Python > questions. Not once can I recall ever having or seeing anyone have an > actual problem that I had or someone else had due to the way Python > handles NaN. As far as I can tell, it works _perfectly_. > > I appreciate the aesthetic concerns, but I really wish someone would > explain to me what's actually broken and in need of fixing. While I also don't think that anything needs to be fixed, I must say that in my years of monitoring tens of thousands of Python questions, there have been a few legitimate problems with the NaN behavior. It does come up from time to time. The most frequent problem is checking if a list contains a NaN. The obvious thing to do for many users: nan in list_of_floats This is a reasonable prediction based on what one normally does for most objects in Python, but this is quite wrong. But because list.__contains__() checks for identity first, it can look like it works when people test it out: >>> nan = float('nan') >>> nan in [1.0, 2.0, nan] True Then they write their code doing the wrong thing thinking that they tested their approach. I classify this as a wart: it breaks reasonable predictions from users, requires more exceptions-based knowledge about NaNs to use correctly, and can trap users who do try to experiment to determine the behavior. But I think that the cost of acquiring and retaining such knowledge is not so onerous as to justify the cost of any of the attempts to fix the wart. The other NaN wart (unrelated to this thread) is that sorting a list of floats containing a NaN will usually leave the list unsorted because "inequality comparisons with a NaN always return False" breaks the assumptions of timsort and other sorting algorithms. You should remember this, as you once demonstrated the problem: http://mail.python.org/pipermail/python-ideas/2011-April/010063.html This is a real problem, so much so that numpy works around it by enforcing our sorts to always sort NaN at the end of the array. Unfortunately, lists do not have the luxury of cheaply knowing the type of all of the objects in the list, so this is not an option for them. Real problems, but nothing that motivates a change, in my opinion. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From g.brandl at gmx.net Wed Oct 10 16:21:52 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Wed, 10 Oct 2012 16:21:52 +0200 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: Am 09.10.2012 23:10, schrieb Serhiy Storchaka: > On 09.10.12 23:15, Georg Brandl wrote: >> The point is that in 99.9...% of cases, >> >> if x == True: >> >> is just >> >> if x: > > Of cause. However in Lib/unittest/main.py I found a lot of "if x != > False:" which is not equivalent to just "if x:". It is equivalent to "if > x is None or x:" and so I left it as is. Arguably, that should be "if x is not False", but it probably doesn't matter too much. Georg From ben at bendarnell.com Wed Oct 10 18:41:33 2012 From: ben at bendarnell.com (Ben Darnell) Date: Wed, 10 Oct 2012 09:41:33 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5074C4E7.60708@canterbury.ac.nz> References: <5073EA4F.8030405@canterbury.ac.nz> <5074C4E7.60708@canterbury.ac.nz> Message-ID: On Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing wrote: > You might like to take a look at this, where I develop a series of > examples culminating in a simple multi-threaded server: > > http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt Thanks for this link, it was very helpful to see it all come together from scratch. And I think the most compelling thing about it is something that I hadn't picked up on when I looked at "yield from" before, that it naturally preserves the call stack for exception handling. That's a big deal, and may be worth the requirement of 3.3+ since the tricks we've used to get better exception handling in earlier pythons have been pretty ugly. On the other hand, it does mean starting from scratch with a new asynchronous world that's not directly compatible with the existing Twisted or Tornado ecosystems. -Ben > > Code here: > > http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/ > > >> somehow it seems there *has* >> to be a distinction between an operation you just *yield* (this would >> be waiting for a specific low-level I/O operation) and something you >> use with yield-from, which returns a value through StopIteration. > > It may be worth noting that nothing in my server example uses 'yield' > to send or receive values -- yield is only used without argument as > a suspension point. But the functions containing the yields *are* > called with yield-from and may return values via StopIteration. > > So I think there are (at least) two distinct ways of using generators, > but the distinction isn't quite the one you're making. Rather, we > have "coroutines" (don't yield values, do return values) and > "iterators" (do yield values, don't return values). > > Moreover, it's *only* the "coroutine" variety that we need to cater > for when designing an async event system. Does that help to > alleviate any of your monad-induced headaches? > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From dreamingforward at gmail.com Wed Oct 10 18:56:17 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Wed, 10 Oct 2012 11:56:17 -0500 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073BC7F.5040203@canterbury.ac.nz> Message-ID: On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell wrote: > On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing wrote: >> Mark Adam wrote: >>> >>> 1) event handlers for the machine-program interface (ex. network I/O) >>> 2) event handlers for the program-user interface (ex. mouse I/O) >>> >>> While similar, my gut tell me they have to be handled in completely >>> different way in order to preserve order (i.e. sanity). >> >> They can't be *completely* different, because deep down there >> has to be a single event loop that can handle all kinds of >> asynchronous events. > > There doesn't *have* to be - you could run a network event loop in one > thread and a GUI event loop in another and pass control back and forth > via methods like IOLoop.add_callback or Reactor.callFromThread. No, this won't work. The key FAIL in that sentence is "...and pass control", because the O.S. has to be in charge of things that happen in user space. And everything in Python happens in user space. (hence my suggestion of creating a Python O.S.). MarkJ From storchaka at gmail.com Wed Oct 10 19:13:22 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Oct 2012 20:13:22 +0300 Subject: [Python-ideas] Make "is" checks on non-singleton literals errors In-Reply-To: References: <20121008162649.73989cc4@resist.wooz.org> Message-ID: On 10.10.12 17:21, Georg Brandl wrote: > Arguably, that should be "if x is not False", but it probably doesn't > matter too much. Some old code can use 1/0 instead True/False. This change will break such code. From ben at bendarnell.com Wed Oct 10 19:29:35 2012 From: ben at bendarnell.com (Ben Darnell) Date: Wed, 10 Oct 2012 10:29:35 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073BC7F.5040203@canterbury.ac.nz> Message-ID: On Wed, Oct 10, 2012 at 9:56 AM, Mark Adam wrote: > On Tue, Oct 9, 2012 at 1:53 AM, Ben Darnell wrote: >> On Mon, Oct 8, 2012 at 10:56 PM, Greg Ewing wrote: >>> Mark Adam wrote: >>>> >>>> 1) event handlers for the machine-program interface (ex. network I/O) >>>> 2) event handlers for the program-user interface (ex. mouse I/O) >>>> >>>> While similar, my gut tell me they have to be handled in completely >>>> different way in order to preserve order (i.e. sanity). >>> >>> They can't be *completely* different, because deep down there >>> has to be a single event loop that can handle all kinds of >>> asynchronous events. >> >> There doesn't *have* to be - you could run a network event loop in one >> thread and a GUI event loop in another and pass control back and forth >> via methods like IOLoop.add_callback or Reactor.callFromThread. > > No, this won't work. The key FAIL in that sentence is "...and pass > control", because the O.S. has to be in charge of things that happen > in user space. And everything in Python happens in user space. > (hence my suggestion of creating a Python O.S.). Letting the OS/GUI library have control of the UI thread is exactly the point I was making. Perhaps "pass control" was a little vague, but what I meant is that you'd have two threads, one for UI and one for networking. When you need to start a network operation from the UI thread you'd use IOLoop.add_callback() to pass a function to the network thread, and then when the network operation completes you'd use the analogous function from the UI library to send the response back and update the interface from the UI thread. -Ben From solipsis at pitrou.net Wed Oct 10 21:07:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Oct 2012 21:07:03 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> Message-ID: <20121010210703.4dafd553@pitrou.net> On Tue, 9 Oct 2012 00:45:41 +0530 Nick Coghlan wrote: > On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: > > I don't like any of those; I'd vote for another regular method, maybe > > p.pathjoin(q). > [...] > > I don't *love* joinpath as a name, I just don't actively dislike it > the way I do the four presented options (and it has the virtue of the > path.py precedent). How about one_path.to(other_path) ? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From michelelacchia at gmail.com Wed Oct 10 21:19:40 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Wed, 10 Oct 2012 12:19:40 -0700 (PDT) Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: <71475825-6f68-402a-88be-af131d69cd96@googlegroups.com> > > > > I don't *love* joinpath as a name, I just don't actively dislike it > > the way I do the four presented options (and it has the virtue of the > > path.py precedent). > > How about one_path.to(other_path) ? > to() is just awesome. Short, rather easy to guess what it does, and easy to remember once you start using it. So now +1 on to() and &. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Oct 10 21:13:19 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 10 Oct 2012 12:13:19 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: <5075C8CF.1020606@stoneleaf.us> Antoine Pitrou wrote: > On Tue, 9 Oct 2012 00:45:41 +0530 > Nick Coghlan wrote: >> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: >>> I don't like any of those; I'd vote for another regular method, maybe >>> p.pathjoin(q). > [...] >> I don't *love* joinpath as a name, I just don't actively dislike it >> the way I do the four presented options (and it has the virtue of the >> path.py precedent). > > How about one_path.to(other_path) ? .to -> +0 .add -> +1 From mikegraham at gmail.com Wed Oct 10 21:36:08 2012 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 10 Oct 2012 15:36:08 -0400 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings Message-ID: The literal"\c" should be an error but in practice means "\\c". It's probably too late to make this invalid syntax as it out to be, but I wonder if a warning isn't in order, especially with the theoretical potential of adding new string escapes in the future. From solipsis at pitrou.net Wed Oct 10 21:46:07 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Oct 2012 21:46:07 +0200 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings References: Message-ID: <20121010214607.354902d6@pitrou.net> On Wed, 10 Oct 2012 15:36:08 -0400 Mike Graham wrote: > The literal"\c" should be an error but in practice means "\\c". It's > probably too late to make this invalid syntax as it out to be, but I > wonder if a warning isn't in order, especially with the theoretical > potential of adding new string escapes in the future. -1. This will make life more difficult with regular expressions (and produce lots of spurious warnings in existing code). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From storchaka at gmail.com Wed Oct 10 22:04:25 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 10 Oct 2012 23:04:25 +0300 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <20121010214607.354902d6@pitrou.net> References: <20121010214607.354902d6@pitrou.net> Message-ID: On 10.10.12 22:46, Antoine Pitrou wrote: > -1. This will make life more difficult with regular expressions (and > produce lots of spurious warnings in existing code). Strings for regular expressions always should be raw. Now regular expressions supports \u and \U escapes and no reason to use non-raw strings. From mikegraham at gmail.com Wed Oct 10 22:08:22 2012 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 10 Oct 2012 16:08:22 -0400 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <20121010214607.354902d6@pitrou.net> References: <20121010214607.354902d6@pitrou.net> Message-ID: On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou wrote: > On Wed, 10 Oct 2012 15:36:08 -0400 > Mike Graham wrote: >> The literal"\c" should be an error but in practice means "\\c". It's >> probably too late to make this invalid syntax as it out to be, but I >> wonder if a warning isn't in order, especially with the theoretical >> potential of adding new string escapes in the future. > > -1. This will make life more difficult with regular expressions (and > produce lots of spurious warnings in existing code). > > Regards > > Antoine. Regular expressions are difficult if you're remembering which escape sequences exist and are easy if you're using raw string literals. Mike From python at mrabarnett.plus.com Wed Oct 10 22:11:49 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 10 Oct 2012 21:11:49 +0100 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <20121010214607.354902d6@pitrou.net> References: <20121010214607.354902d6@pitrou.net> Message-ID: <5075D685.5070204@mrabarnett.plus.com> On 2012-10-10 20:46, Antoine Pitrou wrote: > On Wed, 10 Oct 2012 15:36:08 -0400 > Mike Graham wrote: >> The literal"\c" should be an error but in practice means "\\c". It's >> probably too late to make this invalid syntax as it out to be, but I >> wonder if a warning isn't in order, especially with the theoretical >> potential of adding new string escapes in the future. > > -1. This will make life more difficult with regular expressions (and > produce lots of spurious warnings in existing code). > How would it make life more difficult with regular expressions? I would've preferred: 1. Unknown escapes in string literals give a compile-time error 2. Raw string literals treat backslashes as pure literals 3. Unknown escapes in regex patterns give a run-time error Unfortunately, changing them would break existing code. (I retain the behaviour of re in the regex module for this reason, not that I like it. :-() It would've been nice if the 'fix' had been made in Python 3... From solipsis at pitrou.net Wed Oct 10 22:16:03 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Oct 2012 22:16:03 +0200 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings References: <20121010214607.354902d6@pitrou.net> Message-ID: <20121010221603.23f740c9@pitrou.net> On Wed, 10 Oct 2012 16:08:22 -0400 Mike Graham wrote: > On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou wrote: > > On Wed, 10 Oct 2012 15:36:08 -0400 > > Mike Graham wrote: > >> The literal"\c" should be an error but in practice means "\\c". It's > >> probably too late to make this invalid syntax as it out to be, but I > >> wonder if a warning isn't in order, especially with the theoretical > >> potential of adding new string escapes in the future. > > > > -1. This will make life more difficult with regular expressions (and > > produce lots of spurious warnings in existing code). > > > > Regards > > > > Antoine. > > Regular expressions are difficult if you're remembering which escape > sequences exist and are easy if you're using raw string literals. That's a misconception, since as the re docs mention: ?Most of the standard escapes supported by Python string literals are also accepted by the regular expression parser: [snip]? http://docs.python.org/dev/library/re.html In other words, whether you put "\t" or "\\t" in a regexp doesn't matter: it means the same to the regexp engine. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Wed Oct 10 22:18:39 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 10 Oct 2012 22:18:39 +0200 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings References: <20121010214607.354902d6@pitrou.net> Message-ID: <20121010221839.51b3470c@pitrou.net> On Wed, 10 Oct 2012 23:04:25 +0300 Serhiy Storchaka wrote: > On 10.10.12 22:46, Antoine Pitrou wrote: > > -1. This will make life more difficult with regular expressions (and > > produce lots of spurious warnings in existing code). > > Strings for regular expressions always should be raw. Now regular > expressions supports \u and \U escapes and no reason to use non-raw strings. That's a style issue, not a language rule. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From mikegraham at gmail.com Wed Oct 10 22:45:06 2012 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 10 Oct 2012 16:45:06 -0400 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <20121010221603.23f740c9@pitrou.net> References: <20121010214607.354902d6@pitrou.net> <20121010221603.23f740c9@pitrou.net> Message-ID: On Wed, Oct 10, 2012 at 4:16 PM, Antoine Pitrou wrote: > On Wed, 10 Oct 2012 16:08:22 -0400 > Mike Graham wrote: >> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou wrote: >> > On Wed, 10 Oct 2012 15:36:08 -0400 >> > Mike Graham wrote: >> >> The literal"\c" should be an error but in practice means "\\c". It's >> >> probably too late to make this invalid syntax as it out to be, but I >> >> wonder if a warning isn't in order, especially with the theoretical >> >> potential of adding new string escapes in the future. >> > >> > -1. This will make life more difficult with regular expressions (and >> > produce lots of spurious warnings in existing code). >> > >> > Regards >> > >> > Antoine. >> >> Regular expressions are difficult if you're remembering which escape >> sequences exist and are easy if you're using raw string literals. > > That's a misconception, since as the re docs mention: > > ?Most of the standard escapes supported by Python string literals are > also accepted by the regular expression parser: [snip]? > > http://docs.python.org/dev/library/re.html > > In other words, whether you put "\t" or "\\t" in a regexp doesn't > matter: it means the same to the regexp engine. > > Regards > > Antoine. I'm not sure what misconception you're saying I have. An example of when you have to remember what the escapes are is >>> re.search("\by\b", "x y z") is None True >>> re.search("\\by\\b", "x y z") is None False Mike From greg.ewing at canterbury.ac.nz Wed Oct 10 23:23:00 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Oct 2012 10:23:00 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073BC7F.5040203@canterbury.ac.nz> Message-ID: <5075E734.2010005@canterbury.ac.nz> >>>>Mark Adam wrote: >>>There doesn't *have* to be - you could run a network event loop in one >>>thread and a GUI event loop in another and pass control back and forth >>>via methods like IOLoop.add_callback or Reactor.callFromThread. Well, that could be done, but one of the reasons for using an event loop approach in the first place is to avoid having to deal with threads and all their attendant concurrency problems. -- Greg From joshua.landau.ws at gmail.com Wed Oct 10 23:33:45 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 10 Oct 2012 22:33:45 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <5074CBF2.8070507@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: On 10 October 2012 02:14, Steven D'Aprano wrote: > On 10/10/12 09:13, Joshua Landau wrote: > >> Just a curiosity here (as I can guess of plausible reasons myself, so >> there >> probably are some official stances). >> >> Is there a reason NaNs are not instances of NaN class? >> > > Because that would complicate Python's using floats for absolutely no > benefit. > Instead of float operations always returning a float, they would have to > return > a float or a NAN. To check for a valid floating point instance, instead of > saying: > > isinstance(x, float) > > you would have to say: > > isinstance(x, (float, NAN)) > Not the way I'm proposing it. >>> class NAN(float): > ... def __new__(self): > ... return float.__new__(self, "nan") > ... def __eq__(self, other): > ... return other is self > ... > >>> isinstance(NAN(), float) > True > >>> NAN() is NAN() > False > >>> NAN() == NAN() > False > >>> x = NAN() > >>> x is x > True > >>> x == x > True > >>> x > nan > And what about infinities, denorm numbers, and negative zero? Do they get > dedicated classes too? > Infinities? No, although they might well if the infinities were different (set of reals vs set of ints, for example). Denorms? No, that's a completely different thing. -0.0? No, that's a completely different thing. I was asking, because instances of a class maps on to a behavior that matches *almost exactly* what *both* parties want, why was it not used? This is not the case with anything other than that. And what is the point of this added complexity? Nothing. > Simplicity. It's simpler. > You *still* have the rule that "x == x for all x, except for NANs". False. I was proposing that x == x but NAN() != NAN(). > The only difference is that "NANs" now means "instances of NAN class" > rather than > "NAN floats" (and Decimals). False, if you subclass float. > Working with IEEE 754 floats is now far more of > a nuisance because some valid floating point values aren't floats but have > a > different class, but nothing meaningful is different. Then x == x would be True (as they want), but [this NaN] == [that NaN] >> would be False, as expected. >> > > Making NANs their own class wouldn't give you that. If we wanted that > behaviour, we could have it without introducing a NAN class: just change > the > list __eq__ method to scan the list for a NAN using math.isnan before > checking > whether the lists were identical. > False. >>> x == x > True > >>> [NAN()] == [NAN()] > False as per my previous "implementation". > But that would defeat the purpose of the identity check (an optimization to > avoid scanning the list)! Replacing math.isnan with isinstance doesn't > change > that. > > > I guess that raises the question about why x == x but sqrt(-1) != >> sqrt(-1), >> > > That question has already been raised, and answered, repeatedly in this > thread. False. x != x, so that has *not* been "answered". This was an example problem with my own suggested implementation. but it seems a lot less of a big deal than all of the exceptions with >> container equalities. >> > > Container equalities are not a big deal. I'm not sure what problem you > think > you are solving. Why would you assume that? I mentioned it from *honest* *curiosity*, and all I got back was an attack. Please, I want to be civil but you need to act less angrily. [Has not been spell-checked, as I don't really have time ] Thank you for your time, even though I disagree, Joshua Landau -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Wed Oct 10 23:38:32 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 10 Oct 2012 22:38:32 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: On 10 October 2012 22:33, Joshua Landau wrote: > Why would you assume that? I mentioned it from *honest* *curiosity*, > and all I got back was an attack. Please, I want to be civil but you need > to act less angrily. > After reconsidering, I regret these sentences. Yes, I do still believe your response was overly angry, but I did get a thought out response and you did try and address my concerns. In the interest of benevolence, may I redact my statement? -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Thu Oct 11 00:05:43 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Wed, 10 Oct 2012 23:05:43 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: I don't normally triple-post, but here it goes. After re-re-reading this thread, it turns out one *(1)* post and two *(2)* answers to that post have covered a topic very similar to the one I have raised. All of the others, to my understanding, do not dwell over the fact that *float("nan") is not float("nan")* . The mentioned post was not quite the same as mine, but it still had two replies. I will respond to them here. My response, again, is a curiosity why, *not* a suggestion to change anything. I agree that there is probably no real concern with the current state, I have never had a concern and the concern caused by change would dwarf any possible benefits. Response 1: This implies that you want to differentiate between -0.0 and +0.0. That is bad. My response: Why would I want to do that? Response 2: "There is not space on this thread to convince you otherwise." [paraphrased] My response: That comment was not directed at me and thus has little relevance to my own post. Hopefully now you should understand why I felt need to ask the question after so much has already been said on the topic. Finally, Mike Graham says (probably referring to me): "I'm sometimes surprised at the creativity and passion behind solutions to this issue." My response: It was an immediate thought, not one dwelled upon. The fact it was not answered in the thread prompted my curiosity. It is *honestly* nothing more. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Thu Oct 11 00:42:06 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 10 Oct 2012 23:42:06 +0100 Subject: [Python-ideas] Floating point contexts in Python core Message-ID: On 9 October 2012 02:07, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 5:32 PM, Oscar Benjamin > wrote: >> On 9 October 2012 01:11, Guido van Rossum wrote: >>> On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing wrote: >>>> >>>> So the question that really needs to be answered, I think, is >>>> not "Why is NaN == NaN false?", but "Why doesn't NaN == anything >>>> raise an exception, when it would make so much more sense to >>>> do so?" >>> >>> Because == raising an exception is really unpleasant. We had this in >>> Python 2 for unicode/str comparisons and it was very awkward. >>> >>> Nobody arguing against the status quo seems to care at all about >>> numerical algorithms though. I propose that you go find some numerical >>> mathematicians and ask them. >> >> The main purpose of quiet NaNs is to propagate through computation >> ruining everything they touch. In a programming language like C that >> lacks exceptions this is important as it allows you to avoid checking >> all the time for invalid values, whilst still being able to know if >> the end result of your computation was ever affected by an invalid >> numerical operation. The reasons for NaNs to compare unequal are no >> doubt related to this purpose. >> >> It is of course arguable whether the same reasoning applies to a >> language like Python that has a very good system of exceptions but I >> agree with Guido that raising an exception on == would be unfortunate. >> How many people would forget that they needed to catch those >> exceptions? How awkward could your code be if you did remember to >> catch all those exceptions? In an exception handling language it's >> important to know that there are some operations that you can trust. > > If we want to do *anything* I think we should first introduce a > floating point context similar to the Decimal context. Then we can > talk. The other thread has gone on for ages now and isn't going anywhere. Guido's suggestion here is much more interesting (to me) so I want to start a new thread on this subject. Python's default handling of floating point operations is IEEE-754 compliant which in my opinion is the obvious and right thing to do. However, Python is a much more versatile language than some of the other languages for which IEEE-754 was designed. Python offers the possibility of a very rich approach to the control and verification of the accuracy of numeric operations on both a function by function and code block by code block basis. This kind of functionality is already implemented in the decimal module [1] as well as numpy [2], gmpy [3], sympy [4] and no doubt other numerical modules that I'm not aware of. It would be a real blessing to numerical Python programmers if either/both of the following were to occur: 1) Support for calculation contexts with floats 2) A generic kind of calculation context manager that was recognised widely by the builtin/stdlib types and also by third party numerical packages. Oscar References: [1] http://docs.python.org/library/decimal.html#context-objects [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html#numpy.seterr [3] https://gmpy2.readthedocs.org/en/latest/mpfr.html [4] http://docs.sympy.org/dev/modules/mpmath/contexts.html From g.brandl at gmx.net Thu Oct 11 00:45:33 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Oct 2012 00:45:33 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: Am 10.10.2012 21:07, schrieb Antoine Pitrou: > On Tue, 9 Oct 2012 00:45:41 +0530 > Nick Coghlan wrote: >> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: >> > I don't like any of those; I'd vote for another regular method, maybe >> > p.pathjoin(q). >> > [...] >> >> I don't *love* joinpath as a name, I just don't actively dislike it >> the way I do the four presented options (and it has the virtue of the >> path.py precedent). > > How about one_path.to(other_path) ? I'd have no idea what it means, honestly. Georg From zuo at chopin.edu.pl Thu Oct 11 00:51:14 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Thu, 11 Oct 2012 00:51:14 +0200 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: Hello .* On 09.10.2012 17:28, Serhiy Storchaka wrote: > On 09.10.12 16:07, Oscar Benjamin wrote: >> I really should have checked this before posting but I didn't have >> Python 3.3 available: > > Generator expression also eats the StopIteration value: > >>>> next(x for x in f()) > Traceback (most recent call last): > File "", line 1, in > StopIteration [snip] 1. Why shouldn't it "eat" that value? The full-generator equivalent, even with `yield from`, will "eat" it also: >>> def _make_this_gen_expr_equivalent(): >>> yield from f() # or: for x in f(): yield x ... >>> g = _make_this_gen_expr_equivalent() >>> next(g) Traceback (most recent call last): File "", line 1, in StopIteration After all, any generator will "eat" the StopIteration argument unless: * explicitly propagates it (with `return arg` or `raise StopIteration(arg)`), or * iterates over the subiterator "by hand" using the next() builtin or the __next__() method and does not catch StopIteration. 2. I believe that the new `yield from...` feature changes nothing in the *iterator* protocol. What it adds are only two things that can be placed in the code of a *generator*: 1) a return statement with a value -- finishing execution of the generator + raising StopIteration instantiated with that value passed as the only argument, 2) a `yield from subiterator` expression which propagates the items generated by the subiterator (not necessarily a generator) + does all that dance with __next__/send/throw/close (see PEP 380...) + caches StopIteration and returns the value passed with this exception (if any value has been passed). Not less, not more. Especially, the `yield from subiterator` expression itself* does not propagate* its value outside the generator. 3. The goal described by the OP could be reached with a wrapper generator -- something like this: def preservefrom(iter_factory, *args, which): final_value = None subiter = iter(which) def catching_gen(): nonlocal final_value try: while True: yield next(subiter) except StopIteration as exc: if exc.args: final_value = exc.args[0] raise args = [arg if arg is not which else catching_gen() for arg in args] yield from iter_factory(*args) return final_value Example usage: >>> import itertools >>> def f(): ... yield 'g' ... return 1000000 ... >>> my_gen = f() >>> my_chain = preservefrom(itertools.chain, 'abc', 'def', my_gen, which=my_gen) >>> while True: ... print(next(my_chain)) ... a b c d e f g Traceback (most recent call last): File "", line 2, in StopIteration: 1000000 >>> my_gen = f() >>> my_filter = preservefrom(filter, lambda x: True, my_gen, which=my_gen) >>> next(my_filter) 'g' >>> next(my_filter) Traceback (most recent call last): File "", line 1, in StopIteration: 1000000 Best regards. *j From alexander.belopolsky at gmail.com Thu Oct 11 00:56:08 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Wed, 10 Oct 2012 18:56:08 -0400 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: Message-ID: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> On Oct 10, 2012, at 6:42 PM, Oscar Benjamin wrote: >> If we want to do *anything* I think we should first introduce a >> floating point context similar to the Decimal context. Then we can >> talk. > > The other thread has gone on for ages now and isn't going anywhere. > Guido's suggestion here is much more interesting (to me) so I want to > start a new thread on this subject. Python's default handling of > floating point operations is IEEE-754 compliant which in my opinion is > the obvious and right thing to do. I gave this idea +float('inf') in the other thread and was thinking about it since. I am now toying with the idea to unify float and decimal in Python. IEEE combined their two FP standards in one recently, so we have a precedent for this. We can start by extending decimal to support radix 2 and once that code is mature enough and has accelerated code for platform formats (single, double, long double), we can replace Python float with the new fully platform independent IEEE 754 compliant implementation. We can even supply a legacy context to support some current warts. From zuo at chopin.edu.pl Thu Oct 11 01:29:32 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Thu, 11 Oct 2012 01:29:32 +0200 Subject: [Python-ideas] Propagating StopIteration value In-Reply-To: References: <5070D658.9020300@pearwood.info> Message-ID: <716d99c1f6a76a86a98e0d7142b9da9d@chopin.edu.pl> > The goal described by the OP could be reached with a wrapper > generator -- something like this: [snip] PS. A more convenient version (you don't need to repeat yourself): import collections def genfrom(iter_factory, *args): final_value = None def catching(iterable): subiter = iter(iterable) nonlocal final_value try: while True: yield next(subiter) except StopIteration as exc: if exc.args: final_value = exc.args[0] raise args = [catching(arg.iterable) if isinstance(arg, genfrom.this) else arg for arg in args] yield from iter_factory(*args) return final_value genfrom.this = collections.namedtuple('propagate_from_this', 'iterable') Some examples: >>> import itertools >>> def f(): ... yield 'g' ... return 10000000 ... >>> my_chain = genfrom(itertools.chain, 'abc', 'def', genfrom.this(f())) >>> while True: ... print(next(my_chain)) ... a b c d e f g Traceback (most recent call last): File "", line 2, in StopIteration: 10000000 >>> my_filter = genfrom(filter, lambda x: True, genfrom.this(f())) >>> next(my_filter) 'g' >>> next(my_filter) Traceback (most recent call last): File "", line 1, in StopIteration: 10000000 From zuo at chopin.edu.pl Thu Oct 11 01:38:21 2012 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Thu, 11 Oct 2012 01:38:21 +0200 Subject: [Python-ideas] Propagating StopIteration value [PS. #2] In-Reply-To: <716d99c1f6a76a86a98e0d7142b9da9d@chopin.edu.pl> References: <5070D658.9020300@pearwood.info> <716d99c1f6a76a86a98e0d7142b9da9d@chopin.edu.pl> Message-ID: <476135c117a7287953bc20eac249662e@chopin.edu.pl> W dniu 11.10.2012 01:29, Jan Kaliszewski napisa?(a): >> The goal described by the OP could be reached with a wrapper >> generator -- something like this: > [snip] > > PS. A more convenient version (you don't need to repeat yourself): [snip]. PS2. Sorry for flooding, obviously it can be simpler: import collections def genfrom(iter_factory, *args): final_value = None def catching(iterable): nonlocal final_value final_value = yield from iterable args = [catching(arg.iterable) if isinstance(arg, genfrom.this) else arg for arg in args] yield from iter_factory(*args) return final_value genfrom.this = collections.namedtuple('propagate_from_this', 'iterable') Cheers. *j From steve at pearwood.info Thu Oct 11 02:07:45 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 11:07:45 +1100 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: References: <20121010214607.354902d6@pitrou.net> Message-ID: <50760DD1.3030906@pearwood.info> On 11/10/12 07:04, Serhiy Storchaka wrote: > On 10.10.12 22:46, Antoine Pitrou wrote: >> -1. This will make life more difficult with regular expressions (and >> produce lots of spurious warnings in existing code). > > Strings for regular expressions always should be raw. Why? The re module doesn't care how you construct the strings. It *can't* care how you construct the strings. Something like re.search('\D*', 'abcd1234xyz') works perfectly well and there is no need for a raw string. Any requirement to "always use raw strings" is a style issue, not a language issue. -- Steven From steve at pearwood.info Thu Oct 11 02:08:13 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 11:08:13 +1100 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: References: <20121010214607.354902d6@pitrou.net> Message-ID: <50760DED.2030801@pearwood.info> On 11/10/12 07:08, Mike Graham wrote: > On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou wrote: >> On Wed, 10 Oct 2012 15:36:08 -0400 >> Mike Graham wrote: >>> The literal"\c" should be an error Who says so? My bash shell disagrees with you: [steve at ando ~]$ touch spam [steve at ando ~]$ ls s\pa\m spam and so do I. There are three obvious behaviours for extraneous escapes: 1) backslash-c resolves to just c (what bash and VisualStudio do) 2) backslash-c resolves to backslash-c (what Python does) 3) raise an exception or compile-time error (what Java does) It is undefined behaviour in C. It is a matter of opinion that Java got it right and the others got it wrong, one which I do not share. >>> but in practice means "\\c". It's >>> probably too late to make this invalid syntax as it out to be, but I >>> wonder if a warning isn't in order, especially with the theoretical >>> potential of adding new string escapes in the future. >> >> -1. This will make life more difficult with regular expressions (and >> produce lots of spurious warnings in existing code). I agree with Antoine here. If and when there is a serious, concrete proposal to add a new string escape, and not just a "theoretical potential", then we should consider adding warnings. > Regular expressions are difficult if you're remembering which escape > sequences exist and are easy if you're using raw string literals. Just because some people find it hard to remember doesn't mean that it should be an error *not* to use raw strings. -- Steven From steve at pearwood.info Thu Oct 11 02:25:20 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 11:25:20 +1100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: <507611F0.4040900@pearwood.info> On 11/10/12 06:07, Antoine Pitrou wrote: > On Tue, 9 Oct 2012 00:45:41 +0530 > Nick Coghlan wrote: >> On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: >>> I don't like any of those; I'd vote for another regular method, maybe >>> p.pathjoin(q). >> > [...] >> >> I don't *love* joinpath as a name, I just don't actively dislike it >> the way I do the four presented options (and it has the virtue of the >> path.py precedent). > > How about one_path.to(other_path) ? -1 "To" implies to me either: * one_path is mutated to become other_path; or * you supply the end points, and the method finds a path between them neither of which is remotely relevant. It certainly is not a synonym for add/join/combine/concat paths. Brevity is not more important than clarity. -- Steven From trent at snakebite.org Thu Oct 11 02:55:23 2012 From: trent at snakebite.org (Trent Nelson) Date: Wed, 10 Oct 2012 20:55:23 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <50736C0F.90401@python.org> References: <5072C972.5070207@python.org> <50736C0F.90401@python.org> Message-ID: <20121011005523.GA43928@snakebite.org> On Mon, Oct 08, 2012 at 05:13:03PM -0700, Christian Heimes wrote: > Am 08.10.2012 17:35, schrieb Guido van Rossum: > > On Mon, Oct 8, 2012 at 5:39 AM, Christian Heimes wrote: > >> Python's standard library doesn't contain in interface to I/O Completion > >> Ports. I think a common event loop system is a good reason to add IOCP > >> if somebody is up for the challenge. > >> > >> Would you prefer an IOCP wrapper in the stdlib or your own version? > >> Twisted has its own Cython based wrapper, some other libraries use a > >> libevent-based solution. > > > > What's an IOCP? > > I/O Completion Ports, http://en.wikipedia.org/wiki/IOCP > > It's a Windows (and apparently also Solaris) And AIX, too. For every OS IOCP implementation, there's a corresponding Snakebite box :-) > API for async IO that can handle multiple threads. I find it helps to think of it in terms of a half-sync/half-async pattern. The half-async part handles the I/O; the OS wakes up one of your "I/O" threads upon incoming I/O. The job of such threads is really just to pull/push the bytes from/to kernel/user space as quickly as it can. (Since Vista, Windows has provided a corresponding thread pool API that gels really well with IOCP. Windows will optimally manage threads based on incoming I/O; spawning/destroying threads as per necessary. You can even indicate to Windows whether your threads will be "compute" or I/O bound, which it uses to optimize its scheduling algorithm.) The half-sync part is the event-loop part of your app, which simply churns away on the data prepared for it by the async threads. What would be neat is if the half-async path could be run outside the GIL. They would need to be able to allocate memory that could then be "owned" by the GIL-holding half-sync part. You could leverage this with kqueue and epoll; have similar threads set up to simply process I/O independent of the GIL, using the same facilities that would be used by IOCP-processing threads. Then the "asyncore" event-loop simply becomes the half-sync part of the pattern, enumerating over all the I/O requests queued up for it by all the GIL-independent half-async threads. Trent. From steve at pearwood.info Thu Oct 11 03:20:39 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 12:20:39 +1100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> Message-ID: <50761EE7.8060103@pearwood.info> On 11/10/12 09:05, Joshua Landau wrote: > After re-re-reading this thread, it turns out one *(1)* post and two > *(2)* answers > to that post have covered a topic very similar to the one I have raised. > All of the others, to my understanding, do not dwell over the fact > that *float("nan") is not float("nan")* . That's no different from any other float. py> float('nan') is float('nan') False py> float('1.5') is float('1.5') False Floats are not interned or cached, although of course interning is implementation dependent and this is subject to change without notice. For that matter, it's true of *nearly all builtins* in Python. The exceptions being bool(obj) which returns one of two fixed instances, and int() and str(), where *some* but not all instances are cached. > Response 1: > This implies that you want to differentiate between -0.0 and +0.0. That is > bad. > > My response: > Why would I want to do that? If you are doing numeric work, you *should* differentiate between -0.0 and 0.0. That's why the IEEE 754 standard mandates a -0.0. Both -0.0 and 0.0 compare equal, but they can be distinguished (although doing so is tricky in Python). The reason for distinguishing them is to distinguish between underflow to zero from positive or negative values. E.g. log(x) should return -infinity if x underflows from a positive value, and a NaN if x underflows from a negative. -- Steven From steve at pearwood.info Thu Oct 11 03:30:13 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 12:30:13 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> Message-ID: <50762125.1030201@pearwood.info> On 11/10/12 09:56, Alexander Belopolsky wrote: > I gave this idea +float('inf') in the other thread and was thinking > about it since. I am now toying with the idea to unify float and >decimal in Python. IEEE combined their two FP standards in one > recently, so we have a precedent for this. > > We can start by extending decimal to support radix 2 and once that > code is mature enough and has accelerated code for platform formats > (single, double, long double), I don't want to be greedy, but supporting minifloats would be a real boon to beginners trying to learn how floats work. >we can replace Python float with the > new fully platform independent IEEE 754 compliant implementation. >We can even supply a legacy context to support some current warts. This all sounds very exciting, but also like a huge amount of work. -- Steven From guido at python.org Thu Oct 11 03:44:04 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 10 Oct 2012 18:44:04 -0700 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50762125.1030201@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50762125.1030201@pearwood.info> Message-ID: > This all sounds very exciting, but also like a huge amount of work. Indeed. But that's what we're here for. Anyway, as an indication of the amount of work, you might want to look at the fpectl module -- the module itself is tiny, but its introduction required a huge amount of changes to every place where CPython uses a double. I don't know if anybody uses it, though it's still in the Py3k codebase. -- --Guido van Rossum (python.org/~guido) From chris.jerdonek at gmail.com Thu Oct 11 04:24:07 2012 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Wed, 10 Oct 2012 19:24:07 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Tue, Oct 9, 2012 at 12:32 PM, Terry Reedy wrote: > On 10/9/2012 9:30 AM, Eric Snow wrote: >> On Oct 9, 2012 1:12 AM, "Senthil Kumaran" >> > > wrote: >> > > `p.pathjoin(q)` >> > >> > +1 >> > >> > It is very explicit and hard to get it wrong. > > > or path.concat(otherpath) Or how about path.slash(other_path)? :) I'm not feeling the operators, though I haven't thought about them much. --Chris From mikegraham at gmail.com Thu Oct 11 04:24:12 2012 From: mikegraham at gmail.com (Mike Graham) Date: Wed, 10 Oct 2012 22:24:12 -0400 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <50760DED.2030801@pearwood.info> References: <20121010214607.354902d6@pitrou.net> <50760DED.2030801@pearwood.info> Message-ID: On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano wrote: > On 11/10/12 07:08, Mike Graham wrote: >> >> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou >> wrote: >>> >>> On Wed, 10 Oct 2012 15:36:08 -0400 >>> Mike Graham wrote: > > >>>> The literal"\c" should be an error > > > Who says so? My bash shell disagrees with you: Frankly, I don't look to bash for sensible language design advice. I think concepts like "In the face of ambiguity, refuse the temptation to guess" guides how we should see the decision here. "Backslash is for escape sequences except when it's not" seemed like an obviously-misfortunate thing to me. I'm truly perplexed people see it as a feature they're eager to use, but I guess I should learn something from that. >> Regular expressions are difficult if you're remembering which escape >> sequences exist and are easy if you're using raw string literals. > > Just because some people find it hard to remember doesn't mean that it > should be an error *not* to use raw strings. I didn't say that it should be an error not to use raw strings. I was saying that the implication that this suggestion makes constructing regex strings hard is silly and mentioning the thing that makes them easy. I'm not suggesting that you shouldn't be able to use normal string literals. Antoine went on to point out that things like "\t" worked in regex strings. This is an unrelated feature that I never suggested altering. In that case, a tab character in your string is regarded like \t. This behavior would remain. I think four string escapes have been added since versions of Python I was aware of. Writing code like "ab\c" seems seedy in light of that Mike From ericsnowcurrently at gmail.com Thu Oct 11 04:34:36 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Wed, 10 Oct 2012 20:34:36 -0600 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Oct 8, 2012 5:35 PM, "Eric Snow" wrote: > > On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou wrote: > > - `p[q]` joins path q to path p > -1 > > - `p + q` joins path q to path p > -1 > > - `p / q` joins path q to path p > -1 > > - `p.join(q)` joins path q to path p > +1 (with a different name) > > I've found Nick's argument against operators-from-day-1 to be > convincing, as well as his argument against join() or any other name > already provided by string/sequence APIs. Changing my vote: p[q] -1 p + q -1 p / q +0 p.pathjoin() +1 A method is essential, regardless of the color the bikeshed ends up. As far as operators go, / is the only option here that doesn't conflict with string/collection APIs. The alternative has an adverse impact on subclassing and on future design choices on the path API. This goes for the method name too. -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 11 04:58:33 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 13:58:33 +1100 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: References: <20121010214607.354902d6@pitrou.net> <50760DED.2030801@pearwood.info> Message-ID: <507635D9.2040107@pearwood.info> On 11/10/12 13:24, Mike Graham wrote: > On Wed, Oct 10, 2012 at 8:08 PM, Steven D'Aprano wrote: >> On 11/10/12 07:08, Mike Graham wrote: >>> >>> On Wed, Oct 10, 2012 at 3:46 PM, Antoine Pitrou >>> wrote: >>>> >>>> On Wed, 10 Oct 2012 15:36:08 -0400 >>>> Mike Graham wrote: >> >> >>>>> The literal"\c" should be an error >> >> >> Who says so? My bash shell disagrees with you: > > Frankly, I don't look to bash for sensible language design advice. Pity, because in this case I think bash is actually more sensible than either Python or Java. If you escape a character, you should get something. If it's a special character, you get the special meaning. If it's not, escaping should be transparent: escaping something that doesn't need escaping is a null op: py> from urllib import quote_plus py> quote_plus('abc') 'abc' If we were designing Python from scratch, I'd prefer '\D' -> 'D'. But we're not, so I'm happy with the current behaviour, and don't agree that it should be an error or that it needs warning about. > I > think concepts like "In the face of ambiguity, refuse the temptation > to guess" guides how we should see the decision here. Where is the ambiguity? Is there ever a context where \D could mean two different things and it isn't clear which one? "In the face of ambiguity..." does not mean "refuse to decide on language behaviour". Everything is ambiguous until you decide what something will mean. It's only when you have two possible meanings and no clear, obvious way to determine which one applies that the ambiguity koan applies. > "Backslash is > for escape sequences except when it's not" seemed like an > obviously-misfortunate thing to me. No. In cooked strings, backslash-C is always an escape sequence, for any character (or hex/oct code) C. But some escape sequences resolve to a single char (\n -> newline) and some resolve to a pair of chars (\D -> backslash D). In Haskell, \& resolves to the empty string. It's still an escape sequence. [...] > I think four string escapes have been added since versions of Python I > was aware of. Writing code like "ab\c" seems seedy in light of that Adding a new escape sequence is almost as big a step as adding a new built-in or new syntax. I see that as a good thing, it discourages too many requests for new escape sequences. -- Steven From alexander.belopolsky at gmail.com Thu Oct 11 06:07:33 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Oct 2012 00:07:33 -0400 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50761EE7.8060103@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: On Wed, Oct 10, 2012 at 9:20 PM, Steven D'Aprano wrote: > Both -0.0 and 0.0 compare equal, but they can be distinguished (although > doing so is tricky in Python). Not really: >>> math.copysign(1.0,-0.0) -1.0 >>> math.copysign(1.0,0.0) 1.0 From alexander.belopolsky at gmail.com Thu Oct 11 06:20:17 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Oct 2012 00:20:17 -0400 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50762125.1030201@pearwood.info> Message-ID: On Wed, Oct 10, 2012 at 9:44 PM, Guido van Rossum wrote: > Anyway, as an indication of the amount of work, you might want to look > at the fpectl module -- the module itself is tiny, but its > introduction required a huge amount of changes to every place where > CPython uses a double. I would start from another end. I would look at decimal.py first. This is little over 6,400 line of code and I think most of it can be reused to implement base 2 (or probably better base 16) float. Multi-precision binary float can coexist with existing float until the code matures and accelerators are written for major platforms. At the same time we can make incremental improvements to builtin float until it can be replaced by a multi-precision float in some well-defined context. From greg.ewing at canterbury.ac.nz Thu Oct 11 07:34:32 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Oct 2012 18:34:32 +1300 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <507635D9.2040107@pearwood.info> References: <20121010214607.354902d6@pitrou.net> <50760DED.2030801@pearwood.info> <507635D9.2040107@pearwood.info> Message-ID: <50765A68.9080607@canterbury.ac.nz> Steven D'Aprano wrote: > If you escape a character, you should get > something. If it's a special character, you get the special meaning. > If it's not, escaping should be transparent: escaping something that > doesn't need escaping is a null op I think that calling "\n", "\t" etc. "escape sequences" is a misnomer that is causing confusion in this discussion. The term "escape" in this context means to prevent something from having a special meaning that it would otherwise have. But the backslash in these is being used to *give* a special meaning to the following character. In Python string literals, the only true escape sequences associated with the backslash are '\\', "\'" and '\"'. So the backslash is a bit schizophrenic -- sometimes it's an escape character, sometimes it's a prefix that imparts a special meaning. This means that "\c" where c is not special in any way is somewhat ambiguous. Are you redundantly escaping something that doesn't need it, are you asking for a special meaning that doesn't exist (which is probably a mistake), or do you just want a literal backslash? Python guesses that you want a literal backslash. This seems to be motivated by the desire to minimise the need for backslash doubling. That sounds fine in theory, but I don't think it helps much in practice. I for one don't trust myself to keep the entire set of special characters in my head, including all the rarely-used ones, so I end up doubling every backslash anyway. Given that, I wouldn't have minded at all if Python had refused to guess in this case, and raised a compile-time error. That would have left the way open for extending the set of special chars in the future. > Adding a new escape sequence is almost as big a step as adding a new > built-in or new syntax. I see that as a good thing, it discourages too > many requests for new escape sequences. I don't see it makes much difference. We get plenty of requests for new syntax of all kinds, and we seem to have enough sense to reject them unless they're backed by extremely good arguments. There's no reason requests for new special chars should be treated any differently. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 11 07:45:50 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 11 Oct 2012 18:45:50 +1300 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> Message-ID: <50765D0E.4020001@canterbury.ac.nz> Alexander Belopolsky wrote: > I gave this idea +float('inf') in the other thread and was thinking about it > since. I am now toying with the idea to unify float and decimal in Python. Are you sure there would be any point in this? People who specifically *want* base-2 floats are probably quite happy with the current float type, and wouldn't appreciate having it slowed down, even by a small amount. It might make sense for them to share whatever parts of the fp context apply to both, and they might have a common base type, but they should probably remain distinct types with separate implementations. -- Greg From stephen at xemacs.org Thu Oct 11 07:53:12 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Oct 2012 14:53:12 +0900 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <20121008142812.GA22502@iskra.aviel.ru> Message-ID: <871uh58shj.fsf@uwakimon.sk.tsukuba.ac.jp> Ronald Oussoren writes: > All in all the best we seem to be able to do is use the OS as a > heuristic, most Unix filesystems are case sensitive while Windows > and OSX filesystems are case preserving. We can do better than that heuristic. All of the POSIX systems I know publish mtab by default. The mount utility by default will report the types of filesystems. While a path module should not depend on such information, I suppose[1], there ought to be a way to ask for it. Of course this is still an heuristic (at least some Mac filesystems can be configured to be case sensitive rather than case-preserving, and I don't think this information is available in mtab), but it's far more accurate than using only the OS. Footnotes: [1] Requires a system call or subprocess execution, and since mounts can be dynamically changed, doing it once at module initialization is not good enough. From rohit0286 at gmail.com Thu Oct 11 08:19:09 2012 From: rohit0286 at gmail.com (rohit sharma) Date: Thu, 11 Oct 2012 11:49:09 +0530 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: p + q +1 This is a familiar notation to any developer and its been used widely. Regards, Rohit. On Thu, Oct 11, 2012 at 8:04 AM, Eric Snow wrote: > > On Oct 8, 2012 5:35 PM, "Eric Snow" wrote: > > > > On Mon, Oct 8, 2012 at 12:47 PM, Antoine Pitrou > wrote: > > > - `p[q]` joins path q to path p > > -1 > > > - `p + q` joins path q to path p > > -1 > > > - `p / q` joins path q to path p > > -1 > > > - `p.join(q)` joins path q to path p > > +1 (with a different name) > > > > I've found Nick's argument against operators-from-day-1 to be > > convincing, as well as his argument against join() or any other name > > already provided by string/sequence APIs. > > Changing my vote: > > p[q] -1 > p + q -1 > p / q +0 > p.pathjoin() +1 > > A method is essential, regardless of the color the bikeshed ends up. As > far as operators go, / is the only option here that doesn't conflict with > string/collection APIs. The alternative has an adverse impact on > subclassing and on future design choices on the path API. This goes for > the method name too. > > -eric > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Thu Oct 11 08:55:06 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Thu, 11 Oct 2012 07:55:06 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: On 10/10/2012 20:07, Antoine Pitrou wrote: > > How about one_path.to(other_path) ? > > Regards > > Antoine. -1 two much chance of confusing it with the other ways that too can be spelt :) -- Cheers. Mark Lawrence. From stephen at xemacs.org Thu Oct 11 08:59:23 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Thu, 11 Oct 2012 15:59:23 +0900 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121010210703.4dafd553@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> <20121010210703.4dafd553@pitrou.net> Message-ID: <87zk3t7aus.fsf@uwakimon.sk.tsukuba.ac.jp> Antoine Pitrou writes: > On Tue, 9 Oct 2012 00:45:41 +0530 > Nick Coghlan wrote: > > On Tue, Oct 9, 2012 at 12:24 AM, Guido van Rossum wrote: > > > I don't like any of those; I'd vote for another regular method, maybe > > > p.pathjoin(q). > > > [...] > > > > I don't *love* joinpath as a name, I just don't actively dislike it > > the way I do the four presented options (and it has the virtue of the > > path.py precedent). +1 > How about one_path.to(other_path) ? TOOWDTI, yes, but to me what it obviously does is Path("/usr/local/bin").to(Path("/usr/bin")) => Path("../bin") Ie, to me it's another spelling for .relative_to(), except that the operands have reversed. FWIW M?2% YMMV etc. Some random thoughts follow. If you think that is out of keeping with the progress of this thread, stop reading now. I just don't think this problem (of convenient and object-oriented paths) is going to get solved. Basically what most of the people who are posting about this seem to want is a subclass of string that DWIMs. The problem is that "DWIM" varies substantially across programmers, and seems to be nondeterministic for some (me, for one). If path "objects" "should" behave like strings with specialized convenience methods, how can you improve on os.path? I haven't seen any answers to that, only "WIBNI Paths looked like strings representing paths?" And only piece by piece at that, no coherent overview of what Paths-like-str might look like from a space station. If we're going to have an object-oriented path module, why can't it be object-oriented? Paths are sequences of path components. They are not sequences of characters. Sorry! Path components are strings (or subclasses thereof), but they have additional syntax (extensions, Windows devices, Windows remote paths). Maybe that, we can do something with! Antoine says that Paths need to be immutable. Makes sense, but does that preclude having MutablePath? Then `mp[-1] += ".ext"` is a natural notation, no? How about `mp[-1] %= ".tex"; mp[-1] += .pdf"`? Then just my_path = MutablePath(arg_path) mutate(my_path) return Path(my_path) does the work safely. As has been noted several times, all paths have syntax resembling URL syntax. Even the semantics are similar, except (of course you are in no way surprised by this) on Windows, where the syntactic role of "scheme" has semantics "device", and there is the issue of the different path separator. Maybe it would be reasonable to forget object-oriented Paths restricted to filesystems and use URLs when you want object-oriented behavior. Under the hood URL methods working with file URLs would be manipulating paths via os.path, perhaps. I realize that this would impose an asymmetric burden on developers on Windows. On the other hand, these days who isn't familiar with URL syntax and passing familiar with its minor differences from file system path semantics? Perhaps the benefits of working with a well-defined object model would outweight the costs, at least when developing new code. In ordinary maintenance or major refactoring, the developer would have the option of continuing to use os.path or homebrew functions to manipulate paths. Steve From g.brandl at gmx.net Thu Oct 11 09:01:56 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Thu, 11 Oct 2012 09:01:56 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: Am 11.10.2012 08:19, schrieb rohit sharma: > p + q +1 > > This is a familiar notation to any developer and its been used widely. I'd like to see that claim supported. Georg From him at online.de Thu Oct 11 09:03:23 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Thu, 11 Oct 2012 09:03:23 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: <50766F3B.3090004@online.de> On 11/10/2012 04:24, Chris Jerdonek wrote: > Or how about path.slash(other_path)? :) > and path.backslash(other_path) for windows compaptibility ;-) Joachim From storchaka at gmail.com Thu Oct 11 10:45:09 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Oct 2012 11:45:09 +0300 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50762125.1030201@pearwood.info> Message-ID: On 11.10.12 07:20, Alexander Belopolsky wrote: > This is little over 6,400 line of code and I think most of it can be > reused to implement base 2 (or probably better base 16) float. With base 16 floats you can't emulate x86 native 53-bit mantissa floats. From oscar.j.benjamin at gmail.com Thu Oct 11 10:56:31 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 09:56:31 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50765D0E.4020001@canterbury.ac.nz> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> Message-ID: On 11 October 2012 06:45, Greg Ewing wrote: > Alexander Belopolsky wrote: > >> I gave this idea +float('inf') in the other thread and was thinking about >> it >> since. I am now toying with the idea to unify float and decimal in Python. > > > Are you sure there would be any point in this? People who > specifically *want* base-2 floats are probably quite happy > with the current float type, and wouldn't appreciate having > it slowed down, even by a small amount. > > It might make sense for them to share whatever parts of the > fp context apply to both, and they might have a common base > type, but they should probably remain distinct types with > separate implementations. This is what I was pitching at. It would be great if a single floating point context could be used to control the behaviour of float, decimal, ndarray etc simultaneously. Something that would have made my life easier yesterday would have been a way to enter a debugger at the point when a first NaN is created during execution. Something like: python -m pdb --error-nan broken_script.py Or perhaps: PYTHONRUNFIRST='import errornan' python broken_script.py With numpy you can already do: export PYTHONRUNFIRST='imoprt numpy; numpy.seterr(all='raise')' (Except that PYTHONRUNFIRST isn't implemented yet: http://bugs.python.org/issue14803) Oscar From storchaka at gmail.com Thu Oct 11 11:00:01 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Oct 2012 12:00:01 +0300 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: <20121010221839.51b3470c@pitrou.net> References: <20121010214607.354902d6@pitrou.net> <20121010221839.51b3470c@pitrou.net> Message-ID: On 10.10.12 23:18, Antoine Pitrou wrote: > On Wed, 10 Oct 2012 23:04:25 +0300 > Serhiy Storchaka > wrote: >> On 10.10.12 22:46, Antoine Pitrou wrote: >>> -1. This will make life more difficult with regular expressions (and >>> produce lots of spurious warnings in existing code). >> >> Strings for regular expressions always should be raw. Now regular >> expressions supports \u and \U escapes and no reason to use non-raw strings. > > That's a style issue, not a language rule. Yes, of course, that's a style advice. Sorry if I used the wrong words. This will not make life more difficult with regular expressions because you always can use raw string literals. From solipsis at pitrou.net Thu Oct 11 12:03:10 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Oct 2012 12:03:10 +0200 Subject: [Python-ideas] Floating point contexts in Python core References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> Message-ID: <20121011120310.2ff01caf@pitrou.net> On Thu, 11 Oct 2012 18:45:50 +1300 Greg Ewing wrote: > Alexander Belopolsky wrote: > > > I gave this idea +float('inf') in the other thread and was thinking about it > > since. I am now toying with the idea to unify float and decimal in Python. > > Are you sure there would be any point in this? People who > specifically *want* base-2 floats are probably quite happy > with the current float type, and wouldn't appreciate having > it slowed down, even by a small amount. Indeed, I don't see the point either. Decimal's strength over float is to be able to represent *decimal* numbers of arbitrary precision, which is useful because any common human activity uses base-10 numbers. I don't see how adding a new binary float type would help any use case. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ubershmekel at gmail.com Thu Oct 11 12:31:14 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 11 Oct 2012 12:31:14 +0200 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: References: <20121010214607.354902d6@pitrou.net> <20121010221839.51b3470c@pitrou.net> Message-ID: http://docs.python.org/release/3.3.0/reference/lexical_analysis.html#string-and-bytes-literals I'm not sure I understand what this line from the docs means: \newline Backslash and newline ignored I understand that row as either "\n" won't appear in the resulting string or that I should get "\\newline". Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Oct 11 12:53:00 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Oct 2012 13:53:00 +0300 Subject: [Python-ideas] Make undefined escape sequences have SyntaxWarnings In-Reply-To: References: <20121010214607.354902d6@pitrou.net> <20121010221839.51b3470c@pitrou.net> Message-ID: On 11.10.12 13:31, Yuval Greenfield wrote: > http://docs.python.org/release/3.3.0/reference/lexical_analysis.html#string-and-bytes-literals > > I'm not sure I understand what this line from the docs means: > > \newline Backslash and newline ignored > > I understand that row as either "\n" won't appear in the resulting > string or that I should get "\\newline". Newline is newline in source code. >>> "a\ ... b" 'ab' Type . Result is "ab". From arnodel at gmail.com Thu Oct 11 13:27:05 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Thu, 11 Oct 2012 12:27:05 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50766F3B.3090004@online.de> References: <20121008204707.48559bf9@pitrou.net> <50766F3B.3090004@online.de> Message-ID: On 11 October 2012 08:03, Joachim K?nig wrote: > On 11/10/2012 04:24, Chris Jerdonek wrote: >> >> Or how about path.slash(other_path)? :) >> > and path.backslash(other_path) for windows compaptibility ;-) That's made my day! How about a past participle to express it's not mutating? e.g. 1. path.joined("foo/bar") 2. path.extended("foo", "bar") (or some better and shorter one I can't think of). -- Arnaud From steve at pearwood.info Thu Oct 11 13:35:28 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 11 Oct 2012 22:35:28 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50765D0E.4020001@canterbury.ac.nz> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> Message-ID: <5076AF00.1010902@pearwood.info> On 11/10/12 16:45, Greg Ewing wrote: > Alexander Belopolsky wrote: > >> I gave this idea +float('inf') in the other thread and was thinking about it >> since. I am now toying with the idea to unify float and decimal in Python. > > Are you sure there would be any point in this? People who > specifically *want* base-2 floats are probably quite happy > with the current float type, and wouldn't appreciate having > it slowed down, even by a small amount. I would gladly give up a small amount of speed for better control over floats, such as whether 1/0.0 raised an exception or returned infinity. If I wanted fast code, I'd be using C. I'm happy with *fast enough*. For example, 1/0.0 in a continued fraction is generally harmless, provided it returns infinity. If it raises an exception, you have to write slow, ugly code to evaluate continued fractions robustly. I wouldn't expect 1/0.0 -> infinity to becomes the default, but I'd like a runtime switch to turn it on and off as needed. -- Steven From alexander.belopolsky at gmail.com Thu Oct 11 14:05:41 2012 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Thu, 11 Oct 2012 08:05:41 -0400 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50762125.1030201@pearwood.info> Message-ID: <6CF38710-3E9D-473F-B8B8-9D4E876F2390@gmail.com> On Oct 11, 2012, at 4:45 AM, Serhiy Storchaka wrote: > With base 16 floats you can't emulate x86 native 53-bit mantissa floats. I realized that as soon as I hit send. :-( I also realized that it does not matter for python implementation because decimal stores mantissa as an int rather than a list of digits. From sturla at molden.no Thu Oct 11 14:31:31 2012 From: sturla at molden.no (Sturla Molden) Date: Thu, 11 Oct 2012 14:31:31 +0200 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <5076AF00.1010902@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> Message-ID: <5076BC23.7050108@molden.no> On 11.10.2012 13:35, Steven D'Aprano wrote: > I would gladly give up a small amount of speed for better control > over floats, such as whether 1/0.0 raised an exception or > returned infinity. > > If I wanted fast code, I'd be using C. I'm happy with *fast enough*. > > For example, 1/0.0 in a continued fraction is generally harmless, > provided it returns infinity. If it raises an exception, you have > to write slow, ugly code to evaluate continued fractions robustly. > I wouldn't expect 1/0.0 -> infinity to becomes the default, but > I'd like a runtime switch to turn it on and off as needed. For those who use Python for numerical or scientific computing or computer graphics this is a real issue. First: The standard way of dealing with 1/0.0 in this context, since the days of FORTRAN, is to return an inf. Consequently, that is what NumPy does, as does Matlab, R and most C programs and libraries. Now compare: >>> 1.0/0.0 Traceback (most recent call last): File "", line 1, in 1.0/0.0 ZeroDivisionError: float division by zero With this: >>> import numpy as np >>> np.float64(1.0)/np.float64(0.0) inf Thus, the NumPy float64 scalar behaves differently from the Python float scalar! In less than trivial expressions, we can have a combination of Python floats and ints and NumPy types (arrays or scalars). What this means is that the behavior is undefined. You might get an inf, or you might get an exception. Who can tell? The issue also affects integers: >>> 1/0 Traceback (most recent call last): File "", line 1, in 1/0 ZeroDivisionError: integer division or modulo by zero whereas: >>> np.int64(1)/np.int64(0) 0 >>> np.int32(1)/np.int32(0) 0 And with arrays: >>> np.ones(10, dtype=np.int)/np.zeros(10, dtype=np.int) array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]) >>> np.ones(10, dtype=np.float64)/np.zeros(10, dtype=np.float64) array([ inf, inf, inf, inf, inf, inf, inf, inf, inf, inf]) I think for the sake of us who actually need computation -- believe it or not, Python is rapidly becoming the language of choice for numerical computing -- it would be very nice is this was controllable. Not just the behavior of floats, but also the behavior of ints. A global switch in the sys module would make life a lot easier. Even better would be a context manager that allows us to set up a "numerical" context for local expressions using a with statement. That would not have a lasting effect, but just affect the context. Preferably it should not even propagate across function calls. Something like this: def foobar(): 1/0.0 # raise an exception 1/0 # raise an exception with sys.numerical: 1/0.0 # return inf 1/0 # return 0 foobar() (NumPy actually prints divide by zero warnings on their first occurrence, but I removed it for clarity.) Sturla Molden From rosuav at gmail.com Thu Oct 11 15:18:20 2012 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Oct 2012 00:18:20 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <5076BC23.7050108@molden.no> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <5076BC23.7050108@molden.no> Message-ID: On Thu, Oct 11, 2012 at 11:31 PM, Sturla Molden wrote: > A global switch in the sys module would make life a lot easier. Even better > would be a context manager that allows us to set up a "numerical" context > for local expressions using a with statement. That would not have a lasting > effect, but just affect the context. Preferably it should not even propagate > across function calls. Something like this: > > > def foobar(): > 1/0.0 # raise an exception > 1/0 # raise an exception > > with sys.numerical: > 1/0.0 # return inf > 1/0 # return 0 > foobar() Not propagating across function calls strikes me as messy, but I see why you'd want it. Would this be better as a __future__ directive? There's already the concept that they apply to a module but not to what that module calls. ChrisA From oscar.j.benjamin at gmail.com Thu Oct 11 15:36:11 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 14:36:11 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <5076BC23.7050108@molden.no> Message-ID: On 11 October 2012 14:18, Chris Angelico wrote: > On Thu, Oct 11, 2012 at 11:31 PM, Sturla Molden wrote: >> A global switch in the sys module would make life a lot easier. Even better >> would be a context manager that allows us to set up a "numerical" context >> for local expressions using a with statement. That would not have a lasting >> effect, but just affect the context. Preferably it should not even propagate >> across function calls. Something like this: >> >> >> def foobar(): >> 1/0.0 # raise an exception >> 1/0 # raise an exception >> >> with sys.numerical: >> 1/0.0 # return inf >> 1/0 # return 0 >> foobar() > > Not propagating across function calls strikes me as messy, but I see > why you'd want it. Would this be better as a __future__ directive? > There's already the concept that they apply to a module but not to > what that module calls. __future__ directives are for situations in which the default behaviour will be changed in the future but you want to get the new behaviour now. The proposal is to always have widely supported, convenient ways to switch between different handling modes for numerical operations. The default Python behaviour would be unchanged by this. Oscar From rosuav at gmail.com Thu Oct 11 16:11:45 2012 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 12 Oct 2012 01:11:45 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <5076BC23.7050108@molden.no> Message-ID: On Fri, Oct 12, 2012 at 12:36 AM, Oscar Benjamin wrote: > On 11 October 2012 14:18, Chris Angelico wrote: >> Not propagating across function calls strikes me as messy, but I see >> why you'd want it. Would this be better as a __future__ directive? >> There's already the concept that they apply to a module but not to >> what that module calls. > > __future__ directives are for situations in which the default > behaviour will be changed in the future but you want to get the new > behaviour now. The proposal is to always have widely supported, > convenient ways to switch between different handling modes for > numerical operations. The default Python behaviour would be unchanged > by this. Sure, it's not perfect for __future__ either, but it does seem odd for a function invocation to suddenly change semantics. This change "feels" to me more like a try/catch block - it's a change to this code that causes different behaviour around error conditions. That ought to continue into a called function. ChrisA From solipsis at pitrou.net Thu Oct 11 16:40:43 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 11 Oct 2012 16:40:43 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <5072C972.5070207@python.org> <50736C0F.90401@python.org> <20121011005523.GA43928@snakebite.org> Message-ID: <20121011164043.216164d3@pitrou.net> On Wed, 10 Oct 2012 20:55:23 -0400 Trent Nelson wrote: > > You could leverage this with kqueue and epoll; have similar threads > set up to simply process I/O independent of the GIL, using the same > facilities that would be used by IOCP-processing threads. Would you really win anything by doing I/O in separate threads, while doing normal request processing in the main thread? That said, the idea of a common API architected around async I/O, rather than non-blocking I/O, sounds interesting at least theoretically. Maybe all those outdated Snakebite Operating Systems are useful for something after all. ;-P cheers Antoine. -- Software development and contracting: http://pro.pitrou.net From guido at python.org Thu Oct 11 16:54:35 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 07:54:35 -0700 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> Message-ID: I think you're mistaking my suggestion. I meant to recommend that there should be a way to control the behavior (e.g. whether to silently return Nan/Inf or raise an exception) of floating point operations, using the capabilities of the hardware as exposed through C, using Python's existing float type. I did not for a second consider reimplementing IEEE 754 from scratch. Therein lies insanity. That's also why I recommended you look at the fpectl module. -- --Guido van Rossum (python.org/~guido) From storchaka at gmail.com Thu Oct 11 16:55:48 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 11 Oct 2012 17:55:48 +0300 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <5076BC23.7050108@molden.no> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <5076BC23.7050108@molden.no> Message-ID: On 11.10.12 15:31, Sturla Molden wrote: > >>> np.int64(1)/np.int64(0) > 0 > > >>> np.int32(1)/np.int32(0) > 0 For such behavior must be some rationale. From oscar.j.benjamin at gmail.com Thu Oct 11 17:52:43 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 16:52:43 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <5076BC23.7050108@molden.no> Message-ID: On 11 October 2012 15:55, Serhiy Storchaka wrote: > On 11.10.12 15:31, Sturla Molden wrote: >> >> >>> np.int64(1)/np.int64(0) >> 0 >> >> >>> np.int32(1)/np.int32(0) >> 0 > > > For such behavior must be some rationale. I don't know what the rationale for that is but it is at least controllable in numpy: >>> import numpy as np >>> np.seterr(all='raise') # Exceptions instead of mostly useless values {'over': 'raise', 'divide': 'raise', 'invalid': 'raise', 'under': 'raise'} >>> np.int32(1) / np.int32(0) Traceback (most recent call last): File "", line 1, in FloatingPointError: divide by zero encountered in long_scalars >>> np.float32(1e20) * np.float32(1e20) Traceback (most recent call last): File "", line 1, in FloatingPointError: overflow encountered in float_scalars >>> np.float32('inf') inf >>> np.float32('inf') / np.float32('inf') Traceback (most recent call last): File "", line 1, in FloatingPointError: invalid value encountered in float_scalars Oscar From stephen at xemacs.org Thu Oct 11 18:05:33 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Oct 2012 01:05:33 +0900 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <5076AF00.1010902@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> Message-ID: <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > I would gladly give up a small amount of speed for better control > over floats, such as whether 1/0.0 raised an exception or > returned infinity. Isn't that what the fpectl module is supposed to buy, albeit much less pleasantly than Decimal contexts do? From oscar.j.benjamin at gmail.com Thu Oct 11 19:17:33 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 18:17:33 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> Message-ID: On 11 October 2012 15:54, Guido van Rossum wrote: > I think you're mistaking my suggestion. I meant to recommend that > there should be a way to control the behavior (e.g. whether to > silently return Nan/Inf or raise an exception) of floating point > operations, using the capabilities of the hardware as exposed through > C, using Python's existing float type. I did not for a second consider > reimplementing IEEE 754 from scratch. Therein lies insanity. > > That's also why I recommended you look at the fpectl module. I would like to have precisely the functionality you are suggesting and I don't want to reimplement anything (I assume this message is intended for me since it was addressed to me). I don't know enough about the implementation details to agree on the hardware capabilities part. From a quick glance at the fpectl module I see that it has problems with portability: http://docs.python.org/library/fpectl.html#fpectl-limitations Setting up a given processor to trap IEEE-754 floating point errors currently requires custom code on a per-architecture basis. You may have to modify fpectl to control your particular hardware. This presumably explains why I don't have the module in my Windows build or on the Linux machines in the HPC cluster I use. Are these problems that can be overcome? If it is necessary to have this hardware-specific accelerator for floating point exceptions then is it reasonable to expect implementations other than CPython to be able to match the semantics of floating point contexts without a significant degradation in performance? I was expecting the implementation to be some checks in straight forward C code for invalid values. I would expect this to cause a small degradation in performance (the kind that you wouldn't notice unless you went out of your way to measure it). Python already does this by checking for a zero value on every division. As far as I can tell from the numpy codebase this is how it works there. This function seems to be responsible for the integer division by zero result in numpy: https://github.com/numpy/numpy/blob/master/numpy/core/src/scalarmathmodule.c.src#L271 >>> import numpy as np >>> np.seterr() {'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'} >>> np.int32(1) / np.int32(0) __main__:1: RuntimeWarning: divide by zero encountered in long_scalars 0 >>> np.seterr(divide='ignore') {'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'} >>> np.int32(1) / np.int32(0) 0 >>> np.seterr(divide='raise') {'over': 'warn', 'divide': 'ignore', 'invalid': 'warn', 'under': 'ignore'} >>> np.int32(1) / np.int32(0) Traceback (most recent call last): File "", line 1, in FloatingPointError: divide by zero encountered in long_scalars This works perfectly well in numpy and also in decimal I see no reason why it couldn't work for float/int. But what would would be even better is if you could control all of them with a single context manager. Typically I don't care with the error occurred as a result of operations on ints/floats/ndarrays/decimals I just know that I got a NaN from somewhere and I need to debug it. Oscar From matt at whoosh.ca Thu Oct 11 19:33:38 2012 From: matt at whoosh.ca (Matt Chaput) Date: Thu, 11 Oct 2012 13:33:38 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: <507702F2.8040008@whoosh.ca> > - `p[q]` joins path q to path p -1 > - `p + q` joins path q to path p +1 > - `p / q` joins path q to path p +0 > - `p.join(q)` joins path q to path p +1 I think .join() should be the "obvious" way to do it and + should be a shortcut. Matt From oscar.j.benjamin at gmail.com Thu Oct 11 20:42:55 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 19:42:55 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 October 2012 17:05, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I would gladly give up a small amount of speed for better control > > over floats, such as whether 1/0.0 raised an exception or > > returned infinity. > > Isn't that what the fpectl module is supposed to buy, albeit much less > pleasantly than Decimal contexts do? But the fpectl module IIUC wouldn't work for 1 / 0. Since Python has managed to unify integer/float division now it would be a shame to introduce any new reasons to bring in superfluous .0s again: with context(zero_division='infinity'): x = 1 / 0.0 # float('inf') y = 1 / 0 # I'd like to see float('inf') here as well I've spent 4 hours this week in computer labs with students using Python 2.7 as an introduction to scientific programming. A significant portion of that time was spent explaining the int/float division problem. They all get the issue now but not all of them understand that it is specifically about division: many are putting .0s everywhere. I expect it to be easier when we use Python 3 and I can simply explain that there are two types of division with two different operators. Oscar From guido at python.org Thu Oct 11 20:45:02 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 11:45:02 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <5074C4E7.60708@canterbury.ac.nz> References: <5073EA4F.8030405@canterbury.ac.nz> <5074C4E7.60708@canterbury.ac.nz> Message-ID: Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> Indeed, in NDB this works great. However tracebacks don't work so >> great: If you don't catch the exception right away, it takes work to >> make the tracebacks look right when you catch it a few generator calls >> down on the (conceptual) stack. I fixed this to some extent in NDB, by >> passing the traceback explicitly along when setting an exception on a >> Future; > > > Was this before or after the recent change that was supposed > to improve tracebacks from yield-fram chains? If there's still > a problem after that, maybe exception handling in yield-from > requires some more work. Sadly it was with Python 2.5/2.7... >> But so far when thinking about this >> recently I have found the goal elusive -- > > >> Perhaps you can clear things up by >> >> showing some detailed (but still simple enough) example code to handle >> e.g. a simple web client? > > > You might like to take a look at this, where I develop a series of > examples culminating in a simple multi-threaded server: > > http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt Definitely very enlightening. Though I think you should not use 'thread' since that term is already reserved for OS threads as supported by the threading module. In NDB I chose to use 'tasklet' -- while that also has other meanings, its meaning isn't fixed in core Python. You could also use task, which also doesn't have a core Python meaning. Just don't call it "process", never mind that Erlang uses this (a number of other languages rooted in old traditions do too, I believe). Also I think you can now revisit it and rewrite the code to use Python 3.3. > Code here: > > http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/ It does bother me somehow that you're not using .send() and yield arguments at all. I notice that you have a lot ofthree-line code blocks like this: block_for_reading(sock) yield data = sock.recv(1024) The general form seems to be: arrange for a callback when some operation can be done without blocking yield do the operation This seems to be begging to be collapsed into a single line, e.g. data = yield sock.recv_async(1024) (I would also prefer to see the socket wrapped in an object that makes it hard to accidentally block.) >> somehow it seems there *has* >> to be a distinction between an operation you just *yield* (this would >> be waiting for a specific low-level I/O operation) and something you >> use with yield-from, which returns a value through StopIteration. > > It may be worth noting that nothing in my server example uses 'yield' > to send or receive values -- yield is only used without argument as > a suspension point. But the functions containing the yields *are* > called with yield-from and may return values via StopIteration. Yeah, but see my remark above... > So I think there are (at least) two distinct ways of using generators, > but the distinction isn't quite the one you're making. Rather, we > have "coroutines" (don't yield values, do return values) and > "iterators" (do yield values, don't return values). But surely there's still a place for send() and other PEP 342 features? > Moreover, it's *only* the "coroutine" variety that we need to cater > for when designing an async event system. Does that help to > alleviate any of your monad-induced headaches? Not entirely, no. I now have a fair amount experience writing an async system and helping users make sense of its error messages, and there are some practical considerations. E.g. my users sometimes want to treat something as a coroutine but they don't have any yields in it (perhaps they are writing skeleton code and plan to fill in the I/O later). Example: def caller(): data = yield from reader() def reader(): return 'dummy' yield works, but if you drop the yield it doesn't work. With a decorator I know how to make it work either way. -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Oct 11 20:46:34 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 11:46:34 -0700 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Thu, Oct 11, 2012 at 11:42 AM, Oscar Benjamin wrote: > On 11 October 2012 17:05, Stephen J. Turnbull wrote: >> Steven D'Aprano writes: >> >> > I would gladly give up a small amount of speed for better control >> > over floats, such as whether 1/0.0 raised an exception or >> > returned infinity. >> >> Isn't that what the fpectl module is supposed to buy, albeit much less >> pleasantly than Decimal contexts do? > > But the fpectl module IIUC wouldn't work for 1 / 0. Since Python has > managed to unify integer/float division now it would be a shame to > introduce any new reasons to bring in superfluous .0s again: > > with context(zero_division='infinity'): > x = 1 / 0.0 # float('inf') > y = 1 / 0 # I'd like to see float('inf') here as well > > I've spent 4 hours this week in computer labs with students using > Python 2.7 as an introduction to scientific programming. A significant > portion of that time was spent explaining the int/float division > problem. They all get the issue now but not all of them understand > that it is specifically about division: many are putting .0s > everywhere. I expect it to be easier when we use Python 3 and I can > simply explain that there are two types of division with two different > operators. You could have just told them to "from __future__ import division" -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Thu Oct 11 21:12:37 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 11 Oct 2012 20:12:37 +0100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 11 October 2012 19:46, Guido van Rossum wrote: > On Thu, Oct 11, 2012 at 11:42 AM, Oscar Benjamin > wrote: >> I've spent 4 hours this week in computer labs with students using >> Python 2.7 as an introduction to scientific programming. A significant >> portion of that time was spent explaining the int/float division >> problem. They all get the issue now but not all of them understand >> that it is specifically about division: many are putting .0s >> everywhere. I expect it to be easier when we use Python 3 and I can >> simply explain that there are two types of division with two different >> operators. > > You could have just told them to "from __future__ import division" I know but the reason for choosing Python is the low barrier to getting started with procedural programming. When they're having trouble understanding the difference between the Python shell and the OS shell I'd like to avoid introducing the concept that the interpreter can change its calculation modes dynamically and forget those changes when you restart it. It's also unfortunate for the students to know that some of the things they're seeing on day one will change in the next version (you can't just tell people to import things from the "future" without some kind of explanation). I used the opportunity to think a little bit about types by running type(x) and explain that different types of objects behave differently. I would rather explain that using genuinely incompatible types like strings and numbers than ints and floats though. Oscar From tjreedy at udel.edu Thu Oct 11 22:44:46 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Oct 2012 16:44:46 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073EA4F.8030405@canterbury.ac.nz> <5074C4E7.60708@canterbury.ac.nz> Message-ID: On 10/11/2012 2:45 PM, Guido van Rossum wrote: > Tue, Oct 9, 2012 at 5:44 PM, Greg Ewing wrote: >> You might like to take a look at this, where I develop a series of >> examples culminating in a simple multi-threaded server: >> >> http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt > > Definitely very enlightening. Though I think you should not use > 'thread' since that term is already reserved for OS threads as > supported by the threading module. In NDB I chose to use 'tasklet' -- I read through this also and agree that using 'thread' for 'task', 'tasklet', 'micrethread', or whatever is distracting. Part of the point, to me, is that the code does *not* use (OS) threads and the thread module. Tim Peters intended iterators, including generators, to be an alternative to what he viewed as 'inside-out' callback code. The idea was that pausing where appropriate allowed code that belongs together to be kept together. I find generator-based event loops to be somewhat easier to understand than callback-based loops. I certainly was more comfortable with Greg's example than what I have read about twisted. So I would like to see a generator-based system in the stdlib. -- Terry Jan Reedy From guido at python.org Thu Oct 11 23:18:50 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 14:18:50 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > Oh my me. This is a very long thread that I probably should have replied to > a long time ago. This thread is intensely long right now, and tonight is the > first chance I've had to try and go through it comprehensively. I'll try to > reply to individual points made in the thread -- if I missed yours, please > don't be offended, I promise it's my fault :) No problem, I'm running behind myself... > FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka > async-pep. I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153. > First of all, I'm glad to see that there's some more "let's get that pep > along" movement. I tabled it because: > > a) I didn't have enough time to contribute, > b) a lot of promised contributions ended up not happening when it came down > to it, which was incredibly demotivating. The combination of this thread, > plus the fact that I was strong armed at Pycon ZA by a bunch of community > members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into > exploring this thing again. > > First of all, I don't feel async-pep is an attempt at twisted light in the > stdlib. Other than separation of transport and protocol, there's not really > much there that even smells of twisted (especially since right now I'd > probably throw consumers/producers out) -- and that separation is simply > good practice. Twisted does the same thing, but it didn't invent it. > Furthermore, the advantages seem clear: reusability and testability are more > than enough for me. > > If there's one take away idea from async-pep, it's reusable protocols. Is there a newer version that what's on http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any specific proposals, after spending a lot of time giving a rationale and defining some terms. The version on https://github.com/lvh/async-pep doesn't seem to be any more complete. > The PEP should probably be a number of PEPs. At first sight, it seems that > this number is at least four: > > 1. Protocol and transport abstractions, making no mention of asynchronous IO > (this is what I want 3153 to be, because it's small, manageable, and > virtually everyone appears to agree it's a fantastic idea) But the devil is in the details. *What* specifically are you proposing? How would you write a protocol handler/parser without any reference to I/O? Most protocols are two-way streets -- you read some stuff, and you write some stuff, then you read some more. (HTTP may be the exception here, if you don't keep the connection open.) > 2. A base reactor interface I agree that this should be a separate PEP. But I do think that in practice there will be dependencies between the different PEPs you are proposing. > 3. A way of structuring callbacks: probably deferreds with a built-in > inlineCallbacks for people who want to write synchronous-looking code with > explicit yields for asynchronous procedures Your previous two ideas sound like you're not tied to backward compatibility with Tornado and/or Twisted (not even via an adaptation layer). Given that we're talking Python 3.4 here that's fine with me (though I think we should be careful to offer a path forward for those packages and their users, even if it means making changes to the libraries). But Twisted Deferred is pretty arcane, and I would much rather not use it as the basis of a forward-looking design. I'd much rather see what we can mooch off PEP 3148 (Futures). > 4+ adapting the stdlib tools to using these new things We at least need to have an idea for how this could be done. We're talking serious rewrites of many of our most fundamental existing synchronous protocol libraries (e.g. httplib, email, possibly even io.TextWrapper), most of which have had only scant updates even through the Python 3 transition apart from complications to deal with the bytes/str dichotomy. > Re: forward path for existing asyncore code. I don't remember this being > raised as an issue. If anything, it was mentioned in passing, and I think > the answer to it was something to the tune of "asyncore's API is broken, > fixing it is more important than backwards compat". Essentially I agree with > Guido that the important part is an upgrade path to a good third-party > library, which is the part about asyncore that REALLY sucks right now. I have the feeling that the main reason asyncore sucks is that it requires you to subclass its Dispatcher class, which has a rather treacherous interface. > Regardless, an API upgrade is probably a good idea. I'm not sure if it > should go in the first PEP: given the separation I've outlined above (which > may be too spread out...), there's no obvious place to put it besides it > being a new PEP. Aren't all your proposals API upgrades? > Re base reactor interface: drawing maximally from the lessons learned in > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, > etc), asynchronous-looking name lookup, fd handling are the important parts. That actually sounds more concrete than I'd like a reactor interface to be. In the App Engine world, there is a definite need for a reactor, but it cannot talk about file descriptors at all -- all I/O is defined in terms of RPC operations which have their own (several layers of) async management but still need to be plugged in to user code that might want to benefit from other reactor functionality such as scheduling and placing a call at a certain moment in the future. > call_every can be implemented in terms of call_later on a separate object, > so I think it should be (eg twisted.internet.task.LoopingCall). One thing > that is apparently forgotten about is event loop integration. The prime way > of having two event loops cooperate is *NOT* "run both in parallel", it's > "have one call the other". Even though not all loops support this, I think > it's important to get this as part of the interface (raise an exception for > all I care if it doesn't work). This is definitely one of the things we ought to get right. My own thoughts are slightly (perhaps only cosmetically) different again: ideally each event loop would have a primitive operation to tell it to run for a little while, and then some other code could tie several event loops together. Possibly the primitive operation would be something like "block until either you've got one event ready, or until a certain time (possibly 0) has passed without any events, and then give us the events that are ready and a lower bound for when you might have more work to do" -- or maybe instead of returning the event(s) it could just call the associated callback (it might have to if it is part of a GUI library that has callbacks written in C/C++ for certain events like screen refreshes). Anyway, it would be good to have input from representatives from Wx, Qt, Twisted and Tornado to ensure that the *functionality* required is all there (never mind the exact signatures of the APIs needed to provide all that functionality). -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Oct 12 00:28:18 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 15:28:18 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell wrote: > On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum wrote: >>> It's a Future constructor, a (conditional) add_done_callback, plus the >>> calls to set_result or set_exception and the with statement for error >>> handling. In full: >>> >>> def future_wrap(f): >>> @functools.wraps(f) >>> def wrapper(*args, **kwargs): >>> future = Future() >>> if kwargs.get('callback') is not None: >>> future.add_done_callback(kwargs.pop('callback')) >>> kwargs['callback'] = future.set_result >>> def handle_error(typ, value, tb): >>> future.set_exception(value) >>> return True >>> with ExceptionStackContext(handle_error): >>> f(*args, **kwargs) >>> return future >>> return wrapper >> >> Hmm... I *think* it automatically adds a special keyword 'callback' to >> the *call* site so that you can do things like >> >> fut = some_wrapped_func(blah, callback=my_callback) >> >> and then instead of using yield to wait for the callback, put the >> continuation of your code in the my_callback() function. > > Yes. Note that if you're passing in a callback you're probably going > to just ignore the return value. The callback argument and the future > return value are essentially two alternative interfaces; it probably > doesn't make sense to use both at once (but as a library author it's > useful to provide both). Definitely sounds like something that could be simplified if you didn't have backward compatibility baggage... >> But it also >> seems like it passes callback=future.set_result as the callback to the >> wrapped function, which looks to me like that function was apparently >> written before Futures were widely used. This seems pretty impure to >> me and I'd like to propose a "future" where such functions either be >> given the Future where the result is expected, or (more commonly) the >> function would create the Future itself. > > Yes, it's impure and based on pre-Future patterns. The caller's > callback argument and the inner function's callback not really related > any more (they were the same in pre-Future async code of course). > They should probably have different names, although if the inner > function's return value were passed via exception (StopIteration or > return) the inner callback argument can just go away. > >> >> Unless I'm totally missing the programming model here. >> >> PS. I'd like to learn more about ExceptionStackContext() -- I've >> struggled somewhat with getting decent tracebacks in NDB. > > StackContext doesn't quite give you better tracebacks, although I > think it could be adapted to do that. ExceptionStackContext is > essentially a try/except block that follows you around across > asynchronous operations - on entry it sets a thread-local state, and > all the tornado asynchronous functions know to save this state when > they are passed a callback, and restore it when they execute it. This > has proven to be extremely helpful in ensuring that all exceptions get > caught by something that knows how to do the appropriate cleanup (i.e. > an asynchronous web page serves an error instead of just spinning > forever), although it has turned out to be a little more intrusive and > magical than I had originally anticipated. > > https://github.com/facebook/tornado/blob/master/tornado/stack_context.py Heh. I'll try to mine it for gems. >>>>> In Tornado the Future is created by a decorator >>>>> and hidden from the asynchronous function (it just sees the callback), >>>> >>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used >>>> to make Futures work, and most code (including large swaths of >>>> internal code) uses Futures. I think NDB is similar to monocle here. >>>> In NDB, you can do >>>> >>>> f = >>>> r = yield f >>>> >>>> where "yield f" is mostly equivalent to f.result(), except it gives >>>> better opportunity for concurrency. >>> >>> Yes, tornado's gen.engine does the same thing here. However, the >>> stakes are higher than "better opportunity for concurrency" - in an >>> event loop if you call future.result() without yielding, you'll >>> deadlock if that Future's task needs to run on the same event loop. >> >> That would depend on the semantics of the event loop implementation. >> In NDB's event loop, such a .result() call would just recursively >> enter the event loop, and you'd only deadlock if you actually have two >> pieces of code waiting for each other's completion. > > Hmm, I think I'd rather deadlock. :) If the event loop is reentrant > then the application code has be coded defensively as if it were > preemptively multithreaded, which introduces the possibility of > deadlock or (probably) more subtle/less frequent errors. Reentrancy > has been a significant problem in my experience, so I've been moving > towards a policy where methods in Tornado that take a callback never > run it immediately; callbacks are always scheduled on the next > iteration of the IOLoop with IOLoop.add_callback. The latter is a good tactic and I'm also using it. (Except for some reason we had to add the concept of "immediate callbacks" to our Future class, and those are run inside the set_result() call. But most callbacks don't use that feature.) I don't have a choice about making the event loop reentrant -- App Engine's underlying RPC multiplexing implementation *is* reentrant, and there is a large set of "classic" APIs that I cannot stop the user from calling that reenter it. But even if my hand wasn't forced, I'm not sure if I would make your choice. In NDB, there is a full complement of synchronous APIs that exactly matches the async APIs, and users are free to use the synchronous APIs in parts of their code where they don't need concurrency. Hence, every sychronous API just calls its async sibling and immediately waits for its result, which implicitly invokes the event loop. Of course, I have it easy -- multiple incoming requests are dispatched to separate threads by the App Engine runtime, so I don't have to worry about multiplexing at that level at all -- just end user code that is essentially single-threaded unless they go out of their way. I did end up debugging one user's problem where they were making a synchronous call inside an async handler, and -- very rarely! -- the recursive event loop calls kept stacking up until they hit a StackOverflowError. So I would agree that async code shouldn't make synchronous API calls; but I haven't heard yet from anyone who was otherwise hurt by the recursive event loop invocations -- in particular, nobody has requested locks. Still, this sounds like an important issue to revisit when discussing a standard reactor API as part of Lourens's PEP offensive. >> [...] >>>> I am currently trying to understand if using "yield from" (and >>>> returning a value from a generator) will simplify things. For example >>>> maybe the need for a special decorator might go away. But I keep >>>> getting headaches -- perhaps there's a Monad involved. :-) >>> >>> I think if you build generator handling directly into the event loop >>> and use "yield from" for calls from one async function to another then >>> you can get by without any decorators. But I'm not sure if you can do >>> that and maintain any compatibility with existing non-generator async >>> code. >>> >>> I think the ability to return from a generator is actually a bigger >>> deal than "yield from" (and I only learned about it from another >>> python-ideas thread today). The only reason a generator decorated >>> with @tornado.gen.engine needs a callback passed in to it is to act as >>> a psuedo-return, and a real return would prevent the common mistake of >>> running the callback then falling through to the rest of the function. >> >> Ah, so you didn't come up with the clever hack of raising an exception >> to signify the return value. In NDB, you raise StopIteration (though >> it is given the alias 'Return' for clarity) with an argument, and the >> wrapper code that is responsible for the Future takes the value from >> the StopIteration exception and passes it to the Future's >> set_result(). > > I think I may have thought about "raise Return(x)" and dismissed it as > too weird. But then, I'm abnormally comfortable with asynchronous > code that passes callbacks around. As I thought about the issue of how to spell "return a value" and looked at various approaches, I decided I definitely didn't like what monocle does: they let you say "yield X" where X is a non-Future value; and I saw some other solution (Twisted? Phillip Eby?) that simply called a function named something like returnValue(X). But I also wanted it to look like a control statement that ends a block (so auto-indenting editors would auto-dedent the next line), and that means there are only four choices: continue, break, raise or return. Three of those are useless... So the only choice really was which exception to raise. FOrtunately I had the advantage of knowing that PEP 380 was going to implement "return X" from a generator as "raise StopIteration(X)" so I decided to be compatible with that. >>> For concreteness, here's a crude sketch of what the APIs I'm talking >>> about would look like in use (in a hypothetical future version of >>> tornado). >>> >>> @future_wrap >>> @gen.engine >>> def async_http_client(url, callback): >>> parsed_url = urlparse.urlsplit(url) >>> # works the same whether the future comes from a thread pool or @future_wrap >> >> And you need the thread pool because there's no async version of >> getaddrinfo(), right? > > Right. > >> >>> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) >>> stream = IOStream(socket.socket()) >>> yield stream.connect((addrinfo[0][-1])) >>> stream.write('GET %s HTTP/1.0' % parsed_url.path) >> >> Why no yield in front of the write() call? > > Because we don't need to wait for the write to complete before we > continue to the next statement. write() doesn't return anything; it > just succeeds or fails, and if it fails the next read_until will fail > too. (although in this case it wouldn't hurt to have the yield either) I guess you have a certain kind of buffering built in to your stream? So if you make two write() calls without waiting in quick succession, does the system collapse these into one, or does it end up making two system calls, or what? In NDB, there's a similar issue with multiple RPCs that can be batched. I ended up writing an abstraction that automatically combines these; the call isn't actually made until there are no other runnable tasks. I've had to explain this a few times to users who try to get away with overlapping CPU work and I/O, but otherwise it's worked quite well. >>> header_data = yield stream.read_until('\r\n\r\n') >>> headers = parse_headers(header_data) >>> body_data = yield stream.read_bytes(int(headers['Content-Length'])) >>> stream.close() >>> callback(body_data) >>> >>> # another function to demonstrate composability >>> @future_wrap >>> @gen.engine >>> def fetch_some_urls(url1, url2, url3, callback): >>> body1 = yield async_http_client(url1) >>> # yield a list of futures for concurrency >>> future2 = yield async_http_client(url2) >>> future3 = yield async_http_client(url3) >>> body2, body3 = yield [future2, future3] >>> callback((body1, body2, body3)) >> >> This second one is nearly identical to the way we it's done in NDB. >> However I think you have a typo -- I doubt that there should be yields >> on the lines creating future2 and future3. > > Right. > >> >>> One hole in this design is how to deal with callbacks that are run >>> multiple times. For example, the IOStream read methods take both a >>> regular callback and an optional streaming_callback (which is called >>> with each chunk of data as it arrives). I think this needs to be >>> modeled as something like an iterator of Futures, but I haven't worked >>> out the details yet. >> >> Ah. Yes, that's a completely different kind of thing, and probably >> needs to be handled in a totally different way. I think it probably >> needs to be modeled more like an infinite loop where at the blocking >> point (e.g. a low-level read() or accept() call) you yield a Future. >> Although I can see that this doesn't work well with the IOLoop's >> concept of file descriptor (or other event source) registration. > > It works just fine at the IOLoop level: you call > IOLoop.add_handler(fd, func, READ), and you'll get read events > whenever there's new data until you call remove_handler(fd) (or > update_handler). If you're passing callbacks around explicitly it's > pretty straightforward (as much as anything ever is in that style) to > allow for those callbacks to be run more than once. The problem is > that generators more or less require that each callback be run exactly > once. That's a generally desirable property, but the mismatch between > the two layers can be difficult to deal with. Okay, I see that these are useful. However they feel as two very different classes of callbacks -- one that is called when a *specific* piece of I/O that was previously requested is done; another that will be called *whenever* a certain condition becomes true on a certain channel. The former would correspond to e.g. completion of the headers of an incoming HTTP request); the latter might correspond to a "listening" socket receiving another connection. -- --Guido van Rossum (python.org/~guido) From jeanpierreda at gmail.com Fri Oct 12 00:42:55 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Thu, 11 Oct 2012 18:42:55 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum wrote: > On Tue, Oct 9, 2012 at 11:00 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> Oh my me. This is a very long thread that I probably should have replied to >> a long time ago. This thread is intensely long right now, and tonight is the >> first chance I've had to try and go through it comprehensively. I'll try to >> reply to individual points made in the thread -- if I missed yours, please >> don't be offended, I promise it's my fault :) > > No problem, I'm running behind myself... > >> FYI, I'm the sucker who originally got tricked into starting PEP 3153, aka >> async-pep. > > I suppose that's your pet name for it. :-) For most everyone else it's PEP 3153. > >> First of all, I'm glad to see that there's some more "let's get that pep >> along" movement. I tabled it because: >> >> a) I didn't have enough time to contribute, >> b) a lot of promised contributions ended up not happening when it came down >> to it, which was incredibly demotivating. The combination of this thread, >> plus the fact that I was strong armed at Pycon ZA by a bunch of community >> members that shall not be named (Alex, Armin, Maciej, Larry ;-)) into >> exploring this thing again. >> >> First of all, I don't feel async-pep is an attempt at twisted light in the >> stdlib. Other than separation of transport and protocol, there's not really >> much there that even smells of twisted (especially since right now I'd >> probably throw consumers/producers out) -- and that separation is simply >> good practice. Twisted does the same thing, but it didn't invent it. >> Furthermore, the advantages seem clear: reusability and testability are more >> than enough for me. >> >> If there's one take away idea from async-pep, it's reusable protocols. > > Is there a newer version that what's on > http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any > specific proposals, after spending a lot of time giving a rationale > and defining some terms. The version on > https://github.com/lvh/async-pep doesn't seem to be any more complete. > >> The PEP should probably be a number of PEPs. At first sight, it seems that >> this number is at least four: >> >> 1. Protocol and transport abstractions, making no mention of asynchronous IO >> (this is what I want 3153 to be, because it's small, manageable, and >> virtually everyone appears to agree it's a fantastic idea) > > But the devil is in the details. *What* specifically are you > proposing? How would you write a protocol handler/parser without any > reference to I/O? Most protocols are two-way streets -- you read some > stuff, and you write some stuff, then you read some more. (HTTP may be > the exception here, if you don't keep the connection open.) > >> 2. A base reactor interface > > I agree that this should be a separate PEP. But I do think that in > practice there will be dependencies between the different PEPs you are > proposing. > >> 3. A way of structuring callbacks: probably deferreds with a built-in >> inlineCallbacks for people who want to write synchronous-looking code with >> explicit yields for asynchronous procedures > > Your previous two ideas sound like you're not tied to backward > compatibility with Tornado and/or Twisted (not even via an adaptation > layer). Given that we're talking Python 3.4 here that's fine with me > (though I think we should be careful to offer a path forward for those > packages and their users, even if it means making changes to the > libraries). But Twisted Deferred is pretty arcane, and I would much > rather not use it as the basis of a forward-looking design. I'd much > rather see what we can mooch off PEP 3148 (Futures). Could you be more specific? I've never heard Deferreds in particular called "arcane". They're very popular in e.g. the JS world, and possibly elsewhere. Moreover, they're extremely similar to futures, so if one is arcane so is the other. Maybe if you could elaborate on features of their designs that are better/worse? As far as I know, they mostly differ in that: - Callbacks are added in a pipeline, rather than "in parallel" - Deferreds pass in values along the pipeline, rather than self (and have a separate pipeline for error values). Neither is clearly better or more obvious than the other. If anything I generally find deferred composition more useful than deferred tee-ing, so I feel like composition is the correct base operator, but you could pick another. Either way, each is implementable in terms of the other (ish?). The pipeline approach is particularly nice for the errback pipeline, because it allows chained exception (Failure) handling on the deferred to be very simple. The larger issue is that futures don't make chaining easy at all, even if it is theoretically possible. For example, look at the following Twisted code: http://bpaste.net/show/RfEwoaflO0qY76N8NjHx/ , and imagine how that might generalize to more realistic error handling scenarios. The equivalent Futures code would involve creating one Future per callback in the pipeline and manually hooking them up with a special callback that passes values to the next future. And if we add that to the futures API, the API will almost certainly be somewhat similar to what Twisted has with deferreds and chaining and such. So then, equally arcane. To my mind, it is Futures that need to mooch off of Deferreds, not the other way around. Twisted's Deferreds have a lot of history with making asynchronous computation pleasant, and Futures are missing a lot of good tools. -- Devin From guido at python.org Fri Oct 12 01:37:42 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 16:37:42 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre wrote: > On Thu, Oct 11, 2012 at 5:18 PM, Guido van Rossum wrote: >> [...] Twisted Deferred is pretty arcane, and I would much >> rather not use it as the basis of a forward-looking design. I'd much >> rather see what we can mooch off PEP 3148 (Futures). > > Could you be more specific? I've never heard Deferreds in particular > called "arcane". They're very popular in e.g. the JS world, Really? Twisted is used in the JS world? Or do you just mean the pervasiveness of callback style async programming? That's one of the things I am desperately trying to keep out of Python, I find that style unreadable and unmanageable (whenever I click on a button in a website and nothing happens I know someone has a bug in their callbacks). I understand you feel different; but I feel the general sentiment is that callback-based async programming is even harder than multi-threaded programming (and nobody is claiming that threads are easy :-). > and possibly elsewhere. Moreover, they're extremely similar to futures, so > if one is arcane so is the other. I love Futures, they represent a nice simple programming model. But I especially love that you can write async code using Futures and yield-based coroutines (what you call inlineCallbacks) and never have to write an explicit callback function. Ever. > Maybe if you could elaborate on features of their designs that are better/worse? > > As far as I know, they mostly differ in that: > > - Callbacks are added in a pipeline, rather than "in parallel" > - Deferreds pass in values along the pipeline, rather than self (and > have a separate pipeline for error values). These two combined are indeed what mostly feels arcane to me. > Neither is clearly better or more obvious than the other. If anything > I generally find deferred composition more useful than deferred > tee-ing, so I feel like composition is the correct base operator, but > you could pick another. If you're writing long complicated chains of callbacks that benefit from these features, IMO you are already doing it wrong. I understand that this is a matter of style where I won't be able to convince you. But style is important to me, so let's agree to disagree. > Either way, each is implementable in terms of > the other (ish?). The pipeline approach is particularly nice for the > errback pipeline, because it allows chained exception (Failure) > handling on the deferred to be very simple. The larger issue is that > futures don't make chaining easy at all, even if it is theoretically > possible. But as soon as you switch from callbacks to yield-based coroutines the chaining becomes natural, error handling is just a matter of try/except statements (or not if you want the error to bubble up) and (IMO) the code becomes much more readable. > For example, look at the following Twisted code: > http://bpaste.net/show/RfEwoaflO0qY76N8NjHx/ , and imagine how that > might generalize to more realistic error handling scenarios. Looks fine to me. I have a lot of code like that in NDB and it works great. (Note that NDB's Futures are not the same as PEP 3148 Futures, although they have some things in common; in particular NDB Futures are not tied to threads.) > The equivalent Futures code would involve creating one Future per > callback in the pipeline and manually hooking them up with a special > callback that passes values to the next future. And if we add that to > the futures API, the API will almost certainly be somewhat similar to > what Twisted has with deferreds and chaining and such. So then, > equally arcane. The *implementation* of this stuff in NDB is certainly hairy; I already posted the link to the code: http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py#349 However, this is internal code and doesn't affect the Future API at all. > To my mind, it is Futures that need to mooch off of Deferreds, not the > other way around. Twisted's Deferreds have a lot of history with > making asynchronous computation pleasant, and Futures are missing a > lot of good tools. I am totally open to learning from Twisted's experience. I hope that you are willing to share even the end result might not look like Twisted at all -- after all in Python 3.3 we have "yield from" and return from a generator and many years of experience with different styles of async APIs. In addition to Twisted, there's Tornado and Monocle, and then there's the whole greenlets/gevent and Stackless/microthreads community that we can't completely ignore. I believe somewhere is an ideal async architecture, and I hope you can help us discover it. (For example, I am very interested in Twisted's experiences writing real-world performant, robust reactors.) -- --Guido van Rossum (python.org/~guido) From dreamingforward at gmail.com Fri Oct 12 02:08:21 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Thu, 11 Oct 2012 19:08:21 -0500 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <5076AF00.1010902@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> Message-ID: On Thu, Oct 11, 2012 at 6:35 AM, Steven D'Aprano wrote: > On 11/10/12 16:45, Greg Ewing wrote: >> Are you sure there would be any point in this? People who >> specifically *want* base-2 floats are probably quite happy >> with the current float type, and wouldn't appreciate having >> it slowed down, even by a small amount. > > I would gladly give up a small amount of speed for better control > over floats, such as whether 1/0.0 raised an exception or > returned infinity. Umm, you would be giving up a *lot* of speed. Native floating point happens right in the processor, so if you want special behavior, you'd have to take the floating point out of the CPU and into "user space". mark From steve at pearwood.info Fri Oct 12 02:16:05 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Oct 2012 11:16:05 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <50776145.70800@pearwood.info> On 12/10/12 03:05, Stephen J. Turnbull wrote: > Steven D'Aprano writes: > > > I would gladly give up a small amount of speed for better control > > over floats, such as whether 1/0.0 raised an exception or > > returned infinity. > > Isn't that what the fpectl module is supposed to buy, albeit much less > pleasantly than Decimal contexts do? I can't test it, because I don't have that module installed, but I would think not. Reading the docs: http://docs.python.org/library/fpectl.html I would say that fpectl exists to turn on floating point exceptions where Python currently returns an inf or NaN, not to turn on special values where Python currently raises an exception, e.g. 1/0.0. Because it depends on a build-time option, using it is even less convenient that most other non-standard libraries. It only has a single exception type for any of Division by Zero, Overflow and Invalid, and doesn't appear to trap Underflow or Inexact at all. It's not just less pleasant than Decimal contexts, but much less powerful as well. -- Steven From tjreedy at udel.edu Fri Oct 12 02:29:05 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 11 Oct 2012 20:29:05 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On 10/11/2012 5:18 PM, Guido van Rossum wrote: > Anyway, it would be good to have input from representatives from Wx, > Qt, Twisted and Tornado to ensure that the *functionality* required is > all there (never mind the exact signatures of the APIs needed to > provide all that functionality). And of course tk/tkinter (tho perhaps we can represent that). It occurs to me that while i/o (file/socket) events can be added to a user (mouse/key) event loop, and I suspect that some tk/tkinter apps do so, it might be sensible to keep the two separate. A master loop could tell the user-event loop to handle all user events and then the i/o loop to handle one i/o event. This all depends on the relative speed of the handler code. -- Terry Jan Reedy From guido at python.org Fri Oct 12 02:34:33 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 17:34:33 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy wrote: > On 10/11/2012 5:18 PM, Guido van Rossum wrote: > >> Anyway, it would be good to have input from representatives from Wx, >> Qt, Twisted and Tornado to ensure that the *functionality* required is >> all there (never mind the exact signatures of the APIs needed to >> provide all that functionality). > > > And of course tk/tkinter (tho perhaps we can represent that). It occurs to > me that while i/o (file/socket) events can be added to a user (mouse/key) > event loop, and I suspect that some tk/tkinter apps do so, it might be > sensible to keep the two separate. A master loop could tell the user-event > loop to handle all user events and then the i/o loop to handle one i/o > event. This all depends on the relative speed of the handler code. You should talk to a Tcl/Tk user (if there are any left :-). They actually really like the unified event loop that's used for both widget events and network events. Tk is probably also a good example of a hybrid GUI system, where some of the callbacks (e.g. redraw events) are implemented in C. -- --Guido van Rossum (python.org/~guido) From ben at bendarnell.com Fri Oct 12 02:41:57 2012 From: ben at bendarnell.com (Ben Darnell) Date: Thu, 11 Oct 2012 17:41:57 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum wrote: > On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell wrote: >> On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum wrote: >>>> It's a Future constructor, a (conditional) add_done_callback, plus the >>>> calls to set_result or set_exception and the with statement for error >>>> handling. In full: >>>> >>>> def future_wrap(f): >>>> @functools.wraps(f) >>>> def wrapper(*args, **kwargs): >>>> future = Future() >>>> if kwargs.get('callback') is not None: >>>> future.add_done_callback(kwargs.pop('callback')) >>>> kwargs['callback'] = future.set_result >>>> def handle_error(typ, value, tb): >>>> future.set_exception(value) >>>> return True >>>> with ExceptionStackContext(handle_error): >>>> f(*args, **kwargs) >>>> return future >>>> return wrapper >>> >>> Hmm... I *think* it automatically adds a special keyword 'callback' to >>> the *call* site so that you can do things like >>> >>> fut = some_wrapped_func(blah, callback=my_callback) >>> >>> and then instead of using yield to wait for the callback, put the >>> continuation of your code in the my_callback() function. >> >> Yes. Note that if you're passing in a callback you're probably going >> to just ignore the return value. The callback argument and the future >> return value are essentially two alternative interfaces; it probably >> doesn't make sense to use both at once (but as a library author it's >> useful to provide both). > > Definitely sounds like something that could be simplified if you > didn't have backward compatibility baggage... Probably, although I still feel like callback-passing has its place. For example, I think the Tornado chat demo (https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py) would be less clear with coroutines and Futures than it is now (although it would fit better into Greg's schedule/unschedule style). That doesn't mean that every method has to take a callback, but I'd be reluctant to get rid of them until we have more experience with the generator/future-focused style. >>>>>> In Tornado the Future is created by a decorator >>>>>> and hidden from the asynchronous function (it just sees the callback), >>>>> >>>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used >>>>> to make Futures work, and most code (including large swaths of >>>>> internal code) uses Futures. I think NDB is similar to monocle here. >>>>> In NDB, you can do >>>>> >>>>> f = >>>>> r = yield f >>>>> >>>>> where "yield f" is mostly equivalent to f.result(), except it gives >>>>> better opportunity for concurrency. >>>> >>>> Yes, tornado's gen.engine does the same thing here. However, the >>>> stakes are higher than "better opportunity for concurrency" - in an >>>> event loop if you call future.result() without yielding, you'll >>>> deadlock if that Future's task needs to run on the same event loop. >>> >>> That would depend on the semantics of the event loop implementation. >>> In NDB's event loop, such a .result() call would just recursively >>> enter the event loop, and you'd only deadlock if you actually have two >>> pieces of code waiting for each other's completion. >> >> Hmm, I think I'd rather deadlock. :) If the event loop is reentrant >> then the application code has be coded defensively as if it were >> preemptively multithreaded, which introduces the possibility of >> deadlock or (probably) more subtle/less frequent errors. Reentrancy >> has been a significant problem in my experience, so I've been moving >> towards a policy where methods in Tornado that take a callback never >> run it immediately; callbacks are always scheduled on the next >> iteration of the IOLoop with IOLoop.add_callback. > > The latter is a good tactic and I'm also using it. (Except for some > reason we had to add the concept of "immediate callbacks" to our > Future class, and those are run inside the set_result() call. But most > callbacks don't use that feature.) > > I don't have a choice about making the event loop reentrant -- App > Engine's underlying RPC multiplexing implementation *is* reentrant, > and there is a large set of "classic" APIs that I cannot stop the user > from calling that reenter it. But even if my hand wasn't forced, I'm > not sure if I would make your choice. In NDB, there is a full > complement of synchronous APIs that exactly matches the async APIs, > and users are free to use the synchronous APIs in parts of their code > where they don't need concurrency. Hence, every sychronous API just > calls its async sibling and immediately waits for its result, which > implicitly invokes the event loop. Tornado has a synchronous HTTPClient that does the same thing, although each fetch creates and runs its own IOLoop rather than spinning the top-level IOLoop. (This means it doesn't really make sense to run it when there is a top-level IOLoop; it's provided as a convenience for scripts and multi-threaded apps who want an HTTPRequest interface consistent with the async version). > > Of course, I have it easy -- multiple incoming requests are dispatched > to separate threads by the App Engine runtime, so I don't have to > worry about multiplexing at that level at all -- just end user code > that is essentially single-threaded unless they go out of their way. > > I did end up debugging one user's problem where they were making a > synchronous call inside an async handler, and -- very rarely! -- the > recursive event loop calls kept stacking up until they hit a > StackOverflowError. So I would agree that async code shouldn't make > synchronous API calls; but I haven't heard yet from anyone who was > otherwise hurt by the recursive event loop invocations -- in > particular, nobody has requested locks. I think that's because you don't have file descriptor support. In a (level-triggered) event loop if you don't drain the socket before reentering the loop then your read handler will be called again, which generally makes a mess. I suppose with coroutines you'd want edge-triggered instead of level-triggered though, which might make this problem go away. >>>> For concreteness, here's a crude sketch of what the APIs I'm talking >>>> about would look like in use (in a hypothetical future version of >>>> tornado). >>>> >>>> @future_wrap >>>> @gen.engine >>>> def async_http_client(url, callback): >>>> parsed_url = urlparse.urlsplit(url) >>>> # works the same whether the future comes from a thread pool or @future_wrap >>> >>> And you need the thread pool because there's no async version of >>> getaddrinfo(), right? >> >> Right. >> >>> >>>> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) >>>> stream = IOStream(socket.socket()) >>>> yield stream.connect((addrinfo[0][-1])) >>>> stream.write('GET %s HTTP/1.0' % parsed_url.path) >>> >>> Why no yield in front of the write() call? >> >> Because we don't need to wait for the write to complete before we >> continue to the next statement. write() doesn't return anything; it >> just succeeds or fails, and if it fails the next read_until will fail >> too. (although in this case it wouldn't hurt to have the yield either) > > I guess you have a certain kind of buffering built in to your stream? > So if you make two write() calls without waiting in quick succession, > does the system collapse these into one, or does it end up making two > system calls, or what? In NDB, there's a similar issue with multiple > RPCs that can be batched. I ended up writing an abstraction that > automatically combines these; the call isn't actually made until there > are no other runnable tasks. I've had to explain this a few times to > users who try to get away with overlapping CPU work and I/O, but > otherwise it's worked quite well. Yes, IOStream does buffering for you. Each IOStream.write() call will generally result in a syscall, but once the outgoing socket buffer is full subsequent writes will be buffered in the IOStream and written when the IOLoop says the socket is writable. (the callback argument to write() can be used for flow control in this case) I used to defer the syscall until the IOLoop was idle to batch things up, but it turns out to be more efficient in practice to just write things out each time and let the higher level do its own buffering when appropriate. -Ben From ben at bendarnell.com Fri Oct 12 02:57:38 2012 From: ben at bendarnell.com (Ben Darnell) Date: Thu, 11 Oct 2012 17:57:38 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum wrote: >> Re base reactor interface: drawing maximally from the lessons learned in >> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >> etc), asynchronous-looking name lookup, fd handling are the important parts. > > That actually sounds more concrete than I'd like a reactor interface > to be. In the App Engine world, there is a definite need for a > reactor, but it cannot talk about file descriptors at all -- all I/O > is defined in terms of RPC operations which have their own (several > layers of) async management but still need to be plugged in to user > code that might want to benefit from other reactor functionality such > as scheduling and placing a call at a certain moment in the future. So are you thinking of something like reactor.add_event_listener(event_type, event_params, func)? One thing to keep in mind is that file descriptors are somewhat special (at least in a level-triggered event loop), because of the way the event will keep firing until the socket buffer is drained or the event is unregistered. I'd be inclined to keep file descriptors in the interface even if they just raise an error on app engine, since they're fairly fundamental to the (unixy) event loop. On the other hand, I don't have any experience with event loops outside the unix/network world so I don't know what other systems might need for their event loops. > >> call_every can be implemented in terms of call_later on a separate object, >> so I think it should be (eg twisted.internet.task.LoopingCall). One thing >> that is apparently forgotten about is event loop integration. The prime way >> of having two event loops cooperate is *NOT* "run both in parallel", it's >> "have one call the other". Even though not all loops support this, I think >> it's important to get this as part of the interface (raise an exception for >> all I care if it doesn't work). > > This is definitely one of the things we ought to get right. My own > thoughts are slightly (perhaps only cosmetically) different again: > ideally each event loop would have a primitive operation to tell it to > run for a little while, and then some other code could tie several > event loops together. > > Possibly the primitive operation would be something like "block until > either you've got one event ready, or until a certain time (possibly > 0) has passed without any events, and then give us the events that are > ready and a lower bound for when you might have more work to do" -- or > maybe instead of returning the event(s) it could just call the > associated callback (it might have to if it is part of a GUI library > that has callbacks written in C/C++ for certain events like screen > refreshes). That doesn't work very well - while one loop is waiting for its timeout, nothing can happen on the other event loop. You have to switch back and forth frequently to keep things responsive, which is inefficient. I'd rather give each event loop its own thread; you can minimize the thread-synchronization concerns by picking one loop as "primary" and having all the others just pass callbacks over to it when their events fire. -Ben > > Anyway, it would be good to have input from representatives from Wx, > Qt, Twisted and Tornado to ensure that the *functionality* required is > all there (never mind the exact signatures of the APIs needed to > provide all that functionality). > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From greg.ewing at canterbury.ac.nz Fri Oct 12 03:32:10 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 12 Oct 2012 14:32:10 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <5073EA4F.8030405@canterbury.ac.nz> <5074C4E7.60708@canterbury.ac.nz> Message-ID: <5077731A.2070400@canterbury.ac.nz> Guido van Rossum wrote: > Though I think you should not use > 'thread' since that term is already reserved for OS threads as > supported by the threading module. ... You could also use task, > which also doesn't have a core Python > meaning. > > Also I think you can now revisit it and rewrite the code to use Python 3.3. Both good ideas. I'll see about publishing an updated version. > It does bother me somehow that you're not using .send() and yield > arguments at all. I notice that you have a lot ofthree-line code > blocks like this: > > block_for_reading(sock) > yield > data = sock.recv(1024) I wouldn't say I have a "lot". In the spamserver, there are really only three -- one for accepting a connection, one for reading from a socket, and one for writing to a socket. These are primitive operations that would be provided by an async socket library. Generally, all the yields would be hidden inside primitives like this. Normally, user code would never need to use 'yield', only 'yield from'. This probably didn't come through as clearly as it might have in my tutorial. Part of the reason is that at the time I wrote it, I was having to manually expand yield-froms into for-loops, so I was reluctant to use any more of them than I needed to. Also, yield-from was a new and unfamiliar concept, and I didn't want to scare people by overusing it. These considerations led me to push some of the yields slightly further up the layer stack than they could be. > > The general form seems to be: > > arrange for a callback when some operation can be done without blocking > yield > do the operation > > This seems to be begging to be collapsed into a single line, e.g. > > data = yield sock.recv_async(1024) I'm not sure how you're imagining that would work, but whatever it is, it's wrong -- that just doesn't make sense. What *would* make sense is data = yield from sock.recv_async(1024) with sock.recv_async() being a primitive that encapsulates the block/yield/process triplet. > (I would also prefer to see the socket wrapped in an object that makes > it hard to accidentally block.) It would be straightforward to make the primitives be methods of a socket wrapper object. I only used functions in the tutorial in the interests of keeping the amount of machinery to a bare minimum. > But surely there's still a place for send() and other PEP 342 features? In the wider world of generator usage, yes. If you have a generator that it makes sense to send() things into, for example, and you want to factor part of it out into another function, the fact that yield-from passes through sent values is useful. But we're talking about a very specialised use of generators here, and so far I haven't thought of a use for sent or yielded values in this context that can't be done in a more straightforward way by other means. Keep in mind that a value yielded by a generator being used as part of a coroutine is *not* seen by code calling it with yield-from. Rather, it comes out in the inner loop of the scheduler, from the next() call being used to resume the coroutine. Likewise, any send() call would have to be made by the scheduler, not the yield-from caller. So, the send/yield channel is exclusively for communication with the *scheduler* and nothing else. Under the old way of doing generator-based coroutines, this channel was used to simulate a call stack by yielding 'call' and 'return' instructions that the scheduler interpreted. But all that is now taken care of by the yield-from mechanism, and there is nothing left for the send/yield channel to do. > my users sometimes want to > treat something as a coroutine but they don't have any yields in it > > def caller(): > data = yield from reader() > > def reader(): > return 'dummy' > yield > > works, but if you drop the yield it doesn't work. With a decorator I > know how to make it work either way. If you're talking about a decorator that turns a function into a generator, I can't see anything particularly headachish about that. If you mean something else, you'll have to elaborate. -- Greg From steve at pearwood.info Fri Oct 12 03:03:50 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 12 Oct 2012 12:03:50 +1100 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> Message-ID: <50776C76.3040309@pearwood.info> On 12/10/12 11:04, Mark Adam wrote: > On Thu, Oct 11, 2012 at 6:35 AM, Steven D'Aprano wrote: >> On 11/10/12 16:45, Greg Ewing wrote: >>> Are you sure there would be any point in this? People who >>> specifically *want* base-2 floats are probably quite happy >>> with the current float type, and wouldn't appreciate having >>> it slowed down, even by a small amount. >> >> I would gladly give up a small amount of speed for better control >> over floats, such as whether 1/0.0 raised an exception or >> returned infinity. > > Umm, you would be giving up a *lot* of speed. Native floating point > happens right in the processor, so if you want special behavior, you'd > have to take the floating point out of hardware and into "user space". Any half-decent processor supports the IEEE-754 standard. If it doesn't, it's broken by design. Even in user-space, you're not giving up that much speed in practical terms, at least not for my needs. The new decimal module in Python 3.3 is less than a factor of 10 times slower than Python's floats, which makes it pretty much instantaneous to my mind :) numpy supports configurable numeric contexts, and I don't hear that many complaints that numpy is slower than standard Python. -- Steven > > mark > From dreamingforward at gmail.com Fri Oct 12 03:38:43 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Thu, 11 Oct 2012 20:38:43 -0500 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 7:34 PM, Guido van Rossum wrote: > On Thu, Oct 11, 2012 at 5:29 PM, Terry Reedy wrote: >> On 10/11/2012 5:18 PM, Guido van Rossum wrote: >> >>> Anyway, it would be good to have input from representatives from Wx, >>> Qt, Twisted and Tornado to ensure that the *functionality* required is >>> all there (never mind the exact signatures of the APIs needed to >>> provide all that functionality). >> >> >> And of course tk/tkinter (tho perhaps we can represent that). It occurs to >> me that while i/o (file/socket) events can be added to a user (mouse/key) >> event loop, and I suspect that some tk/tkinter apps do so, it might be >> sensible to keep the two separate. A master loop could tell the user-event >> loop to handle all user events and then the i/o loop to handle one i/o >> event. This all depends on the relative speed of the handler code. Here's the thing: the underlying O.S is always handling two major I/O channels at any given time and it needs all it's attention to do this: the GUI and one of the following (network, file) I/O. You can shuffle these around all you want, but somewhere the O.S. kernel is going to have to be involved, which means either portability is sacrificed or speed if one is going to pursue and abstract, unified async API. > You should talk to a Tcl/Tk user (if there are any left :-). I used to be one of those :) mark From stephen at xemacs.org Fri Oct 12 05:01:12 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Oct 2012 12:01:12 +0900 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50776145.70800@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> <50776145.70800@pearwood.info> Message-ID: <87sj9k75s7.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > On 12/10/12 03:05, Stephen J. Turnbull wrote: > > Steven D'Aprano writes: > > > > > I would gladly give up a small amount of speed for better control > > > over floats, such as whether 1/0.0 raised an exception or > > > returned infinity. > > > > Isn't that what the fpectl module is supposed to buy, albeit much less > > pleasantly than Decimal contexts do? > > > I can't test it, because I don't have that module installed, but I would > think not. > > Reading the docs: > > http://docs.python.org/library/fpectl.html > > I would say that fpectl exists to turn on floating point exceptions where > Python currently returns an inf or NaN, not to turn on special values > where Python currently raises an exception, e.g. 1/0.0. OK. But if Python does that, it must be checking the value of the operand as well as the type. Surely that could be delegated to the hardware easily by commenting out one line. (Of course that would need to be a build-time option, and requires care in initialization.) > Because it depends on a build-time option, using it is even less convenient > that most other non-standard libraries. That is neither here nor there. I think the people who would use such facilities are a very small minority; imposing a slight extra burden on them is not a huge cost to Python. Eg, I'm perfectly happy with Python's current behavior because I only write toy examples/classroom demos in pure Python. If I were going to try to write statistical code in Python (vaguely plausible but not likely :-), I'd surely use SciPy. > It only has a single exception type for any of Division by Zero, Overflow > and Invalid, and doesn't appear to trap Underflow or Inexact at all. It's > not just less pleasant than Decimal contexts, but much less powerful as > well. Now you're really picking nits. Nobody said fpectl is perfect for all uses, just that you could get *better* control over floats. If you're going to insist that nothing less than Decimal contexts will do, you're right for you -- but that's not what you said. From guido at python.org Fri Oct 12 05:40:37 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 11 Oct 2012 20:40:37 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: On Thu, Oct 11, 2012 at 5:41 PM, Ben Darnell wrote: > On Thu, Oct 11, 2012 at 3:28 PM, Guido van Rossum wrote: >> On Mon, Oct 8, 2012 at 10:12 PM, Ben Darnell wrote: >>> On Mon, Oct 8, 2012 at 8:30 AM, Guido van Rossum wrote: >>>>> It's a Future constructor, a (conditional) add_done_callback, plus the >>>>> calls to set_result or set_exception and the with statement for error >>>>> handling. In full: >>>>> >>>>> def future_wrap(f): >>>>> @functools.wraps(f) >>>>> def wrapper(*args, **kwargs): >>>>> future = Future() >>>>> if kwargs.get('callback') is not None: >>>>> future.add_done_callback(kwargs.pop('callback')) >>>>> kwargs['callback'] = future.set_result >>>>> def handle_error(typ, value, tb): >>>>> future.set_exception(value) >>>>> return True >>>>> with ExceptionStackContext(handle_error): >>>>> f(*args, **kwargs) >>>>> return future >>>>> return wrapper >>>> >>>> Hmm... I *think* it automatically adds a special keyword 'callback' to >>>> the *call* site so that you can do things like >>>> >>>> fut = some_wrapped_func(blah, callback=my_callback) >>>> >>>> and then instead of using yield to wait for the callback, put the >>>> continuation of your code in the my_callback() function. >>> >>> Yes. Note that if you're passing in a callback you're probably going >>> to just ignore the return value. The callback argument and the future >>> return value are essentially two alternative interfaces; it probably >>> doesn't make sense to use both at once (but as a library author it's >>> useful to provide both). >> >> Definitely sounds like something that could be simplified if you >> didn't have backward compatibility baggage... > > Probably, although I still feel like callback-passing has its place. > For example, I think the Tornado chat demo > (https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py) > would be less clear with coroutines and Futures than it is now > (although it would fit better into Greg's schedule/unschedule style). Hmm... That's an interesting challenge. I can't quite say I understand that whole program yet, but I'd like to give it a try. I think it can be made clearer than Tornado with Futures and coroutines -- it all depends on how you define your primitives. > That doesn't mean that every method has to take a callback, but I'd be > reluctant to get rid of them until we have more experience with the > generator/future-focused style. Totally understood. Though the nice thing of Futures is that you can tie callbacks to them *or* use them in coroutines. >>>>>>> In Tornado the Future is created by a decorator >>>>>>> and hidden from the asynchronous function (it just sees the callback), >>>>>> >>>>>> Hm, interesting. NDB goes the other way, the callbacks are mostly used >>>>>> to make Futures work, and most code (including large swaths of >>>>>> internal code) uses Futures. I think NDB is similar to monocle here. >>>>>> In NDB, you can do >>>>>> >>>>>> f = >>>>>> r = yield f >>>>>> >>>>>> where "yield f" is mostly equivalent to f.result(), except it gives >>>>>> better opportunity for concurrency. >>>>> >>>>> Yes, tornado's gen.engine does the same thing here. However, the >>>>> stakes are higher than "better opportunity for concurrency" - in an >>>>> event loop if you call future.result() without yielding, you'll >>>>> deadlock if that Future's task needs to run on the same event loop. >>>> >>>> That would depend on the semantics of the event loop implementation. >>>> In NDB's event loop, such a .result() call would just recursively >>>> enter the event loop, and you'd only deadlock if you actually have two >>>> pieces of code waiting for each other's completion. >>> >>> Hmm, I think I'd rather deadlock. :) If the event loop is reentrant >>> then the application code has be coded defensively as if it were >>> preemptively multithreaded, which introduces the possibility of >>> deadlock or (probably) more subtle/less frequent errors. Reentrancy >>> has been a significant problem in my experience, so I've been moving >>> towards a policy where methods in Tornado that take a callback never >>> run it immediately; callbacks are always scheduled on the next >>> iteration of the IOLoop with IOLoop.add_callback. >> >> The latter is a good tactic and I'm also using it. (Except for some >> reason we had to add the concept of "immediate callbacks" to our >> Future class, and those are run inside the set_result() call. But most >> callbacks don't use that feature.) >> >> I don't have a choice about making the event loop reentrant -- App >> Engine's underlying RPC multiplexing implementation *is* reentrant, >> and there is a large set of "classic" APIs that I cannot stop the user >> from calling that reenter it. But even if my hand wasn't forced, I'm >> not sure if I would make your choice. In NDB, there is a full >> complement of synchronous APIs that exactly matches the async APIs, >> and users are free to use the synchronous APIs in parts of their code >> where they don't need concurrency. Hence, every sychronous API just >> calls its async sibling and immediately waits for its result, which >> implicitly invokes the event loop. > > Tornado has a synchronous HTTPClient that does the same thing, > although each fetch creates and runs its own IOLoop rather than > spinning the top-level IOLoop. (This means it doesn't really make > sense to run it when there is a top-level IOLoop; it's provided as a > convenience for scripts and multi-threaded apps who want an > HTTPRequest interface consistent with the async version). I see. Yet another possible design choice. >> Of course, I have it easy -- multiple incoming requests are dispatched >> to separate threads by the App Engine runtime, so I don't have to >> worry about multiplexing at that level at all -- just end user code >> that is essentially single-threaded unless they go out of their way. >> >> I did end up debugging one user's problem where they were making a >> synchronous call inside an async handler, and -- very rarely! -- the >> recursive event loop calls kept stacking up until they hit a >> StackOverflowError. So I would agree that async code shouldn't make >> synchronous API calls; but I haven't heard yet from anyone who was >> otherwise hurt by the recursive event loop invocations -- in >> particular, nobody has requested locks. > > I think that's because you don't have file descriptor support. In a > (level-triggered) event loop if you don't drain the socket before > reentering the loop then your read handler will be called again, which > generally makes a mess. I suppose with coroutines you'd want > edge-triggered instead of level-triggered though, which might make > this problem go away. Ah, good terminology. Coroutines definitely like being edge-triggered. >>>>> For concreteness, here's a crude sketch of what the APIs I'm talking >>>>> about would look like in use (in a hypothetical future version of >>>>> tornado). >>>>> >>>>> @future_wrap >>>>> @gen.engine >>>>> def async_http_client(url, callback): >>>>> parsed_url = urlparse.urlsplit(url) >>>>> # works the same whether the future comes from a thread pool or @future_wrap >>>> >>>> And you need the thread pool because there's no async version of >>>> getaddrinfo(), right? >>> >>> Right. >>> >>>> >>>>> addrinfo = yield g_thread_pool.submit(socket.getaddrinfo, parsed_url.hostname, parsed_url.port) >>>>> stream = IOStream(socket.socket()) >>>>> yield stream.connect((addrinfo[0][-1])) >>>>> stream.write('GET %s HTTP/1.0' % parsed_url.path) >>>> >>>> Why no yield in front of the write() call? >>> >>> Because we don't need to wait for the write to complete before we >>> continue to the next statement. write() doesn't return anything; it >>> just succeeds or fails, and if it fails the next read_until will fail >>> too. (although in this case it wouldn't hurt to have the yield either) >> >> I guess you have a certain kind of buffering built in to your stream? >> So if you make two write() calls without waiting in quick succession, >> does the system collapse these into one, or does it end up making two >> system calls, or what? In NDB, there's a similar issue with multiple >> RPCs that can be batched. I ended up writing an abstraction that >> automatically combines these; the call isn't actually made until there >> are no other runnable tasks. I've had to explain this a few times to >> users who try to get away with overlapping CPU work and I/O, but >> otherwise it's worked quite well. > > Yes, IOStream does buffering for you. Each IOStream.write() call will > generally result in a syscall, but once the outgoing socket buffer is > full subsequent writes will be buffered in the IOStream and written > when the IOLoop says the socket is writable. (the callback argument > to write() can be used for flow control in this case) I used to defer > the syscall until the IOLoop was idle to batch things up, but it turns > out to be more efficient in practice to just write things out each > time and let the higher level do its own buffering when appropriate. Makes sense. I think different people might want to implement slightly different IOStream-like abstractions; this would be a good test of the infrastructure. You should be able to craft one from scratch out of sockets and Futures, but there should be one or two standard ones as well, and they should all happily mix and match using the same reactor. -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Fri Oct 12 05:40:56 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 12 Oct 2012 12:40:56 +0900 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <87wqyx6lki.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <87r4p473xz.fsf@uwakimon.sk.tsukuba.ac.jp> Oscar Benjamin writes: > But the fpectl module IIUC wouldn't work for 1 / 0. No, and it shouldn't. > Since Python has managed to unify integer/float division now it > would be a shame to introduce any new reasons to bring in > superfluous .0s again: With all due respect to the designers, unification of integer/float division is a compromise, even a mathematical kludge. I'm not complaining, it happens to work well for most applications, even for me (at least where I need a computer to do the calculations :-). Practicality beats purity. > with context(zero_division='infinity'): > x = 1 / 0.0 # float('inf') > y = 1 / 0 # I'd like to see float('inf') here as well I'd hate that. Zero simply isn't a unit in any ring of integers; if I want to handle divide-by-zero specially (rather than consider it a programming error in preceding code) a LBYL non-zero divisor test or a try handler for divide-by-zero is appropriate. And in the case of z = -1 / 0.0 should it be float('inf') (complex) or -float('inf') (real)? (Obviously it should be the latter, as most scientific programming is done using real algorithms. But one could argue that just as integer is corrupted to float in the interests of continuity in division results, float should be corrupted to complex in the interest of a larger domain for roots and trigonometric functions.) > I've spent 4 hours this week in computer labs with students using > Python 2.7 as an introduction to scientific programming. A significant > portion of that time was spent explaining the int/float division > problem. They all get the issue now but not all of them understand > that it is specifically about division: many are putting .0s > everywhere. A perfectly rational approach for them, which may appeal to their senses of beauty in mathematics -- I personally would always write 1.0/0.0, not 1/0.0, and more mathematically correct than what you try to teach them. I really don't understand why you have a problem with it. Your problem seems to be that Python shouldn't have integers, except as an internal optimization for a subset of floating point operations. Then "1" could always be an abbreviation for "1.0"! > I expect it to be easier when we use Python 3 and I can simply > explain that there are two types of division with two different > operators. Well, it's been more than 40 years since I studied this stuff in America, but what they taught 10-year-olds then was that there are two ways to view division: in integers with result and remainder, and as a fraction. And they used the same operator! Not to mention that the algorithm for reducing fractions depends on integer division. It's a shame students forget so quickly. :-) From jeanpierreda at gmail.com Fri Oct 12 06:29:05 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 12 Oct 2012 00:29:05 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: First of all, sorry for not snipping the reply I made previously. Noticed that only after I sent it :( On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum wrote: > On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre > wrote: >> Could you be more specific? I've never heard Deferreds in particular >> called "arcane". They're very popular in e.g. the JS world, > > Really? Twisted is used in the JS world? Or do you just mean the > pervasiveness of callback style async programming? Ah, I mean Deferreds. I attended a talk earlier this year all about deferreds in JS, and not a single reference to Python or Twisted was made! These are the examples I remember mentioned in the talk: - http://api.jquery.com/category/deferred-object/ (not very twistedish at all, ill-liked by the speaker) - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe not a good example, mochikit tries to be "python in JS") - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html - https://github.com/kriskowal/q (also includes an explanation of why the author likes deferreds) There were a few more that the speaker mentioned, but didn't cover. One of his points was that the various systems of deferreds are subtly different, some very badly so, and that it was a mess, but that deferreds were still awesome. JS is a language where async programming is mainstream, so lots of people try to make it easier, and they all do it slightly differently. > That's one of the > things I am desperately trying to keep out of Python, I find that > style unreadable and unmanageable (whenever I click on a button in a > website and nothing happens I know someone has a bug in their > callbacks). I understand you feel different; but I feel the general > sentiment is that callback-based async programming is even harder than > multi-threaded programming (and nobody is claiming that threads are > easy :-). :S There are (at least?) four different styles of asynchronous computation used in Twisted, and you seem to be confused as to which ones I'm talking about. 1. Explicit callbacks: For example, reactor.callLater(t, lambda: print("woo hoo")) 2. Method dispatch callbacks: Similar to the above, the reactor or somebody has a handle on your object, and calls methods that you've defined when events happen e.g. IProtocol's dataReceived method 3. Deferred callbacks: When you ask for something to be done, it's set up, and you get an object back, which you can add a pipeline of callbacks to that will be called whenever whatever happens e.g. twisted.internet.threads.deferToThread(print, "x").addCallback(print, "x was printed in some other thread!") 4. Generator coroutines These are a syntactic wrapper around deferreds. If you yield a deferred, you will be sent the result if the deferred succeeds, or an exception if the deferred fails. e.g. examples from previous message I don't see a reason for the first to exist at all, the second one is kind of nice in some circumstances (see below), but perhaps overused. I feel like you're railing on the first and second when I'm talking about the third and fourth. I could be wrong. >> and possibly elsewhere. Moreover, they're extremely similar to futures, so >> if one is arcane so is the other. > > I love Futures, they represent a nice simple programming model. But I > especially love that you can write async code using Futures and > yield-based coroutines (what you call inlineCallbacks) and never have > to write an explicit callback function. Ever. The reason explicit non-deferred callbacks are involved in Twisted is because of situations in which deferreds are not present, because of past history in Twisted. It is not at all a limitation of deferreds or something futures are better at, best as I'm aware. (In case that's what you're getting at.) Anyway, one big issue is that generator coroutines can't really effectively replace callbacks everywhere. Consider the GUI button example you gave. How do you write that as a coroutine? I can see it being written like this: def mycoroutine(gui): while True: clickevent = yield gui.mybutton1.on_click() # handle clickevent But that's probably worse than using callbacks. >> Neither is clearly better or more obvious than the other. If anything >> I generally find deferred composition more useful than deferred >> tee-ing, so I feel like composition is the correct base operator, but >> you could pick another. > > If you're writing long complicated chains of callbacks that benefit > from these features, IMO you are already doing it wrong. I understand > that this is a matter of style where I won't be able to convince you. > But style is important to me, so let's agree to disagree. This is more than a matter of style, so at least for now I'd like to hold off on calling it even. In my day to day silly, synchronous, python code, I do lots of synchronous requests. For example, it's not unreasonable for me to want to load two different files from disk, or make several database interactions, etc. If I want to make this asynchronous, I have to find a way to execute multiple things that could hypothetically block, at the same time. If I can't do that easily, then the asynchronous solution has failed, because its entire purpose is to do everything that I do synchronously, except without blocking the main thread. Here's an example with lots of synchronous requests in Django: def view_paste(request, filekey): try: fileinfo= Pastes.objects.get(key=filekey) except DoesNotExist: t = loader.get_template('pastebin/error.html') return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = open(fileinfo.filename) fcontents = f.read() t = loader.get_template('pastebin/paste.html') return HttpResponse(t.render(Context(dict(file=fcontents)))) How many blocking requests are there? Lots. This is, in a word, a long, complicated chain of synchronous requests. This is also very similar to what actual django code might look like in some circumstances. Even if we might think this is unreasonable, some subset of alteration of this is reasonable. Certainly we should be able to, say, load multiple (!) objects from the database, and open the template (possibly from disk), all potentially-blocking operations. This is inherently a long, complicated chain of requests, whether we implement it asynchronously or synchronously, or use Deferreds or Futures, or write it in Java or Python. Some parts can be done at any time before the end (loader.get_template(...)), some need to be done in a certain order, and there's branching depending on what happens in different cases. In order to even write this code _at all_, we need a way to chain these IO actions together. If we can't chain them together, we can't produce that final synthesis of results at the end. We _need_ a pipeline or something computationally equivalent or more powerful. Results from past "deferred computations" need to be passed forward into future "deferred computations", in order to implement this at all. This is not a style issue, this is an issue of needing to be able to solve problems that involve more than one computation where the results of every computation matters somewhere. It's just that in this case, some of the computations are computed asynchronously. > I am totally open to learning from Twisted's experience. I hope that > you are willing to share even the end result might not look like > Twisted at all -- after all in Python 3.3 we have "yield from" and > return from a generator and many years of experience with different > styles of async APIs. In addition to Twisted, there's Tornado and > Monocle, and then there's the whole greenlets/gevent and > Stackless/microthreads community that we can't completely ignore. I > believe somewhere is an ideal async architecture, and I hope you can > help us discover it. > > (For example, I am very interested in Twisted's experiences writing > real-world performant, robust reactors.) For that stuff, you'd have to speak to the main authors of Twisted. I'm just a twisted user. :( In the end it really doesn't matter what API you go with. The Twisted people will wrap it up so that they are compatible, as far as that is possible. I hope I haven't detracted too much from the main thrust of the surrounding discussion. Futures/deferreds are a pretty big tangent, so sorry. I justified it to myself by figuring that it'd probably come up anyway, somehow, since these are useful abstractions for asynchronous programming. -- Devin From jeanpierreda at gmail.com Fri Oct 12 06:40:04 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 12 Oct 2012 00:40:04 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Fri, Oct 12, 2012 at 12:29 AM, Devin Jeanpierre wrote: >> If you're writing long complicated chains of callbacks that benefit >> from these features, IMO you are already doing it wrong. I understand >> that this is a matter of style where I won't be able to convince you. >> But style is important to me, so let's agree to disagree. > > This is more than a matter of style, so at least for now I'd like to > hold off on calling it even. -- snip boredom -- > together, we can't produce that final synthesis of results at the end. Ugh, just realized way after the fact that of course you meant callbacks, not composition. I feel dumb. Nevermind that whole segment. -- Devin From trent at snakebite.org Fri Oct 12 06:45:06 2012 From: trent at snakebite.org (Trent Nelson) Date: Fri, 12 Oct 2012 00:45:06 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121011164043.216164d3@pitrou.net> References: <5072C972.5070207@python.org> <50736C0F.90401@python.org> <20121011005523.GA43928@snakebite.org> <20121011164043.216164d3@pitrou.net> Message-ID: <20121012044506.GC3112@snakebite.org> On Thu, Oct 11, 2012 at 07:40:43AM -0700, Antoine Pitrou wrote: > On Wed, 10 Oct 2012 20:55:23 -0400 > Trent Nelson wrote: > > > > You could leverage this with kqueue and epoll; have similar threads > > set up to simply process I/O independent of the GIL, using the same > > facilities that would be used by IOCP-processing threads. > > Would you really win anything by doing I/O in separate threads, while > doing normal request processing in the main thread? If the I/O threads can run independent of the GIL, yes, definitely. The whole premise of IOCP is that the kernel takes care of waking one of your I/O handlers when data is ready. IOCP allows that to happen completely independent of your application's event loop. It really is the best way to do I/O. The Windows NT design team got it right from the start. The AIX and Solaris implementations are semantically equivalent to Windows, without the benefit of automatic thread pool management (and a few other optimisations). On Linux and BSD, you could get similar functionality by spawning I/O threads that could also run independent of the GIL. They would differ from the IOCP worker threads in the sense that they all have their own little event loops around epoll/kqueue+timeout. i.e. they have to continually ask "is there anything to do with this set of fds", then process the results, then manage set synchronisation. IOCP threads, on the other hand, wait for completion of something that has already been requested. The thread body implementation is significantly simpler, and no synchronisation primitives are needed. > That said, the idea of a common API architected around async I/O, > rather than non-blocking I/O, sounds interesting at least theoretically. It's the best way to do it. There should really be a libevent-type library (libiocp?) that leverages IOCP where possible, and fakes it when not using a half-sync/half-async pattern with threads and epoll or kqueue on Linux and FreeBSD, falling back to processes and poll on everything else (NetBSD, OpenBSD and HP-UX (the former two not having robust-enough pthread implementations, the latter not having anything better than select or poll)). However, given that the best IOCP implementations are a) Windows by a huge margin, and then b) Solaris and AIX in equal, distant second place, I can't see that happening any time soon. (Trying to use IOCP in the reactor fashion described above for epoll and kqueue is far more limiting than having an IOCP-oriented API and faking it for platforms where native support isn't available.) > Maybe all those outdated Snakebite Operating Systems are useful for > something after all. ;-P All the operating systems are the latest version available! In addition, there's also a Solaris 9 and HP-UX 11iv2 box. The hardware, on the other hand... not so new in some cases. Trent. From solipsis at pitrou.net Fri Oct 12 09:14:54 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 09:14:54 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: <20121012091454.2e7a8365@pitrou.net> On Fri, 12 Oct 2012 00:29:05 -0400 Devin Jeanpierre wrote: > > These are the examples I remember mentioned in the talk: > > - http://api.jquery.com/category/deferred-object/ (not very twistedish > at all, ill-liked by the speaker) > - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe > not a good example, mochikit tries to be "python in JS") > - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html > - https://github.com/kriskowal/q (also includes an explanation of why > the author likes deferreds) Mochikit has been dead for years. As for the others, just because they are called "Deferred" doesn't mean they are the same thing. None of them seems to look like Twisted's Deferred abstraction. > The reason explicit non-deferred callbacks are involved in Twisted is > because of situations in which deferreds are not present, because of > past history in Twisted. It is not at all a limitation of deferreds or > something futures are better at, best as I'm aware. A Deferred can only be called once, but a dataReceived method can be called any number of times. So you can't use a Deferred for dataReceived unless you introduce significant hackery. > Anyway, one big issue is that generator coroutines can't really > effectively replace callbacks everywhere. Consider the GUI button > example you gave. How do you write that as a coroutine? > > I can see it being written like this: > > def mycoroutine(gui): > while True: > clickevent = yield gui.mybutton1.on_click() > # handle clickevent > > But that's probably worse than using callbacks. Agreed. And that's precisely because your GUI button handler is a dataReceived-alike :-) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Fri Oct 12 09:18:25 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 09:18:25 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120922163106.GA18772@hephaistos.amsuess.com> <20121012091454.2e7a8365@pitrou.net> Message-ID: <20121012091825.0b17d6f2@pitrou.net> On Fri, 12 Oct 2012 09:14:54 +0200 Antoine Pitrou wrote: > On Fri, 12 Oct 2012 00:29:05 -0400 > Devin Jeanpierre > wrote: > > > > These are the examples I remember mentioned in the talk: > > > > - http://api.jquery.com/category/deferred-object/ (not very twistedish > > at all, ill-liked by the speaker) > > - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe > > not a good example, mochikit tries to be "python in JS") > > - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html > > - https://github.com/kriskowal/q (also includes an explanation of why > > the author likes deferreds) > > Mochikit has been dead for years. > > As for the others, just because they are called "Deferred" doesn't mean > they are the same thing. None of them seems to look like Twisted's > Deferred abstraction. Correction: actually, some of them do :-) I should have looked a bit better. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From _ at lvh.cc Fri Oct 12 11:25:41 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Fri, 12 Oct 2012 11:25:41 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> Message-ID: On Thu, Oct 11, 2012 at 11:18 PM, Guido van Rossum wrote: > > If there's one take away idea from async-pep, it's reusable protocols. > > Is there a newer version that what's on > http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any > specific proposals, after spending a lot of time giving a rationale > and defining some terms. The version on > https://github.com/lvh/async-pep doesn't seem to be any more complete. > Correct. If I had to change it today, I'd throw out consumers and producers and just stick to a protocol API. Do you feel that there should be less talk about rationale? > > The PEP should probably be a number of PEPs. At first sight, it seems > that > > this number is at least four: > > > > 1. Protocol and transport abstractions, making no mention of > asynchronous IO > > (this is what I want 3153 to be, because it's small, manageable, and > > virtually everyone appears to agree it's a fantastic idea) > > But the devil is in the details. *What* specifically are you > proposing? How would you write a protocol handler/parser without any > reference to I/O? Most protocols are two-way streets -- you read some > stuff, and you write some stuff, then you read some more. (HTTP may be > the exception here, if you don't keep the connection open.) > It's not that there's *no* reference to IO: it's just that that reference is abstracted away in data_received and the protocol's transport object, just like Twisted's IProtocol. > > 2. A base reactor interface > > I agree that this should be a separate PEP. But I do think that in > practice there will be dependencies between the different PEPs you are > proposing. > Absolutely. > > 3. A way of structuring callbacks: probably deferreds with a built-in > > inlineCallbacks for people who want to write synchronous-looking code > with > > explicit yields for asynchronous procedures > > Your previous two ideas sound like you're not tied to backward > compatibility with Tornado and/or Twisted (not even via an adaptation > layer). Given that we're talking Python 3.4 here that's fine with me > (though I think we should be careful to offer a path forward for those > packages and their users, even if it means making changes to the > libraries). I'm assuming that by previous ideas you mean points 1, 2: protocol interface + reactor interface. I don't see why twisted's IProtocol couldn't grow an adapter for stdlib Protocols. Ditto for Tornado. Similarly, the reactor interface could be *provided* (through a fairly simple translation layer) by different implementations, including twisted. > But Twisted Deferred is pretty arcane, and I would much > rather not use it as the basis of a forward-looking design. I'd much > rather see what we can mooch off PEP 3148 (Futures). > I think this needs to be addressed in a separate mail, since more stuff has been said about deferreds in this thread. > > 4+ adapting the stdlib tools to using these new things > > We at least need to have an idea for how this could be done. We're > talking serious rewrites of many of our most fundamental existing > synchronous protocol libraries (e.g. httplib, email, possibly even > io.TextWrapper), most of which have had only scant updates even > through the Python 3 transition apart from complications to deal with > the bytes/str dichotomy. > I certainly agree that this is a very large amount of work. However, it has obvious huge advantages in terms of code reuse. I'm not sure if I understand the technical barrier though. It should be quite easy to create a blocking API with a protocol implementation that doesn't care; just call data_received with all your data at once, and presto! (Since transports in general don't provide guarantees as to how bytes will arrive, existing Twisted IProtocols have to do this already anyway, and that seems to work fine.) > > Re: forward path for existing asyncore code. I don't remember this being > > raised as an issue. If anything, it was mentioned in passing, and I think > > the answer to it was something to the tune of "asyncore's API is broken, > > fixing it is more important than backwards compat". Essentially I agree > with > > Guido that the important part is an upgrade path to a good third-party > > library, which is the part about asyncore that REALLY sucks right now. > > I have the feeling that the main reason asyncore sucks is that it > requires you to subclass its Dispatcher class, which has a rather > treacherous interface. > There's at least a few others, but sure, that's an obvious one. Many of the objections I can raise however don't matter if there's already an *existing working solution*. I mean, sure, it can't do SSL, but if you have code that does what you want right now, then obviously SSL isn't actually needed. > > Regardless, an API upgrade is probably a good idea. I'm not sure if it > > should go in the first PEP: given the separation I've outlined above > (which > > may be too spread out...), there's no obvious place to put it besides it > > being a new PEP. > > Aren't all your proposals API upgrades? > Sorry, that was incredibly poor wording. I meant something more of an adapter: an upgrade path for existing asyncore code to new and shiny 3153 code. > > Re base reactor interface: drawing maximally from the lessons learned in > > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call > later, > > etc), asynchronous-looking name lookup, fd handling are the important > parts. > > That actually sounds more concrete than I'd like a reactor interface > to be. In the App Engine world, there is a definite need for a > reactor, but it cannot talk about file descriptors at all -- all I/O > is defined in terms of RPC operations which have their own (several > layers of) async management but still need to be plugged in to user > code that might want to benefit from other reactor functionality such > as scheduling and placing a call at a certain moment in the future. > I have a hard time understanding how that would work well outside of something like GAE. IIUC, that level of abstraction was chosen because it made sense for GAE (and I don't disagree), but I'm not sure it makes sense here. In this example, where would eg the select/epoll/whatever calls happen? Is it something that calls the reactor that then in turn calls whatever? > call_every can be implemented in terms of call_later on a separate > object, > > so I think it should be (eg twisted.internet.task.LoopingCall). One thing > > that is apparently forgotten about is event loop integration. The prime > way > > of having two event loops cooperate is *NOT* "run both in parallel", it's > > "have one call the other". Even though not all loops support this, I > think > > it's important to get this as part of the interface (raise an exception > for > > all I care if it doesn't work). > > This is definitely one of the things we ought to get right. My own > thoughts are slightly (perhaps only cosmetically) different again: > ideally each event loop would have a primitive operation to tell it to > run for a little while, and then some other code could tie several > event loops together. > As an API, that's pretty close to Twisted's IReactorCore.iterate, I think. It'd work well enough. The issue is only with event loops that don't cooperate so well. Possibly the primitive operation would be something like "block until > either you've got one event ready, or until a certain time (possibly > 0) has passed without any events, and then give us the events that are > ready and a lower bound for when you might have more work to do" -- or > maybe instead of returning the event(s) it could just call the > associated callback (it might have to if it is part of a GUI library > that has callbacks written in C/C++ for certain events like screen > refreshes). > > Anyway, it would be good to have input from representatives from Wx, > Qt, Twisted and Tornado to ensure that the *functionality* required is > all there (never mind the exact signatures of the APIs needed to > provide all that functionality). > > > -- > --Guido van Rossum (python.org/~guido) > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Fri Oct 12 11:29:06 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Fri, 12 Oct 2012 11:29:06 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: Message-ID: I'm not quite sure why Deferreds + @inlineCallbacks is more complicated than Futures + coroutines. They seem, at least from a high level perspective, quite similar. You mention that you can both attach callbacks and use them in coroutines: deferreds do pretty much exactly the same thing (that is, at least there's something to translate your coroutine into a sequence of callbacks/errbacks). If the arcane part of deferreds is from people writing ridiculous errback/callback chains, then I understand. Unfortunately people will write terrible code. cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Fri Oct 12 12:39:37 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 12 Oct 2012 12:39:37 +0200 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50776C76.3040309@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <50776C76.3040309@pearwood.info> Message-ID: <5077F369.9020909@molden.no> On 12.10.2012 03:03, Steven D'Aprano wrote: > Any half-decent processor supports the IEEE-754 standard. If it doesn't, > it's broken by design. > > Even in user-space, you're not giving up that much speed in practical > terms, at least not for my needs. The new decimal module in Python 3.3 is > less than a factor of 10 times slower than Python's floats, which makes it > pretty much instantaneous to my mind :) I will not have any effect on the flops rate. The other stuff the interpreter must do when using floats (allocating and deleting float objects on the heap, initializing new objects, etc.) will dominate the run-time performance. Even a simple check for divide-by-zero (as we have today) will be more expensive than using another numerical context inside the hardware. Sturla From jeanpierreda at gmail.com Fri Oct 12 14:44:39 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 12 Oct 2012 08:44:39 -0400 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: <20121012091454.2e7a8365@pitrou.net> References: <20120922163106.GA18772@hephaistos.amsuess.com> <20121012091454.2e7a8365@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 3:14 AM, Antoine Pitrou wrote: > Mochikit has been dead for years. >From the front page: "MochiKit is "feature complete" at 1.4 and not currently in active development. It has done what we've needed it to do for a number of years so we haven't bothered to make any major changes to it." Last update to the github repository was a few months ago. That said, looking at their APIs now, I'm pretty sure mochikit was not in that presentation. Its API isn't jQuery-like. > As for the others, just because they are called "Deferred" doesn't mean > they are the same thing. None of them seems to look like Twisted's > Deferred abstraction. They have separate callbacks for error and success, which are passed values. That is the same. The callback chains are formed from sequences of deferreds. That's different. If a callback returns a deferred, then the rest of the chain is only called once that deferred resolves -- that's the same, and super important. There's some API differences, like .addCallbacks() --> .then(); and .callback() --> .resolve(). And IIRC jQuery had other differences, but maybe it's just that you use .pipe() to chain deferreds because .then() returns a Promise instead of a Deferred? I don't remember what was weird about jQuery, it's been a while since that talk. :( >> The reason explicit non-deferred callbacks are involved in Twisted is >> because of situations in which deferreds are not present, because of >> past history in Twisted. It is not at all a limitation of deferreds or >> something futures are better at, best as I'm aware. > > A Deferred can only be called once, but a dataReceived method can be > called any number of times. So you can't use a Deferred for > dataReceived unless you introduce significant hackery. Haha, oops! I was being dumb and only thinking of minor cases when callbacks are used, rather than major cases. Some people complain that Twisted's protocols (and dataReceived) should be like that GUI button example, though. Not major hackery, just somewhat nasty and bug-prone. -- Devin From syrion at gmail.com Fri Oct 12 14:45:49 2012 From: syrion at gmail.com (Blake Hyde) Date: Fri, 12 Oct 2012 08:45:49 -0400 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: I'm a Python developer rather than a developer of Python, but I'd like to ask a question about this option (and implicitly vote against it, I suppose); if you specialize a method name, such as .pathjoin, aren't you implying that methods must be unambiguous even across types and classes? This seems negative. Even if .join is already used for strings, it also makes sense for this use case. Of course, the proposed syntactic sugar options (operator overloading) seems more pathological than either of the method-based options, so I suppose you could consider my votes as -1 to everything else, +.5 to .pathjoin, and +1 to .join. On Mon, Oct 8, 2012 at 2:54 PM, Guido van Rossum wrote: > > I don't like any of those; I'd vote for another regular method, maybe > p.pathjoin(q). > > On Mon, Oct 8, 2012 at 11:47 AM, Antoine Pitrou wrote: > > > > Hello, > > > > Since there has been some controversy about the joining syntax used in > > PEP 428 (filesystem path objects), I would like to run an informal poll > > about it. Please answer with +1/+0/-0/-1 for each proposal: > > > > - `p[q]` joins path q to path p > > - `p + q` joins path q to path p > > - `p / q` joins path q to path p > > - `p.join(q)` joins path q to path p > > > > (you can include a rationale if you want, but don't forget to vote :-)) > > > > Thank you > > > > Antoine. > > > > > > -- > > Software development and contracting: http://pro.pitrou.net > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From g.brandl at gmx.net Fri Oct 12 18:12:26 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 12 Oct 2012 18:12:26 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: Am 12.10.2012 14:45, schrieb Blake Hyde: > I'm a Python developer rather than a developer of Python, but I'd like to ask a > question about this option (and implicitly vote against it, I suppose); if you > specialize a method name, such as .pathjoin, aren't you implying that methods > must be unambiguous even across types and classes? This seems negative. Even > if .join is already used for strings, it also makes sense for this use case. Of course different classes can have methods of the same name. The issue here is that due to the similarity (and interchangeability) of path objects and strings it is likely that people get them mixed up every now and then, and if .join() works on both objects the failure mode (strange result from str.join when you expected path.join) is horribly confusing. It's the same argument against the "+" operator. (Apart from the other downside that it will act differently depending on *two* objects, i.e. both operands.) In contrast, the "/" operator is not defined on strings, but on numbers, and the both the confusion likelihood and failure mode of mixing numbers and strings are much less severe. It's really kind of the same reason why integer floor division was awkward with "/", and has been changed to "//" in Python 3. Georg From ethan at stoneleaf.us Fri Oct 12 18:12:02 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 09:12:02 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50735279.8080506@canterbury.ac.nz> References: <20121005202534.5f721292@pitrou.net> <20121007193735.7bb924ac@pitrou.net> <7E8AC881-ADB6-4026-B024-07DE197F8530@mac.com> <20121008110748.GA17653@iskra.aviel.ru> <9D6F4C1B-9145-4775-8657-F99612791067@mac.com> <50735279.8080506@canterbury.ac.nz> Message-ID: <50784152.9070003@stoneleaf.us> Greg Ewing wrote: > Ronald Oussoren wrote: >> neither statvs, statvfs, nor pathconf seem to be able to tell if a >> filesystem is case insensitive. > > Even if they could, you wouldn't be entirely out of the woods, > because different parts of the same path can be on different > file systems... > > But how important is all this anyway? I'm trying to think of > occasions when I've wanted to compare two entire paths for > equality, and I can't think of *any*. Well, while I haven't had to compare the /entire/ path, I have had to compare (and sort) the filename portion. And since the SMB share uses lower-case, and our legacy FoxPro code writes upper-case, and files get copied from SMB to the local Windows drive, having the case-insensitive compare option in Path makes my life much easier. ~Ethan~ From ethan at stoneleaf.us Fri Oct 12 18:27:48 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 09:27:48 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> Message-ID: <50784504.2080801@stoneleaf.us> Georg Brandl wrote: > Am 12.10.2012 14:45, schrieb Blake Hyde: >> I'm a Python developer rather than a developer of Python, but I'd like to ask a >> question about this option (and implicitly vote against it, I suppose); if you >> specialize a method name, such as .pathjoin, aren't you implying that methods >> must be unambiguous even across types and classes? This seems negative. Even >> if .join is already used for strings, it also makes sense for this use case. > > Of course different classes can have methods of the same name. > > The issue here is that due to the similarity (and interchangeability) of path > objects and strings it is likely that people get them mixed up every now and > then, and if .join() works on both objects the failure mode (strange result > from str.join when you expected path.join) is horribly confusing. I don't understand the "horribly confusing" part. Sure, when I got them mixed up and ended up with a plain ol' string instead of a really cool Path it took a moment to figure out where I had made the error, but the traceback of "AttributeError: 'str' object has no attribute 'path'" left absolutely no room for confusion as to what the problem was. ~Ethan~ From guido at python.org Fri Oct 12 18:41:10 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 09:41:10 -0700 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120922163106.GA18772@hephaistos.amsuess.com> <20121012091454.2e7a8365@pitrou.net> Message-ID: I am going to start some new threads on this topic, to avoid going over 100 messages. Topics will be roughly: - reactors - protocol implementations - Twisted (esp. Deferred) - Tornado - yield from vs. Futures It may be a while (hours, not days). -- --Guido van Rossum (python.org/~guido) From dickinsm at gmail.com Fri Oct 12 19:26:18 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 12 Oct 2012 18:26:18 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50761EE7.8060103@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: On Thu, Oct 11, 2012 at 2:20 AM, Steven D'Aprano wrote: > E.g. log(x) should return -infinity if x underflows from a positive value, > and a NaN if x underflows from a negative. IEEE 754 disagrees. :-) Both log(-0.0) and log(0.0) are required to return -infinity (and/or signal the divideByZero exception). And as for sqrt(-0.0) returning -0.0... Grr. I've never understood the motivation for that one, especially as it disagrees with the usual recommendations for complex square root (where the real part of the result *always* has its sign bit cleared). Mark From guido at python.org Fri Oct 12 20:13:23 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 11:13:23 -0700 Subject: [Python-ideas] The async API of the future: Reactors Message-ID: [This is the first spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell wrote: > On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum wrote: >>> Re base reactor interface: drawing maximally from the lessons learned in >>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >>> etc), asynchronous-looking name lookup, fd handling are the important parts. >> >> That actually sounds more concrete than I'd like a reactor interface >> to be. In the App Engine world, there is a definite need for a >> reactor, but it cannot talk about file descriptors at all -- all I/O >> is defined in terms of RPC operations which have their own (several >> layers of) async management but still need to be plugged in to user >> code that might want to benefit from other reactor functionality such >> as scheduling and placing a call at a certain moment in the future. > > So are you thinking of something like > reactor.add_event_listener(event_type, event_params, func)? One thing > to keep in mind is that file descriptors are somewhat special (at > least in a level-triggered event loop), because of the way the event > will keep firing until the socket buffer is drained or the event is > unregistered. I'd be inclined to keep file descriptors in the > interface even if they just raise an error on app engine, since > they're fairly fundamental to the (unixy) event loop. On the other > hand, I don't have any experience with event loops outside the > unix/network world so I don't know what other systems might need for > their event loops. Hmm... This is definitely an interesting issue. I'm tempted to believe that it is *possible* to change every level-triggered setup into an edge-triggered setup by using an explicit loop -- but I'm not saying it is a good idea. In practice I think we need to support both equally well, so that the *app* can decide which paradigm to use. E.g. if I were to implement an HTTP server, I might use level-triggered for the "accept" call on the listening socket, but edge-triggered for everything else. OTOH someone else might prefer a buffered stream abstraction that just keeps filling its read buffer (and draining its write buffer) using level-triggered callbacks, at least up to a certain buffer size -- we have to be robust here and make it impossible for an evil client to fill up all our memory without our approval! I'm not at all familiar with the Twisted reactor interface. My own design would be along the following lines: - There's an abstract Reactor class and an abstract Async I/O object class. To get a reactor to call you back, you must give it an I/O object, a callback, and maybe some more stuff. (I have gone back and like passing optional args for the callback, rather than requiring lambdas to create closures.) Note that the callback is *not* a designated method on the I/O object! In order to distinguish between edge-triggered and level-triggered, you just use a different reactor method. There could also be a reactor method to schedule a "bare" callback, either after some delay, or immediately (maybe with a given priority), although such functionality could also be implemented through magic I/O objects. - In systems supporting file descriptors, there's a reactor implementation that knows how to use select/poll/etc., and there are concrete I/O object classes that wrap file descriptors. On Windows, those would only be socket file descriptors. On Unix, any file descriptor would do. To create such an I/O object you would use a platform-specific factory. There would be specialized factories to create e.g. listening sockets, connections, files, pipes, and so on. - In systems like App Engine that don't support async I/O on file descriptors at all, the constructors for creating I/O objects for disk files and connection sockets would comply with the interface but fake out almost everything (just like today, using httplib or httplib2 on App Engine works by adapting them to a "urlfetch" RPC request). >>> call_every can be implemented in terms of call_later on a separate object, >>> so I think it should be (eg twisted.internet.task.LoopingCall). One thing >>> that is apparently forgotten about is event loop integration. The prime way >>> of having two event loops cooperate is *NOT* "run both in parallel", it's >>> "have one call the other". Even though not all loops support this, I think >>> it's important to get this as part of the interface (raise an exception for >>> all I care if it doesn't work). >> >> This is definitely one of the things we ought to get right. My own >> thoughts are slightly (perhaps only cosmetically) different again: >> ideally each event loop would have a primitive operation to tell it to >> run for a little while, and then some other code could tie several >> event loops together. >> >> Possibly the primitive operation would be something like "block until >> either you've got one event ready, or until a certain time (possibly >> 0) has passed without any events, and then give us the events that are >> ready and a lower bound for when you might have more work to do" -- or >> maybe instead of returning the event(s) it could just call the >> associated callback (it might have to if it is part of a GUI library >> that has callbacks written in C/C++ for certain events like screen >> refreshes). > > That doesn't work very well - while one loop is waiting for its > timeout, nothing can happen on the other event loop. You have to > switch back and forth frequently to keep things responsive, which is > inefficient. I'd rather give each event loop its own thread; you can > minimize the thread-synchronization concerns by picking one loop as > "primary" and having all the others just pass callbacks over to it > when their events fire. That's a good point. I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it. Note that many GUI events would be level-triggered, but sometimes using the edge-triggered paradigm can work well too: e.g. I imagine that writing code to draw a curve following the mouse as long as a button is pressed might be conveniently written as a loop of the form def on_mouse_press(x, y, buttons): while True: x, y, buttons = yield if not buttons: break which itself is registered as a level-triggered handler for mouse presses. (Dealing with multiple buttons is left as an exercise. :-) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Fri Oct 12 20:26:44 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Oct 2012 04:26:44 +1000 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50784504.2080801@stoneleaf.us> References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: On Sat, Oct 13, 2012 at 2:27 AM, Ethan Furman wrote: > Georg Brandl wrote: >> >> Am 12.10.2012 14:45, schrieb Blake Hyde: >>> >>> I'm a Python developer rather than a developer of Python, but I'd like to >>> ask a >>> question about this option (and implicitly vote against it, I suppose); >>> if you >>> specialize a method name, such as .pathjoin, aren't you implying that >>> methods >>> must be unambiguous even across types and classes? This seems negative. >>> Even >>> if .join is already used for strings, it also makes sense for this use >>> case. >> >> >> Of course different classes can have methods of the same name. >> >> The issue here is that due to the similarity (and interchangeability) of >> path >> objects and strings it is likely that people get them mixed up every now >> and >> then, and if .join() works on both objects the failure mode (strange >> result >> from str.join when you expected path.join) is horribly confusing. > > > I don't understand the "horribly confusing" part. Sure, when I got them > mixed up and ended up with a plain ol' string instead of a really cool Path > it took a moment to figure out where I had made the error, but the traceback > of "AttributeError: 'str' object has no attribute 'path'" left absolutely no > room for confusion as to what the problem was. Now, instead of retrieving an attribute, call str() and send the path name over a pipe or socket, or save it to a file. Instead of an immediate error, you'll get a bad path somewhere *else*, and have to track down where the data corruption came from (which not even be in the current process, or in a process that was even running on the current machine). Making "+" and "Path.join" mean something different from what they mean when called on strings is, in the specific case of a path representation, far too likely to lead to data corruption bugs for us to be happy with allowing it in the standard library. This is one I think Jason Orendorff's original path.py got right, which is why my current preference is "just copy path.py and use / and Path.joinpath". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Fri Oct 12 20:33:11 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 20:33:11 +0200 Subject: [Python-ideas] The async API of the future: Reactors References: Message-ID: <20121012203311.4b3ee8af@pitrou.net> Hello Guido, On Fri, 12 Oct 2012 11:13:23 -0700 Guido van Rossum wrote: > OTOH someone else might prefer a buffered stream > abstraction that just keeps filling its read buffer (and draining its > write buffer) using level-triggered callbacks, at least up to a > certain buffer size -- we have to be robust here and make it > impossible for an evil client to fill up all our memory without our > approval! I'd like to know what a sane buffered API for non-blocking I/O may look like, because right now it doesn't seem to make a lot of sense. At least this bug is tricky to resolve: http://bugs.python.org/issue13322 > - There's an abstract Reactor class and an abstract Async I/O object > class. To get a reactor to call you back, you must give it an I/O > object, a callback, and maybe some more stuff. (I have gone back and > like passing optional args for the callback, rather than requiring > lambdas to create closures.) Note that the callback is *not* a > designated method on the I/O object! Why isn't it? In practice, you need several callbacks: in Twisted parlance, you have dataReceived but also e.g. ConnectionLost (depending on the transport, you may even imagine other callbacks, for example for things happening on the TLS layer?). > - In systems supporting file descriptors, there's a reactor > implementation that knows how to use select/poll/etc., and there are > concrete I/O object classes that wrap file descriptors. On Windows, > those would only be socket file descriptors. On Unix, any file > descriptor would do. Windows *is* able to do async I/O on things other than sockets (see the discussion about IOCP). It's just that the Windows implementation of select() (the POSIX function call) is limited to sockets. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Fri Oct 12 20:34:23 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 20:34:23 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: <20121012203423.514afef7@pitrou.net> On Sat, 13 Oct 2012 04:26:44 +1000 Nick Coghlan wrote: > > This is one I > think Jason Orendorff's original path.py got right, which is why my > current preference is "just copy path.py and use / and Path.joinpath". This is my current preference too. I don't like joinpath(), but as long as there is an operator to do the same thing I don't care :-) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ethan at stoneleaf.us Fri Oct 12 20:37:13 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 11:37:13 -0700 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: <50786359.7090502@stoneleaf.us> Nick Coghlan wrote: > Making "+" and "Path.join" mean something different from what they > mean when called on strings is, in the specific case of a path > representation, far too likely to lead to data corruption bugs for us > to be happy with allowing it in the standard library. This is one I > think Jason Orendorff's original path.py got right, which is why my > current preference is "just copy path.py and use / and Path.joinpath". Okay, that makes sense. I think we should settle on one of the possibilities that does /not/ duplicate the word 'path', however. That's one of those things that drives me nuts. ;) (It's a Path object -- of /course/ it's joining path stuff!) ~Ethan~ From joshua.landau.ws at gmail.com Fri Oct 12 20:42:38 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 12 Oct 2012 19:42:38 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: <50761EE7.8060103@pearwood.info> References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: On 11 October 2012 02:20, Steven D'Aprano wrote: > On 11/10/12 09:05, Joshua Landau wrote: > > After re-re-reading this thread, it turns out one *(1)* post and two >> *(2)* answers >> >> to that post have covered a topic very similar to the one I have raised. >> All of the others, to my understanding, do not dwell over the fact >> that *float("nan") is not float("nan")* . >> > > That's no different from any other float. > > py> float('nan') is float('nan') > False > py> float('1.5') is float('1.5') > False > > Floats are not interned or cached, although of course interning is > implementation dependent and this is subject to change without notice. > > For that matter, it's true of *nearly all builtins* in Python. The > exceptions being bool(obj) which returns one of two fixed instances, > and int() and str(), where *some* but not all instances are cached. >>> float(1.5) is float(1.5) True >>> float("1.5") is float("1.5") False Confusing re-use of identity strikes again. Can anyone care to explain what causes this? I understand float(1.5) is likely to return the inputted float, but that's as far as I can reason. What I was saying, though, is that all other posts assumed equality between two different NaNs should be the same as identity between a NaN and itself. This is what I'm really asking about, I guess. > Response 1: >> This implies that you want to differentiate between -0.0 and +0.0. That is >> bad. >> >> My response: >> Why would I want to do that? >> > > If you are doing numeric work, you *should* differentiate between -0.0 > and 0.0. That's why the IEEE 754 standard mandates a -0.0. > > Both -0.0 and 0.0 compare equal, but they can be distinguished (although > doing so is tricky in Python). The reason for distinguishing them is to > distinguish between underflow to zero from positive or negative values. > E.g. log(x) should return -infinity if x underflows from a positive value, > and a NaN if x underflows from a negative. Interesting. Can you give me a more explicit example? When would you not *want* f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on creation] -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Oct 12 21:18:34 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 12:18:34 -0700 Subject: [Python-ideas] The async API of the future: yield-from Message-ID: [This is the second spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing wrote: > Guido van Rossum wrote: >> It does bother me somehow that you're not using .send() and yield >> arguments at all. I notice that you have a lot of three-line code >> blocks like this: >> >> block_for_reading(sock) >> yield >> data = sock.recv(1024) > I wouldn't say I have a "lot". In the spamserver, there are really > only three -- one for accepting a connection, one for reading from > a socket, and one for writing to a socket. These are primitive > operations that would be provided by an async socket library. Hm. In such a small sample program, three near-identical blocks is a lot! > Generally, all the yields would be hidden inside primitives like > this. Normally, user code would never need to use 'yield', only > 'yield from'. > > This probably didn't come through as clearly as it might have in my > tutorial. Part of the reason is that at the time I wrote it, I was > having to manually expand yield-froms into for-loops, so I was > reluctant to use any more of them than I needed to. Also, yield-from > was a new and unfamiliar concept, and I didn't want to scare people > by overusing it. These considerations led me to push some of the > yields slightly further up the layer stack than they could be. But the fact remains that you can't completely hide these yields -- the best you can do is replace them with a single yield-from. >> The general form seems to be: >> >> arrange for a callback when some operation can be done without blocking >> yield >> do the operation >> >> This seems to be begging to be collapsed into a single line, e.g. >> >> data = yield sock.recv_async(1024) > I'm not sure how you're imagining that would work, but whatever > it is, it's wrong -- that just doesn't make sense. That's a strong statement! It makes a lot of sense in a world using Futures and a Future-aware trampoline/scheduler, instead of yield-from and bare generators. I can see however that you don't like it in the yield-from world you're envisioning, and how it would be confusing there. I'll get back to this in a bit. > What *would* make sense is > > data = yield from sock.recv_async(1024) > > with sock.recv_async() being a primitive that encapsulates the > block/yield/process triplet. Right, that's how you would spell it. >> (I would also prefer to see the socket wrapped in an object that makes >> it hard to accidentally block.) > It would be straightforward to make the primitives be methods of a > socket wrapper object. I only used functions in the tutorial in the > interests of keeping the amount of machinery to a bare minimum. Understood. >> But surely there's still a place for send() and other PEP 342 features? > In the wider world of generator usage, yes. If you have a > generator that it makes sense to send() things into, for > example, and you want to factor part of it out into another > function, the fact that yield-from passes through sent values > is useful. But the only use for send() on a generator is when using it as a coroutine for a concurrent tasks system -- send() really makes no sense for generators used as iterators. And you're claiming, it seems, that you prefer yield-from for concurrent tasks. > But we're talking about a very specialised use of generators > here, and so far I haven't thought of a use for sent or yielded > values in this context that can't be done in a more straightforward > way by other means. > > Keep in mind that a value yielded by a generator being used as > part of a coroutine is *not* seen by code calling it with > yield-from. Rather, it comes out in the inner loop of the > scheduler, from the next() call being used to resume the > coroutine. Likewise, any send() call would have to be made > by the scheduler, not the yield-from caller. I'm very much aware of that. There is a *huge* difference between yield-from and yield. However, now that I've implemented a substantial library (NDB, which has thousands of users in the App Engine world, if not hundreds of thousands), I feel that "value = yield " is quite a good paradigm, and the only part of PEP 380 I'm really looking forward to embracing (once App Engine supports Python 3.3) is the option to return a value from a generator -- which my users currently have to spell as "raise ndb.Return()". > So, the send/yield channel is exclusively for communication > with the *scheduler* and nothing else. Under the old way of > doing generator-based coroutines, this channel was used to > simulate a call stack by yielding 'call' and 'return' > instructions that the scheduler interpreted. But all that > is now taken care of by the yield-from mechanism, and there > is nothing left for the send/yield channel to do. I understand that's the state of the world that you're looking forward to. However I'm slightly worried that in practice there are some issues to be resolved. One is what to do with operations directly implemented in C. It would be horrible to require C to create a fake generator. It would be mildly nasty to have to wrap these all in Python code just so you can use them with yield-from. Fortunately an iterator whose final __next__() raises StopIteration() works in the latest Python 3.3 (it didn't work in some of the betas IIRC). >> my users sometimes want to >> treat something as a coroutine but they don't have any yields in it >> >> def caller(): >> data = yield from reader() >> >> def reader(): >> return 'dummy' >> yield >> >> works, but if you drop the yield it doesn't work. With a decorator I >> know how to make it work either way. > If you're talking about a decorator that turns a function > into a generator, I can't see anything particularly headachish > about that. If you mean something else, you'll have to elaborate. Well, I'm talking about a decorator that you *always* apply, and which does nothing (or very little) when wrapping a generator, but adds generator behavior when wrapping a non-generator function. Anyway, I am trying to come up with a table comparing Futures and your yield-from-using generators. I'm basing this on a subset of the PEP 3148 API, and I'm not presuming threads -- I'm just looking at the functionality around getting and setting callbacks, results, and exceptions. My reference is actually based on NDB, but the API there differs from PEP 3148 in uninteresting ways, so I'll use the PEP 3148 method names. (1) Calling an async operation and waiting for its result, using yield Futures: result = yield some_async_op(args) Yield-from: result = yield from some_async_op(args) (2) Setting the result of an async operation Futures: f.set_result(value) # From any callback Yield-from: return value # From the outermost generator (3) Handling an exception Futures: try: result = yield some_async_op(args) except MyException: Yield-from: try: result = yield from some_async_op(args) except MyException: Note: with yield-from, the tracebacks for unhandled exceptions are possibly prettier. (4) Raising an exception as the outcome of an async operation Futures: f.set_exception() Yield-from: raise # From any of the generators Note: with Futures, the traceback also needs to be stored; in Python 3 it is stored on the Exception instance's __traceback__ attribute. But when letting exceptions bubble through multiple levels of nested calls, you must do something special to ensure the traceback looks right to the end user. (5) Having one async operation invoke another async operation Futures: @task def outer(args): res = yield inner(args) return res Yield-from: def outer(args): res = yield from inner(args) return res Note: I'm including this because in the Futures case, each level of yield requires the creation of a separate Future. In practice this requires decorating all async functions. And also as a lead-in to the next item. (6) Spawning off multiple async subtasks Futures: f1 = subtask1(args1) # Note: no yield!!! f2 = subtask2(args2) res1, res2 = yield f1, f2 Yield-from: ?????????? *** Greg, can you come up with a good idiom to spell concurrency at this level? Your example only has concurrency in the philosophers example, but it appears to interact directly with the scheduler, and the philosophers don't return values. *** (7) Checking whether an operation is already complete Futures: if f.done(): ... Yield-from: ????????????? (8) Getting the result of an operation multiple times Futures: f = async_op(args) # squirrel away a reference to f somewhere else r = yield f # ... later, elsewhere r = f.result() Yield-from: ??????????????? (9) Canceling an operation Futures: f.cancel() Yield-from: ??????????????? Note: I haven't needed canceling yet, and I believe Devin said that Twisted just got rid of it. However some of the JS Deferred implementations seem to support it. (10) Registering additional callbacks Futures: f.add_done_callback(callback) Yield-from: ??????? Note: this is used in NDB to trigger "hooks" that should run e.g. when a database write completes. The user's code just writes yield ent.put_async(); the trigger is automatically called by the Future's machinery. This also uses (8). -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Fri Oct 12 21:19:48 2012 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Oct 2012 20:19:48 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: <50786D54.60300@mrabarnett.plus.com> On 2012-10-12 19:42, Joshua Landau wrote: > On 11 October 2012 02:20, Steven D'Aprano > wrote: > > On 11/10/12 09:05, Joshua Landau wrote: > > After re-re-reading this thread, it turns out one *(1)* post and two > *(2)* answers > > to that post have covered a topic very similar to the one I have > raised. > All of the others, to my understanding, do not dwell over the fact > that *float("nan") is not float("nan")* . > > > That's no different from any other float. > > py> float('nan') is float('nan') > False > py> float('1.5') is float('1.5') > False > > Floats are not interned or cached, although of course interning is > implementation dependent and this is subject to change without notice. > > For that matter, it's true of *nearly all builtins* in Python. The > exceptions being bool(obj) which returns one of two fixed instances, > and int() and str(), where *some* but not all instances are cached. > > >>> float(1.5) is float(1.5) > True It re-uses an immutable literal: >>> 1.5 is 1.5 True >>> "1.5" is "1.5" True and 'float' returns its argument if it's already a float: >>> float(1.5) is 1.5 True Therefore: >>> float(1.5) is float(1.5) True But apart from that, when a new object is created, it doesn't check whether it's identical to another, except in certain cases such as ints in a limited range: >>> float("1.5") is float("1.5") False >>> float("1.5") is 1.5 False >>> int("1") is 1 True And it's an implementation-specific behaviour. > >>> float("1.5") is float("1.5") > False > > Confusing re-use of identity strikes again. Can anyone care to explain > what causes this? I understand float(1.5) is likely to return the > inputted float, but that's as far as I can reason. > > What I was saying, though, is that all other posts assumed equality > between two different NaNs should be the same as identity between a NaN > and itself. This is what I'm really asking about, I guess. > > Response 1: > This implies that you want to differentiate between -0.0 and > +0.0. That is > bad. > > My response: > Why would I want to do that? > > > If you are doing numeric work, you *should* differentiate between -0.0 > and 0.0. That's why the IEEE 754 standard mandates a -0.0. > > Both -0.0 and 0.0 compare equal, but they can be distinguished (although > doing so is tricky in Python). The reason for distinguishing them is to > distinguish between underflow to zero from positive or negative values. > E.g. log(x) should return -infinity if x underflows from a positive > value, > and a NaN if x underflows from a negative. > > > Interesting. > > Can you give me a more explicit example? When would you not *want* > f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp > into 0.0 on creation] > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From dickinsm at gmail.com Fri Oct 12 21:22:37 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 12 Oct 2012 20:22:37 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: On Fri, Oct 12, 2012 at 7:42 PM, Joshua Landau wrote: > Can you give me a more explicit example? When would you not *want* f(-0.0) > to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on > creation] A few examples: (1) In the absence of exceptions, 1 / 0.0 is +inf, while 1 / -0.0 is -inf. So e.g. the function exp(-exp(1/x)) has different values at -0.0 and 0.0: >>> from numpy import float64, exp >>> exp(-exp(1/float64(0.0))) 0.0 >>> exp(-exp(1/float64(-0.0))) 1.0 (2) For the atan2 function, we have e.g., >>> from math import atan2 >>> atan2(0.0, -1.0) 3.141592653589793 >>> atan2(-0.0, -1.0) -3.141592653589793 This gives atan2 a couple of nice invariants: the sign of the result always matches the sign of the first argument, and atan2(-y, x) == -atan2(y, x) for any (non-nan) x and y. (3) Similarly, for complex math functions (which aren't covered by IEEE 754, but are standardised in various other languages), it's sometimes convenient to be able to depend on invariants like e.g. asin(z.conj()) == asin(z).conj(). Those are only possible if -0.0 and 0.0 are distinguished; the effect is most visible if you pick values lying on a branch cut. >>> from cmath import sin >>> z = complex(2.0, 0.0) >>> asin(z).conjugate() (1.5707963267948966-1.3169578969248166j) >>> asin(z.conjugate()) (1.5707963267948966-1.3169578969248166j) You can't take that too far, though: e.g., it would be nice if complex multiplication had the property that (z * w).conjugate() was always the same as z.conjugate() * w.conjugate(), but it's impossible to keep both that invariant and the commutativity of multiplication. (E.g., consider the result of complex(1, 1) * complex(1, -1).) -- Mark From ethan at stoneleaf.us Fri Oct 12 21:23:46 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 12:23:46 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> Message-ID: <50786E42.6050308@stoneleaf.us> Georg Brandl wrote: > Am 06.10.2012 20:59, schrieb Ethan Furman: >> Mike Graham wrote: >>> On Sat, Oct 6, 2012 at 2:39 PM, Ethan Furman wrote: >>>> Georg Brandl wrote: >>>>> If you inherit from str, you cannot override any of the operations that >>>>> str already has (i.e. __add__, __getitem__). >>>> Is this a 3.x thing? My 2.x version of Path overrides many of the str >>>> methods and works just fine. >>> This is for theoretical/practical reasons, not technical ones. >> Ah, you mean you can't give them different semantics. Gotcha. > > Yep. Not much use being able to pass them directly to APIs expecting strings > if they can't operate on them like any other string :) Which is why I would like to see Path based on str, despite Guido's misgivings. (Yes, I know I'm probably tilting at windmills here...) If Path is string based we get backwards compatibility with all the os and third-party tools that expect and use strings; this would allow a gentle migration to using them, as opposed to the all-or-nothing if Path is a completely new type. This would be especially useful for accessing the functions that haven't been added on to Path yet. If Path is string based some questions evaporate: '+'? It does what str does; iterate? Just like str (we can make named methods for the iterations that we want, such as Path.dirs). If Path is string based we still get to use '/' to combine them together (I think that was the preference from the poll... but that could be wishful thinking on my part. ;) ) Even Path.joinpath would make sense to differentiate from Path.join (which is really str.join). Anyway, my two cents worth. From guido at python.org Fri Oct 12 21:32:11 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 12:32:11 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <20121012203311.4b3ee8af@pitrou.net> References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou wrote: > On Fri, 12 Oct 2012 11:13:23 -0700 > Guido van Rossum wrote: >> OTOH someone else might prefer a buffered stream >> abstraction that just keeps filling its read buffer (and draining its >> write buffer) using level-triggered callbacks, at least up to a >> certain buffer size -- we have to be robust here and make it >> impossible for an evil client to fill up all our memory without our >> approval! > > I'd like to know what a sane buffered API for non-blocking I/O may look > like, because right now it doesn't seem to make a lot of sense. At > least this bug is tricky to resolve: > http://bugs.python.org/issue13322 Good question. It actually depends quite a bit on whether you have an event loop or not -- with the help of an event loop, you can have a level-triggered callback that fills the buffer behind your back (up to a given limit, at which point it should unregister the I/O object); that bug seems to be about a situation without an event loop, where you can't do that. Also the existing io module design never anticipated cooperation with an event loop. >> - There's an abstract Reactor class and an abstract Async I/O object >> class. To get a reactor to call you back, you must give it an I/O >> object, a callback, and maybe some more stuff. (I have gone back and >> like passing optional args for the callback, rather than requiring >> lambdas to create closures.) Note that the callback is *not* a >> designated method on the I/O object! > > Why isn't it? In practice, you need several callbacks: in Twisted > parlance, you have dataReceived but also e.g. ConnectionLost > (depending on the transport, you may even imagine other callbacks, for > example for things happening on the TLS layer?). Yes, but I really want to separate the callbacks from the object, so that I don't have to inherit from an I/O object class -- asyncore requires this and IMO it's wrong. It also makes it harder to use the same callback code with different types of I/O objects. >> - In systems supporting file descriptors, there's a reactor >> implementation that knows how to use select/poll/etc., and there are >> concrete I/O object classes that wrap file descriptors. On Windows, >> those would only be socket file descriptors. On Unix, any file >> descriptor would do. > > Windows *is* able to do async I/O on things other than sockets (see the > discussion about IOCP). It's just that the Windows implementation of > select() (the POSIX function call) is limited to sockets. I know, but IOCP is currently not supported in the stdlib. I expect that on Windows, to use IOCP, you'd need to use a different reactor implementation and a different I/O object than the vanilla fd-based ones. My design is actually *inspired* by the desire to support this cleanly. -- --Guido van Rossum (python.org/~guido) From g.brandl at gmx.net Fri Oct 12 21:39:16 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 12 Oct 2012 21:39:16 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50784504.2080801@stoneleaf.us> References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: Am 12.10.2012 18:27, schrieb Ethan Furman: > Georg Brandl wrote: >> Am 12.10.2012 14:45, schrieb Blake Hyde: >>> I'm a Python developer rather than a developer of Python, but I'd like to ask a >>> question about this option (and implicitly vote against it, I suppose); if you >>> specialize a method name, such as .pathjoin, aren't you implying that methods >>> must be unambiguous even across types and classes? This seems negative. Even >>> if .join is already used for strings, it also makes sense for this use case. >> >> Of course different classes can have methods of the same name. >> >> The issue here is that due to the similarity (and interchangeability) of path >> objects and strings it is likely that people get them mixed up every now and >> then, and if .join() works on both objects the failure mode (strange result >> from str.join when you expected path.join) is horribly confusing. > > I don't understand the "horribly confusing" part. Sure, when I got them > mixed up and ended up with a plain ol' string instead of a really cool > Path it took a moment to figure out where I had made the error, but the > traceback of "AttributeError: 'str' object has no attribute 'path'" left > absolutely no room for confusion as to what the problem was. "no attribute 'path'"? Not sure where that exception comes from. This is what I meant: >>> p = Path('/usr') >>> p.join('lib') Path('/usr/lib') >>> p = '/usr' >>> p.join('lib') 'l/usri/usrb' Georg From tim.peters at gmail.com Fri Oct 12 21:42:34 2012 From: tim.peters at gmail.com (Tim Peters) Date: Fri, 12 Oct 2012 14:42:34 -0500 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: [Mark Dickinson] > ... > And as for sqrt(-0.0) returning -0.0... Grr. I've never understood > the motivation for that one, especially as it disagrees with the usual > recommendations for complex square root (where the real part of the > result *always* has its sign bit cleared). The only rationale I've seen for this is in Kahan's obscure paper "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit". Hard to find. Here's a mostly readable scan: http://port70.net/~nsz/articles/float/kahan_branch_cuts_complex_elementary_functions_1987.pdf In part it's to preserve various identities, such as that sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0, that becomes sqrt(conjugate(+0)) same_as conjugate(sqrt(+0)) which is sqrt(-0) same_as conjugate(+0) which is sqrt(-0) same as -0 Conviced? LOL. There are others in the paper ;-) From solipsis at pitrou.net Fri Oct 12 21:42:24 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 21:42:24 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> Message-ID: <20121012214224.55f3ed27@pitrou.net> On Fri, 12 Oct 2012 12:23:46 -0700 Ethan Furman wrote: > > Which is why I would like to see Path based on str, despite Guido's > misgivings. (Yes, I know I'm probably tilting at windmills here...) > > If Path is string based we get backwards compatibility with all the os > and third-party tools that expect and use strings; this would allow a > gentle migration to using them, as opposed to the all-or-nothing if Path > is a completely new type. It is not all-or-nothing since you can just call str() and it will work fine with both strings and paths. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ram.rachum at gmail.com Fri Oct 12 22:27:41 2012 From: ram.rachum at gmail.com (Ram Rachum) Date: Fri, 12 Oct 2012 13:27:41 -0700 (PDT) Subject: [Python-ideas] Is there a good reason to use * for multiplication? Message-ID: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Hi everybody, Today a funny thought occurred to me. Ever since I've learned to program when I was a child, I've taken for granted that when programming, the sign used for multiplication is *. But now that I think about it, why? Now that we have Unicode, why not use ? ? Do you think that we can make Python support ? in addition to *? I can think of a couple of problems, but none of them seem like deal-breakers: - Backward compatibility: Python already uses *, but I don't see a backward compatibility problem with supporting ? additionally. Let people use whichever they want, like spaces and tabs. - Input methods: I personally use an IDE that could be easily set to automatically convert * to ? where appropriate and to allow manual input of ?. People on Linux can type Alt-. . Anyone else can set up a script that'll let them type ? using whichever keyboard combination they want. I admit this is pretty annoying, but since you can always use * if you want to, I figure that anyone who cares enough about using ? instead of * (I bet that people in scientific computing would like that) would be willing to take the time to set it up. What do you think? Ram. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ram.rachum at gmail.com Fri Oct 12 22:37:47 2012 From: ram.rachum at gmail.com (Ram Rachum) Date: Fri, 12 Oct 2012 22:37:47 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham wrote: > On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum wrote: > > Hi everybody, > > > > Today a funny thought occurred to me. Ever since I've learned to program > > when I was a child, I've taken for granted that when programming, the > sign > > used for multiplication is *. But now that I think about it, why? Now > that > > we have Unicode, why not use ? ? > > > > Do you think that we can make Python support ? in addition to *? > > > > I can think of a couple of problems, but none of them seem like > > deal-breakers: > > > > - Backward compatibility: Python already uses *, but I don't see a > backward > > compatibility problem with supporting ? additionally. Let people use > > whichever they want, like spaces and tabs. > > - Input methods: I personally use an IDE that could be easily set to > > automatically convert * to ? where appropriate and to allow manual input > of > > ?. People on Linux can type Alt-. . Anyone else can set up a script > that'll > > let them type ? using whichever keyboard combination they want. I admit > this > > is pretty annoying, but since you can always use * if you want to, I > figure > > that anyone who cares enough about using ? instead of * (I bet that > people > > in scientific computing would like that) would be willing to take the > time > > to set it up. > > > > > > What do you think? > > > > > > Ram > > Python should not expect characters that are hard for most people to > type. No one will be forced to type it. If you can't type it, use *. > Python should not expect characters that are still hard to > display on many common platforms. > We allow people to have unicode variable names, if they wish, don't we? So why not allow them to use unicode operator, if they wish, as a completely optional thing? > > I think you'll find strong opposition to adding any non-ASCII > characters or characters that don't occur on almost all keyboards as > part of the language. > > Mike > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Oct 12 22:43:24 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 13:43:24 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: [Responding to a different message that also pertains to the reactors thread] On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam wrote: > Here's the thing: the underlying O.S is always handling two major I/O > channels at any given time and it needs all it's attention to do this: > the GUI and one of the following (network, file) I/O. You can > shuffle these around all you want, but somewhere the O.S. kernel is > going to have to be involved, which means either portability is > sacrificed or speed if one is going to pursue and abstract, unified > async API. I'm convinced that the OS has to get involved. I'm not convinced that it will get in the way of designing an abstract unified API -- however that API will have to be more complicated than the kind of event loop that *only* handles network I/O or the kind that *only* handles GUI events. I wonder if Windows' IOCP API that was mentioned before in the parent thread wouldn't be able to handle both though. Windows' event concept seems more general than sockets or GUI events. However I don't know if this is actually how GUI events are handled in Windows. >> You should talk to a Tcl/Tk user (if there are any left :-). > > I used to be one of those :) So tell us more about the user experience of having a standard event loop always available in the language, and threads, network I/O and GUI events all integrated. What worked, what didn't? What did you wish had been different? -- --Guido van Rossum (python.org/~guido) From ram.rachum at gmail.com Fri Oct 12 22:45:40 2012 From: ram.rachum at gmail.com (Ram Rachum) Date: Fri, 12 Oct 2012 22:45:40 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde wrote: > Is anything gained from this addition? To give a practical answer, I could say that for newbies it's one small confusion that could removed from the language. You and I have been programming for a long time so we take it for granted that * means multiplication, but for any other person that's just another weird idiosyncrasy that further alienates programming. Also, I think that using * for multiplication is ugly. > > On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum wrote: > > > > > > On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham > wrote: > >> > >> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum > wrote: > >> > Hi everybody, > >> > > >> > Today a funny thought occurred to me. Ever since I've learned to > program > >> > when I was a child, I've taken for granted that when programming, the > >> > sign > >> > used for multiplication is *. But now that I think about it, why? Now > >> > that > >> > we have Unicode, why not use ? ? > >> > > >> > Do you think that we can make Python support ? in addition to *? > >> > > >> > I can think of a couple of problems, but none of them seem like > >> > deal-breakers: > >> > > >> > - Backward compatibility: Python already uses *, but I don't see a > >> > backward > >> > compatibility problem with supporting ? additionally. Let people use > >> > whichever they want, like spaces and tabs. > >> > - Input methods: I personally use an IDE that could be easily set to > >> > automatically convert * to ? where appropriate and to allow manual > input > >> > of > >> > ?. People on Linux can type Alt-. . Anyone else can set up a script > >> > that'll > >> > let them type ? using whichever keyboard combination they want. I > admit > >> > this > >> > is pretty annoying, but since you can always use * if you want to, I > >> > figure > >> > that anyone who cares enough about using ? instead of * (I bet that > >> > people > >> > in scientific computing would like that) would be willing to take the > >> > time > >> > to set it up. > >> > > >> > > >> > What do you think? > >> > > >> > > >> > Ram > >> > >> Python should not expect characters that are hard for most people to > >> type. > > > > > > No one will be forced to type it. If you can't type it, use *. > > > > > >> > >> Python should not expect characters that are still hard to > >> display on many common platforms. > > > > > > We allow people to have unicode variable names, if they wish, don't we? > So > > why not allow them to use unicode operator, if they wish, as a completely > > optional thing? > > > >> > >> > >> I think you'll find strong opposition to adding any non-ASCII > >> characters or characters that don't occur on almost all keyboards as > >> part of the language. > >> > >> Mike > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dickinsm at gmail.com Fri Oct 12 22:46:00 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Fri, 12 Oct 2012 21:46:00 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: On Fri, Oct 12, 2012 at 8:42 PM, Tim Peters wrote: > In part it's to preserve various identities, such as that > sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0, > that becomes > > sqrt(conjugate(+0)) same_as conjugate(sqrt(+0)) > > which is > > sqrt(-0) same_as conjugate(+0) > > which is > > sqrt(-0) same as -0 > > Conviced? Not really. :-) In fact, it's exactly that paper that made me think sqrt(-0.0) -> -0.0 is suspect. The way I read it, the argument from the paper implies that cmath.sqrt(complex(0.0, -0.0)) should be complex(0.0, -0.0), which I have no problem with---it makes things nice and neat: quadrants 1 and 2 in the complex plane map to quadrant 1, and quadrants 3 and 4 to quadrant 4, with the signs of the zeros making it clear what 'quadrant' means in all (non-nan) cases. But I don't see how to get from there to math.sqrt(-0.0) being -0.0. It's exactly the mismatch between the real and complex math that makes no sense to me: math.sqrt(-0.0) should resemble cmath.sqrt(complex(-0.0, +/-0.0)). But the latter, quite reasonably, is complex(0.0, +/-0.0) (at least according to both Kahan and C99 Annex G), while the former is specified to be -0.0 in IEEE 754. -- Mark From joshua.landau.ws at gmail.com Fri Oct 12 22:45:58 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 12 Oct 2012 21:45:58 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: On 12 October 2012 20:39, Georg Brandl wrote: > Am 12.10.2012 18:27, schrieb Ethan Furman: > > Georg Brandl wrote: > >> Am 12.10.2012 14:45, schrieb Blake Hyde: > >>> I'm a Python developer rather than a developer of Python, but I'd like > to ask a > >>> question about this option (and implicitly vote against it, I > suppose); if you > >>> specialize a method name, such as .pathjoin, aren't you implying that > methods > >>> must be unambiguous even across types and classes? This seems > negative. Even > >>> if .join is already used for strings, it also makes sense for this use > case. > >> > >> Of course different classes can have methods of the same name. > >> > >> The issue here is that due to the similarity (and interchangeability) > of path > >> objects and strings it is likely that people get them mixed up every > now and > >> then, and if .join() works on both objects the failure mode (strange > result > >> from str.join when you expected path.join) is horribly confusing. > > > > I don't understand the "horribly confusing" part. Sure, when I got them > > mixed up and ended up with a plain ol' string instead of a really cool > > Path it took a moment to figure out where I had made the error, but the > > traceback of "AttributeError: 'str' object has no attribute 'path'" left > > absolutely no room for confusion as to what the problem was. > > "no attribute 'path'"? Not sure where that exception comes from. > This is what I meant: > > >>> p = Path('/usr') > >>> p.join('lib') > Path('/usr/lib') > > >>> p = '/usr' > >>> p.join('lib') > 'l/usri/usrb' > I don't know about you, but I found that so horribly confusing I had to check the output. I'm just not used to thinking of str.join(str) as sensible, and I could not for the life of me see where the output 'l/usri/usrb' came from. Where was "lib"? I might just have been an idiot for a minute, but it'll just get harder in real code. And I'm not the worst for stupid mistakes: we wan't newbies to be able (and want) to use the built in path modules. When they come back wondering why > homepath.join("joshua").join(".config") returned > > > '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag' we are going to have a problem. So I agree with you [Georg Brandl] here. I would even rather .pathjoin/.joinpath than .join despite the utterly painful name*. * As others have stated, if you like it why not .strjoin and .dictupdate and .listappend? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Fri Oct 12 22:46:39 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 12 Oct 2012 16:46:39 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: On Fri, Oct 12, 2012 at 4:37 PM, Ram Rachum wrote: > > > On Fri, Oct 12, 2012 at 10:34 PM, Mike Graham wrote: >> >> On Fri, Oct 12, 2012 at 4:27 PM, Ram Rachum wrote: >> > Hi everybody, >> > >> > Today a funny thought occurred to me. Ever since I've learned to program >> > when I was a child, I've taken for granted that when programming, the >> > sign >> > used for multiplication is *. But now that I think about it, why? Now >> > that >> > we have Unicode, why not use ? ? >> > >> > Do you think that we can make Python support ? in addition to *? >> > >> > I can think of a couple of problems, but none of them seem like >> > deal-breakers: >> > >> > - Backward compatibility: Python already uses *, but I don't see a >> > backward >> > compatibility problem with supporting ? additionally. Let people use >> > whichever they want, like spaces and tabs. >> > - Input methods: I personally use an IDE that could be easily set to >> > automatically convert * to ? where appropriate and to allow manual input >> > of >> > ?. People on Linux can type Alt-. . Anyone else can set up a script >> > that'll >> > let them type ? using whichever keyboard combination they want. I admit >> > this >> > is pretty annoying, but since you can always use * if you want to, I >> > figure >> > that anyone who cares enough about using ? instead of * (I bet that >> > people >> > in scientific computing would like that) would be willing to take the >> > time >> > to set it up. >> > >> > >> > What do you think? >> > >> > >> > Ram >> >> Python should not expect characters that are hard for most people to >> type. > > > No one will be forced to type it. If you can't type it, use *. > > >> >> Python should not expect characters that are still hard to >> >> display on many common platforms. > > > We allow people to have unicode variable names, if they wish, don't we? So > why not allow them to use unicode operator, if they wish, as a completely > optional thing? 1. Non-ASCII unicode identifiers are heavily discouraged in most contexts and are not present anywhere in the core language or stdlib for a reason. 2. Having duplicative features where neither is encouraged is a bad idea. "There should be one-- and preferably only one --obvious way to do it." This is doubly true when one of the ways makes it harder for people to read and edit code others' wrote. Mike From mikegraham at gmail.com Fri Oct 12 22:49:18 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 12 Oct 2012 16:49:18 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: On Fri, Oct 12, 2012 at 4:45 PM, Ram Rachum wrote: > On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde wrote: >> >> Is anything gained from this addition? > > > To give a practical answer, I could say that for newbies it's one small > confusion that could removed from the language. You and I have been > programming for a long time so we take it for granted that * means > multiplication, but for any other person that's just another weird > idiosyncrasy that further alienates programming. > > Also, I think that using * for multiplication is ugly. You're emphatically not getting rid of *, though, which means 1) you're only making it harder for new people to learn and deal with, and b) you're at best not eliminating any perceived ugliness, in reality probably compounding it. Mike From ethan at stoneleaf.us Fri Oct 12 22:33:14 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 13:33:14 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121012214224.55f3ed27@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> Message-ID: <50787E8A.2090804@stoneleaf.us> Antoine Pitrou wrote: > On Fri, 12 Oct 2012 12:23:46 -0700 > Ethan Furman wrote: >> Which is why I would like to see Path based on str, despite Guido's >> misgivings. (Yes, I know I'm probably tilting at windmills here...) >> >> If Path is string based we get backwards compatibility with all the os >> and third-party tools that expect and use strings; this would allow a >> gentle migration to using them, as opposed to the all-or-nothing if Path >> is a completely new type. > > It is not all-or-nothing since you can just call str() and it will work > fine with both strings and paths. D'oh. You're correct, of course. What I was thinking was along the lines of: --> some_table = Path('~/addresses.dbf') --> some_table = os.path.expanduser(some_table) vs --> some_table = Path('~/addresses.dbf') --> some_table = Path(os.path.expanduser(str(some_table))) The Path/str sandwich is ackward, as well as verbose. ~Ethan~ From solipsis at pitrou.net Fri Oct 12 22:53:06 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 12 Oct 2012 22:53:06 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> Message-ID: <20121012225306.295d93e6@pitrou.net> On Fri, 12 Oct 2012 13:33:14 -0700 Ethan Furman wrote: > Antoine Pitrou wrote: > > On Fri, 12 Oct 2012 12:23:46 -0700 > > Ethan Furman wrote: > >> Which is why I would like to see Path based on str, despite Guido's > >> misgivings. (Yes, I know I'm probably tilting at windmills here...) > >> > >> If Path is string based we get backwards compatibility with all the os > >> and third-party tools that expect and use strings; this would allow a > >> gentle migration to using them, as opposed to the all-or-nothing if Path > >> is a completely new type. > > > > It is not all-or-nothing since you can just call str() and it will work > > fine with both strings and paths. > > D'oh. You're correct, of course. > > What I was thinking was along the lines of: > > --> some_table = Path('~/addresses.dbf') > --> some_table = os.path.expanduser(some_table) > > vs > > > --> some_table = Path('~/addresses.dbf') > --> some_table = Path(os.path.expanduser(str(some_table))) Hey, nice catch, I need to add a expanduser()-alike to the Path API. Thank you! Antoine. -- Software development and contracting: http://pro.pitrou.net From joshua.landau.ws at gmail.com Fri Oct 12 23:03:03 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 12 Oct 2012 22:03:03 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <50787E8A.2090804@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> Message-ID: On 12 October 2012 21:33, Ethan Furman wrote: > Antoine Pitrou wrote: > >> On Fri, 12 Oct 2012 12:23:46 -0700 >> Ethan Furman wrote: >> >>> Which is why I would like to see Path based on str, despite Guido's >>> misgivings. (Yes, I know I'm probably tilting at windmills here...) >>> >>> If Path is string based we get backwards compatibility with all the os >>> and third-party tools that expect and use strings; this would allow a >>> gentle migration to using them, as opposed to the all-or-nothing if Path is >>> a completely new type. >>> >> >> It is not all-or-nothing since you can just call str() and it will work >> fine with both strings and paths. >> > > D'oh. You're correct, of course. > > What I was thinking was along the lines of: > > --> some_table = Path('~/addresses.dbf') > --> some_table = os.path.expanduser(some_table) > > vs > > > --> some_table = Path('~/addresses.dbf') > --> some_table = Path(os.path.expanduser(str(**some_table))) > > The Path/str sandwich is ackward, as well as verbose. A lot of them might end up inadvertently converting back to a pure string as well, so a better comparison will in many places be: some_table = Path('~/addresses.dbf') > some_table = Path(os.path.expanduser(some_table)) > vs some_table = Path('~/addresses.dbf') > some_table = Path(os.path.expanduser(str(**some_table))) which is only five characters different. I would also prefer: some_table = Path('~/addresses.dbf') > some_table = Path(os.path.expanduser(some_table.raw())) or some other method. It just looks nicer to me in this case. Maybe .str(), .chars() or.text(). Additionally, if this is too painful and too often used, we can always make an auxiliary function. some_table = Path('~/addresses.dbf') > some_table = some_table.str_apply(os.path.expanduser) Where .str_apply takes (func, *args, **kwargs) and you need to wrap the function if it takes the path at a different position. I don't particularly like this option, but it exists. -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Fri Oct 12 23:12:27 2012 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Oct 2012 22:12:27 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121012214224.55f3ed27@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> Message-ID: <507887BB.7060002@mrabarnett.plus.com> On 2012-10-12 20:42, Antoine Pitrou wrote: > On Fri, 12 Oct 2012 12:23:46 -0700 > Ethan Furman wrote: >> >> Which is why I would like to see Path based on str, despite Guido's >> misgivings. (Yes, I know I'm probably tilting at windmills here...) >> >> If Path is string based we get backwards compatibility with all the os >> and third-party tools that expect and use strings; this would allow a >> gentle migration to using them, as opposed to the all-or-nothing if Path >> is a completely new type. > > It is not all-or-nothing since you can just call str() and it will work > fine with both strings and paths. > The disadvantage of using str is that it will also convert non-path objects to strings, possibly changing the result of the call: >>> os.path.isdir(1) Traceback (most recent call last): File "", line 1, in os.path.isdir(1) TypeError: 'int' does not support the buffer interface >>> os.path.isdir(str(1)) False From ethan at stoneleaf.us Fri Oct 12 23:00:32 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 14:00:32 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121012225306.295d93e6@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> Message-ID: <507884F0.2060608@stoneleaf.us> Antoine Pitrou wrote: > Hey, nice catch, I need to add a expanduser()-alike to the Path API. > > Thank you! You're welcome. :p My point about the Path(...(str(...))) sandwich still applies, though, for every function that isn't built in to Path. :) ~Ethan~ From joshua.landau.ws at gmail.com Fri Oct 12 23:16:13 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Fri, 12 Oct 2012 22:16:13 +0100 Subject: [Python-ideas] checking for identity before comparing built-in objects In-Reply-To: References: <506D94EE.30808@pearwood.info> <50723BE5.3060300@nedbatchelder.com> <50733A18.10400@nedbatchelder.com> <20121009043236.GI27445@ando> <5074489B.6000003@stoneleaf.us> <50744CBE.4010600@pearwood.info> <5074CBF2.8070507@pearwood.info> <50761EE7.8060103@pearwood.info> Message-ID: Thank you all for being so thorough. I think I'm sated for tonight. ^^ With all due respect, Joshua Landau -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Oct 12 23:31:23 2012 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 12 Oct 2012 23:31:23 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: 2012/10/12 Ram Rachum : > What do you think? It's maybe time to implement http://python.org/dev/peps/pep-3117/ Victor From ethan at stoneleaf.us Fri Oct 12 23:37:53 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 12 Oct 2012 14:37:53 -0700 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <50788DB1.4090809@stoneleaf.us> Ram Rachum wrote: > Hi everybody, > > Today a funny thought occurred to me. Ever since I've learned to program > when I was a child, I've taken for granted that when programming, the > sign used for multiplication is *. But now that I think about it, why? > Now that we have Unicode, why not use ? ? Because it is too easy to confuse ? with . Because it is not solving a problem. Because it would still take work, and then easily cause confusion. In short, I don't see it happening. ~Ethan~ From tarek at ziade.org Fri Oct 12 23:57:06 2012 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 12 Oct 2012 22:57:06 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <50789232.3050007@ziade.org> On 10/12/12 10:31 PM, Victor Stinner wrote: > 2012/10/12 Ram Rachum : >> What do you think? > It's maybe time to implement > http://python.org/dev/peps/pep-3117/ I'd use ? ?? to speed up a piece code, not for exceptions... > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Oct 13 00:11:54 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 15:11:54 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds Message-ID: [This is the third spin-off thread from "asyncore: included batteries don't fit"] On Thu, Oct 11, 2012 at 9:29 PM, Devin Jeanpierre wrote: > On Thu, Oct 11, 2012 at 7:37 PM, Guido van Rossum wrote: >> On Thu, Oct 11, 2012 at 3:42 PM, Devin Jeanpierre >> wrote: >>> Could you be more specific? I've never heard Deferreds in particular >>> called "arcane". They're very popular in e.g. the JS world, >> >> Really? Twisted is used in the JS world? Or do you just mean the >> pervasiveness of callback style async programming? > > Ah, I mean Deferreds. I attended a talk earlier this year all about > deferreds in JS, and not a single reference to Python or Twisted was > made! > > These are the examples I remember mentioned in the talk: > > - http://api.jquery.com/category/deferred-object/ (not very twistedish > at all, ill-liked by the speaker) > - http://mochi.github.com/mochikit/doc/html/MochiKit/Async.html (maybe > not a good example, mochikit tries to be "python in JS") > - http://dojotoolkit.org/reference-guide/1.8/dojo/Deferred.html > - https://github.com/kriskowal/q (also includes an explanation of why > the author likes deferreds) > > There were a few more that the speaker mentioned, but didn't cover. > One of his points was that the various systems of deferreds are subtly > different, some very badly so, and that it was a mess, but that > deferreds were still awesome. JS is a language where async programming > is mainstream, so lots of people try to make it easier, and they all > do it slightly differently. Thanks for those links. I followed the kriskowal/q link and was reminded of why Twisted's Deferreds are considered more awesome than Futures: it's the chaining. BUT... That's only important if callbacks are all the language lets you do! If your baseline is this: step1(function (value1) { step2(value1, function(value2) { step3(value2, function(value3) { step4(value3, function(value4) { // Do something with value4 }); }); }); }); then of course the alternative using Deferred looks better: Q.fcall(step1) .then(step2) .then(step3) .then(step4) .then(function (value4) { // Do something with value4 }, function (error) { // Handle any error from step1 through step4 }) .end(); (Both quoted literally from the kriskowal/q link.) I also don't doubt that using classic Futures you can't do this -- the chaining really matter for this style, and I presume this (modulo unimportant API differences) is what typical Twisted code looks like. However, Python has yield, and you can do much better (I'll write plain yield for now, but it works the same with yield-from): try: value1 = yield step1() value2 = yield step2(value1) value3 = yield step3(value2) # Do something with value4 except Exception: # Handle any error from step1 through step4 There's an outer function missing here, since you can't have a toplevel yield; I think that's the same for the JS case, typically. Also, strictly speaking the "Do something with value4" code should probably be in an else: clause after the except handler. But that actually leads nicely to the advantage: This form is more flexible, since it is easier to catch different exceptions at different points. It is also much easier to pass extra information around. E.g. what if your flow ends up having to pass both value1 and value2 into step3()? Sure, you can do that by making value2 a tuple (or a dict, or an object) incorporating value1 and the original value2, but that's exactly where this style becomes cumbersome, whereas in the yield-based form, such things can remain simple local variables. All in all I find it more readable. In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), or they go a completely different route and use greenlets/gevent instead -- and get amazing performance and productivity that way too, even though they know it's monkey-patching their asses off... So, in the end, for Python 3.4 and beyond, I want to promote a style that mixes simple callbacks (perhaps augmented with simple Futures) and generator-based coroutines (either PEP 342, yield/send-based, or PEP 380 yield-from-based). I'm looking to Twisted for the best reactors (see other thread). But for transport/protocol implementations I think that generator/coroutines offers a cleaner, better interface than incorporating Deferred. I hope that the path forward for Twisted will be simple enough: it should be possible to hook Deferred into the simpler callback APIs (perhaps a new implementation using some form of adaptation, but keeping the interface the same). In a sense, the greenlet/gevent crowd will be the biggest losers, since they currently write async code without either callbacks or yield, using microthreads instead. I wouldn't want to have to start putting yield back everywhere into that code. But the stdlib will still support yield-free blocking calls (even if under the hood some of these use yield/send-based or yield-from-based couroutines) so the monkey-patchey tradition can continue. >> That's one of the >> things I am desperately trying to keep out of Python, I find that >> style unreadable and unmanageable (whenever I click on a button in a >> website and nothing happens I know someone has a bug in their >> callbacks). I understand you feel different; but I feel the general >> sentiment is that callback-based async programming is even harder than >> multi-threaded programming (and nobody is claiming that threads are >> easy :-). > > :S > > There are (at least?) four different styles of asynchronous > computation used in Twisted, and you seem to be confused as to which > ones I'm talking about. > > 1. Explicit callbacks: > > For example, reactor.callLater(t, lambda: print("woo hoo")) I actually like this, as it's a lowest-common-denominator approach which everyone can easily adapt to their purposes. See the thread I started about reactors. > 2. Method dispatch callbacks: > > Similar to the above, the reactor or somebody has a handle on your > object, and calls methods that you've defined when events happen > e.g. IProtocol's dataReceived method While I'm sure it's expedient and captures certain common patterns well, I like this the least of all -- calling fixed methods on an object sounds like a step back; it smells of the old Java way (before it had some equivalent of anonymous functions), and of asyncore, which (nearly) everybody agrees is kind of bad due to its insistence that you subclass its classes. (Notice how subclassing as the prevalent approach to structuring your code has gotten into a lot of discredit since 1996.) > 3. Deferred callbacks: > > When you ask for something to be done, it's set up, and you get an > object back, which you can add a pipeline of callbacks to that will be > called whenever whatever happens > e.g. twisted.internet.threads.deferToThread(print, > "x").addCallback(print, "x was printed in some other thread!") Discussed above. > 4. Generator coroutines > > These are a syntactic wrapper around deferreds. If you yield a > deferred, you will be sent the result if the deferred succeeds, or an > exception if the deferred fails. > e.g. examples from previous message Seeing them as syntactic sugar for Deferreds is one way of looking at it; no doubt this is how they're seen in the Twisted community because Deferreds are older and more entrenched. But there's no requirement that an architecture has to have Deferreds in order to use generator coroutines -- simple Futures will do just fine, and Greg Ewing has shown that using yield-from you can even do without those. (But he does use simple, explicit callbacks at the lowest level of his system.) > I don't see a reason for the first to exist at all, the second one is > kind of nice in some circumstances (see below), but perhaps overused. > > I feel like you're railing on the first and second when I'm talking > about the third and fourth. I could be wrong. I think you're wrong -- I was (and am) most concerned about the perceived complexity of the API offered by, and the typical looks of code using, Deferreds (i.e., #3). >>> and possibly elsewhere. Moreover, they're extremely similar to futures, so >>> if one is arcane so is the other. >> >> I love Futures, they represent a nice simple programming model. But I >> especially love that you can write async code using Futures and >> yield-based coroutines (what you call inlineCallbacks) and never have >> to write an explicit callback function. Ever. > > The reason explicit non-deferred callbacks are involved in Twisted is > because of situations in which deferreds are not present, because of > past history in Twisted. It is not at all a limitation of deferreds or > something futures are better at, best as I'm aware. > > (In case that's what you're getting at.) I don't think I was. It's clear to me (now) that Futures are simpler than Deferreds -- and I like Futures better because of it, because for the complex cases I would much rather use generator coroutines than Deferreds. > Anyway, one big issue is that generator coroutines can't really > effectively replace callbacks everywhere. Consider the GUI button > example you gave. How do you write that as a coroutine? > > I can see it being written like this: > > def mycoroutine(gui): > while True: > clickevent = yield gui.mybutton1.on_click() > # handle clickevent > > But that's probably worse than using callbacks. I touched on this briefly in the reactor thread. Basically, GUI callbacks are often level-triggered rather than edge-triggered, and IIUC Deferreds are not great for that either; and in a few cases where edge-triggered coding makes sense I *would* like to use a generator coroutine. >>> Neither is clearly better or more obvious than the other. If anything >>> I generally find deferred composition more useful than deferred >>> tee-ing, so I feel like composition is the correct base operator, but >>> you could pick another. >> >> If you're writing long complicated chains of callbacks that benefit >> from these features, IMO you are already doing it wrong. I understand >> that this is a matter of style where I won't be able to convince you. >> But style is important to me, so let's agree to disagree. [In a follow-up to yourself, you quoted starting from this point and appended "Nevermind that whole segment." I'm keeping it in here just for context of the thread.] > This is more than a matter of style, so at least for now I'd like to > hold off on calling it even. > > In my day to day silly, synchronous, python code, I do lots of > synchronous requests. For example, it's not unreasonable for me to > want to load two different files from disk, or make several database > interactions, etc. If I want to make this asynchronous, I have to find > a way to execute multiple things that could hypothetically block, at > the same time. If I can't do that easily, then the asynchronous > solution has failed, because its entire purpose is to do everything > that I do synchronously, except without blocking the main thread. > > Here's an example with lots of synchronous requests in Django: > > def view_paste(request, filekey): > try: > fileinfo= Pastes.objects.get(key=filekey) > except DoesNotExist: > t = loader.get_template('pastebin/error.html') > return HttpResponse(t.render(Context(dict(error='File does not exist')))) > > f = open(fileinfo.filename) > fcontents = f.read() > t = loader.get_template('pastebin/paste.html') > return HttpResponse(t.render(Context(dict(file=fcontents)))) > > How many blocking requests are there? Lots. This is, in a word, a > long, complicated chain of synchronous requests. This is also very > similar to what actual django code might look like in some > circumstances. Even if we might think this is unreasonable, some > subset of alteration of this is reasonable. Certainly we should be > able to, say, load multiple (!) objects from the database, and open > the template (possibly from disk), all potentially-blocking > operations. > > This is inherently a long, complicated chain of requests, whether we > implement it asynchronously or synchronously, or use Deferreds or > Futures, or write it in Java or Python. Some parts can be done at any > time before the end (loader.get_template(...)), some need to be done > in a certain order, and there's branching depending on what happens in > different cases. In order to even write this code _at all_, we need a > way to chain these IO actions together. If we can't chain them > together, we can't produce that final synthesis of results at the end. [This is here you write "Ugh, just realized way after the fact that of course you meant callbacks, not composition. I feel dumb. Nevermind that whole segment."] I'd like to come back to that Django example though. You are implying that there are some opportunities for concurrency here, and I agree, assuming we believe disk I/O is slow enough to bother making it asynchronously. (In App Engine it's not, and we can't anyways, but in other contexts I agree that it would be bad if a slow disk seek were to hold up all processing -- not to mention that it might really be NFS...) The potentially async operations I see are: (1) fileinfo = Pastes.objects.get(key=filekey) # I assume this is some kind of database query (2) loader.get_template('pastebin/error.html') (3) f = open(fileinfo.filename) # depends on (1) (4) fcontents = f.read() # depends on (3) (5) loader.get_template('pastebin/paste.html') How would you code that using Twisted Deferreds? Using Futures and generator coroutines, I would do it as follows. I'm hypothesizing that for every blocking API foo() there is a corresponding non-blocking API foo_async() with the same call signature, and returning a Future whose result is what the synchronous API returns (and raises what the synchronous call would raise, if there's an error). These are the conventions I use in NDB. I'm also inventing a @task decorator. @task def view_paste_async(request, filekey): # Create Futures -- no yields! f1 = Pastes.objects.get_async(key=filekey) # This won't raise f2 = loader.get_template_async('pastebin/error.html') f3 = loader.get_template_async('pastebin/paste.html') try: fileinfo= yield f1 except DoesNotExist: t = yield f2 return HttpResponse(t.render(Context(dict(error='File does not exist')))) f = yield open_async(fileinfo.filename) fcontents = yield f.read_async() t = yield f3 return HttpResponse(t.render(Context(dict(file=fcontents)))) You could easily decide not to bother loading the error template asynchronously (assuming most requests don't fail), and you could move the creation of f3 below the try/except. But you get the idea. Even if you do everything serially, inserting the yields and _async calls would make this more parallellizable without the use of threads. (If you were using threads, all this would be moot of course -- but then your limit on requests being handled concurrently probably goes way down.) > We _need_ a pipeline or something computationally equivalent or more > powerful. Results from past "deferred computations" need to be passed > forward into future "deferred computations", in order to implement > this at all. Yeah, and I think that a single generator using multiple yields is the ideal pipeline to me (see my example near the top based on kriskowal/q). > This is not a style issue, this is an issue of needing to be able to > solve problems that involve more than one computation where the > results of every computation matters somewhere. It's just that in this > case, some of the computations are computed asynchronously. And I think generators do this very well. >> I am totally open to learning from Twisted's experience. I hope that >> you are willing to share even the end result might not look like >> Twisted at all -- after all in Python 3.3 we have "yield from" and >> return from a generator and many years of experience with different >> styles of async APIs. In addition to Twisted, there's Tornado and >> Monocle, and then there's the whole greenlets/gevent and >> Stackless/microthreads community that we can't completely ignore. I >> believe somewhere is an ideal async architecture, and I hope you can >> help us discover it. >> >> (For example, I am very interested in Twisted's experiences writing >> real-world performant, robust reactors.) > > For that stuff, you'd have to speak to the main authors of Twisted. > I'm just a twisted user. :( They seem to be mostly ignoring this conversation, so your standing in as a proxy for them is much appreciated! > In the end it really doesn't matter what API you go with. The Twisted > people will wrap it up so that they are compatible, as far as that is > possible. And I want to ensure that that is possible and preferably easy, if I can do it without introducing too many warts in the API that non-Twisted users see and use. > I hope I haven't detracted too much from the main thrust of the > surrounding discussion. Futures/deferreds are a pretty big tangent, so > sorry. I justified it to myself by figuring that it'd probably come up > anyway, somehow, since these are useful abstractions for asynchronous > programming. Not at all. This has been a valuable refresher for me! -- --Guido van Rossum (python.org/~guido) From nadeem.vawda at gmail.com Sat Oct 13 00:16:38 2012 From: nadeem.vawda at gmail.com (Nadeem Vawda) Date: Sat, 13 Oct 2012 00:16:38 +0200 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On Mon, Oct 8, 2012 at 8:47 PM, Antoine Pitrou wrote: > - `p[q]` joins path q to path p -1. Much less intuitive than the other two proposed operators. > - `p + q` joins path q to path p -1. Silently does the wrong thing if p and q are both strings. > - `p / q` joins path q to path p +1. Reads naturally, and fails loudly if p is a string. > - `p.join(q)` joins path q to path p -1. Produces a nonsensical result if p and q are both strings. I'd be +1 on `p.joinpath(q)`, since it doesn't have this problem. Cheers, Nadeem From breamoreboy at yahoo.co.uk Sat Oct 13 00:38:59 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 12 Oct 2012 23:38:59 +0100 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <20121008204707.48559bf9@pitrou.net> References: <20121008204707.48559bf9@pitrou.net> Message-ID: On 08/10/2012 19:47, Antoine Pitrou wrote: > > Hello, > > Since there has been some controversy about the joining syntax used in > PEP 428 (filesystem path objects), I would like to run an informal poll > about it. Please answer with +1/+0/-0/-1 for each proposal: > > - `p[q]` joins path q to path p > - `p + q` joins path q to path p > - `p / q` joins path q to path p > - `p.join(q)` joins path q to path p > > (you can include a rationale if you want, but don't forget to vote :-)) > > Thank you > > Antoine. > > How about using the caret symbol to join so `p ^ q`? Rationale, it looks like a miniature combination of the backslash and forwardslash so should keep Windows and *nix camps happy, plus it's only used in Python (I think?) for bitwise operations so shouldn't confuse anybody. Parachute is ready for the antiaircraft fire :) -- Cheers. Mark Lawrence. From guido at python.org Sat Oct 13 00:49:36 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 15:49:36 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: [Responding to yet another message in the original thread] On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson wrote: > On Thu, Oct 11, 2012 at 07:40:43AM -0700, Antoine Pitrou wrote: >> On Wed, 10 Oct 2012 20:55:23 -0400 Trent Nelson wrote: >> > You could leverage this with kqueue and epoll; have similar threads >> > set up to simply process I/O independent of the GIL, using the same >> > facilities that would be used by IOCP-processing threads. >> Would you really win anything by doing I/O in separate threads, while >> doing normal request processing in the main thread? > If the I/O threads can run independent of the GIL, yes, definitely. > The whole premise of IOCP is that the kernel takes care of waking > one of your I/O handlers when data is ready. IOCP allows that to > happen completely independent of your application's event loop. > > It really is the best way to do I/O. The Windows NT design team > got it right from the start. The AIX and Solaris implementations > are semantically equivalent to Windows, without the benefit of > automatic thread pool management (and a few other optimisations). > > On Linux and BSD, you could get similar functionality by spawning > I/O threads that could also run independent of the GIL. They would > differ from the IOCP worker threads in the sense that they all have > their own little event loops around epoll/kqueue+timeout. i.e. they > have to continually ask "is there anything to do with this set of > fds", then process the results, then manage set synchronisation. > > IOCP threads, on the other hand, wait for completion of something > that has already been requested. The thread body implementation is > significantly simpler, and no synchronisation primitives are needed. >> That said, the idea of a common API architected around async I/O, >> rather than non-blocking I/O, sounds interesting at least theoretically. (Oh, what a nice distinction.) > It's the best way to do it. There should really be a libevent-type > library (libiocp?) that leverages IOCP where possible, and fakes it > when not using a half-sync/half-async pattern with threads and epoll > or kqueue on Linux and FreeBSD, falling back to processes and poll > on everything else (NetBSD, OpenBSD and HP-UX (the former two not > having robust-enough pthread implementations, the latter not having > anything better than select or poll)). In which category does OS X fall? > However, given that the best IOCP implementations are a) Windows by > a huge margin, and then b) Solaris and AIX in equal, distant second > place, I can't see that happening any time soon. > > (Trying to use IOCP in the reactor fashion described above for epoll > and kqueue is far more limiting than having an IOCP-oriented API > and faking it for platforms where native support isn't available.) How close would our abstracted reactor interface have to be exactly like IOCP? The actual IOCP API calls have very little to recommend them -- it's the implementation and the architecture that we're after. But we want it to be able to use actual IOCP calls on all systems that have them. >> Maybe all those outdated Snakebite Operating Systems are useful for >> something after all. ;-P > All the operating systems are the latest version available! > In addition, there's also a Solaris 9 and HP-UX 11iv2 box. > The hardware, on the other hand... not so new in some cases. -- --Guido van Rossum (python.org/~guido) From dreamingforward at gmail.com Sat Oct 13 00:56:06 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Fri, 12 Oct 2012 17:56:06 -0500 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: <50776C76.3040309@pearwood.info> References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <50776C76.3040309@pearwood.info> Message-ID: >>> I would gladly give up a small amount of speed for better control >>> over floats, such as whether 1/0.0 raised an exception or >>> returned infinity. >> >> Umm, you would be giving up a *lot* of speed. Native floating point >> happens right in the processor, so if you want special behavior, you'd >> have to take the floating point out of hardware and into "user space". > > Even in user-space, you're not giving up that much speed in practical > terms, at least not for my needs. The new decimal module in Python 3.3 is > less than a factor of 10 times slower than Python's floats, which makes it > pretty much instantaneous to my mind :) Hmm, well, if it's only that much slower, then we should implement Rationals and get rid of the issue altogether. Mark From python at mrabarnett.plus.com Sat Oct 13 00:57:30 2012 From: python at mrabarnett.plus.com (MRAB) Date: Fri, 12 Oct 2012 23:57:30 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <50788DB1.4090809@stoneleaf.us> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <50788DB1.4090809@stoneleaf.us> Message-ID: <5078A05A.9030007@mrabarnett.plus.com> On 2012-10-12 22:37, Ethan Furman wrote: > Ram Rachum wrote: >> Hi everybody, >> >> Today a funny thought occurred to me. Ever since I've learned to program >> when I was a child, I've taken for granted that when programming, the >> sign used for multiplication is *. But now that I think about it, why? >> Now that we have Unicode, why not use ? ? > Why not use ? ? > Because it is too easy to confuse ? with . > Because it is too easy to confuse ? with x > Because it is not solving a problem. > Ditto. > Because it would still take work, and then easily cause confusion. > Ditto. > > > > > In short, I don't see it happening. > Ditto. From guido at python.org Sat Oct 13 01:22:39 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 16:22:39 -0700 Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep) Message-ID: [Hopefully this is the last spin-off thread from "asyncore: included batteries don't fit"] [LvH] >> > If there's one take away idea from async-pep, it's reusable protocols. [Guido] >> Is there a newer version that what's on >> http://www.python.org/dev/peps/pep-3153/ ? It seems to be missing any >> specific proposals, after spending a lot of time giving a rationale >> and defining some terms. The version on >> https://github.com/lvh/async-pep doesn't seem to be any more complete. [LvH] > Correct. So it's totally unfinished? > If I had to change it today, I'd throw out consumers and producers and just > stick to a protocol API. > > Do you feel that there should be less talk about rationale? No, but I feel that there should be some actual specification. I am also looking forward to an actual meaty bit of example code -- ISTR you mentioned you had something, but that it was incomplete, and I can't find the link. >> > The PEP should probably be a number of PEPs. At first sight, it seems >> > that this number is at least four: >> > >> > 1. Protocol and transport abstractions, making no mention of >> > asynchronous IO >> > (this is what I want 3153 to be, because it's small, manageable, and >> > virtually everyone appears to agree it's a fantastic idea) >> >> But the devil is in the details. *What* specifically are you >> proposing? How would you write a protocol handler/parser without any >> reference to I/O? Most protocols are two-way streets -- you read some >> stuff, and you write some stuff, then you read some more. (HTTP may be >> the exception here, if you don't keep the connection open.) > > It's not that there's *no* reference to IO: it's just that that reference is > abstracted away in data_received and the protocol's transport object, just > like Twisted's IProtocol. The words "data_received" don't even occur in the PEP. >> > 2. A base reactor interface >> >> I agree that this should be a separate PEP. But I do think that in >> practice there will be dependencies between the different PEPs you are >> proposing. > > Absolutely. > >> > 3. A way of structuring callbacks: probably deferreds with a built-in >> > inlineCallbacks for people who want to write synchronous-looking code >> > with >> > explicit yields for asynchronous procedures >> >> Your previous two ideas sound like you're not tied to backward >> compatibility with Tornado and/or Twisted (not even via an adaptation >> layer). Given that we're talking Python 3.4 here that's fine with me >> (though I think we should be careful to offer a path forward for those >> packages and their users, even if it means making changes to the >> libraries). > > I'm assuming that by previous ideas you mean points 1, 2: protocol interface > + reactor interface. Yes. > I don't see why twisted's IProtocol couldn't grow an adapter for stdlib > Protocols. Ditto for Tornado. Similarly, the reactor interface could be > *provided* (through a fairly simple translation layer) by different > implementations, including twisted. Right. >> But Twisted Deferred is pretty arcane, and I would much >> rather not use it as the basis of a forward-looking design. I'd much >> rather see what we can mooch off PEP 3148 (Futures). > > I think this needs to be addressed in a separate mail, since more stuff has > been said about deferreds in this thread. Yes, that's in the thread with subject "The async API of the future: Twisted and Deferreds". >> > 4+ adapting the stdlib tools to using these new things >> >> We at least need to have an idea for how this could be done. We're >> talking serious rewrites of many of our most fundamental existing >> synchronous protocol libraries (e.g. httplib, email, possibly even >> io.TextWrapper), most of which have had only scant updates even >> through the Python 3 transition apart from complications to deal with >> the bytes/str dichotomy. > > I certainly agree that this is a very large amount of work. However, it has > obvious huge advantages in terms of code reuse. I'm not sure if I understand > the technical barrier though. It should be quite easy to create a blocking > API with a protocol implementation that doesn't care; just call > data_received with all your data at once, and presto! (Since transports in > general don't provide guarantees as to how bytes will arrive, existing > Twisted IProtocols have to do this already anyway, and that seems to work > fine.) Hmm... I guess that depends on how your legacy code works. As Barry mentioned somewhere, the email package's feedparser() is an attempt at implementing this -- but he sounded he has doubts that it works as-is in an async environment. However I am more worried about pull-based APIs. Take (as an extreme example) the standard stream API for reading, especially TextIOWrapper. I could see how we could turn the *writing* APIs async easily enough, but I don't see how to do it for the reading end -- you can't seriously propose to read the entire file into the buffer and then satisfy all reads from memory. >> > Re: forward path for existing asyncore code. I don't remember this being >> > raised as an issue. If anything, it was mentioned in passing, and I think >> > the answer to it was something to the tune of "asyncore's API is broken, >> > fixing it is more important than backwards compat". Essentially I agree with >> > Guido that the important part is an upgrade path to a good third-party >> > library, which is the part about asyncore that REALLY sucks right now. >> >> I have the feeling that the main reason asyncore sucks is that it >> requires you to subclass its Dispatcher class, which has a rather >> treacherous interface. > > There's at least a few others, but sure, that's an obvious one. Many of the > objections I can raise however don't matter if there's already an *existing > working solution*. I mean, sure, it can't do SSL, but if you have code that > does what you want right now, then obviously SSL isn't actually needed. I think you mean this as an indication that providing the forward path for existing asyncore apps shouldn't be rocket science, right? Sure, I don't want to worry about that, I just want to make sure that we don't *completely* paint ourselves into the wrong corner when it comes to that. >> > Regardless, an API upgrade is probably a good idea. I'm not sure if it >> > should go in the first PEP: given the separation I've outlined above (which >> > may be too spread out...), there's no obvious place to put it besides it >> > being a new PEP. >> >> Aren't all your proposals API upgrades? > > Sorry, that was incredibly poor wording. I meant something more of an > adapter: an upgrade path for existing asyncore code to new and shiny 3153 > code. Yes, now it makes sense. >> > Re base reactor interface: drawing maximally from the lessons learned in >> > twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >> > etc), asynchronous-looking name lookup, fd handling are the important >> > parts. >> >> That actually sounds more concrete than I'd like a reactor interface >> to be. In the App Engine world, there is a definite need for a >> reactor, but it cannot talk about file descriptors at all -- all I/O >> is defined in terms of RPC operations which have their own (several >> layers of) async management but still need to be plugged in to user >> code that might want to benefit from other reactor functionality such >> as scheduling and placing a call at a certain moment in the future. > > I have a hard time understanding how that would work well outside of > something like GAE. IIUC, that level of abstraction was chosen because it > made sense for GAE (and I don't disagree), but I'm not sure it makes sense > here. I think I answered this in the reactors thread -- I propose an I/O object abstraction that is not directly tied to a file descriptor, but for which a concrete implementation can be made to support file descriptors, and another to support App Engine RPC. > In this example, where would eg the select/epoll/whatever calls happen? Is > it something that calls the reactor that then in turn calls whatever? App Engine doesn't have select/epoll/whatever, so it would have a reactor implementation that doesn't use them. But the standard Unix reactor would support file descriptors using select/etc. Please respond in the reactors thread. >> > call_every can be implemented in terms of call_later on a separate object, >> > so I think it should be (eg twisted.internet.task.LoopingCall). One thing >> > that is apparently forgotten about is event loop integration. The prime way >> > of having two event loops cooperate is *NOT* "run both in parallel", it's >> > "have one call the other". Even though not all loops support this, I think >> > it's important to get this as part of the interface (raise an exception for >> > all I care if it doesn't work). >> >> This is definitely one of the things we ought to get right. My own >> thoughts are slightly (perhaps only cosmetically) different again: >> ideally each event loop would have a primitive operation to tell it to >> run for a little while, and then some other code could tie several >> event loops together. > > As an API, that's pretty close to Twisted's IReactorCore.iterate, I think. > It'd work well enough. The issue is only with event loops that don't > cooperate so well. Again, a topic for the reactor thread. But I'm really hoping you'll make good on your promise of redoing async-pep, giving some actual specifications and example code, so I can play with it. -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Sat Oct 13 01:39:21 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 13 Oct 2012 00:39:21 +0100 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On 12/10/2012 11:11pm, Guido van Rossum wrote: > Using Futures and generator coroutines, I would do it as follows. I'm > hypothesizing that for every blocking API foo() there is a > corresponding non-blocking API foo_async() with the same call > signature, and returning a Future whose result is what the synchronous > API returns (and raises what the synchronous call would raise, if > there's an error). These are the conventions I use in NDB. I'm also > inventing a @task decorator. > > @task > def view_paste_async(request, filekey): > # Create Futures -- no yields! > f1 = Pastes.objects.get_async(key=filekey) # This won't raise > f2 = loader.get_template_async('pastebin/error.html') > f3 = loader.get_template_async('pastebin/paste.html') > > try: > fileinfo= yield f1 > except DoesNotExist: > t = yield f2 > return HttpResponse(t.render(Context(dict(error='File does not > exist')))) > > f = yield open_async(fileinfo.filename) > fcontents = yield f.read_async() > t = yield f3 > return HttpResponse(t.render(Context(dict(file=fcontents)))) So would the futures be registered with the reactor as soon as they are created, or only when they are yielded? I can't see how there can be any "concurrency" if they don't start till they are yielded. It would be like doing t1 = Thread(target=f1) t2 = Thread(target=f2) t3 = Thread(target=f3) t1.start() t1.join() t2.start() t2.join() t3.start() t3.join() But if the futures are registered immediately with the reactor then does that mean there is a singleton reactor? That seems rather inflexible. Richard. From greg.ewing at canterbury.ac.nz Sat Oct 13 02:01:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Oct 2012 13:01:23 +1300 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: Message-ID: <5078AF53.3060505@canterbury.ac.nz> Guido van Rossum wrote: > - There's an abstract Reactor class and an abstract Async I/O object > class. Can we please use a better term than "reactor" for this? Its meaning is only obvious to someone familiar with Twisted. Not being such a person, it's taken me a while to figure out from this discussion that it refers to the central object implementing the event loop, and not one of the user-supplied objects that could equally well be described as "reacting" to events. Something like "dispatcher" would be clearer, IMO. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 13 02:17:28 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Oct 2012 13:17:28 +1300 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <20121012203311.4b3ee8af@pitrou.net> References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: <5078B318.4010303@canterbury.ac.nz> Antoine Pitrou wrote: > On Fri, 12 Oct 2012 11:13:23 -0700 > Guido van Rossum wrote: > >>ote that the callback is *not* a >>designated method on the I/O object! > > Why isn't it? One reason might be that it more or less forces you to subclass the I/O object, instead of just using one of a few predefined ones for file, socket, etc. Although this could be ameliorated by giving the standard I/O objects the ability to have callbacks plugged into them. Then you could use whichever style was most convenient. -- Greg From guido at python.org Sat Oct 13 02:22:07 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 17:22:07 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk wrote: > On 12/10/2012 11:11pm, Guido van Rossum wrote: >> >> Using Futures and generator coroutines, I would do it as follows. I'm >> hypothesizing that for every blocking API foo() there is a >> corresponding non-blocking API foo_async() with the same call >> signature, and returning a Future whose result is what the synchronous >> API returns (and raises what the synchronous call would raise, if >> there's an error). These are the conventions I use in NDB. I'm also >> inventing a @task decorator. >> >> @task >> def view_paste_async(request, filekey): >> # Create Futures -- no yields! >> f1 = Pastes.objects.get_async(key=filekey) # This won't raise >> f2 = loader.get_template_async('pastebin/error.html') >> f3 = loader.get_template_async('pastebin/paste.html') >> >> try: >> fileinfo= yield f1 >> except DoesNotExist: >> t = yield f2 >> return HttpResponse(t.render(Context(dict(error='File does not exist')))) >> >> f = yield open_async(fileinfo.filename) >> fcontents = yield f.read_async() >> t = yield f3 >> return HttpResponse(t.render(Context(dict(file=fcontents)))) > > > So would the futures be registered with the reactor as soon as they are > created, or only when they are yielded? I can't see how there can be any > "concurrency" if they don't start till they are yielded. It would be like > doing > > t1 = Thread(target=f1) > t2 = Thread(target=f2) > t3 = Thread(target=f3) > t1.start() > t1.join() > t2.start() > t2.join() > t3.start() > t3.join() > > But if the futures are registered immediately with the reactor then does > that mean there is a singleton reactor? That seems rather inflexible. I don't think it follows that there can only be one reactor if they are registered immediately. There could be a notion of "current reactor" maintained in thread-local context; moreover it could depend on the reactor that made the callback that caused the current task to run. The reactor could also be chosen by the code that made the Future. (Though I'm not immediately sure how that would work in the yield-from scenario -- but I'm sure there's a way.) FWIW, in NDB there is one event loop per thread; separate threads are handling separate requests and are completely independent. Also, in NDB there's some code that turns Futures into actual RPCs that runs only once there are no more immediately runnable tasks. I think that in general such behaviors are up to the reactor implementation for the platform though, and should not directly be reflected in the reactor API. -- --Guido van Rossum (python.org/~guido) From mwm at mired.org Sat Oct 13 02:26:20 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 12 Oct 2012 19:26:20 -0500 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <50788DB1.4090809@stoneleaf.us> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <50788DB1.4090809@stoneleaf.us> Message-ID: On Fri, Oct 12, 2012 at 4:37 PM, Ethan Furman wrote: > Ram Rachum wrote: >> >> Hi everybody, >> >> Today a funny thought occurred to me. Ever since I've learned to program >> when I was a child, I've taken for granted that when programming, the sign >> used for multiplication is *. But now that I think about it, why? Now that >> we have Unicode, why not use ? ? > > > Because it is too easy to confuse ? with . > > Because it is not solving a problem. > > Because it would still take work, and then easily cause confusion. Because, unlike *, it's a valid character in identifiers. Which means allowing it either breaks backwards compatibility or makes for some very confusing usage conventions. From guido at python.org Sat Oct 13 02:26:27 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 17:26:27 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <5078AF53.3060505@canterbury.ac.nz> References: <5078AF53.3060505@canterbury.ac.nz> Message-ID: On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> - There's an abstract Reactor class and an abstract Async I/O object >> class. > > Can we please use a better term than "reactor" for this? > Its meaning is only obvious to someone familiar with Twisted. > > Not being such a person, it's taken me a while to figure out > from this discussion that it refers to the central object > implementing the event loop, and not one of the user-supplied > objects that could equally well be described as "reacting" > to events. > > Something like "dispatcher" would be clearer, IMO. Sorry about that. I'm afraid it's too late for this thread's subject line, but I will try to make sure that if and when this makes it into the standard library it'll have a more appropriate name. I would recommend event loop (which is the name I naturally would give it when asked out of context) or I/O loop, which is what Tornado apparently used. Dispatcher would not be my first choice. FWIW, it's not a completely Twisted-specific term: http://en.wikipedia.org/wiki/Reactor_pattern -- --Guido van Rossum (python.org/~guido) From dreamingforward at gmail.com Sat Oct 13 02:58:29 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Fri, 12 Oct 2012 19:58:29 -0500 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum wrote: > On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam wrote: >> Here's the thing: the underlying O.S is always handling two major I/O >> channels at any given time and it needs all it's attention to do this: >> the GUI and one of the following (network, file) I/O. You can >> shuffle these around all you want, but somewhere the O.S. kernel is >> going to have to be involved, which means either portability is >> sacrificed or speed if one is going to pursue and abstract, unified >> async API. > > I'm convinced that the OS has to get involved. I'm not convinced that > it will get in the way of designing an abstract unified API -- however > that API will have to be more complicated than the kind of event loop > that *only* handles network I/O or the kind that *only* handles GUI > events. Yes, however, as suggested in my other message, there are three desires: {"cross-platform (OS) portability", "speed", "unified API"}, but you can only pick two. One of these has to be sacrificed because there are users for all of those. I think such a decision must be "deferred() "to some "Future(Python4000)" in order to succeed at making "Grand Unified Theory" for hardware/OS/python synchronization. (For the record, I do think it is possible, and indeed that is exactly what I'm working on. To make it work will require a compelling, unified object model, forwarding the art of Computer Science...) markj From guido at python.org Sat Oct 13 03:00:42 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 12 Oct 2012 18:00:42 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 5:58 PM, Mark Adam wrote: > On Fri, Oct 12, 2012 at 3:43 PM, Guido van Rossum wrote: >> On Thu, Oct 11, 2012 at 6:38 PM, Mark Adam wrote: >>> Here's the thing: the underlying O.S is always handling two major I/O >>> channels at any given time and it needs all it's attention to do this: >>> the GUI and one of the following (network, file) I/O. You can >>> shuffle these around all you want, but somewhere the O.S. kernel is >>> going to have to be involved, which means either portability is >>> sacrificed or speed if one is going to pursue and abstract, unified >>> async API. >> >> I'm convinced that the OS has to get involved. I'm not convinced that >> it will get in the way of designing an abstract unified API -- however >> that API will have to be more complicated than the kind of event loop >> that *only* handles network I/O or the kind that *only* handles GUI >> events. > > Yes, however, as suggested in my other message, there are three > desires: {"cross-platform (OS) portability", "speed", "unified API"}, > but you can only pick two. Do you have any proof for that claim? > One of these has to be sacrificed because there are users for all of those. > > I think such a decision must be "deferred() "to some > "Future(Python4000)" in order to succeed at making "Grand Unified > Theory" for hardware/OS/python synchronization. > > (For the record, I do think it is possible, and indeed that is exactly > what I'm working on. To make it work will require a compelling, > unified object model, forwarding the art of Computer Science...) That would be the topic for a new thread, please. -- --Guido van Rossum (python.org/~guido) From mikegraham at gmail.com Sat Oct 13 03:06:01 2012 From: mikegraham at gmail.com (Mike Graham) Date: Fri, 12 Oct 2012 21:06:01 -0400 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 8:58 PM, Mark Adam wrote: > there are three desires: > {"cross-platform (OS) portability", "speed", "unified API"}, > but you can only pick two. There are many tradeoffs where this is the case, but this isn't one of them. There are several systems that prove otherwise. Mike From trent at snakebite.org Sat Oct 13 03:11:20 2012 From: trent at snakebite.org (Trent Nelson) Date: Fri, 12 Oct 2012 21:11:20 -0400 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: <20121013011119.GA5503@snakebite.org> On Fri, Oct 12, 2012 at 03:49:36PM -0700, Guido van Rossum wrote: > [Responding to yet another message in the original thread] > > On Thu, Oct 11, 2012 at 9:45 PM, Trent Nelson wrote: > > It's the best way to do it. There should really be a libevent-type > > library (libiocp?) that leverages IOCP where possible, and fakes it > > when not using a half-sync/half-async pattern with threads and epoll > > or kqueue on Linux and FreeBSD, falling back to processes and poll > > on everything else (NetBSD, OpenBSD and HP-UX (the former two not > > having robust-enough pthread implementations, the latter not having > > anything better than select or poll)). > > In which category does OS X fall? Oh, how'd I forget about OS X! At the worst, it falls into the FreeBSD kqueue camp, having both a) kqueue and b) a performant pthread implementation. However, with the recent advent of Grand Central Dispatch, it's actually on par with Windows' IOCP+threadpool offerings, which is pretty cool. (And apparently there are GCD ports in the works for Solaris, Linux and... Windows?!) Will reply to the other questions in a separate response. Trent. From steve at pearwood.info Sat Oct 13 04:41:18 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Oct 2012 13:41:18 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <5078D4CE.1040201@pearwood.info> On 13/10/12 07:27, Ram Rachum wrote: > Hi everybody, > > Today a funny thought occurred to me. Ever since I've learned to program > when I was a child, I've taken for granted that when programming, the sign > used for multiplication is *. But now that I think about it, why? Now that > we have Unicode, why not use ? ? 25 or so years ago, I used to do some programming in Apple's Hypertalk language, which accepted ? in place of / for division. The use of two symbols for the same operation didn't cause any problem for users. But then Apple had the advantage that there was a single, system-wide, highly discoverable way of typing non-ASCII characters at the keyboard, and Apple users tended to pride themselves for using them. I'm not entirely sure about MIDDLE DOT though: especially in small font sizes, it falls foul of the design principle: "syntax should not look like a speck of dust on Tim's monitor" (paraphrasing... can anyone locate the original quote?) and may be too easily confused with FULL STOP. Another problem is that MIDDLE DOT is currently valid in identifiers, so that a?b would count as a single name. Fixing this would require some fairly heavy lifting (a period of deprecation and warnings for any identifier using MIDDLE DOT) before introducing it as an operator. So that's a lot of effort for very little gain. If I were designing a language from scratch today, with full Unicode support from the beginning, I would support a rich set of operators possibly even including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user to use them wisely or not at all. But I don't think it would be appropriate for Python to add them, at least not before Python 4: too much effort for too little gain. Maybe in another ten years people will be less resistant to Unicode operators. [...] > ?. People on Linux can type Alt-. . For what it is worth, I'm using Linux and that does not work for me. I am yet to find a decent method of entering non-ASCII characters. -- Steven From mwm at mired.org Sat Oct 13 05:19:29 2012 From: mwm at mired.org (Mike Meyer) Date: Fri, 12 Oct 2012 22:19:29 -0500 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <20121013011119.GA5503@snakebite.org> References: <20121012203311.4b3ee8af@pitrou.net> <20121013011119.GA5503@snakebite.org> Message-ID: <20121012221929.2642bea3@bhuda.mired.org> On Fri, 12 Oct 2012 21:11:20 -0400 Trent Nelson wrote: > However, with the recent advent of Grand Central Dispatch, it's > actually on par with Windows' IOCP+threadpool offerings, which is > pretty cool. (And apparently there are GCD ports in the works for > Solaris, Linux and... Windows?!) The port already exists for FreeBSD. As of 8.1, the kernel has enhanced kqueue support for it, and devel/libdispatch installs the GCD code. I'd be surprised if the other *BSD's haven't picked it up yet. All of which makes me think that an async library based on GCD and maybe IOCP for Windows if it's not available there would be reasonably portable. A standard Python library that made this as nice to use as it is from MacRuby would be a good thing. You can find jkh (ex FreeBSD RE, now running the OS X systems group for Apple) discussing Python and GCD here: http://stackoverflow.com/questions/7955630/parallel-processing-in-python-a-la-grand-central-dispatch http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From dreamingforward at gmail.com Sat Oct 13 05:29:46 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Fri, 12 Oct 2012 22:29:46 -0500 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 8:06 PM, Mike Graham wrote: > On Fri, Oct 12, 2012 at 8:58 PM, Mark Adam wrote: >> there are three desires: >> {"cross-platform (OS) portability", "speed", "unified API"}, >> but you can only pick two. > > There are many tradeoffs where this is the case, but this isn't one of > them. There are several systems that prove otherwise. ...several **systems**? i mean, you can accomplish such a task on a *particular* O.S. but I don't know where this is the case across *several* systems (Unix, Mac, and Windows). I would like to know of an example, if you have one? mark From bruce at leapyear.org Sat Oct 13 06:20:30 2012 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 12 Oct 2012 21:20:30 -0700 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5078D4CE.1040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: Well, I learned x as a multiplication symbol long before I learned either ? or *, and in many fonts you can barely see the middle dot. Is there a good reason, we can't just write foo x bar instead of foo * bar? If that's confusing we could use ? instead. No one would ever confuse ? and x. Or for that matter how about (~R?R?.?R)/R?1??R Seriously: learning that * means multiplication is a very small thing. You also need to learn what /, // and % do, and the difference between 'and' and &, and between =, ==, != and /=. --- Bruce On Fri, Oct 12, 2012 at 7:41 PM, Steven D'Aprano wrote: > On 13/10/12 07:27, Ram Rachum wrote: > >> Hi everybody, >> >> Today a funny thought occurred to me. Ever since I've learned to program >> when I was a child, I've taken for granted that when programming, the sign >> used for multiplication is *. But now that I think about it, why? Now that >> we have Unicode, why not use ? ? >> > t > 25 or so years ago, I used to do some programming in Apple's Hypertalk > language, which accepted ? in place of / for division. The use of two > symbols for the same operation didn't cause any problem for users. But then > Apple had the advantage that there was a single, system-wide, highly > discoverable way of typing non-ASCII characters at the keyboard, and Apple > users tended to pride themselves for using them. > > I'm not entirely sure about MIDDLE DOT though: especially in small font > sizes, > it falls foul of the design principle: > > "syntax should not look like a speck of dust on Tim's monitor" > > (paraphrasing... can anyone locate the original quote?) > > and may be too easily confused with FULL STOP. Another problem is that > MIDDLE > DOT is currently valid in identifiers, so that a?b would count as a single > name. Fixing this would require some fairly heavy lifting (a period of > deprecation and warnings for any identifier using MIDDLE DOT) before > introducing it as an operator. So that's a lot of effort for very little > gain. > > If I were designing a language from scratch today, with full Unicode > support > from the beginning, I would support a rich set of operators possibly even > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user > to use them wisely or not at all. But I don't think it would be appropriate > for Python to add them, at least not before Python 4: too much effort for > too > little gain. Maybe in another ten years people will be less resistant to > Unicode operators. > > > > [...] > > ?. People on Linux can type Alt-. . >> > > For what it is worth, I'm using Linux and that does not work for me. I am > yet to find a decent method of entering non-ASCII characters. > > > > -- > Steven > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Oct 13 06:22:43 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 13 Oct 2012 00:22:43 -0400 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <5078AF53.3060505@canterbury.ac.nz> Message-ID: On 10/12/2012 8:26 PM, Guido van Rossum wrote: > On Fri, Oct 12, 2012 at 5:01 PM, Greg Ewing wrote: >> Can we please use a better term than "reactor" for this? >> Its meaning is only obvious to someone familiar with Twisted. >> >> Not being such a person, it's taken me a while to figure out >> from this discussion that it refers to the central object >> implementing the event loop, and not one of the user-supplied >> objects that could equally well be described as "reacting" >> to events. >> >> Something like "dispatcher" would be clearer, IMO. > > Sorry about that. I'm afraid it's too late for this thread's subject > line, but I will try to make sure that if and when this makes it into > the standard library it'll have a more appropriate name. I would > recommend event loop (which is the name I naturally would give it when > asked out of context) or I/O loop, which is what Tornado apparently > used. Dispatcher would not be my first choice. > > FWIW, it's not a completely Twisted-specific term: > http://en.wikipedia.org/wiki/Reactor_pattern Thanks for the clarification. Reactors react to events within an event loop* by dispatching them to handlers. Correct? *Iteration rather than recursion is required because they continue the cycle indefinitely. I am still fuzzy on edge-triggered versus level triggered in this context, as opposed to electronics. -- Terry Jan Reedy From jeanpierreda at gmail.com Sat Oct 13 06:44:35 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 13 Oct 2012 00:44:35 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5078D4CE.1040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: On Fri, Oct 12, 2012 at 10:41 PM, Steven D'Aprano wrote: > If I were designing a language from scratch today, with full Unicode support > from the beginning, I would support a rich set of operators possibly even > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user > to use them wisely or not at all. But I don't think it would be appropriate > for Python to add them, at least not before Python 4: too much effort for > too > little gain. Maybe in another ten years people will be less resistant to > Unicode operators. Python has cleverly left the $ symbol unused. We can use it as a quasiquote to embed executable TeX. for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$): ... No need to wait for that new language, we can have a rich set of math operators today! -- Devin From glyph at twistedmatrix.com Sat Oct 13 06:46:20 2012 From: glyph at twistedmatrix.com (Glyph) Date: Fri, 12 Oct 2012 21:46:20 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit Message-ID: There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion. Would everyone interested in this please please please read several times? Especially this section: . If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation. I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer. Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore. If it needs a name, I suppose I'd call my preferred style "event triggering". Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk. -g From ben at bendarnell.com Sat Oct 13 06:52:19 2012 From: ben at bendarnell.com (Ben Darnell) Date: Fri, 12 Oct 2012 21:52:19 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: Message-ID: On Fri, Oct 12, 2012 at 11:13 AM, Guido van Rossum wrote: > [This is the first spin-off thread from "asyncore: included batteries > don't fit"] > > On Thu, Oct 11, 2012 at 5:57 PM, Ben Darnell wrote: >> On Thu, Oct 11, 2012 at 2:18 PM, Guido van Rossum wrote: >>>> Re base reactor interface: drawing maximally from the lessons learned in >>>> twisted, I think IReactorCore (start, stop, etc), IReactorTime (call later, >>>> etc), asynchronous-looking name lookup, fd handling are the important parts. >>> >>> That actually sounds more concrete than I'd like a reactor interface >>> to be. In the App Engine world, there is a definite need for a >>> reactor, but it cannot talk about file descriptors at all -- all I/O >>> is defined in terms of RPC operations which have their own (several >>> layers of) async management but still need to be plugged in to user >>> code that might want to benefit from other reactor functionality such >>> as scheduling and placing a call at a certain moment in the future. >> >> So are you thinking of something like >> reactor.add_event_listener(event_type, event_params, func)? One thing >> to keep in mind is that file descriptors are somewhat special (at >> least in a level-triggered event loop), because of the way the event >> will keep firing until the socket buffer is drained or the event is >> unregistered. I'd be inclined to keep file descriptors in the >> interface even if they just raise an error on app engine, since >> they're fairly fundamental to the (unixy) event loop. On the other >> hand, I don't have any experience with event loops outside the >> unix/network world so I don't know what other systems might need for >> their event loops. > > Hmm... This is definitely an interesting issue. I'm tempted to believe > that it is *possible* to change every level-triggered setup into an > edge-triggered setup by using an explicit loop -- but I'm not saying > it is a good idea. In practice I think we need to support both equally > well, so that the *app* can decide which paradigm to use. E.g. if I > were to implement an HTTP server, I might use level-triggered for the > "accept" call on the listening socket, but edge-triggered for > everything else. OTOH someone else might prefer a buffered stream > abstraction that just keeps filling its read buffer (and draining its > write buffer) using level-triggered callbacks, at least up to a > certain buffer size -- we have to be robust here and make it > impossible for an evil client to fill up all our memory without our > approval! First of all, to clear up the terminology, edge-triggered actually has a specific meaning in this context that is separate from the question of whether callbacks are used more than once. The edge- vs level-triggered question is moot with one-shot callbacks, but when you're reusing callbacks in edge-triggered mode you won't get a second call until you've drained the socket buffer and then it becomes readable again. This turns out to be helpful for hybrid event/threaded systems, since the network thread may go into the next iteration of its loop while the worker thread is still consuming the data from a previous event. You can't always emulate edge-triggered behavior since it needs knowledge of internal socket buffers (epoll has an edge-triggered mode and I think kqueue does too, but you can't get edge-triggered behavior if you're falling back to select()). However, you can easily get one-shot callbacks from an event loop with persistent callbacks just by unregistering the callback once it has received an event. This has a performance cost, though - in tornado we try to avoid unnecessary unregister/register pairs. > > I'm not at all familiar with the Twisted reactor interface. My own > design would be along the following lines: > > - There's an abstract Reactor class and an abstract Async I/O object > class. To get a reactor to call you back, you must give it an I/O > object, a callback, and maybe some more stuff. (I have gone back and > like passing optional args for the callback, rather than requiring > lambdas to create closures.) Note that the callback is *not* a > designated method on the I/O object! In order to distinguish between > edge-triggered and level-triggered, you just use a different reactor > method. There could also be a reactor method to schedule a "bare" > callback, either after some delay, or immediately (maybe with a given > priority), although such functionality could also be implemented > through magic I/O objects. One reason to have a distinct method for running a bare callback is that you need to have some thread-safe entry point, but you otherwise don't really want locking on all the internal methods. Tornado's IOLoop.add_callback and Twisted's Reactor.callFromThread can be used to run code in the IOLoop's thread (which can then call the other IOLoop methods). We also have distinct methods for running a callback after a timeout, although if you had a variant of add_handler that didn't require a subsequent call to remove_handler you could probably do timeouts using a magical IO object. (an additional subtlety for the time-based methods is how time is computed. I recently added support in tornado to optionally use time.monotonic instead of time.time) > > - In systems supporting file descriptors, there's a reactor > implementation that knows how to use select/poll/etc., and there are > concrete I/O object classes that wrap file descriptors. On Windows, > those would only be socket file descriptors. On Unix, any file > descriptor would do. To create such an I/O object you would use a > platform-specific factory. There would be specialized factories to > create e.g. listening sockets, connections, files, pipes, and so on. > Jython is another interesting case - it has a select() function that doesn't take integer file descriptors, just the opaque objects returned by socket.fileno(). While it's convenient to have higher-level constructors for various specialized types, I'd like to emphasize that having the low-level interface is important for interoperability. Tornado doesn't know whether the file descriptors are listening sockets, connected sockets, or pipes, so we'd just have to pass in a file descriptor with no other information. > - In systems like App Engine that don't support async I/O on file > descriptors at all, the constructors for creating I/O objects for disk > files and connection sockets would comply with the interface but fake > out almost everything (just like today, using httplib or httplib2 on > App Engine works by adapting them to a "urlfetch" RPC request). Why would you be allowed to make IO objects for sockets that don't work? I would expect that to just raise an exception. On app engine RPCs would be the only supported async I/O objects (and timers, if those are implemented as magic I/O objects), and they're not implemented in terms of sockets or files. -Ben From greg.ewing at canterbury.ac.nz Sat Oct 13 07:05:53 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Oct 2012 18:05:53 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <5078F6B1.2030309@canterbury.ac.nz> Guido van Rossum wrote: > But the fact remains that you can't completely hide these yields -- > the best you can do is replace them with a single yield-from. Yes, as things stand, a call to a sub-generator is always going to look different from an ordinary call, all the way up the call chain. I regard that as a wart remaining to be fixed, although opinions seem to differ. I do think it's a bit unfortunate that 'yield from' contains the word 'yield', though, since in this context it's best thought of as a kind of function call rather than a kind of yield. >>>This seems to be begging to be collapsed into a single line, e.g. >>> >>> data = yield sock.recv_async(1024) > > >>I'm not sure how you're imagining that would work, but whatever >>it is, it's wrong -- that just doesn't make sense. > > It makes a lot of sense in a world using > Futures and a Future-aware trampoline/scheduler, instead of yield-from > and bare generators. I can see however that you don't like it in the > yield-from world you're envisioning I don't like it because, to my mind, Futures et al are kludgy workarounds for not having something like yield-from. Now that we do, we shouldn't need them any more. I can see the desirability of being able to interoperate with existing code that uses them, but I'm not convinced that building awareness of them into the guts of the scheduler is the best way to go about it. Why Futures in particular? What if someone wants to use Deferreds instead, or some other similar thing? At some point you need to build adapters. I'd rather see Futures treated on an equal footing with the others, and dealt with by building on the primitive facilities provided by the scheduler. > But the only use for send() on a generator is when using it as a > coroutine for a concurrent tasks system... And you're claiming, it seems, > that you prefer yield-from for concurrent tasks. The particular technique of using send() to supply a return value for a simulated sub-generator call is made obsolete by yield-from. I can't rule out the possibility that there may be other uses for send() in a concurrent task system. I just haven't found the need for it in any of the examples I've developed so far. > I feel that "value = yield Future>" is quite a good paradigm, I feel that it shouldn't be *necessary* to yield any kind of special object in order to suspend a task; just a simple 'yield' should be sufficient. It might make sense to allow this as an *option* for the purpose of interoperating with existing async code. But I would much rather the public API for this was something like value = yield from wait_for_future(a_future) leaving it up to the implementation whether this is achieved by yielding the Future or by some other means. Then we can also have wait_for_deferred(), etc., without giving any one of them special status. > One is what to do with operations directly > implemented in C. It would be horrible to require C to create a fake > generator. Fortunately an > iterator whose final __next__() raises StopIteration() works in > the latest Python 3.3 Well, such an iterator *is* a "fake generator" in all the respects that the scheduler cares about. Especially if the scheduler doesn't rely on send(), so your C object doesn't have to implement a send() method. :-) > Well, I'm talking about a decorator that you *always* apply, and which > does nothing (or very little) when wrapping a generator, but adds > generator behavior when wrapping a non-generator function. As long as it's optional, I wouldn't object to the existence of such a decorator, although I would probably choose not to use it most of the time. I would object if it was *required* to make things work properly, because I would worry that this was a symptom of unnecessary complication and inefficiency in the underlying machinery. > (6) Spawning off multiple async subtasks > > Futures: > f1 = subtask1(args1) # Note: no yield!!! > f2 = subtask2(args2) > res1, res2 = yield f1, f2 > > Yield-from: > ?????????? > > *** Greg, can you come up with a good idiom to spell concurrency at > this level? Your example only has concurrency in the philosophers > example, but it appears to interact directly with the scheduler, and > the philosophers don't return values. *** I don't regard the need to interact directly with the scheduler as a problem. That's because in the world I envisage, there would only be *one* scheduler, for much the same reason that there can really only be one async event handling loop in any given program. It would be part of the standard library and have a well-known API that everyone uses. If you don't want things to be that way, then maybe this is a good use for yielding things to the scheduler. Yielding a generator could mean "spawn this as a concurrent task". You could go further and say that yielding a tuple of generators means to spawn them all concurrently, wait for them all to complete and send back a tuple of the results. The yield-from code would then look pretty much the same as the futures code. However, I'm inclined to think that this is too much functionality to build directly into the scheduler, and that it would be better provided by a class or function that builds on more primitive facilities. So it would look something like Yield-from: task1 = subtask1(args1) task2 = subtask2(args2) res1, res2 = yield from par(task1, task2) where the implementation of par() is left as an exercise for the reader. > (7) Checking whether an operation is already complete > > Futures: > if f.done(): ... I'm inclined to think that this is not something the scheduler needs to be directly concerned with. If it's important for one task to know when another task is completed, it's up to those tasks to agree on a way of communicating that information between them. Although... is there a way to non-destructively test whether a generator is exhausted? If so, this could easily be provided as a scheduler primitive. > (8) Getting the result of an operation multiple times > > Futures: > > f = async_op(args) > # squirrel away a reference to f somewhere else > r = yield f > # ... later, elsewhere > r = f.result() Is this really a big deal? What's wrong with having to store the return value away somewhere if you want to use it multiple times? > (9) Canceling an operation > > Futures: > f.cancel() This would be another scheduler primitive. Yield-from: cancel(task) This would remove the task from the ready list or whatever queue it's blocked on, and probably throw an exception into it to give it a chance to clean up. > (10) Registering additional callbacks > > Futures: > f.add_done_callback(callback) Another candidate for a higher-level facility, I think. The API might look something like Yield-from: cbt = task_with_callbacks(task) cbt.add_callback(callback) yield from cbt.run() I may have a go at coming up with implementations for some of these things and send them in later posts. -- Greg From ben at bendarnell.com Sat Oct 13 07:26:46 2012 From: ben at bendarnell.com (Ben Darnell) Date: Fri, 12 Oct 2012 22:26:46 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Fri, Oct 12, 2012 at 4:39 PM, Richard Oudkerk wrote: >> @task >> def view_paste_async(request, filekey): >> # Create Futures -- no yields! >> f1 = Pastes.objects.get_async(key=filekey) # This won't raise >> f2 = loader.get_template_async('pastebin/error.html') >> f3 = loader.get_template_async('pastebin/paste.html') >> >> try: >> fileinfo= yield f1 >> except DoesNotExist: >> t = yield f2 >> return HttpResponse(t.render(Context(dict(error='File does not >> exist')))) >> >> f = yield open_async(fileinfo.filename) >> fcontents = yield f.read_async() >> t = yield f3 >> return HttpResponse(t.render(Context(dict(file=fcontents)))) > > > So would the futures be registered with the reactor as soon as they are > created, or only when they are yielded? I can't see how there can be any > "concurrency" if they don't start till they are yielded. It would be like > doing The Futures are not what is doing the work here, they just hold the result. In this example the get_async() functions register something with the reactor when they are called. When that "something" is done (or perhaps after several "somethings" chained together), get_async will set a result on its Future. > But if the futures are registered immediately with the reactor then does > that mean there is a singleton reactor? That seems rather inflexible. In most event-driven systems there is a global (or thread-local) event loop, but it's also possible to pass one in explicitly to get_async(). -Ben From greg.ewing at canterbury.ac.nz Sat Oct 13 07:37:31 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Oct 2012 18:37:31 +1300 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <5078FE1B.4090701@canterbury.ac.nz> Ram Rachum wrote: > I could say that for newbies it's one small > confusion that could removed from the language. You and I have been > programming for a long time so we take it for granted that * means > multiplication, but for any other person that's just another > weird idiosyncrasy that further alienates programming. Do you have any evidence that a substantial number of beginners are confused by * for multiplication, or that they have trouble remembering what it means once they've been told? If you do, is there further evidence that they would find a dot to be any clearer? The use of a raised dot to indicate multiplication of numbers is actually quite rare even in mathematics, and I would not expect anyone without a mathematical background to even be aware of it. In primary school we're taught that 'x' means multiplication. Later when we come to algebra, we're taught not to use any symbol at all, just write things next to each other. A dot is only used in rare cases where there would otherwise be ambiguity -- and even then it's often preferred to parenthesise things instead. And don't forget there's great potential for confusion with the decimal point. -- Greg From solipsis at pitrou.net Sat Oct 13 08:14:45 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Oct 2012 08:14:45 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds References: Message-ID: <20121013081445.40d6d78f@pitrou.net> On Fri, 12 Oct 2012 15:11:54 -0700 Guido van Rossum wrote: > > > 2. Method dispatch callbacks: > > > > Similar to the above, the reactor or somebody has a handle on your > > object, and calls methods that you've defined when events happen > > e.g. IProtocol's dataReceived method > > While I'm sure it's expedient and captures certain common patterns > well, I like this the least of all -- calling fixed methods on an > object sounds like a step back; it smells of the old Java way (before > it had some equivalent of anonymous functions), and of asyncore, which > (nearly) everybody agrees is kind of bad due to its insistence that > you subclass its classes. (Notice how subclassing as the prevalent > approach to structuring your code has gotten into a lot of discredit > since 1996.) But how would you write a dataReceived equivalent then? Would you have a "task" looping on a read() call, e.g. @task def my_protocol_main_loop(conn): while : try: data = yield conn.read(1024) except ConnectionError: conn.close() break I'm not sure I understand the problem with subclassing. It works fine in Twisted. Even in Python 3 we don't shy away from subclassing, for example the IO stack is based on subclassing RawIOBase, BufferedIOBase, etc. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From greg.ewing at canterbury.ac.nz Sat Oct 13 08:44:48 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 13 Oct 2012 19:44:48 +1300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> Message-ID: <50790DE0.7010207@canterbury.ac.nz> Joshua Landau wrote: > '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag' Homeo, Homeo, wherefore path thou Homeo? -- Greg From ncoghlan at gmail.com Sat Oct 13 09:41:29 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Oct 2012 17:41:29 +1000 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <507884F0.2060608@stoneleaf.us> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> Message-ID: On Sat, Oct 13, 2012 at 7:00 AM, Ethan Furman wrote: > My point about the Path(...(str(...))) sandwich still applies, though, for > every function that isn't built in to Path. :) It's the same situation we were in with the design of the new ipaddress module, and the answer is the same: implicit coercion just creates way too many opportunities for errors to pass silently. We had to create a backwards incompatible version of the language to eliminate the semantic confusion between binary data and text data, we're not going to introduce a similar confusion between arbitrary text strings and objects that actually behave like filesystem paths. str has a *big* API, and much of it doesn't make any sense in the particular case of path objects. In particular, path objects shouldn't be iterable, because it isn't clear what iteration should mean: it could be path segments, it could be parent paths, or it could be directory contents. It definitely *shouldn't* be individual characters, but that's what we would get if it inherited from strings. I do like the idea of introducing a "filesystem path" protocol though (and Antoine's already considering that), which would give us the implicit interoperability without the inheritance of an overbroad API. Something else I've been thinking about is that it still feels wrong to me to be making the Windows vs Posix behavioural decision at the class level. It really feels more like a "decimal.Context" style API would be more appropriate, where there was a PathContext that determined how various operations on paths behaved. The default context would then be determined by the current OS, but you could write: with pathlib.PosixContext: # "\" is not a directory separator # "/" is used in representations # Comparison is case sensitive # expanduser() uses posix rules with pathlib.WindowsContext: # "\" and "/" are directory separators # "\" is used in representations # Comparison is case insensitive Contexts could be tweaked for desired behaviour (e.g. using "/" in representations on Windows) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Oct 13 09:59:53 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 13 Oct 2012 17:59:53 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <5078F6B1.2030309@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sat, Oct 13, 2012 at 3:05 PM, Greg Ewing wrote: > Although... is there a way to non-destructively test whether > a generator is exhausted? If so, this could easily be provided > as a scheduler primitive. Yes. Take a look at inspect.getgeneratorstate in 3.2+ (previously, implementations weren't *required* to provide that introspection capability, but now they do in order to support this function in the inspect module). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ubershmekel at gmail.com Sat Oct 13 10:05:34 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 13 Oct 2012 10:05:34 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: On Oct 13, 2012 6:45 AM, "Devin Jeanpierre" wrote: > > On Fri, Oct 12, 2012 at 10:41 PM, Steven D'Aprano wrote: > > If I were designing a language from scratch today, with full Unicode support > > from the beginning, I would support a rich set of operators possibly even > > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user > > to use them wisely or not at all. But I don't think it would be appropriate > > for Python to add them, at least not before Python 4: too much effort for > > too > > little gain. Maybe in another ten years people will be less resistant to > > Unicode operators. > > Python has cleverly left the $ symbol unused. > > We can use it as a quasiquote to embed executable TeX. > > for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$): > ... > I hope this was in jest because that line of TeX for general programming made my eyes bleed. A PEP for defining operators sounds interesting for 4.0 indeed. Though it might be messy to allow a module to meddle with the python syntax. Perhaps instead I would like it if all operators were objects with e.g. special __infix__ methods. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sat Oct 13 10:18:12 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 13 Oct 2012 19:18:12 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: <507923C4.8040201@pearwood.info> On 13/10/12 19:05, Yuval Greenfield wrote: > A PEP for defining operators sounds interesting for 4.0 indeed. Though it > might be messy to allow a module to meddle with the python syntax. You mean more than classes already do? :) > Perhaps instead I would like it if all operators were objects with e.g. > special __infix__ methods. I believe that Haskell treats operators as if they were function objects, so you could do something like: negative_values = map(-, values) but I think that puts the emphasis on the wrong thing. If (and that's a big if) we did something like this, it should be a pair of methods __op__ and the right-hand version __rop__ which get called on the *operands*, not the operator/function object: def __op__(self, other, symbol) -- Steven From solipsis at pitrou.net Sat Oct 13 10:22:04 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Oct 2012 10:22:04 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> Message-ID: <20121013102204.7b55dc53@pitrou.net> On Sat, 13 Oct 2012 17:41:29 +1000 Nick Coghlan wrote: > > Something else I've been thinking about is that it still feels wrong > to me to be making the Windows vs Posix behavioural decision at the > class level. It really feels more like a "decimal.Context" style API > would be more appropriate, where there was a PathContext that > determined how various operations on paths behaved. The default > context would then be determined by the current OS, but you could > write: > > with pathlib.PosixContext: > # "\" is not a directory separator > # "/" is used in representations > # Comparison is case sensitive > # expanduser() uses posix rules > > with pathlib.WindowsContext: > # "\" and "/" are directory separators > # "\" is used in representations > # Comparison is case insensitive :-/ You could make an argument that the Path classes could have their behaviour tweaked with such a context system, but I really think explicit classes for different path flavours are much better design than some thread-local context hackery. Personally, I consider thread-local contexts to be an anti-pattern. (also, the idea that a POSIX path becomes a Windows path based on which "with" statement it's used inside sounds scary) Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From storchaka at gmail.com Sat Oct 13 10:33:13 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 13 Oct 2012 11:33:13 +0300 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5078D4CE.1040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: On 13.10.12 05:41, Steven D'Aprano wrote: > If I were designing a language from scratch today, with full Unicode > support > from the beginning, I would support a rich set of operators possibly even > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user > to use them wisely or not at all. But they are a different operators. (1, 2, 3)?(6, 5, 4) = 28 (1, 2, 3)?(6, 5, 4) = (-7, 14, -7) From ubershmekel at gmail.com Sat Oct 13 11:15:10 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 13 Oct 2012 11:15:10 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507923C4.8040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> Message-ID: On Sat, Oct 13, 2012 at 10:18 AM, Steven D'Aprano wrote: > [..] > but I think that puts the emphasis on the wrong thing. If (and that's a big > if) we did something like this, it should be a pair of methods __op__ and > the right-hand version __rop__ which get called on the *operands*, not the > operator/function object: > > def __op__(self, other, symbol) > > I thought the operator should have a say in how it operates, e.g. the operater `dot` could call __dot__ in its operands. class Vector: def _dot(self, other): return sum([i * j for i, j in zip(self, other)]) class dot(operator): def __infix__(self, left, right): return left._dot(left, right) >>>Vector([1,2,3]) dot Vector([3,4,5]) 26 Making the declaration and import of operators more explicit than the `def __op__(self, other, symbol)` version. We could put [/, *, ., //, etc...] in __builtins__ Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Oct 13 11:18:26 2012 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 13 Oct 2012 20:18:26 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507923C4.8040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> Message-ID: On Sat, Oct 13, 2012 at 7:18 PM, Steven D'Aprano wrote: > On 13/10/12 19:05, Yuval Greenfield wrote: > >> A PEP for defining operators sounds interesting for 4.0 indeed. Though it >> might be messy to allow a module to meddle with the python syntax. > > You mean more than classes already do? :) Yes, more than classes already do. You could completely redefine Python into another language. Here, I wrote a program. It uses the letter d as an infix operator that means "sum N random numbers up to M". You know the language, it's Python same as you work with all the time! Oh, but I don't use + for addition, I use $, and # is my "turn tuple into dictionary" operator, and I use parentheses as a sort of C-style ternary operator. But it's still Python, so you should be able to read and understand the code, right? I actually wrote up a language design spec to highlight what would happen if this sort of thing were possible. And the writing of that spec was what demonstrated to me how fundamentally BAD the idea was. http://rosuav.com/1/?id=683 It could certainly be done. All you need to do is make abuttal of three objects into second_object.__infix__(first_object, third_object) and then handle the mess of prefix and postfix objects. I just don't recommend ever doing it. ChrisA From cs at zip.com.au Sat Oct 13 06:27:28 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 13 Oct 2012 15:27:28 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <20121013042727.GA1964@cskk.homeip.net> On 12Oct2012 13:27, Ram Rachum wrote: | Today a funny thought occurred to me. Ever since I've learned to program | when I was a child, I've taken for granted that when programming, the sign | used for multiplication is *. But now that I think about it, why? Now that | we have Unicode, why not use ? ? Because it looks astonishingly like ".". Reason enough to avoid it altogether, for any purpose, in a language that uses "." quite a like, as Python does. A big -100 from me. Besides, "*" works well and has a long history as multiplication in many languages. This isn't broken. As a child, I was taught "x" (that's intened as a small cross diagonally oriented, not the letter I've used here) for multiplication. Let's support that too! It also looks like another character (specifically, a lot like the letter "x"). Seriously, I think this is a bad idea on a readability/usability basis, and an unnecessary idea from a functional point of view - it adds noting not already there and mucks with the "one obvious way to do it" notion into the bargain. Cheers, -- Cameron Simpson Climber: "I don't know, I can't see the next bolt." Belayer: "Remember X, when in doubt, run it out." This should be read with a good Birmingham accent, something like "Remember 'oids, win in dowt, roon it owt" From shibturn at gmail.com Sat Oct 13 11:30:16 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 13 Oct 2012 10:30:16 +0100 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On 13/10/2012 1:22am, Guido van Rossum wrote: > I don't think it follows that there can only be one reactor if they > are registered immediately. There could be a notion of "current > reactor" maintained in thread-local context; moreover it could depend > on the reactor that made the callback that caused the current task to > run. The reactor could also be chosen by the code that made the > Future. (Though I'm not immediately sure how that would work in the > yield-from scenario -- but I'm sure there's a way.) Alternatively, yielding a future (or whatever ones calls the objects returned by *_async()) could register *and* wait for the result. To register without waiting one would yield a wrapper for the future. So one could write result = yield foo_async(...) or f = yield Register(foo_async()) # do some other work result = yield f Richard From masklinn at masklinn.net Sat Oct 13 11:32:24 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 13 Oct 2012 11:32:24 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507923C4.8040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> Message-ID: <8F2E03E0-EC1B-4195-B5BC-831183ACE924@masklinn.net> On 2012-10-13, at 10:18 , Steven D'Aprano wrote: >> Perhaps instead I would like it if all operators were objects with e.g. >> special __infix__ methods. > > I believe that Haskell treats operators as if they were function objects That is correct for binary operators. The unary minus is (currently) a keyword and sugar for the negate function[0]. So `map (-) values` is not going to negate all values, it's going to partially apply the binary `(-)` to all values. > but I think that puts the emphasis on the wrong thing. I'm not sure I understand that, what does it put the emphasis on? Note that these operators ? when generic ? tend to live in typeclasses, so the actual implementation of the behavior of the operator for the set of its arguments is defined where and when the corresponding typeclass instance is created. This is essentially how Python's own operators (and some builtins e.g. ``divmod`` or ``pow``) work (except Haskell doesn't have a reflected operands fallback) [0] http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelude.html#v:negate From masklinn at masklinn.net Sat Oct 13 11:34:11 2012 From: masklinn at masklinn.net (Masklinn) Date: Sat, 13 Oct 2012 11:34:11 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <50788DB1.4090809@stoneleaf.us> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <50788DB1.4090809@stoneleaf.us> Message-ID: On 2012-10-12, at 23:37 , Ethan Furman wrote: > > In college we dropped the ? and just wrote stuff like: > > (x + z)(x - y) > > but we can't do that in Python because they are function calls. Numbers could be callable with __call__ aliasing to a multiplication? From breamoreboy at yahoo.co.uk Sat Oct 13 11:51:52 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 13 Oct 2012 10:51:52 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121013042727.GA1964@cskk.homeip.net> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <20121013042727.GA1964@cskk.homeip.net> Message-ID: On 13/10/2012 05:27, Cameron Simpson wrote: > > As a child, I was taught "x" (that's intened as a small cross diagonally > oriented, not the letter I've used here) for multiplication. Let's > support that too! It also looks like another character (specifically, a > lot like the letter "x"). > > Cheers, > Another problem with "x" is actually writing it out correctly on your coding sheets for the data preparation team. IIRC Hagar the Horrible had an issue with this as he couldn't get the lines to cross. -- Cheers. Mark Lawrence. From solipsis at pitrou.net Sat Oct 13 12:06:34 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Oct 2012 12:06:34 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> Message-ID: <1350122794.3365.8.camel@localhost.localdomain> Le samedi 13 octobre 2012 ? 19:47 +1000, Nick Coghlan a ?crit : > The problem is that "Windows path" and "Posix path" aren't really > accurate. There are a bunch of degrees of freedom, which is *exactly* > the problem the context pattern is designed to deal with without a > combinatorial explosion of different types or mixins. > > The "how is the string format determined?" aspect could be handled > with separate methods, but how do you do case insensitive comparisons > of paths on posix systems? The question is: why do you want to do that? I know there are a limited bunch of special cases where Posix filesystem paths may be case-insensitive, but nobody really cares about them today, and I don't expect many people to bother tomorrow. Playing with individual parameters of path semantics sounds like a theoretical bother more than a practical one. A possibility would be to expose the Flavour classes, which until now are an internal implementation detail. That would first imply better defining their API, though. Then people could write e.g.: class PosixCaseInsensitiveFlavour(pathlib.PosixFlavour): case_sensitive = False class MyPath(pathlib.PosixPath): flavour = PosixCaseInsensitiveFlavour() But I would consider it extra icing on the cake, not a requirement for a Path API. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From itamar at futurefoundries.com Sat Oct 13 12:52:54 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Sat, 13 Oct 2012 06:52:54 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds Message-ID: (Sorry if this doesn't end up in the right thread in mail clients; I've been reading this through a web UI and only just formally subscribed so can't reply directly to the correct email.) Code that uses generators is indeed often easier to read... but the problem is that this isn't just a difference in syntax, it has a significant semantic impact. Specifically, requiring yield means that you're re-introducing context switching. In inlineCallbacks, or coroutines, or any system that use yield as in your example above, arbitrary code may run during the context switch, and who knows what happened to the state of the world in the interim. True, it's an explicit context switch, unlike threading where it can happen at any point, but it's still a context switch, and it still increases the chance of race conditions and all the other problems threading has. (If you're omitting yield it's even worse, since you can't even tell anymore where the context switches are happening.) Superficially such code is simpler (and in some cases I'm happy to use inlineCallbacks, in particular in unit tests), but much the same way threaded code is "simpler". If you're not very very careful, it'll work 99 times and break mysteriously the 100th. For example, consider the following code; silly, but buggy due to the context switch in yield allowing race conditions if any other code modifies counter.value while getResult() is waiting for a result. def addToCounter(): counter.value = counter.value + (yield getResult()) In a Deferred callback, on the other hand, you know the only things that are going to run are functions you call. In so far as it's possible, what happens is under control of one function only. Less pretty, but no potential race conditions: def add(result): counter.value = counter.value + result getResult().addCallback(add) That being said, perhaps some changes to Python syntax could solve this; Allen Short ( http://washort.twistedmatrix.com/2012/10/coroutines-reduce-readability.html) claims to have a proposal, hopefully he'll post it soon. -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sat Oct 13 13:05:08 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 13 Oct 2012 13:05:08 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: In addition to the issue mentioned by Itamar, there needs to be a clear way to do two related things: 1) actually doing things asynchronously! A good example of where this happens for me is stats logging. I log some stats, but I don't want to wait for the request to be completed before I continue on with my work: def callback(): logSomeStats() return actuallyDoWorkCustomerCaresAbout() logSomeStats returns a deferred, and I probably would attach an errback to that deferred, but I don't want to wait until I've finished logging some stats to do the rest of the work, and I CERTAINLY don't want the work the customer cares about to bomb out because my stats server is down. In current inlineCallbacks, this is equally simple: I just run the expression and *not* yield. If I understand the current alternative suggestions correctly, the yielding part is important for actually hooking up the IO (whereas in @inlineCallbacks, it *only* does callback management). Perhaps I am mistaken in this belief? 2) doing multiple things concurrently. Let's say I want to download 10 web pages and do something when all ten of them have completed. In twisted, I can say: gatherResults(map(getPage, urls)).addCallback(...) with inlineCallbacks, you can do quite similar things (just yield the result of gatherResults, since that's a deferred that'll fire once all of them have fired): for body in (yield gatherResults(map(getPage, urls)): .... --- How would these two look in a world where the generator/inlineCallbacks magic isn't generator backed? cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From michelelacchia at gmail.com Sat Oct 13 14:04:56 2012 From: michelelacchia at gmail.com (Michele Lacchia) Date: Sat, 13 Oct 2012 05:04:56 -0700 (PDT) Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <50790DE0.7010207@canterbury.ac.nz> References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> <50790DE0.7010207@canterbury.ac.nz> Message-ID: <5a0c9852-bb3f-40a0-8b86-060c1138f372@googlegroups.com> > > > '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag' > > > Homeo, Homeo, wherefore path thou Homeo? > > -- > Greg > lol I just had to +1 on this one!! Congrats! -------------- next part -------------- An HTML attachment was scrubbed... URL: From tismer at stackless.com Sat Oct 13 13:42:32 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 13 Oct 2012 13:42:32 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> <20121007002402.43472817@pitrou.net> <20121007120931.09c12ec4@pitrou.net> Message-ID: <507953A8.5050902@stackless.com> Hi Guido and folks, On 07.10.12 17:04, Guido van Rossum wrote: > On Sun, Oct 7, 2012 at 3:09 AM, Antoine Pitrou wrote: >> On Sat, 6 Oct 2012 17:23:48 -0700 >> Guido van Rossum wrote: >>> On Sat, Oct 6, 2012 at 3:24 PM, Antoine Pitrou wrote: >>>> greenlets/gevents only get you half the advantages of single-threaded >>>> "async" programming: they get you scalability in the face of a high >>>> number of concurrent connections, but they don't get you the robustness >>>> of cooperative multithreading (because it's not obvious when reading >>>> the code where the possible thread-switching points are). >>> I used to think that too, long ago, until I discovered that as you add >>> abstraction layers, cooperative multithreading is untenable -- sooner >>> or later you will lose track of where the threads are switched. >> Even with an explicit notation like "yield" / "yield from"? > If you strictly adhere to using those you should be safe (though > distinguishing between the two may prove challenging) -- but in > practice it's hard to get everyone and every API to use this style. So > you'll have some blocking API calls hidden deep inside what looks like > a perfectly innocent call to some helper function. > > IIUC in Go this is solved by mixing threads and lighter-weight > constructs (say, greenlets) -- if a greenlet gets blocked for I/O, the > rest of the system continues to make progress by spawning another > thread. > > My own experience with NDB is that it's just too hard to make everyone > use the async APIs all the time -- so I gave up and made async APIs an > optional feature, offering a blocking and an async version of every > API. I didn't start out that way, but once I started writing > documentation aimed at unsophisticated users, I realized that it was > just too much of an uphill battle to bother. > > So I think it's better to accept this and deal with it, possibly > adding locking primitives into the mix that work well with the rest of > the framework. Building a lock out of a tasklet-based (i.e. > non-threading) Future class is easy enough. I'm digging in, a bit late. Still trying to read the myriad of messages. For now just a word: Guido: How much I would love to use your time machine and invite you to discuss Pythons future in 1998. Then we would have tossed greenlet/stackless and all that crap. Entering a different context could have been folded deeply into Python, by making it able to pickle program state in certain positions. Just dreaming out loud :-) It is great that this discussion is taking place, and I'll try to help. cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Sat Oct 13 15:11:43 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 13 Oct 2012 15:11:43 +0200 Subject: [Python-ideas] Cofunctions PEP - Revision 4 In-Reply-To: <4C65EFC5.4080100@canterbury.ac.nz> References: <4C625949.1060803@canterbury.ac.nz> <4C632F88.9070405@canterbury.ac.nz> <4C639558.5020602@canterbury.ac.nz> <4C64A96B.1030808@canterbury.ac.nz> <4C652660.5010907@egenix.com> <4C65EFC5.4080100@canterbury.ac.nz> Message-ID: <5079688F.9040709@stackless.com> Hi Greg, digged this thing up while looking into the current async discussion. On 14.08.10 03:22, Greg Ewing wrote: > M.-A. Lemburg wrote: > >> Greg Ewing wrote: > >>> In an application that requires thousands of small, cooperating >>> processes, > >> Sure, and those use Stackless to solve the problem, which IMHO >> provides a much more Pythonic approach to these things. > > At the expense of using a non-standard Python installation, > though. I'm trying to design something that can be incorporated > into standard Python and work without requiring any deep > black magic. Guido has so far rejected any idea of merging > Stackless into CPython. > > Also I gather that Stackless works by copying pieces of > C stack around, which is probably more lightweight than using > an OS thread, but not as light as it could be. > So, here I need to correct a bit. What you are describing is the behavior of stackless 2.0, also what the greenlet does (and eventlet then too for now). The main thing that makes stackless 3.x so difficult _is_ that it is as efficient as can be, because no stack slicing is done, for 90 % of all code. Stackless uses operations to unwind the C stack in most cases. If this were possible in _all_ cases, then all the stack copying would go away, and we had no machine code at all! But the necessary change to Python would be quite heavy, undoable for a small team. I have left these ideas long time ago and did other projects. But maybe things should be considered again, after the world has changed so much. Maybe Python 4 could be decoupled from the C stack. cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From mwm at mired.org Sat Oct 13 17:22:29 2012 From: mwm at mired.org (Mike Meyer) Date: Sat, 13 Oct 2012 10:22:29 -0500 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507923C4.8040201@pearwood.info> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> Message-ID: <20121013102229.259572ad@bhuda.mired.org> On Sat, 13 Oct 2012 19:18:12 +1100 Steven D'Aprano wrote: > On 13/10/12 19:05, Yuval Greenfield wrote: > I believe that Haskell treats operators as if they were function objects, > so you could do something like: For the record, Haskell allows operators to be used as functions by quoting them in ()'s (to provide the functionality of operator) and to turn functions into operators by quoting them in ``'s. > negative_values = map(-, values) > > but I think that puts the emphasis on the wrong thing. If (and that's a big > if) we did something like this, it should be a pair of methods __op__ and > the right-hand version __rop__ which get called on the *operands*, not the > operator/function object: > > def __op__(self, other, symbol) Yeah, but then your function has to dispatch for *all* operators. Depending on how we handle backwards compatibility with __add__ et. al. I'd rather slice it the other way (leveraging $ being unsused): def __$__(self, other, right): so it only has to dispatch on left/right invocation. must match a new grammer symbol "operator_symbol", with limits on it to for readability reasons: say at most three characters, all coming from an appropriate unicode class or classes (you want to catch the current operators and dollar sign). Both of these leave both operator precedence and backwards compatibility to be dealt with. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From bauertomer at gmail.com Sat Oct 13 17:29:54 2012 From: bauertomer at gmail.com (T.B.) Date: Sat, 13 Oct 2012 17:29:54 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <1350122794.3365.8.camel@localhost.localdomain> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> Message-ID: <507988F2.2070108@gmail.com> On 2012-10-13 12:06, Antoine Pitrou wrote: > Le samedi 13 octobre 2012 ? 19:47 +1000, Nick Coghlan a ?crit : >> The problem is that "Windows path" and "Posix path" aren't really >> accurate. There are a bunch of degrees of freedom, which is *exactly* >> the problem the context pattern is designed to deal with without a >> combinatorial explosion of different types or mixins. >> >> The "how is the string format determined?" aspect could be handled >> with separate methods, but how do you do case insensitive comparisons >> of paths on posix systems? > > The question is: why do you want to do that? > I know there are a limited bunch of special cases where Posix filesystem > paths may be case-insensitive, but nobody really cares about them today, > and I don't expect many people to bother tomorrow. Playing with > individual parameters of path semantics sounds like a theoretical bother > more than a practical one. > If you want do that, and that is a big if, it might be better to give keywords arguments to Path(), so that the class signature would look like: class Path: def __init__(self, *args, sep=os.path.sep, casesensitive=os.path.casesensitive, expanduser=False)... This will make PosixPath and WindowsPath a partial class with certain keywords arguments filled in. Notice that os.path.casesensitive is not (yet) present in Python. Regards, TB From ncoghlan at gmail.com Sat Oct 13 17:46:09 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Oct 2012 01:46:09 +1000 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 8:52 PM, Itamar Turner-Trauring wrote: > def addToCounter(): > counter.value = counter.value + (yield getResult()) This is buggy code for the reasons you state. However, only improperly *embedded* yields have this problem, yields that are done in a dedicated assignment statement are fine: def addToCounter(): result = yield getResult() # No race condition here, as we only read the counter *after* receiving the result counter.value = counter.value + result (You can also make sure they're the first thing executed as part of a larger expression, but a separate assignment statement will almost always be clearer) > In a Deferred callback, on the other hand, you know the only things that are > going to run are functions you call. In so far as it's possible, what > happens is under control of one function only. Less pretty, but no potential > race conditions: > > def add(result): > counter.value = counter.value + result > getResult().addCallback(add) This is not the same code you wrote above in the generator version. The callback equivalent of the code you wrote is this: bound_value = counter.value def add(result): counter.value = bound_value + result getResult().addCallback(add) The generator version isn't magic, people still need to know what they're doing to properly benefit from the cooperative multithreading. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Oct 13 17:50:57 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Oct 2012 01:50:57 +1000 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 9:05 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > In addition to the issue mentioned by Itamar, there needs to be a clear way > to do two related things: > > 1) actually doing things asynchronously! A good example of where this > happens for me is stats logging. I log some stats, but I don't want to wait > for the request to be completed before I continue on with my work: > > def callback(): > logSomeStats() > return actuallyDoWorkCustomerCaresAbout() > > logSomeStats returns a deferred, and I probably would attach an errback to > that deferred, but I don't want to wait until I've finished logging some > stats to do the rest of the work, and I CERTAINLY don't want the work the > customer cares about to bomb out because my stats server is down. > > In current inlineCallbacks, this is equally simple: I just run the > expression and *not* yield. If I understand the current alternative > suggestions correctly, the yielding part is important for actually hooking > up the IO (whereas in @inlineCallbacks, it *only* does callback management). > Perhaps I am mistaken in this belief? Some have certainly suggested that, but not Guido. In Guido's API, the *_async() calls actually kick off the operations, the "yield" calls are the "I'm done for now, wake me when this Future I'm yielding is ready". This is the only way that makes sense, for the reasons you give here. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From itamar at futurefoundries.com Sat Oct 13 18:00:24 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Sat, 13 Oct 2012 12:00:24 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 11:46 AM, Nick Coghlan wrote: > > > In a Deferred callback, on the other hand, you know the only things that > are > > going to run are functions you call. In so far as it's possible, what > > happens is under control of one function only. Less pretty, but no > potential > > race conditions: > > > > def add(result): > > counter.value = counter.value + result > > getResult().addCallback(add) > > This is not the same code you wrote above in the generator version. > The callback equivalent of the code you wrote is this: > > bound_value = counter.value > def add(result): > counter.value = bound_value + result > getResult().addCallback(add) > True, so, let's look at this version. First, notice that it's more convoluted than the version I wrote above; i.e. you have to go out of your way to write race conditiony code. Second, and much more important, when reading it it's obvious that you're getting and setting counter.value at different times! Whereas in the generator version you have to think about it. The generator version has you naturally writing code where things you thought are happening at the same time are actually happening very far apart; the Deferred code makes it clear which pieces of code happen separately, and so you're much more likely to notice these sort of bugs. The generator version isn't magic, people still need to know what > they're doing to properly benefit from the cooperative multithreading. > I agree. And that's exactly the dimension in which Deferreds are superior to cooperative multithreading; people don't have to think about race conditions as much, which is hard enough in general. At least when you're using Deferreds, you can tell by reading the code which chunks of code can happen at different times, and the natural idioms of Python don't *encourage* race conditions as they do with yield syntax. -Itamar -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 13 17:37:18 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Oct 2012 01:37:18 +1000 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <1350122794.3365.8.camel@localhost.localdomain> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> Message-ID: On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou wrote: > The question is: why do you want to do that? > I know there are a limited bunch of special cases where Posix filesystem > paths may be case-insensitive, but nobody really cares about them today, > and I don't expect many people to bother tomorrow. Playing with > individual parameters of path semantics sounds like a theoretical bother > more than a practical one. It's a useful trick for writing genuinely cross-platform code: when I'm writing cross-platform code on *nix, I want my paths to behave like posix paths in every respect *except* I want them to complain somehow if any of my names only differ by case. I've been burnt in the past by checking in conflicting names on a Linux machine and then wondering why the Windows checkouts were broken. The only real way to deal with that is to avoid relying on filesystem case sensitivity for correct behaviour of your application, even when the underlying OS *permits* case sensitivity. This becomes even *more* important if NFS and CIFS filesystems are being shared between *nix and Windows systems, but it applies any time a file system may be shared (e.g. creating archive files, checking in to a source control system, etc). I have the luxury right now of only needing to care about Linux systems, but I've had to deal with the mess in the past and "act case insensitive everywhere" is the only sanity preserving option. Python itself deals with this mostly via the stylistic rule of "always use lowercase module and package names", but it would be nice if a new path abstraction allowed the problem to be handled *properly*. On the Windows side, it would be nice to be able to request the use of "/" as the directory separator when converting to a string. Using "\" has the potential to cause interoperability problems (e.g. with regular expressions). If you don't like the implicit nature of contexts (a perfectly reasonable complaint), then I suggest going for an explicit strategy pattern with flavours rather than requiring classes. With this approach, the flavour would be specified on a *per-instance* basis (with the default behaviour being determined by the OS). The main class hierarchy would just be PurePath <-- Path and there would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor subclasses (public Python stdlib APIs generally follow US spelling and drop the 'u'). The main classes would then *delegate* the flavour dependent operations like parsing, conversion to a string and equality comparisons to the flavour objects. It's really the public use of the strategy pattern that prevents the combinatorial explosion - you can just have a single OS-based default (as is already the case with PurePath.__new__ and Path.__new__ playing type selection games), rather than allowing the default to be configured per thread. The decimal-style thread-based dynamic contexts are more useful when you want to change the behaviour *without* either copying or mutating objects, which I agree is overkill for path manipulation. Since pathlib already uses the Flavor objects as strategies internally, it should just be a matter of switching from the use of inheritance to specify the flavour to using a keyword-only argument in the constructor. The "case-insensitive posix path" example would then look like: class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor): case_sensitive = False def my_path(*args): return Path(*args, flavor=PosixCaseInsensitiveFlavor) You can add as many new flavours as you want, and it's only one class per flavour rather than up to 3 (the flavour itself, the pure variant and the concrete variant). This class hierarchy is also more amenable to the introduction of MutablePath as a second subclass of PurePath - a path variant with mutable properties still sounds potentially attractive to me (over a wide variety of return-a-modified-copy methods for various cases). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From joshua.landau.ws at gmail.com Sat Oct 13 18:10:08 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 13 Oct 2012 17:10:08 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121013102229.259572ad@bhuda.mired.org> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> Message-ID: On 13 October 2012 16:22, Mike Meyer wrote: > On Sat, 13 Oct 2012 19:18:12 +1100 > Steven D'Aprano wrote: > > > On 13/10/12 19:05, Yuval Greenfield wrote: > > I believe that Haskell treats operators as if they were function objects, > > so you could do something like: > > For the record, Haskell allows operators to be used as functions by > quoting them in ()'s (to provide the functionality of operator) and to > turn functions into operators by quoting them in ``'s. > > > negative_values = map(-, values) > > > > but I think that puts the emphasis on the wrong thing. If (and that's a > big > > if) we did something like this, it should be a pair of methods __op__ and > > the right-hand version __rop__ which get called on the *operands*, not > the > > operator/function object: > > > > def __op__(self, other, symbol) > > Yeah, but then your function has to dispatch for *all* > operators. Depending on how we handle backwards compatibility with > __add__ et. al. > > I'd rather slice it the other way (leveraging $ being unsused): > > def __$__(self, other, right): > > so it only has to dispatch on left/right invocation. > > must match a new grammer symbol "operator_symbol", with limits on > it to for readability reasons: say at most three characters, all > coming from an appropriate unicode class or classes (you want to catch > the current operators and dollar sign). > > Both of these leave both operator precedence and backwards > compatibility to be dealt with. > If anyone is taking this as more than a bit of fun, *stop it*. How'er, for all you wanting something a bit more concrete to play with, I've got something that simulates infix based off something I'd found on the netz sometime who's author I do not remember. The code is one Codepad for brevity, and it lets you do things like this: >>> (2 *dot* "__mul__" |mappedover| 10-to-25) >> tolist > [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48] Note that this is a contrived example equivalent to: >>> list(map((2).__mul__, range(10, 25))) > [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48] and mixing the styles you can get a quite nice: >>> map((2).__mul__, 10-to-25) >> tolist > [20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48] which would actually look readable if it was considered mainstream. Note that making an in-line function is as simple as: >>> @Inline > ... def multiply(x, y): return x*y > ... > >>> 3 ^multiply^ 3 > 9 and that you can use any surrounding operators (other than comparisons) to chose your operator priority or what reads well: >>> 1 |div| 3 |div| 3 > 0.1111111111111111 > >>> 1 |div| 3 *div* 3 > 1.0 and finally you also get "coercion" to functions ? la Haskell: >>> 2 |(div|3) > 0.6666666666666666 > >>> (div|3)(2) > 0.6666666666666666 but I wouldn't even hope of calling it stable code or low on WTFs (if the above wasn't enough): >>> (div|(div|3))(3) # Go on, guess why! > 1.0 > >>> 2 + (div|3) # 'Cause you can, yo > 0.6666666666666666 These could both be "fixed" by making an infix require the same operator on both sides, which would make these both errors, but that wouldn't catch cases like (or*(div|3))(3) anyway. So enjoy. Or not. Preferably not. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Oct 13 18:17:46 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Oct 2012 09:17:46 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: Message-ID: On Fri, Oct 12, 2012 at 9:46 PM, Glyph wrote: > There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion. > > Would everyone interested in this please please please read several times? Especially this section: . If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation. I am well aware of that section. But, like the rest of PEP 3153, it is sorely lacking in examples or specifications. > I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer. This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best. > Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore. Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer. The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run. But the common case is that they don't actually need to block because there is still data / space in the buffer. (You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.) > If it needs a name, I suppose I'd call my preferred style "event triggering". But how does it work? What would typical user code in this style look like? > Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk. Well understood. (And I don't even want to get microthreading into the mix, although others may disagree -- I see Christian Tismer has jumped in...) But I also think that if we design these things in isolation it's likely that we'll find later that the pieces don't fit, and I don't want that to happen either. So I think we should consider these separate, but loosely coordinated efforts. -- --Guido van Rossum (python.org/~guido) From joshua.landau.ws at gmail.com Sat Oct 13 18:21:19 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 13 Oct 2012 17:21:19 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> Message-ID: On 13 October 2012 16:37, Nick Coghlan wrote: > On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou > wrote: > > The question is: why do you want to do that? > > I know there are a limited bunch of special cases where Posix filesystem > > paths may be case-insensitive, but nobody really cares about them today, > > and I don't expect many people to bother tomorrow. Playing with > > individual parameters of path semantics sounds like a theoretical bother > > more than a practical one. > > It's a useful trick for writing genuinely cross-platform code: when > I'm writing cross-platform code on *nix, I want my paths to behave > like posix paths in every respect *except* I want them to complain > somehow if any of my names only differ by case. I've been burnt in the > past by checking in conflicting names on a Linux machine and then > wondering why the Windows checkouts were broken. The only real way to > deal with that is to avoid relying on filesystem case sensitivity for > correct behaviour of your application, even when the underlying OS > *permits* case sensitivity. > > This becomes even *more* important if NFS and CIFS filesystems are > being shared between *nix and Windows systems, but it applies any time > a file system may be shared (e.g. creating archive files, checking in > to a source control system, etc). I have the luxury right now of only > needing to care about Linux systems, but I've had to deal with the > mess in the past and "act case insensitive everywhere" is the only > sanity preserving option. Python itself deals with this mostly via the > stylistic rule of "always use lowercase module and package names", but > it would be nice if a new path abstraction allowed the problem to be > handled *properly*. > > On the Windows side, it would be nice to be able to request the use of > "/" as the directory separator when converting to a string. Using "\" > has the potential to cause interoperability problems (e.g. with > regular expressions). > > If you don't like the implicit nature of contexts (a perfectly > reasonable complaint), then I suggest going for an explicit strategy > pattern with flavours rather than requiring classes. > > With this approach, the flavour would be specified on a *per-instance* > basis (with the default behaviour being determined by the OS). > > The main class hierarchy would just be PurePath <-- Path and there > would be a separate PathFlavor ABC with PosixFlavor and WindowsFlavor > subclasses (public Python stdlib APIs generally follow US spelling and > drop the 'u'). > > The main classes would then *delegate* the flavour dependent > operations like parsing, conversion to a string and equality > comparisons to the flavour objects. > > It's really the public use of the strategy pattern that prevents the > combinatorial explosion - you can just have a single OS-based default > (as is already the case with PurePath.__new__ and Path.__new__ playing > type selection games), rather than allowing the default to be > configured per thread. The decimal-style thread-based dynamic contexts > are more useful when you want to change the behaviour *without* either > copying or mutating objects, which I agree is overkill for path > manipulation. > > Since pathlib already uses the Flavor objects as strategies > internally, it should just be a matter of switching from the use of > inheritance to specify the flavour to using a keyword-only argument in > the constructor. The "case-insensitive posix path" example would then > look like: > > class PosixCaseInsensitiveFlavor(pathlib.PosixFlavor): > case_sensitive = False > > def my_path(*args): > return Path(*args, flavor=PosixCaseInsensitiveFlavor) > > You can add as many new flavours as you want, and it's only one class > per flavour rather than up to 3 (the flavour itself, the pure variant > and the concrete variant). > > This class hierarchy is also more amenable to the introduction of > MutablePath as a second subclass of PurePath - a path variant with > mutable properties still sounds potentially attractive to me (over a > wide variety of return-a-modified-copy methods for various cases). I don't disagree with your points, but I want to point out that IO is something Python has to make *really basic* because it's one of the first things newbies use, and Python is a newbie-friendly language. If you're recommending flavours and whatnot, I recommend you do it in a way that makes it very much optional and not at all the direct focus of the docs. The nice thing about the class idea for the uninitiated was that there were only two options, and newbies only ever had one obvious choice. Contexts using "with", I think, seem newbie-friendly too. So does having default flavours and then an ?expert?'s option to override default classes in possibly a sub-module. I'm no expert, but I think it's worth bearing in mind. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Oct 13 18:28:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Oct 2012 18:28:30 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> Message-ID: <1350145710.3365.44.camel@localhost.localdomain> Le dimanche 14 octobre 2012 ? 01:37 +1000, Nick Coghlan a ?crit : > On Sat, Oct 13, 2012 at 8:06 PM, Antoine Pitrou wrote: > > The question is: why do you want to do that? > > I know there are a limited bunch of special cases where Posix filesystem > > paths may be case-insensitive, but nobody really cares about them today, > > and I don't expect many people to bother tomorrow. Playing with > > individual parameters of path semantics sounds like a theoretical bother > > more than a practical one. > > It's a useful trick for writing genuinely cross-platform code: when > I'm writing cross-platform code on *nix, I want my paths to behave > like posix paths in every respect *except* I want them to complain > somehow if any of my names only differ by case. But that's not cross-platform. Under Windows you must also care about reserved files (CON, NUL, etc.). Also, you can create Posix filenames with backslashes in them, but under Windows they will be treated as directory separators. Mercurial learnt this the hard way: http://selenic.com/repo/hg-stable/file/605fe310691f/mercurial/store.py#l124 > On the Windows side, it would be nice to be able to request the use of > "/" as the directory separator when converting to a string. Using "\" > has the potential to cause interoperability problems (e.g. with > regular expressions). The PEP mentions the .as_posix() method, which does exactly that. (use of regular expressions on whole paths sounds like a weird idea, but hey :-)) > If you don't like the implicit nature of contexts (a perfectly > reasonable complaint), then I suggest going for an explicit strategy > pattern with flavours rather than requiring classes. > With this approach, the flavour would be specified on a *per-instance* > basis (with the default behaviour being determined by the OS). If you s/would/could/, I have nothing against it, but I certainly don't understand why you dislike the approach of providing dedicated classes *by default*. IMO, having separate classes is simpler to use, easier to type, more discoverable (using pydoc or help() or tab-completion at the prompt), and it has an educational value that a keyword-only "flavour" argument doesn't have. > The main classes would then *delegate* the flavour dependent > operations like parsing, conversion to a string and equality > comparisons to the flavour objects. Which they already do :) Here is the code: class PurePosixPath(PurePath): _flavour = _posix_flavour __slots__ = () class PureNTPath(PurePath): _flavour = _nt_flavour __slots__ = () (https://bitbucket.org/pitrou/pathlib/src/f6df458aaa89/pathlib.py?at=default#cl-990) > The decimal-style thread-based dynamic contexts > are more useful when you want to change the behaviour *without* either > copying or mutating objects, which I agree is overkill for path > manipulation. Not only overkill, but incorrect and dangerous! > You can add as many new flavours as you want, and it's only one class > per flavour rather than up to 3 (the flavour itself, the pure variant > and the concrete variant). Yes, you can. That doesn't preclude offering separate classes by default, though :-) > This class hierarchy is also more amenable to the introduction of > MutablePath as a second subclass of PurePath - a path variant with > mutable properties still sounds potentially attractive to me (over a > wide variety of return-a-modified-copy methods for various cases). I'm very cold on offering both mutable on non-mutable paths. That's just complicated and confusing. Since an immutable type is very desireable for use in associative containers, I think immutability is the right choice. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ncoghlan at gmail.com Sat Oct 13 18:52:18 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Oct 2012 02:52:18 +1000 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <1350145710.3365.44.camel@localhost.localdomain> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> <1350145710.3365.44.camel@localhost.localdomain> Message-ID: On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou wrote: >> You can add as many new flavours as you want, and it's only one class >> per flavour rather than up to 3 (the flavour itself, the pure variant >> and the concrete variant). > > Yes, you can. That doesn't preclude offering separate classes by > default, though :-) Factory functions would make more sense to me than separate classes - they're not really a different type, they're the same type using a different strategy for the OS dependent bits. >> This class hierarchy is also more amenable to the introduction of >> MutablePath as a second subclass of PurePath - a path variant with >> mutable properties still sounds potentially attractive to me (over a >> wide variety of return-a-modified-copy methods for various cases). > > I'm very cold on offering both mutable on non-mutable paths. That's just > complicated and confusing. Since an immutable type is very desireable > for use in associative containers, I think immutability is the right > choice. Sure, if we're only offering one of them, then immutable is definitely the right choice. However, I think this is analogous to the bytes vs bytearray distinction - while bytes objects are more useful in general, using the mutable bytearray when appropriate is vastly superior to slicing and copying bytes objects. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sat Oct 13 19:04:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 13 Oct 2012 19:04:21 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> <1350145710.3365.44.camel@localhost.localdomain> Message-ID: <20121013190421.7753a8e7@pitrou.net> On Sun, 14 Oct 2012 02:52:18 +1000 Nick Coghlan wrote: > On Sun, Oct 14, 2012 at 2:28 AM, Antoine Pitrou wrote: > >> You can add as many new flavours as you want, and it's only one class > >> per flavour rather than up to 3 (the flavour itself, the pure variant > >> and the concrete variant). > > > > Yes, you can. That doesn't preclude offering separate classes by > > default, though :-) > > Factory functions would make more sense to me than separate classes - > they're not really a different type, they're the same type using a > different strategy for the OS dependent bits. I find them less helpful. isinstance() calls won't work. Deriving won't work. It makes things a bit more opaque. However, we are definitely talking about a secondary style issue. (note how the threading module moved away from factory functions to regular classes :-)) > >> This class hierarchy is also more amenable to the introduction of > >> MutablePath as a second subclass of PurePath - a path variant with > >> mutable properties still sounds potentially attractive to me (over a > >> wide variety of return-a-modified-copy methods for various cases). > > > > I'm very cold on offering both mutable on non-mutable paths. That's just > > complicated and confusing. Since an immutable type is very desireable > > for use in associative containers, I think immutability is the right > > choice. > > Sure, if we're only offering one of them, then immutable is definitely > the right choice. However, I think this is analogous to the bytes vs > bytearray distinction - while bytes objects are more useful in > general, using the mutable bytearray when appropriate is vastly > superior to slicing and copying bytes objects. bytearray was only added after a lot of experience with the 2.x str type. I don't think we should add a mutable path API before significant experience has been gathered about the cost and performance-criticality of path manipulation operations. Offering both mutable and immutable types makes learning the API harder for beginners ("which type should I use? what happens when I combine them?"). Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ben at bendarnell.com Sat Oct 13 19:07:05 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 13 Oct 2012 10:07:05 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: <20121013081445.40d6d78f@pitrou.net> References: <20121013081445.40d6d78f@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou wrote: > On Fri, 12 Oct 2012 15:11:54 -0700 > Guido van Rossum wrote: >> >> > 2. Method dispatch callbacks: >> > >> > Similar to the above, the reactor or somebody has a handle on your >> > object, and calls methods that you've defined when events happen >> > e.g. IProtocol's dataReceived method >> >> While I'm sure it's expedient and captures certain common patterns >> well, I like this the least of all -- calling fixed methods on an >> object sounds like a step back; it smells of the old Java way (before >> it had some equivalent of anonymous functions), and of asyncore, which >> (nearly) everybody agrees is kind of bad due to its insistence that >> you subclass its classes. (Notice how subclassing as the prevalent >> approach to structuring your code has gotten into a lot of discredit >> since 1996.) > > But how would you write a dataReceived equivalent then? Would you have > a "task" looping on a read() call, e.g. > > @task > def my_protocol_main_loop(conn): > while : > try: > data = yield conn.read(1024) > except ConnectionError: > conn.close() > break > > I'm not sure I understand the problem with subclassing. It works fine > in Twisted. Even in Python 3 we don't shy away from subclassing, for > example the IO stack is based on subclassing RawIOBase, BufferedIOBase, > etc. Subclassing per se isn't a problem, but requiring a single dataReceived method per class can be awkward. Many protocols are effectively state machines, and modeling each state as a function can be cleaner than a big if/switch block in dataReceived. For example, here's a simplistic HTTP client using tornado's IOStream: from tornado import ioloop from tornado import iostream import socket def send_request(): stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com\r\n\r\n") stream.read_until("\r\n\r\n", on_headers) def on_headers(data): headers = {} for line in data.split("\r\n"): parts = line.split(":") if len(parts) == 2: headers[parts[0].strip()] = parts[1].strip() stream.read_bytes(int(headers["Content-Length"]), on_body) def on_body(data): print data stream.close() ioloop.IOLoop.instance().stop() s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) stream = iostream.IOStream(s) stream.connect(("friendfeed.com", 80), send_request) ioloop.IOLoop.instance().start() Classes allow and encourage broader interfaces, which are sometimes a good thing, but interact poorly with coroutines. Both twisted and tornado use separate callbacks for incoming data and for the connection being closed, but for coroutines it's probably better to just treat a closed connection as an error on the read. Futures (and yield from) give us a nice way to do that. -Ben From _ at lvh.cc Sat Oct 13 19:18:20 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 13 Oct 2012 19:18:20 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: What calls on_headers in this example? Coming from twisted, that seems like dataReceived's responsibility, but given your introductory paragraph that's not actually what goes on here? On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell wrote: > On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou > wrote: > > On Fri, 12 Oct 2012 15:11:54 -0700 > > Guido van Rossum wrote: > >> > >> > 2. Method dispatch callbacks: > >> > > >> > Similar to the above, the reactor or somebody has a handle on your > >> > object, and calls methods that you've defined when events happen > >> > e.g. IProtocol's dataReceived method > >> > >> While I'm sure it's expedient and captures certain common patterns > >> well, I like this the least of all -- calling fixed methods on an > >> object sounds like a step back; it smells of the old Java way (before > >> it had some equivalent of anonymous functions), and of asyncore, which > >> (nearly) everybody agrees is kind of bad due to its insistence that > >> you subclass its classes. (Notice how subclassing as the prevalent > >> approach to structuring your code has gotten into a lot of discredit > >> since 1996.) > > > > But how would you write a dataReceived equivalent then? Would you have > > a "task" looping on a read() call, e.g. > > > > @task > > def my_protocol_main_loop(conn): > > while : > > try: > > data = yield conn.read(1024) > > except ConnectionError: > > conn.close() > > break > > > > I'm not sure I understand the problem with subclassing. It works fine > > in Twisted. Even in Python 3 we don't shy away from subclassing, for > > example the IO stack is based on subclassing RawIOBase, BufferedIOBase, > > etc. > > Subclassing per se isn't a problem, but requiring a single > dataReceived method per class can be awkward. Many protocols are > effectively state machines, and modeling each state as a function can > be cleaner than a big if/switch block in dataReceived. For example, > here's a simplistic HTTP client using tornado's IOStream: > > from tornado import ioloop > from tornado import iostream > import socket > > def send_request(): > stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com\r\n\r\n") > stream.read_until("\r\n\r\n", on_headers) > > def on_headers(data): > headers = {} > for line in data.split("\r\n"): > parts = line.split(":") > if len(parts) == 2: > headers[parts[0].strip()] = parts[1].strip() > stream.read_bytes(int(headers["Content-Length"]), on_body) > > def on_body(data): > print data > stream.close() > ioloop.IOLoop.instance().stop() > > s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) > stream = iostream.IOStream(s) > stream.connect(("friendfeed.com", 80), send_request) > ioloop.IOLoop.instance().start() > > > Classes allow and encourage broader interfaces, which are sometimes a > good thing, but interact poorly with coroutines. Both twisted and > tornado use separate callbacks for incoming data and for the > connection being closed, but for coroutines it's probably better to > just treat a closed connection as an error on the read. Futures (and > yield from) give us a nice way to do that. > > -Ben > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From tismer at stackless.com Sat Oct 13 19:22:13 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 13 Oct 2012 19:22:13 +0200 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: Message-ID: <5079A345.1070004@stackless.com> On 13.10.12 18:17, Guido van Rossum wrote: > .... >> Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk. > Well understood. (And I don't even want to get microthreading into the > mix, although others may disagree -- I see Christian Tismer has jumped > in...) But I also think that if we design these things in isolation > it's likely that we'll find later that the pieces don't fit, and I > don't want that to happen either. So I think we should consider these > separate, but loosely coordinated efforts. > I don't disagree but understand this, too. As long as we are talking Python 3.x, the topic is good compromises, usability and coordination. Pushing for microthreads would not be constructive for these threads (email-threads, of course ;-) . ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ben at bendarnell.com Sat Oct 13 19:27:55 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 13 Oct 2012 10:27:55 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > What calls on_headers in this example? Coming from twisted, that seems like > dataReceived's responsibility, but given your introductory paragraph that's > not actually what goes on here? The IOStream does, after send_request calls stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is a _handle_read method that is registered with the IOLoop and fills up a buffer. When the read condition is satisfied the IOStream calls back into application code. -Ben > > > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell wrote: >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou >> wrote: >> > On Fri, 12 Oct 2012 15:11:54 -0700 >> > Guido van Rossum wrote: >> >> >> >> > 2. Method dispatch callbacks: >> >> > >> >> > Similar to the above, the reactor or somebody has a handle on >> >> > your >> >> > object, and calls methods that you've defined when events happen >> >> > e.g. IProtocol's dataReceived method >> >> >> >> While I'm sure it's expedient and captures certain common patterns >> >> well, I like this the least of all -- calling fixed methods on an >> >> object sounds like a step back; it smells of the old Java way (before >> >> it had some equivalent of anonymous functions), and of asyncore, which >> >> (nearly) everybody agrees is kind of bad due to its insistence that >> >> you subclass its classes. (Notice how subclassing as the prevalent >> >> approach to structuring your code has gotten into a lot of discredit >> >> since 1996.) >> > >> > But how would you write a dataReceived equivalent then? Would you have >> > a "task" looping on a read() call, e.g. >> > >> > @task >> > def my_protocol_main_loop(conn): >> > while : >> > try: >> > data = yield conn.read(1024) >> > except ConnectionError: >> > conn.close() >> > break >> > >> > I'm not sure I understand the problem with subclassing. It works fine >> > in Twisted. Even in Python 3 we don't shy away from subclassing, for >> > example the IO stack is based on subclassing RawIOBase, BufferedIOBase, >> > etc. >> >> Subclassing per se isn't a problem, but requiring a single >> dataReceived method per class can be awkward. Many protocols are >> effectively state machines, and modeling each state as a function can >> be cleaner than a big if/switch block in dataReceived. For example, >> here's a simplistic HTTP client using tornado's IOStream: >> >> from tornado import ioloop >> from tornado import iostream >> import socket >> >> def send_request(): >> stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com\r\n\r\n") >> stream.read_until("\r\n\r\n", on_headers) >> >> def on_headers(data): >> headers = {} >> for line in data.split("\r\n"): >> parts = line.split(":") >> if len(parts) == 2: >> headers[parts[0].strip()] = parts[1].strip() >> stream.read_bytes(int(headers["Content-Length"]), on_body) >> >> def on_body(data): >> print data >> stream.close() >> ioloop.IOLoop.instance().stop() >> >> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) >> stream = iostream.IOStream(s) >> stream.connect(("friendfeed.com", 80), send_request) >> ioloop.IOLoop.instance().start() >> >> >> Classes allow and encourage broader interfaces, which are sometimes a >> good thing, but interact poorly with coroutines. Both twisted and >> tornado use separate callbacks for incoming data and for the >> connection being closed, but for coroutines it's probably better to >> just treat a closed connection as an error on the read. Futures (and >> yield from) give us a nice way to do that. >> >> -Ben >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > cheers > lvh > From _ at lvh.cc Sat Oct 13 19:49:59 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 13 Oct 2012 19:49:59 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: Interesting. That's certainly a nice API, but that then again (read_until) sounds like something I'd implement using dataReceived... You know, read_until clears the buffer, logs the requested callback. data_received adds something to the buffer, and checks if it triggered the (one of the?) registered callbacks. Of course, I may just be rusted in my ways and trying to implement everything in terms of things I know (then again, that might be just what's needed when you're trying to make a useful general API). I guess it's time for me to go deep-diving into Tornado :) On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell wrote: > On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > > What calls on_headers in this example? Coming from twisted, that seems > like > > dataReceived's responsibility, but given your introductory paragraph > that's > > not actually what goes on here? > > The IOStream does, after send_request calls > stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is > a _handle_read method that is registered with the IOLoop and fills up > a buffer. When the read condition is satisfied the IOStream calls > back into application code. > > -Ben > > > > > > > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell wrote: > >> > >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou > >> wrote: > >> > On Fri, 12 Oct 2012 15:11:54 -0700 > >> > Guido van Rossum wrote: > >> >> > >> >> > 2. Method dispatch callbacks: > >> >> > > >> >> > Similar to the above, the reactor or somebody has a handle on > >> >> > your > >> >> > object, and calls methods that you've defined when events happen > >> >> > e.g. IProtocol's dataReceived method > >> >> > >> >> While I'm sure it's expedient and captures certain common patterns > >> >> well, I like this the least of all -- calling fixed methods on an > >> >> object sounds like a step back; it smells of the old Java way (before > >> >> it had some equivalent of anonymous functions), and of asyncore, > which > >> >> (nearly) everybody agrees is kind of bad due to its insistence that > >> >> you subclass its classes. (Notice how subclassing as the prevalent > >> >> approach to structuring your code has gotten into a lot of discredit > >> >> since 1996.) > >> > > >> > But how would you write a dataReceived equivalent then? Would you have > >> > a "task" looping on a read() call, e.g. > >> > > >> > @task > >> > def my_protocol_main_loop(conn): > >> > while : > >> > try: > >> > data = yield conn.read(1024) > >> > except ConnectionError: > >> > conn.close() > >> > break > >> > > >> > I'm not sure I understand the problem with subclassing. It works fine > >> > in Twisted. Even in Python 3 we don't shy away from subclassing, for > >> > example the IO stack is based on subclassing RawIOBase, > BufferedIOBase, > >> > etc. > >> > >> Subclassing per se isn't a problem, but requiring a single > >> dataReceived method per class can be awkward. Many protocols are > >> effectively state machines, and modeling each state as a function can > >> be cleaner than a big if/switch block in dataReceived. For example, > >> here's a simplistic HTTP client using tornado's IOStream: > >> > >> from tornado import ioloop > >> from tornado import iostream > >> import socket > >> > >> def send_request(): > >> stream.write("GET / HTTP/1.0\r\nHost: friendfeed.com > \r\n\r\n") > >> stream.read_until("\r\n\r\n", on_headers) > >> > >> def on_headers(data): > >> headers = {} > >> for line in data.split("\r\n"): > >> parts = line.split(":") > >> if len(parts) == 2: > >> headers[parts[0].strip()] = parts[1].strip() > >> stream.read_bytes(int(headers["Content-Length"]), on_body) > >> > >> def on_body(data): > >> print data > >> stream.close() > >> ioloop.IOLoop.instance().stop() > >> > >> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) > >> stream = iostream.IOStream(s) > >> stream.connect(("friendfeed.com", 80), send_request) > >> ioloop.IOLoop.instance().start() > >> > >> > >> Classes allow and encourage broader interfaces, which are sometimes a > >> good thing, but interact poorly with coroutines. Both twisted and > >> tornado use separate callbacks for incoming data and for the > >> connection being closed, but for coroutines it's probably better to > >> just treat a closed connection as an error on the read. Futures (and > >> yield from) give us a nice way to do that. > >> > >> -Ben > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > > > -- > > cheers > > lvh > > > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sat Oct 13 19:54:34 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 13 Oct 2012 19:54:34 +0200 Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep) In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum wrote: > [Hopefully this is the last spin-off thread from "asyncore: included > batteries don't fit"] > > So it's totally unfinished? > At the time, the people I talked to placed significantly more weight in "explain why this is necessary" than "get me something I can play with". > > Do you feel that there should be less talk about rationale? > > No, but I feel that there should be some actual specification. I am > also looking forward to an actual meaty bit of example code -- ISTR > you mentioned you had something, but that it was incomplete, and I > can't find the link. > Just examples of how it would work, nothing hooked up to real code. My memory of it is more of a drowning-in-politics-and-bikeshedding kind of thing, unfortunately :) Either way, I'm okay with letting bygones be bygones and focus on how we can get this show on the road. > It's not that there's *no* reference to IO: it's just that that reference > is > > abstracted away in data_received and the protocol's transport object, > just > > like Twisted's IProtocol. > > The words "data_received" don't even occur in the PEP. > See above. What thread should I reply in about the pull APIs? > I just want to make sure that we don't *completely* paint ourselves into > the wrong corner when it comes to that. > I don't think we have to worry about it too much. Any reasonable API I can think of makes this completely doable. But I'm really hoping you'll make good on your promise of redoing > async-pep, giving some actual specifications and example code, so I > can play with it. > Takeaways: - The async API of the future is very important, and too important to be left to chance. - It requires a lot of very experienced manpower. - It requires a lot of effort to handle the hashing out of it (as we're doing here) as well as it deserves to be. I'll take as proactive a role as I can afford to take in this process, but I don't think I can do it by myself. Furthermore, it's a risk nobody wants to take: a repeat performance wouldn't be good for anyone, in particular not for Python nor myself. I've asked JP Calderone and Itamar Turner-Trauring if they would be interested in carrying this forward professionally, and they have tentatively said yes. JP's already familiar with a large part of the problem space with the implementation of the ssl module. JP and Itamar have worked together for years and have recently set up a consulting firm. Given that this is emphatically important to Python, I intend to apply for a PSF grant on their behalf to further this goal. Given their experience in the field, I expect this to be a fairly low risk endeavor. > -- > --Guido van Rossum (python.org/~guido) > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Sat Oct 13 20:20:02 2012 From: dreamingforward at gmail.com (Mark Adam) Date: Sat, 13 Oct 2012 13:20:02 -0500 Subject: [Python-ideas] Floating point contexts in Python core In-Reply-To: References: <69C432CC-7BAC-4B55-8BC3-2BF5307FD29C@gmail.com> <50765D0E.4020001@canterbury.ac.nz> <5076AF00.1010902@pearwood.info> <50776C76.3040309@pearwood.info> Message-ID: On Fri, Oct 12, 2012 at 5:54 PM, Mark Adam wrote: > On Thu, Oct 11, 2012 at 8:03 PM, Steven D'Aprano wrote: >>>> I would gladly give up a small amount of speed for better control >>>> over floats, such as whether 1/0.0 raised an exception or >>>> returned infinity. >>> >>> Umm, you would be giving up a *lot* of speed. Native floating point >>> happens right in the processor, so if you want special behavior, you'd >>> have to take the floating point out of hardware and into "user space". >> >> Even in user-space, you're not giving up that much speed in practical >> terms, at least not for my needs. The new decimal module in Python 3.3 is >> less than a factor of 10 times slower than Python's floats, which makes it >> pretty much instantaneous to my mind :) > > Hmm, well, if it's only that much slower, then we should implement > Rationals and get rid of the issue altogether. Now that I think of it, this issue has a strange whiff of the argument wherefrom came the "from __future__" directive and the split that happened between the vpython folks who needed the direct support of float division (rendering 3-d graphics for an interpreted environment) and the regular python crowd. Anyone else remember that? mark From sturla at molden.no Sat Oct 13 20:29:57 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 13 Oct 2012 20:29:57 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: <7950B82F-3A8A-4D48-AB02-9FD3BCBAE114@molden.no> Den 13. okt. 2012 kl. 06:44 skrev Devin Jeanpierre : > > Python has cleverly left the $ symbol unused. > > We can use it as a quasiquote to embed executable TeX. > > for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$): > ... > > No need to wait for that new language, we can have a rich set of math > operators today! > LOL :D But hey, this is valid Python :D :D for x in texrange(r"$b \cdot \sum_{i=1}^n \frac{x^n}{n!}$"): pass Sturla From ben at bendarnell.com Sat Oct 13 20:54:27 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 13 Oct 2012 11:54:27 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > Interesting. That's certainly a nice API, but that then again (read_until) > sounds like something I'd implement using dataReceived... You know, > read_until clears the buffer, logs the requested callback. data_received > adds something to the buffer, and checks if it triggered the (one of the?) > registered callbacks. Right, that's how IOStream is implemented internally. The transport/protocol split works a little differently in Tornado: IOStream is implemented something like a Protocol subclass, but we consider it a part of the transport layer. The "protocols" are arbitrary classes that don't share any particular interface, but instead just call methods on the IOStream. -Ben > > Of course, I may just be rusted in my ways and trying to implement > everything in terms of things I know (then again, that might be just what's > needed when you're trying to make a useful general API). > > I guess it's time for me to go deep-diving into Tornado :) > > > On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell wrote: >> >> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> > What calls on_headers in this example? Coming from twisted, that seems >> > like >> > dataReceived's responsibility, but given your introductory paragraph >> > that's >> > not actually what goes on here? >> >> The IOStream does, after send_request calls >> stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is >> a _handle_read method that is registered with the IOLoop and fills up >> a buffer. When the read condition is satisfied the IOStream calls >> back into application code. >> >> -Ben >> >> > >> > >> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell wrote: >> >> >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou >> >> wrote: >> >> > On Fri, 12 Oct 2012 15:11:54 -0700 >> >> > Guido van Rossum wrote: >> >> >> >> >> >> > 2. Method dispatch callbacks: >> >> >> > >> >> >> > Similar to the above, the reactor or somebody has a handle on >> >> >> > your >> >> >> > object, and calls methods that you've defined when events happen >> >> >> > e.g. IProtocol's dataReceived method >> >> >> >> >> >> While I'm sure it's expedient and captures certain common patterns >> >> >> well, I like this the least of all -- calling fixed methods on an >> >> >> object sounds like a step back; it smells of the old Java way >> >> >> (before >> >> >> it had some equivalent of anonymous functions), and of asyncore, >> >> >> which >> >> >> (nearly) everybody agrees is kind of bad due to its insistence that >> >> >> you subclass its classes. (Notice how subclassing as the prevalent >> >> >> approach to structuring your code has gotten into a lot of discredit >> >> >> since 1996.) >> >> > >> >> > But how would you write a dataReceived equivalent then? Would you >> >> > have >> >> > a "task" looping on a read() call, e.g. >> >> > >> >> > @task >> >> > def my_protocol_main_loop(conn): >> >> > while : >> >> > try: >> >> > data = yield conn.read(1024) >> >> > except ConnectionError: >> >> > conn.close() >> >> > break >> >> > >> >> > I'm not sure I understand the problem with subclassing. It works fine >> >> > in Twisted. Even in Python 3 we don't shy away from subclassing, for >> >> > example the IO stack is based on subclassing RawIOBase, >> >> > BufferedIOBase, >> >> > etc. >> >> >> >> Subclassing per se isn't a problem, but requiring a single >> >> dataReceived method per class can be awkward. Many protocols are >> >> effectively state machines, and modeling each state as a function can >> >> be cleaner than a big if/switch block in dataReceived. For example, >> >> here's a simplistic HTTP client using tornado's IOStream: >> >> >> >> from tornado import ioloop >> >> from tornado import iostream >> >> import socket >> >> >> >> def send_request(): >> >> stream.write("GET / HTTP/1.0\r\nHost: >> >> friendfeed.com\r\n\r\n") >> >> stream.read_until("\r\n\r\n", on_headers) >> >> >> >> def on_headers(data): >> >> headers = {} >> >> for line in data.split("\r\n"): >> >> parts = line.split(":") >> >> if len(parts) == 2: >> >> headers[parts[0].strip()] = parts[1].strip() >> >> stream.read_bytes(int(headers["Content-Length"]), on_body) >> >> >> >> def on_body(data): >> >> print data >> >> stream.close() >> >> ioloop.IOLoop.instance().stop() >> >> >> >> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) >> >> stream = iostream.IOStream(s) >> >> stream.connect(("friendfeed.com", 80), send_request) >> >> ioloop.IOLoop.instance().start() >> >> >> >> >> >> Classes allow and encourage broader interfaces, which are sometimes a >> >> good thing, but interact poorly with coroutines. Both twisted and >> >> tornado use separate callbacks for incoming data and for the >> >> connection being closed, but for coroutines it's probably better to >> >> just treat a closed connection as an error on the read. Futures (and >> >> yield from) give us a nice way to do that. >> >> >> >> -Ben >> >> _______________________________________________ >> >> Python-ideas mailing list >> >> Python-ideas at python.org >> >> http://mail.python.org/mailman/listinfo/python-ideas >> > >> > >> > >> > >> > -- >> > cheers >> > lvh >> > > > > > > -- > cheers > lvh > From _ at lvh.cc Sat Oct 13 21:13:09 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 13 Oct 2012 21:13:09 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: I quite like IOStream's interface, actually. If that's part of the transport layer, how do you prevent from having duplicating its behavior (read_until etc)? If there's just another separate object that would be the ITransport in twisted, I think the difference is purely one of labeling. On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell wrote: > On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > > Interesting. That's certainly a nice API, but that then again > (read_until) > > sounds like something I'd implement using dataReceived... You know, > > read_until clears the buffer, logs the requested callback. data_received > > adds something to the buffer, and checks if it triggered the (one of > the?) > > registered callbacks. > > Right, that's how IOStream is implemented internally. The > transport/protocol split works a little differently in Tornado: > IOStream is implemented something like a Protocol subclass, but we > consider it a part of the transport layer. The "protocols" are > arbitrary classes that don't share any particular interface, but > instead just call methods on the IOStream. > > -Ben > > > > > Of course, I may just be rusted in my ways and trying to implement > > everything in terms of things I know (then again, that might be just > what's > > needed when you're trying to make a useful general API). > > > > I guess it's time for me to go deep-diving into Tornado :) > > > > > > On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell wrote: > >> > >> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > >> > What calls on_headers in this example? Coming from twisted, that seems > >> > like > >> > dataReceived's responsibility, but given your introductory paragraph > >> > that's > >> > not actually what goes on here? > >> > >> The IOStream does, after send_request calls > >> stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is > >> a _handle_read method that is registered with the IOLoop and fills up > >> a buffer. When the read condition is satisfied the IOStream calls > >> back into application code. > >> > >> -Ben > >> > >> > > >> > > >> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell > wrote: > >> >> > >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou < > solipsis at pitrou.net> > >> >> wrote: > >> >> > On Fri, 12 Oct 2012 15:11:54 -0700 > >> >> > Guido van Rossum wrote: > >> >> >> > >> >> >> > 2. Method dispatch callbacks: > >> >> >> > > >> >> >> > Similar to the above, the reactor or somebody has a handle > on > >> >> >> > your > >> >> >> > object, and calls methods that you've defined when events happen > >> >> >> > e.g. IProtocol's dataReceived method > >> >> >> > >> >> >> While I'm sure it's expedient and captures certain common patterns > >> >> >> well, I like this the least of all -- calling fixed methods on an > >> >> >> object sounds like a step back; it smells of the old Java way > >> >> >> (before > >> >> >> it had some equivalent of anonymous functions), and of asyncore, > >> >> >> which > >> >> >> (nearly) everybody agrees is kind of bad due to its insistence > that > >> >> >> you subclass its classes. (Notice how subclassing as the prevalent > >> >> >> approach to structuring your code has gotten into a lot of > discredit > >> >> >> since 1996.) > >> >> > > >> >> > But how would you write a dataReceived equivalent then? Would you > >> >> > have > >> >> > a "task" looping on a read() call, e.g. > >> >> > > >> >> > @task > >> >> > def my_protocol_main_loop(conn): > >> >> > while : > >> >> > try: > >> >> > data = yield conn.read(1024) > >> >> > except ConnectionError: > >> >> > conn.close() > >> >> > break > >> >> > > >> >> > I'm not sure I understand the problem with subclassing. It works > fine > >> >> > in Twisted. Even in Python 3 we don't shy away from subclassing, > for > >> >> > example the IO stack is based on subclassing RawIOBase, > >> >> > BufferedIOBase, > >> >> > etc. > >> >> > >> >> Subclassing per se isn't a problem, but requiring a single > >> >> dataReceived method per class can be awkward. Many protocols are > >> >> effectively state machines, and modeling each state as a function can > >> >> be cleaner than a big if/switch block in dataReceived. For example, > >> >> here's a simplistic HTTP client using tornado's IOStream: > >> >> > >> >> from tornado import ioloop > >> >> from tornado import iostream > >> >> import socket > >> >> > >> >> def send_request(): > >> >> stream.write("GET / HTTP/1.0\r\nHost: > >> >> friendfeed.com\r\n\r\n") > >> >> stream.read_until("\r\n\r\n", on_headers) > >> >> > >> >> def on_headers(data): > >> >> headers = {} > >> >> for line in data.split("\r\n"): > >> >> parts = line.split(":") > >> >> if len(parts) == 2: > >> >> headers[parts[0].strip()] = parts[1].strip() > >> >> stream.read_bytes(int(headers["Content-Length"]), > on_body) > >> >> > >> >> def on_body(data): > >> >> print data > >> >> stream.close() > >> >> ioloop.IOLoop.instance().stop() > >> >> > >> >> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) > >> >> stream = iostream.IOStream(s) > >> >> stream.connect(("friendfeed.com", 80), send_request) > >> >> ioloop.IOLoop.instance().start() > >> >> > >> >> > >> >> Classes allow and encourage broader interfaces, which are sometimes a > >> >> good thing, but interact poorly with coroutines. Both twisted and > >> >> tornado use separate callbacks for incoming data and for the > >> >> connection being closed, but for coroutines it's probably better to > >> >> just treat a closed connection as an error on the read. Futures (and > >> >> yield from) give us a nice way to do that. > >> >> > >> >> -Ben > >> >> _______________________________________________ > >> >> Python-ideas mailing list > >> >> Python-ideas at python.org > >> >> http://mail.python.org/mailman/listinfo/python-ideas > >> > > >> > > >> > > >> > > >> > -- > >> > cheers > >> > lvh > >> > > > > > > > > > > > -- > > cheers > > lvh > > > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Sat Oct 13 21:14:09 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 13 Oct 2012 20:14:09 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <7950B82F-3A8A-4D48-AB02-9FD3BCBAE114@molden.no> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <7950B82F-3A8A-4D48-AB02-9FD3BCBAE114@molden.no> Message-ID: On 13 October 2012 19:29, Sturla Molden wrote: > Den 13. okt. 2012 kl. 06:44 skrev Devin Jeanpierre >: > > > > > Python has cleverly left the $ symbol unused. > > > > We can use it as a quasiquote to embed executable TeX. > > > > for x in xrange($b \cdot \sum_{i=1}^n \frac{x^n}{n!}$): > > ... > > > > No need to wait for that new language, we can have a rich set of math > > operators today! > > > > > LOL :D > > But hey, this is valid Python :D :D > > for x in texrange(r"$b \cdot \sum_{i=1}^n \frac{x^n}{n!}$"): pass I am glad someone else shares the same progressive attitude. I, personally, wrap my whole code like so: import texcode texcode.texecute(""" > \eq{y}{\range{1}{10}} > \for{x}{y}{ > \print{x} > } > """) # Alas, the joy has to end Which has tremendously improved the quality of my output. Recently, rendering my code, too, has sped up to a remarkable 3 pages-per-minute! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at bendarnell.com Sat Oct 13 21:25:38 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 13 Oct 2012 12:25:38 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <20121013081445.40d6d78f@pitrou.net> Message-ID: On Sat, Oct 13, 2012 at 12:13 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > I quite like IOStream's interface, actually. If that's part of the transport > layer, how do you prevent from having duplicating its behavior (read_until > etc)? If there's just another separate object that would be the ITransport > in twisted, I think the difference is purely one of labeling. So far we haven't actually needed much flexibility in the transport layer - most of the functionality is in the BaseIOStream class, and then there are subclasses IOStream (regular sockets), SSLIOStream, and PipeIOStream that actually call recv(), read(), connect(), etc. We might need a little refactoring if we introduce dramatically different types of transports, but the plan is that we'd represent transports as classes in the IOStream hierarchy. -Ben > > > On Sat, Oct 13, 2012 at 8:54 PM, Ben Darnell wrote: >> >> On Sat, Oct 13, 2012 at 10:49 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> > Interesting. That's certainly a nice API, but that then again >> > (read_until) >> > sounds like something I'd implement using dataReceived... You know, >> > read_until clears the buffer, logs the requested callback. data_received >> > adds something to the buffer, and checks if it triggered the (one of >> > the?) >> > registered callbacks. >> >> Right, that's how IOStream is implemented internally. The >> transport/protocol split works a little differently in Tornado: >> IOStream is implemented something like a Protocol subclass, but we >> consider it a part of the transport layer. The "protocols" are >> arbitrary classes that don't share any particular interface, but >> instead just call methods on the IOStream. >> >> -Ben >> >> > >> > Of course, I may just be rusted in my ways and trying to implement >> > everything in terms of things I know (then again, that might be just >> > what's >> > needed when you're trying to make a useful general API). >> > >> > I guess it's time for me to go deep-diving into Tornado :) >> > >> > >> > On Sat, Oct 13, 2012 at 7:27 PM, Ben Darnell wrote: >> >> >> >> On Sat, Oct 13, 2012 at 10:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> >> > What calls on_headers in this example? Coming from twisted, that >> >> > seems >> >> > like >> >> > dataReceived's responsibility, but given your introductory paragraph >> >> > that's >> >> > not actually what goes on here? >> >> >> >> The IOStream does, after send_request calls >> >> stream.read_until("\r\n\r\n", on_headers). Inside IOStream, there is >> >> a _handle_read method that is registered with the IOLoop and fills up >> >> a buffer. When the read condition is satisfied the IOStream calls >> >> back into application code. >> >> >> >> -Ben >> >> >> >> > >> >> > >> >> > On Sat, Oct 13, 2012 at 7:07 PM, Ben Darnell >> >> > wrote: >> >> >> >> >> >> On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou >> >> >> >> >> >> wrote: >> >> >> > On Fri, 12 Oct 2012 15:11:54 -0700 >> >> >> > Guido van Rossum wrote: >> >> >> >> >> >> >> >> > 2. Method dispatch callbacks: >> >> >> >> > >> >> >> >> > Similar to the above, the reactor or somebody has a handle >> >> >> >> > on >> >> >> >> > your >> >> >> >> > object, and calls methods that you've defined when events >> >> >> >> > happen >> >> >> >> > e.g. IProtocol's dataReceived method >> >> >> >> >> >> >> >> While I'm sure it's expedient and captures certain common >> >> >> >> patterns >> >> >> >> well, I like this the least of all -- calling fixed methods on an >> >> >> >> object sounds like a step back; it smells of the old Java way >> >> >> >> (before >> >> >> >> it had some equivalent of anonymous functions), and of asyncore, >> >> >> >> which >> >> >> >> (nearly) everybody agrees is kind of bad due to its insistence >> >> >> >> that >> >> >> >> you subclass its classes. (Notice how subclassing as the >> >> >> >> prevalent >> >> >> >> approach to structuring your code has gotten into a lot of >> >> >> >> discredit >> >> >> >> since 1996.) >> >> >> > >> >> >> > But how would you write a dataReceived equivalent then? Would you >> >> >> > have >> >> >> > a "task" looping on a read() call, e.g. >> >> >> > >> >> >> > @task >> >> >> > def my_protocol_main_loop(conn): >> >> >> > while : >> >> >> > try: >> >> >> > data = yield conn.read(1024) >> >> >> > except ConnectionError: >> >> >> > conn.close() >> >> >> > break >> >> >> > >> >> >> > I'm not sure I understand the problem with subclassing. It works >> >> >> > fine >> >> >> > in Twisted. Even in Python 3 we don't shy away from subclassing, >> >> >> > for >> >> >> > example the IO stack is based on subclassing RawIOBase, >> >> >> > BufferedIOBase, >> >> >> > etc. >> >> >> >> >> >> Subclassing per se isn't a problem, but requiring a single >> >> >> dataReceived method per class can be awkward. Many protocols are >> >> >> effectively state machines, and modeling each state as a function >> >> >> can >> >> >> be cleaner than a big if/switch block in dataReceived. For example, >> >> >> here's a simplistic HTTP client using tornado's IOStream: >> >> >> >> >> >> from tornado import ioloop >> >> >> from tornado import iostream >> >> >> import socket >> >> >> >> >> >> def send_request(): >> >> >> stream.write("GET / HTTP/1.0\r\nHost: >> >> >> friendfeed.com\r\n\r\n") >> >> >> stream.read_until("\r\n\r\n", on_headers) >> >> >> >> >> >> def on_headers(data): >> >> >> headers = {} >> >> >> for line in data.split("\r\n"): >> >> >> parts = line.split(":") >> >> >> if len(parts) == 2: >> >> >> headers[parts[0].strip()] = parts[1].strip() >> >> >> stream.read_bytes(int(headers["Content-Length"]), >> >> >> on_body) >> >> >> >> >> >> def on_body(data): >> >> >> print data >> >> >> stream.close() >> >> >> ioloop.IOLoop.instance().stop() >> >> >> >> >> >> s = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) >> >> >> stream = iostream.IOStream(s) >> >> >> stream.connect(("friendfeed.com", 80), send_request) >> >> >> ioloop.IOLoop.instance().start() >> >> >> >> >> >> >> >> >> Classes allow and encourage broader interfaces, which are sometimes >> >> >> a >> >> >> good thing, but interact poorly with coroutines. Both twisted and >> >> >> tornado use separate callbacks for incoming data and for the >> >> >> connection being closed, but for coroutines it's probably better to >> >> >> just treat a closed connection as an error on the read. Futures >> >> >> (and >> >> >> yield from) give us a nice way to do that. >> >> >> >> >> >> -Ben >> >> >> _______________________________________________ >> >> >> Python-ideas mailing list >> >> >> Python-ideas at python.org >> >> >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > cheers >> >> > lvh >> >> > >> > >> > >> > >> > >> > -- >> > cheers >> > lvh >> > > > > > > -- > cheers > lvh > From sturla at molden.no Sat Oct 13 22:13:28 2012 From: sturla at molden.no (Sturla Molden) Date: Sat, 13 Oct 2012 22:13:28 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <7950B82F-3A8A-4D48-AB02-9FD3BCBAE114@molden.no> Message-ID: <46C531B6-1624-4631-BB49-3CDFF8D4088C@molden.no> Den 13. okt. 2012 kl. 21:14 skrev Joshua Landau : > > I am glad someone else shares the same progressive attitude. I, personally, wrap my whole code like so: > >> import texcode >> >> texcode.texecute(""" >> \eq{y}{\range{1}{10}} >> \for{x}{y}{ >> \print{x} >> } >> """) # Alas, the joy has to end > > Which has tremendously improved the quality of my output. > Recently, rendering my code, too, has sped up to a remarkable 3 pages-per-minute! Gee, I thought texecution was the Texas death penalty :-) Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosser.meister.morti at gmx.net Sat Oct 13 22:20:12 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 13 Oct 2012 22:20:12 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <5079CCFC.4000502@gmx.net> On 10/12/2012 10:27 PM, Ram Rachum wrote: > Hi everybody, > > Today a funny thought occurred to me. Ever since I've learned to program when I was a child, I've > taken for granted that when programming, the sign used for multiplication is *. But now that I think > about it, why? Now that we have Unicode, why not use ? ? > > Do you think that we can make Python support ? in addition to *? > > I can think of a couple of problems, but none of them seem like deal-breakers: > > - Backward compatibility: Python already uses *, but I don't see a backward compatibility problem > with supporting ? additionally. Let people use whichever they want, like spaces and tabs. > - Input methods: I personally use an IDE that could be easily set to automatically convert * to ? > where appropriate and to allow manual input of ?. People on Linux can type Alt-. . I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I get ? and here in Thunderbird I get nothing. So I don't think this is very practical. > Anyone else can > set up a script that'll let them type ? using whichever keyboard combination they want. I admit this > is pretty annoying, but since you can always use * if you want to, I figure that anyone who cares > enough about using ? instead of * (I bet that people in scientific computing would like that) would > be willing to take the time to set it up. > > > What do you think? > > > Ram. > From grosser.meister.morti at gmx.net Sat Oct 13 22:25:09 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Sat, 13 Oct 2012 22:25:09 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5078FE1B.4090701@canterbury.ac.nz> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078FE1B.4090701@canterbury.ac.nz> Message-ID: <5079CE25.6020500@gmx.net> On 10/13/2012 07:37 AM, Greg Ewing wrote: > Ram Rachum wrote: >> I could say that for newbies it's one small confusion that could removed from the language. You >> and I have been programming for a long time so we take it for granted that * means multiplication, >> but for any other person that's just another weird idiosyncrasy that further alienates programming. > > Do you have any evidence that a substantial number of > beginners are confused by * for multiplication, or that > they have trouble remembering what it means once they've > been told? > > If you do, is there further evidence that they would > find a dot to be any clearer? > > The use of a raised dot to indicate multiplication of > numbers is actually quite rare even in mathematics, and I > would not expect anyone without a mathematical background > to even be aware of it. > > In primary school we're taught that 'x' means multiplication. > Later when we come to algebra, we're taught not to use > any symbol at all, just write things next to each other. > A dot is only used in rare cases where there would > otherwise be ambiguity -- and even then it's often > preferred to parenthesise things instead. > > And don't forget there's great potential for confusion > with the decimal point. > I'm -1 on the whole idea. Also why use ? and not ?? I think unicode in source code is a bad idea. From grosser.meister.morti at gmx.net Sat Oct 13 22:33:56 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Sat, 13 Oct 2012 22:33:56 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> Message-ID: <5079D034.7000006@gmx.net> On 10/13/2012 06:20 AM, Bruce Leban wrote: > Well, I learned x as a multiplication symbol long before I learned either ? or *, and in many fonts > you can barely see the middle dot. Is there a good reason, we can't just write foo x bar instead of > foo * bar? If that's confusing we could use ? instead. No one would ever confuse ? and x. > > Or for that matter how about (~R?R?.?R)/R?1??R > On related news: The source code of the APL complier (interpreter?) was released. http://www.osnews.com/story/26464/The_APL_programming_language_source_code I'm still baffled that this programming language was ever in production use. > Seriously: learning that * means multiplication is a very small thing. You also need to learn what > /, // and % do, and the difference between 'and' and &, and between =, ==, != and /=. > > --- Bruce > > > > On Fri, Oct 12, 2012 at 7:41 PM, Steven D'Aprano > > wrote: > > On 13/10/12 07:27, Ram Rachum wrote: > > Hi everybody, > > Today a funny thought occurred to me. Ever since I've learned to program > when I was a child, I've taken for granted that when programming, the sign > used for multiplication is *. But now that I think about it, why? Now that > we have Unicode, why not use ? ? > > t > 25 or so years ago, I used to do some programming in Apple's Hypertalk > language, which accepted ? in place of / for division. The use of two > symbols for the same operation didn't cause any problem for users. But then > Apple had the advantage that there was a single, system-wide, highly > discoverable way of typing non-ASCII characters at the keyboard, and Apple > users tended to pride themselves for using them. > > I'm not entirely sure about MIDDLE DOT though: especially in small font sizes, > it falls foul of the design principle: > > "syntax should not look like a speck of dust on Tim's monitor" > > (paraphrasing... can anyone locate the original quote?) > > and may be too easily confused with FULL STOP. Another problem is that MIDDLE > DOT is currently valid in identifiers, so that a?b would count as a single > name. Fixing this would require some fairly heavy lifting (a period of > deprecation and warnings for any identifier using MIDDLE DOT) before > introducing it as an operator. So that's a lot of effort for very little gain. > > If I were designing a language from scratch today, with full Unicode support > from the beginning, I would support a rich set of operators possibly even > including MIDDLE DOT and ? MULTIPLICATION SIGN, and leave it up to the user > to use them wisely or not at all. But I don't think it would be appropriate > for Python to add them, at least not before Python 4: too much effort for too > little gain. Maybe in another ten years people will be less resistant to > Unicode operators. > > > > [...] > > ?. People on Linux can type Alt-. . > > > For what it is worth, I'm using Linux and that does not work for me. I am > yet to find a decent method of entering non-ASCII characters. > > > > -- > Steven > > _________________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/__mailman/listinfo/python-ideas > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From joshua.landau.ws at gmail.com Sat Oct 13 22:45:32 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sat, 13 Oct 2012 21:45:32 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5079CCFC.4000502@gmx.net> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5079CCFC.4000502@gmx.net> Message-ID: On 13 October 2012 21:20, Mathias Panzenb?ck wrote: > On 10/12/2012 10:27 PM, Ram Rachum wrote: > >> Hi everybody, >> >> Today a funny thought occurred to me. Ever since I've learned to program >> when I was a child, I've >> taken for granted that when programming, the sign used for multiplication >> is *. But now that I think >> about it, why? Now that we have Unicode, why not use ? ? >> >> Do you think that we can make Python support ? in addition to *? >> >> I can think of a couple of problems, but none of them seem like >> deal-breakers: >> >> - Backward compatibility: Python already uses *, but I don't see a >> backward compatibility problem >> with supporting ? additionally. Let people use whichever they want, like >> spaces and tabs. >> - Input methods: I personally use an IDE that could be easily set to >> automatically convert * to ? >> where appropriate and to allow manual input of ?. People on Linux can >> type Alt-. . >> > > I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I > get ? and here in Thunderbird I get nothing. So I don't think this is very > practical. Are y'all using your Alt Grill ? M?n? ?e?s m??????? -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosser.meister.morti at gmx.net Sat Oct 13 23:50:17 2012 From: grosser.meister.morti at gmx.net (=?UTF-8?B?TWF0aGlhcyBQYW56ZW5iw7Zjaw==?=) Date: Sat, 13 Oct 2012 23:50:17 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5079CCFC.4000502@gmx.net> Message-ID: <5079E219.4070504@gmx.net> On 10/13/2012 10:45 PM, Joshua Landau wrote: > On 13 October 2012 21:20, Mathias Panzenb?ck > wrote: > > On 10/12/2012 10:27 PM, Ram Rachum wrote: > > Hi everybody, > > Today a funny thought occurred to me. Ever since I've learned to program when I was a child, > I've > taken for granted that when programming, the sign used for multiplication is *. But now that > I think > about it, why? Now that we have Unicode, why not use ? ? > > Do you think that we can make Python support ? in addition to *? > > I can think of a couple of problems, but none of them seem like deal-breakers: > > - Backward compatibility: Python already uses *, but I don't see a backward compatibility > problem > with supporting ? additionally. Let people use whichever they want, like spaces and tabs. > - Input methods: I personally use an IDE that could be easily set to automatically > convert * to ? > where appropriate and to allow manual input of ?. People on Linux can type Alt-. . > > > I use Linux (KDE4). When I press Alt-. in kwrite I simply get . in gvim I get ? and here in > Thunderbird I get nothing. So I don't think this is very practical. > > > Are y'all using your Alt Grill ? M?n? ?e?s m??????? With Alt Gr I always get ? Ah, Alt Gr-, produces ? (German keyboard here, of course.) From daniel.mcdougall at liftoffsoftware.com Sun Oct 14 00:27:22 2012 From: daniel.mcdougall at liftoffsoftware.com (Daniel McDougall) Date: Sat, 13 Oct 2012 18:27:22 -0400 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user Message-ID: (This is a response to GVR's Google+ post asking for ideas; I apologize in advance if I come off as an ignorant programming newbie) I am the author of Gate One (https://github.com/liftoff/GateOne/) which makes extensive use of Tornado's asynchronous capabilities. It also uses multiprocessing and threading to a lesser extent. The biggest issue I've had trying to write asynchronous code for Gate One is complexity. Complexity creates problems with expressiveness which results in code that, to me, feels un-Pythonic. For evidence of this I present the following example: The retrieve_log_playback() function: http://bit.ly/W532m6 (link goes to Github) All the function does is generate and return (to the client browser) an HTML playback of their terminal session recording. To do it efficiently without blocking the event loop or slowing down all other connected clients required loads of complexity (or maybe I'm just ignorant of "a better way"--feel free to enlighten me). In an ideal world I could have just done something like this: import async # The API of the future ;) async.async_call(retrieve_log_playback, settings, tws, mechanism=multiprocessing) # tws == instance of tornado.web.WebSocketHandler that holds the open connection ...but instead I had to create an entirely separate function to act as the multiprocessing.Process(), create a multiprocessing.Queue() to shuffle data back and forth, watch a special file descriptor for updates (so I can tell when the task is complete), and also create a closure because the connection instance (aka 'tws') isn't pickleable. After reading through these threads I feel much of the discussion is over my head but as someone who will ultimately become a *user* of the "async API of the future" I would like to share my thoughts... My opinion is that the goal of any async module that winds up in Python's standard library should be simplicity and portability. In terms of features, here's my 'async wishlist': * I should not have to worry about what is and isn't pickleable when I decide that a task should be performed asynchronously. * I should be able to choose the type of event loop/async mechanism that is appropriate for the task: For CPU-bound tasks I'll probably want to use multiprocessing. For IO-bound tasks I might want to use threading. For a multitude of tasks that "just need to be async" (by nature) I'll want to use an event loop. * Any async module should support 'basics' like calling functions at an interval and calling functions after a timeout occurs (with the ability to cancel). * Asynchronous tasks should be able to access the same namespace as everything else. Maybe wishful thinking. * It should support publish/subscribe-style events (i.e. an event dispatcher). For example, the ability to watch a file descriptor or socket for changes in state and call a function when that happens. Preferably with the flexibility to define custom events (i.e don't have it tied to kqueue/epoll-specific events). Thanks for your consideration; and thanks for the awesome language. -- Dan McDougall - Chief Executive Officer and Developer Liftoff Software ? Your flight to the cloud is now boarding. 904-446-8323 From greg.ewing at canterbury.ac.nz Sun Oct 14 01:33:50 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 12:33:50 +1300 Subject: [Python-ideas] PEP 428: poll about the joining syntax In-Reply-To: <5a0c9852-bb3f-40a0-8b86-060c1138f372@googlegroups.com> References: <20121008204707.48559bf9@pitrou.net> <50784504.2080801@stoneleaf.us> <50790DE0.7010207@canterbury.ac.nz> <5a0c9852-bb3f-40a0-8b86-060c1138f372@googlegroups.com> Message-ID: <5079FA5E.5010907@canterbury.ac.nz> Michele Lacchia wrote: > I wrote: > > '.j/homeo/homes/homeh/homeu/homeacj/homeo/homes/homeh/homeu/homeaoj/homeo/homes/homeh/homeu/homeanj/homeo/homes/homeh/homeu/homeafj/homeo/homes/homeh/homeu/homeaij/homeo/homes/homeh/homeu/homeag' > > Homeo, Homeo, wherefore path thou Homeo? > > -- > Greg > > I just had to +1 on this one!! Congrats! I also propose the term "julietted" to describe a string that has had this misfortune happen to it. -- Greg From jeanpierreda at gmail.com Sun Oct 14 01:42:09 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 13 Oct 2012 19:42:09 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: There has to be some way to contract emails sent in discussions rather than exploding them. I swear I'm trying to be concise, yet readable. It's not working. On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum wrote: > I also don't doubt that using classic Futures you can't do this -- the > chaining really matter for this style, and I presume this (modulo > unimportant API differences) is what typical Twisted code looks like. My experience has been unfortunately rather devoid of deferreds in Twisted. I always feel like the odd one out when people discuss this confusion. For me, it was all Protocol this and Protocol that, and deferreds only came up when I used Twisted's great AMP (Asynchronous Messaging Protocol) library. > However, Python has yield, and you can do much better (I'll write > plain yield for now, but it works the same with yield-from): > > try: > value1 = yield step1() > value2 = yield step2(value1) > value3 = yield step3(value2) > # Do something with value4 > except Exception: > # Handle any error from step1 through step4 > --snip-- > > This form is more flexible, since it is easier to catch different > exceptions at different points. It is also much easier to pass extra > information around. E.g. what if your flow ends up having to pass both > value1 and value2 into step3()? Sure, you can do that by making value2 > a tuple (or a dict, or an object) incorporating value1 and the > original value2, but that's exactly where this style becomes > cumbersome, whereas in the yield-based form, such things can remain > simple local variables. All in all I find it more readable. Well, first of all, deferreds have ways of joining values together. For example: from __future__ import print_function from twisted.internet import defer def example_joined(): d1 = defer.Deferred() d2 = defer.Deferred() # consumeErrors looks scary, but it only means that # d1 and d2's errbacks aren't called. Instead, the error is sent to d's # errback. d = defer.gatherResults([d1, d2], consumeErrors=True) d.addCallback(print) d.addErrback(lambda v: print("ERROR!")) d1.callback("The first deferred has succeeded") # now we're waiting on the second deferred to succeed, # which we'll let the caller handle return d2 example_joined().callback("The second deferred has succeeded too!") print("==============") example_joined().errback("The second deferred has failed...") I agree it's easier to use the generator style in many complicated cases. That doesn't preclude manual deferreds from also being useful. > So, in the end, for Python 3.4 and beyond, I want to promote a style > that mixes simple callbacks (perhaps augmented with simple Futures) > and generator-based coroutines (either PEP 342, yield/send-based, or > PEP 380 yield-from-based). I'm looking to Twisted for the best > reactors (see other thread). But for transport/protocol > implementations I think that generator/coroutines offers a cleaner, > better interface than incorporating Deferred. Egh. I mean, sure, supposed we have those things. But what if you want to send the result of a callback to a generator-coroutine? Presumably generator coroutines work by yielding deferreds and being called back when the future resolves (deferred fires). But if those futures/deferreds aren't unexposed, and instead only the generator stuff is exposed, then bridging the gap between callbacks and generator-coroutines is impossible. So every callback function has to also be defined to use something else. And worse, other APIs using callbacks are left in the dust. Suppose, OTOH, futures/deferreds are exposed. Then we can easily bridge between callbacks and generators, by returning a future whose `set_result` is the callback to our callback function (deferred whose `callback` is the callback). But if we're exposing futures/deferreds, why have callbacks in the first place? The difference between these two functions, is that the second can be used in generator-coroutines trivially and the first cannot: # callbacks: reactor.timer(10, print, "hello world") # deferreds reactor.timer(10).addCallback(print, "hello world") Now here's another thing: suppose we have a list of "deferred events", but instead of handling all 10 at once, we want to handle them "as they arrive", and then synthesize a result at the bottom. How do you do this with pure generator coroutines? For example, perhaps I am implementing a game server, where all the players choose their characters and then the game begins. Whenever a character is chosen, everyone else has to know about it so that they can plan their strategy based on who has chosen a character. Character selections are final, just so that I can use deferreds (hee hee). I am imagining something like the following: # WRONG: handles players in a certain order, rather than as they come in def player_lobby(reactor, players): for player in players: player_character = yield player.wait_for_confirm(reactor) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) This is wrong, because it goes in a certain order and "blocks" the coroutine until every character is chosen. Players will not know who has chosen what characters in an appropriate order. But hypothetically, maybe we could do the following: # Hypothetical magical code? def player_lobby(reactor, players): confirmation_events = UnorderedEventList([player.wait_for_confirm(reactor) for player in players]) while confirmation_events: player_character = yield confirmation_events.get_next() player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) start_game(players) But then, how do we write UnorderedEventList? I don't really know. I suspect I've made the problem harder, not easier! eek. Plus, it doesn't even read very well. Especially not compared to the deferred version: This is how I would personally do it in Twisted, without using UnorderedEventList (no magic!): @inlineCallbacks def player_lobby(reactor, players): events = [] for player in players: confirm_event = player.wait_for_confirm(reactor) @confirm_event.addCallback def on_confirmation(player_character, player=player) player.set_character(player_character) # tell all the other players what character the player has chosen notify_choice((player, player_character), players) yield gatherResults(events) start_game(players) Notice how I dropped down into the level of manipulating deferreds so that I could add this "as they come in" functionality, and then went back. Actually it wouldn't've hurt much to just not bother with inlineCallbacks at all. I don't think this is particularly unreadable. More importantly, I actually know how to do it. I have no idea how I would do this without using addCallback, or without reimplementing addCallback using inlineCallbacks. And then, supposing we don't have these deferreds/futures exposed... how do we implement delayed computation stuff from extension modules? What if we want to do these kinds of compositions within said extension modules? What if we want to write our own version of @tasks or @inlineCallbacks with extra features, or generate callback chains from XML files, and so on? I don't really like the prospect of having just the "sugary syntax" available, without a flexible underlying representation also exposed. I don't know if you've ever shared that worry -- sometimes the pretty syntax gets in the way of getting stuff done. > I hope that the path forward for Twisted will be simple enough: it > should be possible to hook Deferred into the simpler callback APIs > (perhaps a new implementation using some form of adaptation, but > keeping the interface the same). In a sense, the greenlet/gevent crowd > will be the biggest losers, since they currently write async code > without either callbacks or yield, using microthreads instead. I > wouldn't want to have to start putting yield back everywhere into that > code. But the stdlib will still support yield-free blocking calls > (even if under the hood some of these use yield/send-based or > yield-from-based couroutines) so the monkey-patchey tradition can > continue. Surely it's no harder to make yourself into a generator than to make yourself into a low-level thread-like context switching function with a saved callstack implemented by hand in assembler, and so on? I'm sure they'll be fine. >> 1. Explicit callbacks: >> >> For example, reactor.callLater(t, lambda: print("woo hoo")) > > I actually like this, as it's a lowest-common-denominator approach > which everyone can easily adapt to their purposes. See the thread I > started about reactors. Will do (but also see my response above about why not "everyone" can). >> 2. Method dispatch callbacks: >> >> Similar to the above, the reactor or somebody has a handle on your >> object, and calls methods that you've defined when events happen >> e.g. IProtocol's dataReceived method > > While I'm sure it's expedient and captures certain common patterns > well, I like this the least of all -- calling fixed methods on an > object sounds like a step back; it smells of the old Java way (before > it had some equivalent of anonymous functions), and of asyncore, which > (nearly) everybody agrees is kind of bad due to its insistence that > you subclass its classes. (Notice how subclassing as the prevalent > approach to structuring your code has gotten into a lot of discredit > since 1996.) I only used asyncore once, indirectly, so I don't know anything about it. I'm willing to dismiss it (and, in fact, various parts of twisted (I'm looking at you twisted.words)) as not good examples of the pattern. First of all, I'd like to separate the notion of subclassing and method dispatch. They're entirely unrelated. If I pass my object to you, and you call different methods depending on what happens elsewhere, that's method dispatch. And my object doesn't have to be subclassed or anything for it to happen. Now here's the thing. Suppose we're writing, for example, an IRC bot. (Everyone loves IRC bots.) My IRC bot needs to handle several different possible events, such as: private messages channel join event CTCP event and so on. My event handlers for each of these events probably manipulate some internal state (such as a log file, or a GUI). We'd probably organize this as a class, or else as a bunch of functions accessing global state. Or, perhaps a collection of closures. This last one is pretty unlikely. For the most part, these functions are all intrinsically related and can't be sensibly treated separately. You can't take the private message callback of Bot A, and the channel join callback of bot B, and register these and expect a result that makes sense. If we look at this, we're expecting to deal with a set of functions that manage shared data. The abstraction for this is usually an object, and we'd really probably write the callbacks in a class unless we were being contrarian. And it's not too crazy for the dispatcher to know this and expect you to write it as a class that supports a certain interface (certain methods correspond to certain events). Missing methods can be assumed to have the empty implementation (no subclassing, just catching AttributeError). This isn't too much of an imposition on the user -- any collection of functions (with shared state via globals or closure variables) can be converted to an object with callable attributes very simply (thanks to types.SimpleNamespace, especially). And I only really think this is OK when writing it as an object -- as a collection of functions with shared state -- is the eminently obvious primary use case, so that that situation wouldn't come up very often. So, as an example, a protocol that passes data on further down the line needs to be notified when data is received, but also when the connection begins and ends. So the twisted protocol interface has "dataReceived", "connectionMade", and "connectionLost" callbacks. These really do belong together, they manage a single connection between computers and how it gets mapped to events usable by a twisted application. So I like the convenience and suggestiveness of them all being methods on an object. >> 4. Generator coroutines >> >> These are a syntactic wrapper around deferreds. If you yield a >> deferred, you will be sent the result if the deferred succeeds, or an >> exception if the deferred fails. >> e.g. examples from previous message > > Seeing them as syntactic sugar for Deferreds is one way of looking at > it; no doubt this is how they're seen in the Twisted community because > Deferreds are older and more entrenched. But there's no requirement > that an architecture has to have Deferreds in order to use generator > coroutines -- simple Futures will do just fine, and Greg Ewing has > shown that using yield-from you can even do without those. (But he > does use simple, explicit callbacks at the lowest level of his > system.) I meant it as a factual explanation of what generator coroutines are in Twisted, not what they are in general. Sorry for the confusion. We are probably agreed here. After a cursory examination, I don't really understand Greg Ewing's thing. I'd have to dig deeper into the logs for when he first introduced it. > I'd like to come back to that Django example though. You are implying > that there are some opportunities for concurrency here, and I agree, > assuming we believe disk I/O is slow enough to bother making it > asynchronously. (In App Engine it's not, and we can't anyways, but in > other contexts I agree that it would be bad if a slow disk seek were > to hold up all processing -- not to mention that it might really be > NFS...) > --snip-- > How would you code that using Twisted Deferreds? Well. I'd replace the @task in your NDB thing with @inlineCallbacks and call it a day. ;) (I think there's enough deferred examples above, and I'm getting tired and it's been a day since I started writing this damned email.) >> For that stuff, you'd have to speak to the main authors of Twisted. >> I'm just a twisted user. :( > > They seem to be mostly ignoring this conversation, so your standing in > as a proxy for them is much appreciated! Well. We are on Python-Ideas... :( >> In the end it really doesn't matter what API you go with. The Twisted >> people will wrap it up so that they are compatible, as far as that is >> possible. > > And I want to ensure that that is possible and preferably easy, if I > can do it without introducing too many warts in the API that > non-Twisted users see and use. I probably lack the expertise to help too much with this. I can point out anything that sticks out, if/when an extended futures proposal is made. -- Devin From greg.ewing at canterbury.ac.nz Sun Oct 14 01:42:49 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 12:42:49 +1300 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121013102229.259572ad@bhuda.mired.org> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> Message-ID: <5079FC79.3040506@canterbury.ac.nz> Mike Meyer wrote: > def __$__(self, other, right): > > must match a new grammer symbol "operator_symbol", with limits on > it to for readability reasons: say at most three characters, all > coming from an appropriate unicode class or classes If it's restricted it to single Unicode character, we could use its Unicode name as the method name: def __CIRCLE_PLUS__(x, y): ... -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 14 01:48:34 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 12:48:34 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <20121013102204.7b55dc53@pitrou.net> <1350122794.3365.8.camel@localhost.localdomain> Message-ID: <5079FDD2.9060308@canterbury.ac.nz> Nick Coghlan wrote: > It's a useful trick for writing genuinely cross-platform code: when > I'm writing cross-platform code on *nix, I want my paths to behave > like posix paths in every respect *except* I want them to complain > somehow if any of my names only differ by case. I don't see how this problem can be solved purely by adjusting path object behaviour. What you want is to get a complaint whenever you try to create a file in a directory that already contains another name that is case-insensitively equal. That would have to be built into the file system access functions. -- Greg From python at mrabarnett.plus.com Sun Oct 14 02:04:59 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 14 Oct 2012 01:04:59 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <5079FC79.3040506@canterbury.ac.nz> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> Message-ID: <507A01AB.2060708@mrabarnett.plus.com> On 2012-10-14 00:42, Greg Ewing wrote: > Mike Meyer wrote: > >> def __$__(self, other, right): >> >> must match a new grammer symbol "operator_symbol", with limits on >> it to for readability reasons: say at most three characters, all >> coming from an appropriate unicode class or classes > > If it's restricted it to single Unicode character, we could > use its Unicode name as the method name: > > def __CIRCLE_PLUS__(x, y): > ... > If it's more than one codepoint, we could prefix with the length of the codepoint's name: def __12CIRCLED_PLUS__(x, y): ... From greg.ewing at canterbury.ac.nz Sun Oct 14 02:17:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 13:17:05 +1300 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: <507A0481.7050904@canterbury.ac.nz> Itamar Turner-Trauring wrote: > For example, consider the following code; silly, but buggy due to the > context switch in yield allowing race conditions if any other code > modifies counter.value while getResult() is waiting for a result. > > def addToCounter(): > counter.value = counter.value + (yield getResult()) But at least you can *see* from the presence of the 'yield' that suspension can occur. PEP 380 preserves this, because anything that can yield has to be called using 'yield from', so the potential suspension points remain visible. > That being said, perhaps some changes to Python syntax could solve this; > Allen Short > (http://washort.twistedmatrix.com/2012/10/coroutines-reduce-readability.html) > claims to have a proposal, hopefully he'll post it soon. He argues there that greenlet-style coroutines are bad because suspension can occur anywhere without warning. He likes generators better, because the 'yield' warns you that suspension might occur. Generators using 'yield from' have the same property. If his proposal involves marking the suspension points somehow, then syntactically it will probably be very similar to yield-from. -- Greg From itamar at futurefoundries.com Sun Oct 14 02:59:56 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Sat, 13 Oct 2012 20:59:56 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: <507A0481.7050904@canterbury.ac.nz> References: <507A0481.7050904@canterbury.ac.nz> Message-ID: On Sat, Oct 13, 2012 at 8:17 PM, Greg Ewing wrote: > But at least you can *see* from the presence of the 'yield' > that suspension can occur. > ... He argues there that greenlet-style coroutines are bad because > suspension can occur anywhere without warning. He likes > generators better, because the 'yield' warns you that suspension > might occur. Generators using 'yield from' have the same property. > > If his proposal involves marking the suspension points somehow, then > syntactically it will probably be very similar to yield-from. > Explicit suspension is certainly better than hidden suspension, yes. But by extension, no suspension at all is best. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 03:35:00 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Oct 2012 18:35:00 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: <507A0481.7050904@canterbury.ac.nz> Message-ID: On Sat, Oct 13, 2012 at 5:59 PM, Itamar Turner-Trauring wrote: > Explicit suspension is certainly better than hidden suspension, yes. But by > extension, no suspension at all is best. When using Deferreds, there are suspension points too. They just happen whenever a Deferred is blocked. Each next callback has to assume that the world may have changed. You may like that better. But for me, 9 out of 10 times, yield-based coroutines (whether using Futures or PEP 380's yield from) make the code more readable than the Deferred style. I do appreciate that often the Deferred style is an improvement over plain callbacks -- but I believe that explicit-yielding coroutines are so much better than Deferred that I'd rather base the future standard API on a combination of plain callbacks and either Futures+yield or yield-from (the latter without Futures). I trust that Twisted invented the best possible interface given the available machinery at the time (no yield-based coroutines at all, and not using Stackless). But now that we have explicit-yielding coroutines, I believe we should adopt a style based on those. Twisted can of course implement Deferred easily in this world using some form of adaptation, and we should ensure that this is indeed reasonable before accepting a standard. Whether it's better to use yield-from or yield remains to be seen; that debate is still ongoing in the thread "yield-from". -- --Guido van Rossum (python.org/~guido) From oscar.j.benjamin at gmail.com Sun Oct 14 04:16:57 2012 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 14 Oct 2012 03:16:57 +0100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121012214224.55f3ed27@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> Message-ID: On 12 October 2012 20:42, Antoine Pitrou wrote: > On Fri, 12 Oct 2012 12:23:46 -0700 > Ethan Furman wrote: >> >> Which is why I would like to see Path based on str, despite Guido's >> misgivings. (Yes, I know I'm probably tilting at windmills here...) >> >> If Path is string based we get backwards compatibility with all the os >> and third-party tools that expect and use strings; this would allow a >> gentle migration to using them, as opposed to the all-or-nothing if Path >> is a completely new type. > > It is not all-or-nothing since you can just call str() and it will work > fine with both strings and paths. I assumed that part of the proposal for including a new Path class was that it would (perhaps eventually rather than immediately) be directly supported by all of the standard Python APIs that expect strings-representing-paths. I apologise if I have missed something but is there some reason why it would be bad for e.g. open() to accept Path instances as they are? I think it's reasonable to require that e.g. os.open() should only accept a str, but standard open()? Oscar From ncoghlan at gmail.com Sun Oct 14 04:26:04 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 14 Oct 2012 12:26:04 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: My general thought on this is that "yield from generator" is the coroutine equivalent of a function call, while "yield future" would be the way the lowest level of the generator talked to the standard event loop. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 04:39:06 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Oct 2012 19:39:06 -0700 Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep) In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 10:54 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum wrote: >> >> [Hopefully this is the last spin-off thread from "asyncore: included >> batteries don't fit"] >> >> So it's totally unfinished? > > > At the time, the people I talked to placed significantly more weight in > "explain why this is necessary" than "get me something I can play with". Odd. Were those people experienced in writing / reviewing PEPs? >> > Do you feel that there should be less talk about rationale? >> >> No, but I feel that there should be some actual specification. I am >> also looking forward to an actual meaty bit of example code -- ISTR >> you mentioned you had something, but that it was incomplete, and I >> can't find the link. > > Just examples of how it would work, nothing hooked up to real code. My > memory of it is more of a drowning-in-politics-and-bikeshedding kind of > thing, unfortunately :) Either way, I'm okay with letting bygones be bygones > and focus on how we can get this show on the road. Shall I just reject PEP 3153 so it doesn't distract people? Of course we can still refer to it when people ask for a rationale for the separation between transports and protocols, but it doesn't seem the PEP itself is going to be finished (correct me if I'm wrong), and as it stands it is not useful as a software specification. >> > It's not that there's *no* reference to IO: it's just that that >> > reference is >> > abstracted away in data_received and the protocol's transport object, >> > just >> > like Twisted's IProtocol. >> >> The words "data_received" don't even occur in the PEP. > > > See above. > > What thread should I reply in about the pull APIs? Probably the yield-from thread; or the Twisted/Deferred thread. >> I just want to make sure that we don't *completely* paint ourselves into >> the wrong corner when it comes to that. > > > I don't think we have to worry about it too much. Any reasonable API I can > think of makes this completely doable. Agreed that we needn't constantly worry about it. It should be enough to have some kind of reality check closer to PEP accept time. >> But I'm really hoping you'll make good on your promise of redoing >> async-pep, giving some actual specifications and example code, so I >> can play with it. > > > Takeaways: > > - The async API of the future is very important, and too important to be > left to chance. That's why we're discussing it here. > - It requires a lot of very experienced manpower. It also requires (a certain level of) *agreement* between people with different preferences, since it's no good if the community fragments or the standard solution gets ignored by Twisted and Tornado, for example. Ideally those packages (that is, their Python 3.4 versions) would build on and extend the standard API, and for "boring" stuff (like possibly the event loop) they would just use the standard solution. > - It requires a lot of effort to handle the hashing out of it (as we're > doing here) as well as it deserves to be. Right. > I'll take as proactive a role as I can afford to take in this process, but I > don't think I can do it by myself. I hope I didn't come across as asking you that! I am just hoping that you can give some concrete, working example code showing how to do protocols and transports. > Furthermore, it's a risk nobody wants to > take: a repeat performance wouldn't be good for anyone, in particular not > for Python nor myself. A repeat of what? Of the failure of PEP 3153? Don't worry about that. This time around I'm here, and since then I have got a lot of experience implementing and using a solid async library (albeit of a quite different nature than the typical socket-based stuff that most people do). > I've asked JP Calderone and Itamar Turner-Trauring if they would be > interested in carrying this forward professionally, and they have > tentatively said yes. JP's already familiar with a large part of the problem > space with the implementation of the ssl module. JP and Itamar have worked > together for years and have recently set up a consulting firm. Insight in the right way to support SSL would be huge; it is an excellent example of a popular transport that does *not* behave like sockets, even though its abstract conceptual model is similar (a setup phase, followed by two bidirectional byte streams). > Given that this is emphatically important to Python, I intend to apply for a > PSF grant on their behalf to further this goal. Given their experience in > the field, I expect this to be a fairly low risk endeavor. Famous last words. :-) -- --Guido van Rossum (python.org/~guido) From glyph at twistedmatrix.com Sun Oct 14 05:41:02 2012 From: glyph at twistedmatrix.com (Glyph) Date: Sat, 13 Oct 2012 20:41:02 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: Message-ID: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> On Oct 13, 2012, at 9:17 AM, Guido van Rossum wrote: > On Fri, Oct 12, 2012 at 9:46 PM, Glyph wrote: >> There has been a lot written on this list about asynchronous, microthreaded and event-driven I/O in the last couple of days. There's too much for me to try to respond to all at once, but I would very much like to (possibly re-)introduce one very important point into the discussion. >> >> Would everyone interested in this please please please read several times? Especially this section: . If it is not clear, please ask questions about it and I will try to needle someone qualified into improving the explanation. > > I am well aware of that section. But, like the rest of PEP 3153, it is > sorely lacking in examples or specifications. If that's what the problem is, I will do what I can to get those sections fleshed out ASAP. >> I am bringing this up because I've seen a significant amount of discussion of level-triggering versus edge-triggering. Once you have properly separated out transport logic from application implementation, triggering style is an irrelevant, private implementation detail of the networking layer. > > This could mean several things: (a) only the networking layer needs to use both trigger styles, the rest of your code should always use trigger style X (and please let X be edge-triggered :-); (b) only in the networking layer is it important to distinguish carefully between the two, in the rest of the app you can use whatever you like best. Edge triggering and level triggering both have to do with changes in boolean state. Edge triggering is "call me when this bit is changed"; level triggering is "call me (and keep calling me) when this bit is set". The metaphor extends very well from the electrical-circuit definition, but the distinction is not very meaningful to applications who want to subscribe to a semantic event and not the state of a bit. Applications want to know about particular bits of information, not state changes. Obviously when data is available on the connection, it's the bytes that the application is interested in. When a new connection is available to be accept()-ed, the application wants to know that as a distinct notification. There's no way to deliver data or new connected sockets to the application as "edge-triggered"; if the bit is still set later, then there's more, different data to be delivered, which needs a separate notification. But, even in more obscure cases like "the socket is writable", the transport layer needs to disambiguate between "the connection has closed unexpectedly" and "you should produce some more data for writing now". (You might want to also know how much buffer space is available, although that is pretty fiddly.) The low-level event loop needs to have both kinds of callbacks, but avoid exposing the distinction to application code. However, this doesn't mean all styles need to be implemented. If Python defines a core event loop interface specification, it doesn't have to provide every type of loop. Twisted can continue using its reactors, Tornado can continue using its IOLoop, and each can have transforming adapters to work with standard-library protocols. When the "you should read some data" bit is set, an edge-triggered transport receives that notification, reads the data, which immediately clears that bit, so it responds to the next down->up edge notification in the same way. The level-triggered transport does the same thing: it receives the notification that the bit is set, then immediately clears it by reading the data; therefore, if it gets another notification that the bit is high, that means it's high again, and more data needs to be read. >> Whether the operating system tells Python "you must call recv() once now" or "you must call recv() until I tell you to stop" should not matter to the application if the application is just getting passed the results of recv() which has already been called. Since not all I/O libraries actually have a recv() to call, you shouldn't have the application have to call it. This is perhaps the central design error of asyncore. > > Is this about buffering? Because I think I understand buffering. Filling up a buffer with data as it comes in (until a certain limit) is a good job for level-triggered callbacks. Ditto for draining a buffer. In the current Twisted implementation, you just get bytes objects delivered; when it was designed, 'str' was really the only game in town. However, I think this still applies because the first thing you're going to do when parsing the contents of your buffer is to break it up into chunks by using some handy bytes method. In modern Python, you might want to get a bytearray plus an offset delivered instead, because a bytearray can use recv_into, and a bytearray might be reusable, and could possibly let you implement some interesting zero-copy optimizations. However, in order to facilitate that, bytearray would need to have zero-copy implementations of split() and re.search() and such. In my opinion, the prerequisite for using anything other than a bytes object in practical use would be a very sophisticated lazy-slicing data structure, with zero-copy implementations of everything, and a copy-on-write version of recv_into so that if the sliced-up version of the data structure is shared between loop iterations the copies passed off to other event handlers don't get stomped on. (Although maybe somebody's implemented this while I wasn't looking?) This kind of pay-only-for-what-you-need buffering is really cool, a lot of fun to implement, and it will give you very noticeable performance gains if you're trying to write a wire-speed proxy or router with almost no logic in it; however, I've never seen it really be worth the trouble in any other type of application. I'd say that if we can all agree on the version that delivers bytes, the version that re-uses a fixed-sized bytearray buffer could be an optional feature in the 2.0 version of the spec. > The rest of the app can then talk to the buffer and tell it "give me between X and Y bytes, possibly blocking if you don't have at least X available right now, or "here are N more bytes, please send them out when you can". From the app's position these calls *may* block, so they need to use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) to ensure that *if* they block, other tasks can run. This is not how the application should talk to the receive buffer. Bytes should not necessarily be directly be requested by the application: they simply arrive. If you have to model everything in terms of a request-for-bytes/response-to-request idiom, there are several problems: 1. You have to heap-allocate an additional thing-to-track-the-request object every time you ask for bytes, which adds non-trivial additional overhead to the processing of simple streams. (The C-level event object that i.e. IOCP uses to track the request is slightly different, because it's a single signaling event and you should only ever have one outstanding per connection, so you don't have to make a bunch of them.) 2. Multiple listeners might want to "read" from a socket at once; for example, if you have a symmetric protocol where the application is simultaneously waiting for a response message from its peer and also trying to handle new requests of its own. (This is required in symmetric protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in this direction too.) 3. Even assuming you deal with part 1 and 2 properly - they are possible to work around - error-handling becomes tricky and tedious. You can't effectively determine in your coroutine scheduler which errors are in the code that is reading or writing to a given connection (because the error may have been triggered by code that was reading or writing to a different connection), so sometimes your sockets will just go off into la-la land with nothing reading from them or writing to them. In Twisted, if a dataReceived handler causes an error, then we know it's time to shut down that connection and close that socket; there's no ambiguity. Even if you want to write your protocol parsers in a yield-coroutine style, I don't think you want the core I/O layer to be written in that style; it should be possible to write everything as "raw" it's-just-a-method event handlers because that is really the lowest common denominator and therefore the lowest overhead; both in terms of performance and in terms of simplicity of debugging. It's easy enough to write a thing with a .data_received(data) method that calls send() on the appropriate suspended generator. > But the common case is that they don't actually need to block because there is still data / space in the buffer. I don't think that this is necessarily the "common case". Certainly in bulk-transfer protocols or in any protocol that supports pipelining, you usually fill up the buffer completely on every iteration. > (You could also have an exception for write() and make that never-blocking, trusting the app not to overfill the buffer; this seems convenient but it worries me a bit.) That's how Twisted works... sort of. If you call write(), it always just does its thing. That said, you can ask to be notified if you've written too much, so that you can slow down. (Flow-control is sort of a sore spot for the current Twisted API; what we have works, and it satisfies the core requirements, but the shape of the API is definitely not very convenient. outlines the next-generation streaming and flow-control primitives that we are currently working on. I'm very excited about those but they haven't been battle-tested yet.) If you're talking about "blocking" in a generator-coroutine style, then well-written code can do yield write(x) yield write(y) yield write(z) and "lazy" code, that doesn't care about over-filling its buffer, can just do write(x) write(y) yield write(z) there's no reason that the latter style ought to cause any sort of error. >> If it needs a name, I suppose I'd call my preferred style "event triggering". > > But how does it work? What would typical user code in this style look like? It really depends on the layer. You have to promote what methods get called at each semantic layer; but, at the one that's most interesting for interoperability, the thing that delivers bytes to protocol parsers, it looks something like this: def data_received(self, data): lines = (self.buf + data).split("\r\n") for line in lines[:-1]: self.line_received(line) self.buf = lines[-1] At a higher level, you might have header_received, http_request_received, etc. The thing that calls data_received typically would look like this: def handle_read(self): try: data = self.socket.recv(self.buffer_size) except socket.error, se: if se.args[0] == EWOULDBLOCK: return else: return main.CONNECTION_LOST else: try: self.protocol.data_received(data) except: log_the_error() self.disconnect() although it obviously looks a little different in the case of IOCP. >> Also, I would like to remind all participants that microthreading, request/response abstraction (i.e. Deferreds, Futures), generator coroutines and a common API for network I/O are all very different tasks and do not need to be accomplished all at once. If you try to build something that does all of this stuff, you get most of Twisted core plus half of Stackless all at once, which is a bit much for the stdlib to bite off in one chunk. > > Well understood. (And I don't even want to get microthreading into the > mix, although others may disagree -- I see Christian Tismer has jumped > in...) But I also think that if we design these things in isolation > it's likely that we'll find later that the pieces don't fit, and I > don't want that to happen either. So I think we should consider these > separate, but loosely coordinated efforts. Great, glad to hear it. -g -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Oct 14 06:30:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 17:30:04 +1300 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> Message-ID: <507A3FCC.9040308@canterbury.ac.nz> Oscar Benjamin wrote: > I think it's reasonable to require that > e.g. os.open() should only accept a str, but standard open()? Why shouldn't os.open() accept a path object? Especially if we use a protocol such as __strpath__ so that the os module doesn't have to explicitly know about the Path classes. -- Greg From guido at python.org Sun Oct 14 06:49:07 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Oct 2012 21:49:07 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> Message-ID: On Sat, Oct 13, 2012 at 8:41 PM, Glyph wrote: > > On Oct 13, 2012, at 9:17 AM, Guido van Rossum wrote: > > On Fri, Oct 12, 2012 at 9:46 PM, Glyph wrote: > > There has been a lot written on this list about asynchronous, > microthreaded and event-driven I/O in the last couple of days. There's too > much for me to try to respond to all at once, but I would very much like to > (possibly re-)introduce one very important point into the discussion. > > Would everyone interested in this please please please read < > https://github.com/lvh/async-pep/blob/master/pep-3153.rst> several times? > Especially this section: < > https://github.com/lvh/async-pep/blob/master/pep-3153.rst#why-separate-protocols-and-transports>. > If it is not clear, please ask questions about it and I will try to needle > someone qualified into improving the explanation. > > > I am well aware of that section. But, like the rest of PEP 3153, it is > sorely lacking in examples or specifications. > > > If that's what the problem is, I will do what I can to get those sections > fleshed out ASAP. > I'd love that! Laurens seems burned-out from his previous attempts at authoring that PEP and has not volunteered any examples. > I am bringing this up because I've seen a significant amount of discussion > of level-triggering versus edge-triggering. Once you have properly > separated out transport logic from application implementation, triggering > style is an irrelevant, private implementation detail of the networking > layer. > > > This could mean several things: (a) only the networking layer needs to use > both trigger styles, the rest of your code should always use trigger style > X (and please let X be edge-triggered :-); (b) only in the networking layer > is it important to distinguish carefully between the two, in the rest of > the app you can use whatever you like best. > > > Edge triggering and level triggering both have to do with changes in > boolean state. Edge triggering is "call me when this bit is changed"; > level triggering is "call me (and keep calling me) when this bit is set". > The metaphor extends very well from the electrical-circuit definition, but > the distinction is not very meaningful to applications who want to > subscribe to a semantic event and not the state of a bit. > I am well aware of the terms' meanings in electrical circuits. It seems that, alas, I may have misunderstood how the terms are used in the world of callbacks. In my naivete, when they were brought up, I thought that edge-triggered meant "call this callback once, when this specific event happens" (e.g. a specific async read or write call completing) whereas level-triggered referred to "call this callback whenever a certain condition is true" (e.g. a socket is ready for reading or writing). But from your messages it seems that it's more a technical term for different ways of dealing with the latter, so that in either case it is about multiple-call callbacks. If this is indeed the case I profusely apologize for the confusion I have probably caused. (Hopefully most people glazed over anyway. :-) Applications want to know about particular bits of information, not state > changes. Obviously when data is available on the connection, it's the > bytes that the application is interested in. When a new connection is > available to be accept()-ed, the application wants to know that as a > distinct notification. There's no way to deliver data or new connected > sockets to the application as "edge-triggered"; if the bit is still set > later, then there's more, different data to be delivered, which needs a > separate notification. But, even in more obscure cases like "the socket is > writable", the transport layer needs to disambiguate between "the > connection has closed unexpectedly" and "you should produce some more data > for writing now". (You might want to also know how much buffer space is > available, although that is pretty fiddly.) > Excuse my ignorance, but are there ioctl() calls to get at this kind of information, or do you just have to try to call send()/write() and interpret the error you get back? > The low-level event loop needs to have both kinds of callbacks, but avoid > exposing the distinction to application code. However, this doesn't mean > all styles need to be implemented. If Python defines a core event loop > interface specification, it doesn't have to provide every type of loop. > Twisted can continue using its reactors, Tornado can continue using its > IOLoop, and each can have transforming adapters to work with > standard-library protocols. > I'm not 100% sure I follow this. I think you are saying that in some systems the system level (the kernel, say) has an edge-triggered API and in other systems it is level-triggered? And that it doesn't matter much since it's easy to turn either into the other? If I've got that right, do you have a preference for what style the standard-library interface should use? And why? > When the "you should read some data" bit is set, an edge-triggered > transport receives that notification, reads the data, which immediately > clears that bit, so it responds to the next down->up edge notification in > the same way. The level-triggered transport does the same thing: it > receives the notification that the bit is set, then immediately clears it > by reading the data; therefore, if it gets another notification that the > bit is high, that means it's high again, and more data needs to be read. > Makes sense. So they both refer to multi-call callbacks (I don't know what you call these). And it looks like a common application of either is buffered streams, and another is incoming connections to listening sockets. Both seem to belong to the world of transports. Right? > Whether the operating system tells Python "you must call recv() once now" > or "you must call recv() until I tell you to stop" should not matter to the > application if the application is just getting passed the results of recv() > which has already been called. Since not all I/O libraries actually have a > recv() to call, you shouldn't have the application have to call it. This > is perhaps the central design error of asyncore. > > > Is this about buffering? Because I think I understand buffering. Filling > up a buffer with data as it comes in (until a certain limit) is a good job > for level-triggered callbacks. Ditto for draining a buffer. > > > In the current Twisted implementation, you just get bytes objects > delivered; when it was designed, 'str' was really the only game in town. > However, I think this still applies because the first thing you're going > to do when parsing the contents of your buffer is to break it up into > chunks by using some handy bytes method. > > In modern Python, you *might* want to get a bytearray plus an offset > delivered instead, because a bytearray can use recv_into, and a bytearray > might be reusable, and could possibly let you implement some interesting > zero-copy optimizations. However, in order to facilitate that, bytearray > would need to have zero-copy implementations of split() and re.search() and > such. > That sounds like a *very* low-level consideration to me, and as you suggest unrealistic given the other limitations. I would rather just get bytes objects and pay for the copying. I know some people care deeply about extra copies, and in certain systems they are probably right, but I doubt that those systems would be implemented in Python even if we *did* bend over backwards to avoid copies. And it really would make the interface much more painful to use. Possibly there could be a separate harder-to-use lower-level API that deals in bytearrays for a few connoisseurs, but we probably shouldn't promote it much, and since it's always possible to add APIs later, I'd rather avoid defining it for version 1. > In my opinion, the prerequisite for using anything other than a bytes > object in practical use would be a very sophisticated lazy-slicing data > structure, with zero-copy implementations of everything, and a > copy-on-write version of recv_into so that if the sliced-up version of the > data structure is shared between loop iterations the copies passed off to > other event handlers don't get stomped on. (Although maybe somebody's > implemented this while I wasn't looking?) > > This kind of pay-only-for-what-you-need buffering is really cool, a lot of > fun to implement, and it will give you very noticeable performance gains if > you're trying to write a wire-speed proxy or router with almost no logic in > it; however, I've never seen it really be worth the trouble in any other > type of application. I'd say that if we can all agree on the version that > delivers bytes, the version that re-uses a fixed-sized bytearray buffer > could be an optional feature in the 2.0 version of the spec. > Seems we are in perfect agreement (I wrote the above without reading this far :-). > The rest of the app can then talk to the buffer and tell it "give me > between X and Y bytes, possibly blocking if you don't have at least X > available right now, or "here are N more bytes, please send them out when > you can". From the app's position these calls *may* block, so they need to > use whatever mechanism (callbacks, Futures, Deferreds, yield, yield-from) > to ensure that *if* they block, other tasks can run. > > > This is not how the application should talk to the receive buffer. Bytes > should not necessarily be directly be requested by the application: they > simply arrive. If you have to model everything in terms of a > request-for-bytes/response-to-request idiom, there are several problems: > (Thanks for writing this; this is the kind of insight I am hoping to get from you and others.) > 1. You have to heap-allocate an additional thing-to-track-the-request > object every time you ask for bytes, which adds non-trivial additional > overhead to the processing of simple streams. (The C-level event object > that i.e. IOCP uses to track the request is slightly different, because > it's a single signaling event and you should only ever have one outstanding > per connection, so you don't have to make a bunch of them.) > > 2. Multiple listeners might want to "read" from a socket at once; for > example, if you have a symmetric protocol where the application is > simultaneously waiting for a response message from its peer and also trying > to handle new requests of its own. (This is required in symmetric > protocols, like websockets and XMPP, and HTTP/2.0 seems to be moving in > this direction too.) > > 3. Even assuming you deal with part 1 and 2 properly - they are possible > to work around - error-handling becomes tricky and tedious. You can't > effectively determine in your coroutine scheduler which errors are in the > code that is reading or writing to a given connection (because the error > may have been triggered by code that was reading or writing to a different > connection), so sometimes your sockets will just go off into la-la land > with nothing reading from them or writing to them. In Twisted, if a > dataReceived handler causes an error, then we know it's time to shut down > that connection and close that socket; there's no ambiguity. > I'll have to digest all this, but I'll be sure to think about this carefully. My kneejerk reactions are that (1) heap allocations are unavoidable anyway, (2) if there are multiple listeners there should be some other layer demultiplexing, and (3) nobody gets error handling right anyway; but I should be very suspicious of kneejerks, even my own. > Even if you want to write your protocol parsers in a yield-coroutine > style, I don't think you want the core I/O layer to be written in that > style; it should be possible to write everything as "raw" > it's-just-a-method event handlers because that is really the lowest common > denominator and therefore the lowest overhead; both in terms of performance > and in terms of simplicity of debugging. It's easy enough to write a thing > with a .data_received(data) method that calls send() on the appropriate > suspended generator. > I agree. In fact, the lowest level in NDB (my own big foray into async, albeit using App Engine's RPC instead of sockets) is written as an event loop with no references to generators or Futures -- all it knows about are RPCs and callback functions. (Given the way the RPC class is defined in App Engine, calling a designated method on the RPC object is out of the question, everything is callables plus *args plus **kwds.) > But the common case is that they don't actually need to block because > there is still data / space in the buffer. > > > I don't think that this is necessarily the "common case". Certainly in > bulk-transfer protocols or in any protocol that supports pipelining, you > usually fill up the buffer completely on every iteration. > Another pragmatic observation that I wouldn't have been able to make on my own. > (You could also have an exception for write() and make that > never-blocking, trusting the app not to overfill the buffer; this seems > convenient but it worries me a bit.) > > > That's how Twisted works... sort of. If you call write(), it always just > does its thing. That said, you can ask to be notified if you've written > too much, so that you can slow down. > > (Flow-control is sort of a sore spot for the current Twisted API; what we > have works, and it satisfies the core requirements, but the shape of the > API is definitely not very convenient. outlines the > next-generation streaming and flow-control primitives that we are currently > working on. I'm very excited about those but they haven't been > battle-tested yet.) > > If you're talking about "blocking" in a generator-coroutine style, then > well-written code can do > > yield write(x) > yield write(y) > yield write(z) > > > and "lazy" code, that doesn't care about over-filling its buffer, can just > do > > write(x) > write(y) > yield write(z) > > > there's no reason that the latter style ought to cause any sort of error. > Good to know. > If it needs a name, I suppose I'd call my preferred style "event > triggering". > > > But how does it work? What would typical user code in this style look like? > > > It really depends on the layer. You have to promote what methods get > called at each semantic layer; but, at the one that's most interesting for > interoperability, the thing that delivers bytes to protocol parsers, it > looks something like this: > > def data_received(self, data): > lines = (self.buf + data).split("\r\n") > for line in lines[:-1]: > self.line_received(line) > self.buf = lines[-1] > > I see, I've written code like this many times, with many variations. > At a higher level, you might have header_received, http_request_received, > etc. > > The thing that calls data_received typically would look like this: > > def handle_read(self): > try: > data = self.socket.recv(self.buffer_size) > except socket.error, se: > if se.args[0] == EWOULDBLOCK: > return > else: > return main.CONNECTION_LOST > else: > try: > self.protocol.data_received(data) > except: > log_the_error() > self.disconnect() > > > although it obviously looks a little different in the case of IOCP. > It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL, and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation. > Also, I would like to remind all participants that microthreading, > request/response abstraction (i.e. Deferreds, Futures), generator > coroutines and a common API for network I/O are all very different tasks > and do not need to be accomplished all at once. If you try to build > something that does all of this stuff, you get most of Twisted core plus > half of Stackless all at once, which is a bit much for the stdlib to bite > off in one chunk. > > > Well understood. (And I don't even want to get microthreading into the > mix, although others may disagree -- I see Christian Tismer has jumped > in...) But I also think that if we design these things in isolation > it's likely that we'll find later that the pieces don't fit, and I > don't want that to happen either. So I think we should consider these > separate, but loosely coordinated efforts. > > > Great, glad to hear it. > Thanks for taking the time to respond! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 07:03:17 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 13 Oct 2012 22:03:17 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: <20121013081445.40d6d78f@pitrou.net> References: <20121013081445.40d6d78f@pitrou.net> Message-ID: [Quick, I know I'm way behind, especially on this thread; more tomorrow.] On Fri, Oct 12, 2012 at 11:14 PM, Antoine Pitrou wrote: > > On Fri, 12 Oct 2012 15:11:54 -0700 > Guido van Rossum wrote: > > > > > 2. Method dispatch callbacks: > > > > > > Similar to the above, the reactor or somebody has a handle on your > > > object, and calls methods that you've defined when events happen > > > e.g. IProtocol's dataReceived method > > > > While I'm sure it's expedient and captures certain common patterns > > well, I like this the least of all -- calling fixed methods on an > > object sounds like a step back; it smells of the old Java way (before > > it had some equivalent of anonymous functions), and of asyncore, which > > (nearly) everybody agrees is kind of bad due to its insistence that > > you subclass its classes. (Notice how subclassing as the prevalent > > approach to structuring your code has gotten into a lot of discredit > > since 1996.) > > But how would you write a dataReceived equivalent then? Would you have > a "task" looping on a read() call, e.g. > > @task > def my_protocol_main_loop(conn): > while : > try: > data = yield conn.read(1024) > except ConnectionError: > conn.close() > break No, I would use plain callbacks. There would be some kind of IOObject class defined by the stdlib that wraps a socket (it would make it non-blocking, and possibly to other things), and the user would make a registration call to the event loop giving it the IOOjbect and the user's callback function plus *args and **kwds; the event loop would call callback(*args, **kwds) each time the IOObject became readable. (Oh, and there would be separate registration (and unregistration) functions for reading and writing.) Apparently my rants about callbacks have made people assume that I don't want to see them anywhere. In fact I am comfortable with callbacks for a number of situations -- I just think we have several other tools in our toolbox that are way underused, whereas callbacks are way overused, in part because the alternative tools are relatively new. This way the user could switch to a different callback when a different phase of the protocol is reached. I realize there are other shapes this API could take. But I really don't want the user to have to subclass IOObject. > I'm not sure I understand the problem with subclassing. It works fine > in Twisted. Even in Python 3 we don't shy away from subclassing, for > example the IO stack is based on subclassing RawIOBase, BufferedIOBase, > etc. I'm fine with using subclassing for the internal structure of a library. (The IOObject I am postulating would almost certainly have a bunch of subclasses used for different types of sockets, IOCP, SSL, etc.) The thing that I've soured upon (and many others too) is to tell users "and to use this fine feature, just subclass this handy base class and override or extend the following three methods". Because in practice (certainly in Python, where the compiler doesn't enforce privacy) users always start overriding other methods, or using internal state, or add state that clashes with the base class's state, or forget to call mandatory super calls, or make incorrect assumptions about thread-safety, or whatever else they can do to screw things up. And duck typing isn't ideal either for this situation. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sun Oct 14 07:16:10 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 18:16:10 +1300 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: <507A4A9A.30203@canterbury.ac.nz> Devin Jeanpierre wrote: > Presumably > generator coroutines work by yielding deferreds and being called back > when the future resolves (deferred fires). That's one way to go about it, but it's not the only way. See here for my take on how it might work: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yf_current/Examples/Scheduler/scheduler.txt -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 14 07:29:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 18:29:05 +1300 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> Message-ID: <507A4DA1.2070701@canterbury.ac.nz> Guido van Rossum wrote: > I thought that > edge-triggered meant "call this callback once, when this specific event > happens" (e.g. a specific async read or write call completing) whereas > level-triggered referred to "call this callback whenever a certain > condition is true" (e.g. a socket is ready for reading or writing). Not sure if this is relevant, but I'd just like to point out that the behaviour of select() in this respect is actually *edge triggered* by this definition. Once it has reported that a given file descriptor is ready, it *won't* report that file descriptor again until you do something with it. This can be a subtle source of bugs in select-based code if you're not aware of it. -- Greg From ubershmekel at gmail.com Sun Oct 14 07:40:57 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 14 Oct 2012 07:40:57 +0200 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507A01AB.2060708@mrabarnett.plus.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> Message-ID: On Sun, Oct 14, 2012 at 2:04 AM, MRAB wrote: > If it's more than one codepoint, we could prefix with the length of the > codepoint's name: > > def __12CIRCLED_PLUS__(x, y): > ... > > That's a bit impractical, and why reinvent the wheel? I'd much rather: def \u2295(x, y): .... So readable I want to read it twice. And that's not legal python today so we don't break backwards compatibility! Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Oct 14 08:24:55 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 19:24:55 +1300 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: <507A5AB7.8010007@canterbury.ac.nz> Devin Jeanpierre wrote (concerning callbacks): > If we look at this, we're expecting to deal with a set of functions > that manage shared data. The abstraction for this is usually an > object, and we'd really probably write the callbacks in a class unless > we were being contrarian. And it's not too crazy for the dispatcher to > know this and expect you to write it as a class that supports a > certain interface (certain methods correspond to certain events). IIUC, what Guido objects to is callbacks that are methods *of the I/O object*, so that you have to subclass the library-supplied object and override them. You seem to be talking about something slightly different -- an object that's entirely supplied by the user, and simply bundles a set of callbacks together. That doesn't seem so bad. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 14 09:12:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 14 Oct 2012 20:12:04 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <507A65C4.9010709@canterbury.ac.nz> I've had some thoughts on why I'm uncomfortable about this kind of pattern: data = yield sock.async_read(1024) The idea here is that sock.async_read() returns a Future or similar object that performs the I/O and waits for the result. However, reading the data isn't necessarily the point at which the suspension actually occurs. If you're using a select-style event loop, the async read operation breaks down into 1. Wait for data to arrive on the socket 2. Read the data So the implementation of sock.async_read() is going to have to create another Future to handle waiting for the socket to become ready. But then the outer Future is an unnecessary complication, because you could get the same effect by defining def async_read(self, length): yield future_to_wait_for_fd(self.fd) return os.read(self.fd, length) and calling it using data = yield from sock.async_read(1024) If Futures are to appear anywhere, they should only be at the very bottom layer, at the transition between generator and non-generator code. And the place where that transition occurs depend on how the lower levels are implemented. If you're using IOCP instead of select, for example, you need to do things the other way around: 1. Start the read operation 2. Wait for it to complete So I feel that all public APIs should be functions called using yield-from, leaving it up to the implementation to decide if and where Futures become involved. -- Greg From _ at lvh.cc Sun Oct 14 11:32:01 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 14 Oct 2012 11:32:01 +0200 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 12:27 AM, Daniel McDougall < daniel.mcdougall at liftoffsoftware.com> wrote: > (This is a response to GVR's Google+ post asking for ideas; I > apologize in advance if I come off as an ignorant programming newbie) > -- snip snip snip -- import async # The API of the future ;) > async.async_call(retrieve_log_playback, settings, tws, > mechanism=multiprocessing) > # tws == instance of tornado.web.WebSocketHandler that holds the open > connection > Is this a CPU-bound problem? My opinion is that the goal of any async module that winds up in > Python's standard library should be simplicity and portability. In > terms of features, here's my 'async wishlist': > > * I should not have to worry about what is and isn't pickleable when I > decide that a task should be performed asynchronously. > Certainly. My above question is important, because this should only matter for IPC. > * I should be able to choose the type of event loop/async mechanism > that is appropriate for the task: For CPU-bound tasks I'll probably > want to use multiprocessing. For IO-bound tasks I might want to use > threading. For a multitude of tasks that "just need to be async" (by > nature) I'll want to use an event loop. > Ehhh, maybe. This sounds like it confounds the tools for different use cases. You can quite easily have threads and processes on top of an event loop; that works out particularly nicely for processes because you still have to talk to your processes. Examples: twisted.internet.reactor.spawnProcess (local processes) twisted.internet.threads.deferToThread (local threads) ampoule (remote processes) It's quite easy to do blocking IO in a thread with deferToThread; in fact, that's how twisted's adbapi, an async wrapper to dbapi, works. * Any async module should support 'basics' like calling functions at > an interval and calling functions after a timeout occurs (with the > ability to cancel). > * Asynchronous tasks should be able to access the same namespace as > everything else. Maybe wishful thinking. > With twisted, this is already the case; general caveats for shared mutable state across threads of course still apply. Fortunately in most Twisted apps, that's a tiny fraction of the total code, and they tend to be fractions that are well-isolated or at least easily isolatable. > * It should support publish/subscribe-style events (i.e. an event > dispatcher). For example, the ability to watch a file descriptor or > socket for changes in state and call a function when that happens. > Preferably with the flexibility to define custom events (i.e don't > have it tied to kqueue/epoll-specific events). > Like connectionMade, connectionLost, dataReceived etc? > > Thanks for your consideration; and thanks for the awesome language. > > -- > Dan McDougall - Chief Executive Officer and Developer > Liftoff Software ? Your flight to the cloud is now boarding. > 904-446-8323 > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Oct 14 12:40:48 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 12:40:48 +0200 Subject: [Python-ideas] The async API of the future: yield-from References: <507A65C4.9010709@canterbury.ac.nz> Message-ID: <20121014124048.193ad446@pitrou.net> On Sun, 14 Oct 2012 20:12:04 +1300 Greg Ewing wrote: > > So the implementation of sock.async_read() is going > to have to create another Future to handle waiting > for the socket to become ready. But then the outer > Future is an unnecessary complication, because you > could get the same effect by defining > > def async_read(self, length): > yield future_to_wait_for_fd(self.fd) > return os.read(self.fd, length) read() may fail even if select() returned successfully. See http://bugs.python.org/issue9090 What this means is that your select-style event loop should probably also handle actually reading the data. Besides, this will make its API more easily ported to something like IOCP. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From steve at pearwood.info Sun Oct 14 12:48:59 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Oct 2012 21:48:59 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> Message-ID: <507A989B.9020206@pearwood.info> On 13/10/12 18:41, Nick Coghlan wrote: > str has a *big* API, and much of it doesn't make any sense in the > particular case of path objects. In particular, path objects shouldn't > be iterable, because it isn't clear what iteration should mean: it > could be path segments, it could be parent paths, or it could be > directory contents. It definitely *shouldn't* be individual > characters, but that's what we would get if it inherited from strings. Ah, I wondered if anyone else had picked up on that. When I read the PEP, I was concerned about the mental conflict between iteration and indexing of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does something completely different from iterating over p directly. Indexing gives path components; iteration gives children of the path (like os.walk). -1 on iteration over the children. Instead, use: for child in p.walk(): ... which has the huge benefit that the walk method can take arguments as needed, such as the args os.walk takes: topdown=True, onerror=None, followlinks=False plus I'd like to see a "filter" argument to filter which children are (or aren't) seen. +1 on indexing giving path components, although the side effect of this is that you automatically get iteration via the sequence protocol. So be it -- I don't think we should be scared to *choose* an iteration model, just because there are other potential models. Using indexing to get path components is useful, slicing gives you sub paths for free, and if the cost of that is that you can iterate over the path, well, I'm okay with that: p = Path('/usr/local/lib/python3.3/glob.py') list(p) => ['/', 'usr', 'local', 'lib', 'python3.3', 'glob.py'] Works for me. -- Steven From solipsis at pitrou.net Sun Oct 14 12:43:27 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 12:43:27 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds References: <20121013081445.40d6d78f@pitrou.net> Message-ID: <20121014124327.71a71a09@pitrou.net> On Sat, 13 Oct 2012 22:03:17 -0700 Guido van Rossum wrote: > > > > But how would you write a dataReceived equivalent then? Would you have > > a "task" looping on a read() call, e.g. > > > > @task > > def my_protocol_main_loop(conn): > > while : > > try: > > data = yield conn.read(1024) > > except ConnectionError: > > conn.close() > > break > > No, I would use plain callbacks. There would be some kind of IOObject > class defined by the stdlib that wraps a socket (it would make it > non-blocking, and possibly to other things), and the user would make a > registration call to the event loop giving it the IOOjbect and the > user's callback function plus *args and **kwds; the event loop would > call callback(*args, **kwds) each time the IOObject became readable. > (Oh, and there would be separate registration (and unregistration) > functions for reading and writing.) > > Apparently my rants about callbacks have made people assume that I > don't want to see them anywhere. In fact I am comfortable with > callbacks for a number of situations -- I just think we have several > other tools in our toolbox that are way underused, whereas callbacks > are way overused, in part because the alternative tools are relatively > new. > > This way the user could switch to a different callback when a > different phase of the protocol is reached. I realize there are other > shapes this API could take. But I really don't want the user to have > to subclass IOObject. Subclassing IOObject would be wrong, since the user isn't writing an IO object in the first place. But subclassing a separate class, like Twisted's Protocol (which is mostly an empty shell, really), would sound reasonable to me. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From steve at pearwood.info Sun Oct 14 13:02:19 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Oct 2012 22:02:19 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121013102229.259572ad@bhuda.mired.org> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> Message-ID: <507A9BBB.6040004@pearwood.info> On 14/10/12 02:22, Mike Meyer wrote: > On Sat, 13 Oct 2012 19:18:12 +1100 > Steven D'Aprano wrote: > >> On 13/10/12 19:05, Yuval Greenfield wrote: >> I believe that Haskell treats operators as if they were function objects, >> so you could do something like: > > For the record, Haskell allows operators to be used as functions by > quoting them in ()'s (to provide the functionality of operator) and to > turn functions into operators by quoting them in ``'s. > >> negative_values = map(-, values) >> >> but I think that puts the emphasis on the wrong thing. If (and that's a big >> if) we did something like this, it should be a pair of methods __op__ and >> the right-hand version __rop__ which get called on the *operands*, not the >> operator/function object: >> >> def __op__(self, other, symbol) > > Yeah, but then your function has to dispatch for *all* > operators. Depending on how we handle backwards compatibility with > __add__ et. al. It looks like I didn't make myself clear. I didn't think it was necessary to go into too much detail for an off-the-cuff comment about an idea that can't go anywhere for at least another five years. I should have known better :) What I meant was that standard Python operators like +, -, &, etc. would continue to dispatch at the compiler level to dunder methods __add__, __sub__, __and__ etc. But there could be a way to add new operators, in which case Python could call a dedicated dunder method __op__ with two arguments, the "other" operand and the operator itself. Your class needs to define the __op__ method, but it only needs to dispatch on operators it cares about. I have no idea how this would work out in practice, given that presumably Python would still want to raise SyntaxError on illegal/unknown operators at compile time. As I said, this is Python 4 territory. Let's sleep on it for four or six years, hey? :) -- Steven From solipsis at pitrou.net Sun Oct 14 13:03:18 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 13:03:18 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> Message-ID: <20121014130318.7140255e@pitrou.net> On Sun, 14 Oct 2012 21:48:59 +1100 Steven D'Aprano wrote: > > Ah, I wondered if anyone else had picked up on that. When I read the PEP, > I was concerned about the mental conflict between iteration and indexing > of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does > something completely different from iterating over p directly. p[0] p[1] etc. are just TypeErrors: >>> p = Path('.') >>> p[0] Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 951, in __getitem__ return self._make_child((key,)) File "pathlib.py", line 1090, in _make_child return self._from_parts(parts) File "pathlib.py", line 719, in _from_parts drv, root, parts = self._parse_args(args) File "pathlib.py", line 711, in _parse_args % type(a)) TypeError: argument should be a path or str object, not So, yes, it's doing "something different", but there is little chance of silent bugs :-) > -1 on iteration over the children. Instead, use: > > for child in p.walk(): > ... > > which has the huge benefit that the walk method can take arguments as > needed, such as the args os.walk takes: > > topdown=True, onerror=None, followlinks=False Judging by its name and signature, walk() would be a recursive operation, while iterating on a path isn't (it only gets you the children). > +1 on indexing giving path components, although the side effect of > this is that you automatically get iteration via the sequence protocol. > So be it -- I don't think we should be scared to *choose* an iteration > model, just because there are other potential models. There is already a .parts property which does exactly that: http://www.python.org/dev/peps/pep-0428/#sequence-like-access The problem with enabling sequence access *on the path object* is that you get confusion with str's own sequencing behaviour, if you happen to pass a str instead of a Path, or the reverse. Which is explained briefly here: http://www.python.org/dev/peps/pep-0428/#no-confusion-with-builtins Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From steve at pearwood.info Sun Oct 14 13:21:58 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Oct 2012 22:21:58 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121014130318.7140255e@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> Message-ID: <507AA056.7020907@pearwood.info> On 14/10/12 22:03, Antoine Pitrou wrote: > On Sun, 14 Oct 2012 21:48:59 +1100 > Steven D'Aprano wrote: >> >> Ah, I wondered if anyone else had picked up on that. When I read the PEP, >> I was concerned about the mental conflict between iteration and indexing >> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does >> something completely different from iterating over p directly. > > p[0] p[1] etc. are just TypeErrors: Ah, my mistake... I didn't register that you sequenced over the parts attribute, not the path itself. Sorry for the noise. -- Steven From ubershmekel at gmail.com Sun Oct 14 14:04:52 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 14 Oct 2012 14:04:52 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121014130318.7140255e@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> Message-ID: On Sun, Oct 14, 2012 at 1:03 PM, Antoine Pitrou wrote: > On Sun, 14 Oct 2012 21:48:59 +1100 > Steven D'Aprano wrote:> -1 on iteration over the > children. Instead, use: > > > > for child in p.walk(): > > ... > > > > which has the huge benefit that the walk method can take arguments as > > needed, such as the args os.walk takes: > > > > topdown=True, onerror=None, followlinks=False > > Judging by its name and signature, walk() would be a recursive > operation, while iterating on a path isn't (it only gets you the > children). > > Steven realized what currently happens and was suggesting doing it differently. Personally I really dislike the idea that [i for i in p][0] != p[0] It makes no sense to have this huge surprise. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Oct 14 14:13:21 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 14:13:21 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> Message-ID: <1350216801.3484.0.camel@localhost.localdomain> Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit : > > Steven realized what currently happens and was suggesting doing it > differently. > > > Personally I really dislike the idea that > > > [i for i in p][0] != p[0] > > > It makes no sense to have this huge surprise. Again, p[0] just raises TypeError. Regards Antoine. From steve at pearwood.info Sun Oct 14 14:45:42 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 14 Oct 2012 23:45:42 +1100 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <1350216801.3484.0.camel@localhost.localdomain> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <1350216801.3484.0.camel@localhost.localdomain> Message-ID: <507AB3F6.9000308@pearwood.info> On 14/10/12 23:13, Antoine Pitrou wrote: > Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit : >> >> Steven realized what currently happens and was suggesting doing it >> differently. >> >> >> Personally I really dislike the idea that >> >> >> [i for i in p][0] != p[0] >> >> >> It makes no sense to have this huge surprise. > > Again, p[0] just raises TypeError. Well, that's two people so far who have conflated "p.parts" as just p. Perhaps that's because "parts" is so similar to "path". Since we already refer to the bits of a path as "path components", perhaps this bike shed ought to be spelled "p.components". It's longer, but I bet nobody will miss it. -- Steven From shibturn at gmail.com Sun Oct 14 14:48:30 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sun, 14 Oct 2012 13:48:30 +0100 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <507A4DA1.2070701@canterbury.ac.nz> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <507A4DA1.2070701@canterbury.ac.nz> Message-ID: On 14/10/2012 6:29am, Greg Ewing wrote: > Not sure if this is relevant, but I'd just like to point out > that the behaviour of select() in this respect is actually > *edge triggered* by this definition. Once it has reported that > a given file descriptor is ready, it *won't* report that file > descriptor again until you do something with it. This can be > a subtle source of bugs in select-based code if you're not > aware of it. Unless I have misunderstood you, the following example contradicts that: >>> import os, select >>> r, w = os.pipe() >>> os.write(w, b"hello") 5 >>> select.select([r], [], []) ([3], [], []) >>> select.select([r], [], []) ([3], [], []) -- Richard From ironfroggy at gmail.com Sun Oct 14 15:09:10 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 14 Oct 2012 09:09:10 -0400 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <507AB3F6.9000308@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <1350216801.3484.0.camel@localhost.localdomain> <507AB3F6.9000308@pearwood.info> Message-ID: On Sun, Oct 14, 2012 at 8:45 AM, Steven D'Aprano wrote: > On 14/10/12 23:13, Antoine Pitrou wrote: >> >> Le dimanche 14 octobre 2012 ? 14:04 +0200, Yuval Greenfield a ?crit : >>> >>> >>> Steven realized what currently happens and was suggesting doing it >>> differently. >>> >>> >>> Personally I really dislike the idea that >>> >>> >>> [i for i in p][0] != p[0] >>> >>> >>> It makes no sense to have this huge surprise. >> >> >> Again, p[0] just raises TypeError. > > > > Well, that's two people so far who have conflated "p.parts" as just p. > Perhaps that's because "parts" is so similar to "path". > > Since we already refer to the bits of a path as "path components", > perhaps this bike shed ought to be spelled "p.components". It's longer, > but I bet nobody will miss it. I would prefer to see p.split() It matches the existing os.path.split() better and I like the idea of a new library matching the old, to be an easier transition for brains. That said, it also looks too much like str.split() > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From shane at umbrellacode.com Sun Oct 14 15:18:29 2012 From: shane at umbrellacode.com (Shane Green) Date: Sun, 14 Oct 2012 06:18:29 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <507A4DA1.2070701@canterbury.ac.nz> Message-ID: <11A498C1-9ACE-4F64-8147-D3CE173EC279@umbrellacode.com> Not sure I follow, but yeah: select reports the state of the file-descriptor. While the descriptor is readable, every call to select will indicate that it's readable, etc. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 14, 2012, at 5:48 AM, Richard Oudkerk wrote: > On 14/10/2012 6:29am, Greg Ewing wrote: >> Not sure if this is relevant, but I'd just like to point out >> that the behaviour of select() in this respect is actually >> *edge triggered* by this definition. Once it has reported that >> a given file descriptor is ready, it *won't* report that file >> descriptor again until you do something with it. This can be >> a subtle source of bugs in select-based code if you're not >> aware of it. > > Unless I have misunderstood you, the following example contradicts that: > > >>> import os, select > >>> r, w = os.pipe() > >>> os.write(w, b"hello") > 5 > >>> select.select([r], [], []) > ([3], [], []) > >>> select.select([r], [], []) > ([3], [], []) > > -- > Richard > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 16:36:38 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 07:36:38 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <5078F6B1.2030309@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing wrote: [Long sections snipped, all very clear] > Guido van Rossum wrote: >> (6) Spawning off multiple async subtasks >> >> Futures: >> f1 = subtask1(args1) # Note: no yield!!! >> f2 = subtask2(args2) >> res1, res2 = yield f1, f2 >> >> Yield-from: >> ?????????? >> >> *** Greg, can you come up with a good idiom to spell concurrency at >> this level? Your example only has concurrency in the philosophers >> example, but it appears to interact directly with the scheduler, and >> the philosophers don't return values. *** > > > I don't regard the need to interact directly with the scheduler > as a problem. That's because in the world I envisage, there would > only be *one* scheduler, for much the same reason that there can > really only be one async event handling loop in any given program. > It would be part of the standard library and have a well-known > API that everyone uses. > > If you don't want things to be that way, then maybe this is a > good use for yielding things to the scheduler. Yielding a generator > could mean "spawn this as a concurrent task". > > You could go further and say that yielding a tuple of generators > means to spawn them all concurrently, wait for them all to > complete and send back a tuple of the results. The yield-from > code would then look pretty much the same as the futures code. Sadly it looks that r = yield from (f1(), f2()) ends up interpreting the tuple as the iterator, and you end up with r = (f1(), f2()) (i.e., a tuple of generators) rather than the desired r = ((yield from f1()), (yield from f2())) > However, I'm inclined to think that this is too much functionality > to build directly into the scheduler, and that it would be better > provided by a class or function that builds on more primitive > facilities. Possibly. In NDB it is actually a very common operation which looks quite elegant. But your solution below is fine (and helps by giving people a specific entry in the documentation they can look up!) > So it would look something like > > Yield-from: > task1 = subtask1(args1) > task2 = subtask2(args2) > res1, res2 = yield from par(task1, task2) > > where the implementation of par() is left as an exercise for > the reader. So, can par() be as simple as def par(*args): results = [] for task in args: result = yield from task results.append(result) return results ??? Or does it need to interact with the scheduler to ensure fairness? (Not having built one of these, my intuition for how the primitives fit together is still lacking, so excuse me for asking naive questions.) Of course there's the question of what to do when one of the tasks raises an error -- I haven't quite figured that out in NDB either, it runs all the tasks to completion but the caller only sees the first exception. I briefly considered having an "multi-exception" but it felt too weird -- though I'm not married to that decision. >> (7) Checking whether an operation is already complete >> >> Futures: >> if f.done(): ... > > > I'm inclined to think that this is not something the > scheduler needs to be directly concerned with. If it's > important for one task to know when another task is completed, > it's up to those tasks to agree on a way of communicating > that information between them. > > Although... is there a way to non-destructively test whether > a generator is exhausted? If so, this could easily be provided > as a scheduler primitive. Nick answered this affirmatively. >> (8) Getting the result of an operation multiple times >> >> Futures: >> >> f = async_op(args) >> # squirrel away a reference to f somewhere else >> r = yield f >> # ... later, elsewhere >> r = f.result() > > > Is this really a big deal? What's wrong with having to store > the return value away somewhere if you want to use it > multiple times? I suppose that's okay. >> (9) Canceling an operation >> >> Futures: >> f.cancel() > > > This would be another scheduler primitive. > > Yield-from: > cancel(task) > > This would remove the task from the ready list or whatever > queue it's blocked on, and probably throw an exception into > it to give it a chance to clean up. Ah, of course. (I said I was asking newbie questions. Consider me your first newbie!) >> (10) Registering additional callbacks >> >> Futures: >> f.add_done_callback(callback) > > > Another candidate for a higher-level facility, I think. > The API might look something like > > Yield-from: > cbt = task_with_callbacks(task) > cbt.add_callback(callback) > yield from cbt.run() > > I may have a go at coming up with implementations for some of > these things and send them in later posts. Or better, add them to the tutorial. (Or an advanced tutorial, "common async patterns". That would actually be a useful collection of use cases for whatever we end up building.) Here's another pattern that I can't quite figure out. It started when Ben Darnell posted a link to Tornado's chat demo (https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py). I didn't understand it and asked him offline what it meant. Essentially, it's a barrier pattern where multiple tasks (each representing a different HTTP request, and thus not all starting at the same time) render a partial web page and then block until a new HTTP request comes in that provides the missing info. (For technical reasons they only do this once, and then the browsers re-fetch the URL.) When the missing info is available, it must wake up all blocked task and give then the new info. I wrote a Futures-based version of this -- not the whole thing, but the block-until-more-info-and-wakeup part. Here it is (read 'info' for 'messages'): Each waiter executes this code when it is ready to block: f = Future() # Explicitly create a future! waiters.add(f) messages = yield f I'd write a helper for the first two lines: def register(): f = Future() waiters.add(f) return f Then the waiter's code becomes: messages = yield register() When new messages become available, the code just sends the same results to all those Futures: def wakeup(messages): for waiter in waiters: waiter.set_result(messages) waiters.clear() (OO sauce left to the reader. :-) If you wonder where the code is that hooks up the waiter.set_result() call with the yield, that's done by the scheduler: when a task yields a Future, it adds a callback to the Future that reschedules the task when the Future's result is set. Edge cases: - Were the waiter to lose interest, it could remove its Future from the list of waiters, but no harm is done leaving it around either. (NDB doesn't have this feature, but if you have a way to remove callbacks, setting the result of a Future that nobody cares about has no ill effect. You could even use a weak set...) - It's possible to broadcast an exception to all waiters by using waiter.set_exception(). -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 14 16:39:41 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 07:39:41 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507A65C4.9010709@canterbury.ac.nz> References: <507A65C4.9010709@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 12:12 AM, Greg Ewing wrote: > I've had some thoughts on why I'm uncomfortable > about this kind of pattern: > > data = yield sock.async_read(1024) > > The idea here is that sock.async_read() returns a > Future or similar object that performs the I/O and > waits for the result. > > However, reading the data isn't necessarily the point > at which the suspension actually occurs. If you're > using a select-style event loop, the async read > operation breaks down into > > 1. Wait for data to arrive on the socket > 2. Read the data > > So the implementation of sock.async_read() is going > to have to create another Future to handle waiting > for the socket to become ready. But then the outer > Future is an unnecessary complication, because you > could get the same effect by defining > > def async_read(self, length): > yield future_to_wait_for_fd(self.fd) > return os.read(self.fd, length) > > and calling it using > > data = yield from sock.async_read(1024) > > If Futures are to appear anywhere, they should only > be at the very bottom layer, at the transition > between generator and non-generator code. And the > place where that transition occurs depend on how > the lower levels are implemented. If you're using > IOCP instead of select, for example, you need to > do things the other way around: > > 1. Start the read operation > 2. Wait for it to complete > > So I feel that all public APIs should be functions > called using yield-from, leaving it up to the > implementation to decide if and where Futures > become involved. A logical and consistent conclusion. I actually agree: in NDB, where all I have is "yield " I have a similar guideline: all public async APIs return a Future and must be waited on using yield, and only at the lowest level are other types primitives involved (bare App Engine RPCs, callbacks). -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Sun Oct 14 16:50:06 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 14 Oct 2012 07:50:06 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <507AA056.7020907@pearwood.info> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <507AA056.7020907@pearwood.info> Message-ID: <507AD11E.4080503@stoneleaf.us> Steven D'Aprano wrote: > On 14/10/12 22:03, Antoine Pitrou wrote: >> On Sun, 14 Oct 2012 21:48:59 +1100 >> Steven D'Aprano wrote: >>> >>> Ah, I wondered if anyone else had picked up on that. When I read the >>> PEP, >>> I was concerned about the mental conflict between iteration and >>> indexing >>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does >>> something completely different from iterating over p directly. >> >> p[0] p[1] etc. are just TypeErrors: > > > Ah, my mistake... I didn't register that you sequenced over the parts > attribute, not the path itself. Sorry for the noise. > > > I actually prefer Steven's interpretation. If we are going to iterate directly on a path object, we should be yeilding the pieces of the path object. After all, a path can contain a file name (most of mine do) and what sense does it make to iterate over the children of /usr/home/ethanf/some_table.dbf? ~Ethan~ From ironfroggy at gmail.com Sun Oct 14 17:01:15 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 14 Oct 2012 11:01:15 -0400 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Fri, Oct 12, 2012 at 3:32 PM, Guido van Rossum wrote: > On Fri, Oct 12, 2012 at 11:33 AM, Antoine Pitrou wrote: >> On Fri, 12 Oct 2012 11:13:23 -0700 >> Guido van Rossum wrote: >>> OTOH someone else might prefer a buffered stream >>> abstraction that just keeps filling its read buffer (and draining its >>> write buffer) using level-triggered callbacks, at least up to a >>> certain buffer size -- we have to be robust here and make it >>> impossible for an evil client to fill up all our memory without our >>> approval! >> >> I'd like to know what a sane buffered API for non-blocking I/O may look >> like, because right now it doesn't seem to make a lot of sense. At >> least this bug is tricky to resolve: >> http://bugs.python.org/issue13322 > > Good question. It actually depends quite a bit on whether you have an > event loop or not -- with the help of an event loop, you can have a > level-triggered callback that fills the buffer behind your back (up to > a given limit, at which point it should unregister the I/O object); > that bug seems to be about a situation without an event loop, where > you can't do that. Also the existing io module design never > anticipated cooperation with an event loop. > >>> - There's an abstract Reactor class and an abstract Async I/O object >>> class. To get a reactor to call you back, you must give it an I/O >>> object, a callback, and maybe some more stuff. (I have gone back and >>> like passing optional args for the callback, rather than requiring >>> lambdas to create closures.) Note that the callback is *not* a >>> designated method on the I/O object! >> >> Why isn't it? In practice, you need several callbacks: in Twisted >> parlance, you have dataReceived but also e.g. ConnectionLost >> (depending on the transport, you may even imagine other callbacks, for >> example for things happening on the TLS layer?). > > Yes, but I really want to separate the callbacks from the object, so > that I don't have to inherit from an I/O object class -- asyncore > requires this and IMO it's wrong. It also makes it harder to use the > same callback code with different types of I/O objects. Why is subclassing a problem? It can be overused, but seems the right thing to do in this case. You want a protocol that responds to new data by echoing and tells the user when the connection was terminated? It makes sense that this is a subclass: a special case of some class that handles the base behavior. What if this was just an optional way and we could also provide a helper to attach handlers to the base class instance without subclassing it? The function registering it could take keyword arguments mapping additional event->callbacks to the object. >>> - In systems supporting file descriptors, there's a reactor >>> implementation that knows how to use select/poll/etc., and there are >>> concrete I/O object classes that wrap file descriptors. On Windows, >>> those would only be socket file descriptors. On Unix, any file >>> descriptor would do. >> >> Windows *is* able to do async I/O on things other than sockets (see the >> discussion about IOCP). It's just that the Windows implementation of >> select() (the POSIX function call) is limited to sockets. > > I know, but IOCP is currently not supported in the stdlib. I expect > that on Windows, to use IOCP, you'd need to use a different reactor > implementation and a different I/O object than the vanilla fd-based > ones. My design is actually *inspired* by the desire to support this > cleanly. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From guido at python.org Sun Oct 14 17:11:46 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 08:11:46 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: <20121014124327.71a71a09@pitrou.net> References: <20121013081445.40d6d78f@pitrou.net> <20121014124327.71a71a09@pitrou.net> Message-ID: On Sun, Oct 14, 2012 at 3:43 AM, Antoine Pitrou wrote: > Subclassing IOObject would be wrong, since the user isn't writing an IO > object in the first place. But subclassing a separate class, like > Twisted's Protocol (which is mostly an empty shell, really), would sound > reasonable to me. It's a possible style. I'm inclined not to follow this example but I could go either way. One thing that somewhat worries me is that the names of these methods will be baked forever into all user code. As a user I prefer to have control over the names of my methods; first, there's the style issue (e.g. I'm always conflicted over what style to use in unittest.TestCase subclasses, since its own style is setUp, tearDown); second, in my app there may be a much better name for what the method does than e.g. data_received(). (Not to mention that that's another adjective used as a verb. ;-) -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Oct 14 17:16:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 17:16:40 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <507AA056.7020907@pearwood.info> <507AD11E.4080503@stoneleaf.us> Message-ID: <20121014171640.06ee1143@pitrou.net> On Sun, 14 Oct 2012 07:50:06 -0700 Ethan Furman wrote: > Steven D'Aprano wrote: > > On 14/10/12 22:03, Antoine Pitrou wrote: > >> On Sun, 14 Oct 2012 21:48:59 +1100 > >> Steven D'Aprano wrote: > >>> > >>> Ah, I wondered if anyone else had picked up on that. When I read the > >>> PEP, > >>> I was concerned about the mental conflict between iteration and > >>> indexing > >>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does > >>> something completely different from iterating over p directly. > >> > >> p[0] p[1] etc. are just TypeErrors: > > > > > > Ah, my mistake... I didn't register that you sequenced over the parts > > attribute, not the path itself. Sorry for the noise. > > > > > > > > I actually prefer Steven's interpretation. If we are going to iterate > directly on a path object, we should be yeilding the pieces of the path > object. > After all, a path can contain a file name (most of mine do) and > what sense does it make to iterate over the children of > /usr/home/ethanf/some_table.dbf? Well, given that: 1. sequence access (including the iterator protocol) to the path's parts is already provided through the ".parts" property 2. it makes little sense to actually iterate over those parts (what operations are you going to do sequentially over '/', then 'home', then 'ethanf', etc.?) ... I think yielding the directory contents is a much more useful alternative when iterating over the path itself. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From _ at lvh.cc Sun Oct 14 17:29:27 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 14 Oct 2012 17:29:27 +0200 Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep) In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 4:39 AM, Guido van Rossum wrote: > Odd. Were those people experienced in writing / reviewing PEPs? > There were a few. Some of them were. Unfortunately the prevalent reason was politics: "make it clear that you're not just trying to get twisted in the stdlib". Given that that's been suggested both on and off-list, both now and then, I guess that wasn't entirely unreasonable (but not providing things to play with was -- the experience was just so bad I pretty much never got there). > >> > Do you feel that there should be less talk about rationale? > >> > >> No, but I feel that there should be some actual specification. I am > >> also looking forward to an actual meaty bit of example code -- ISTR > >> you mentioned you had something, but that it was incomplete, and I > >> can't find the link. > > > > Just examples of how it would work, nothing hooked up to real code. My > > memory of it is more of a drowning-in-politics-and-bikeshedding kind of > > thing, unfortunately :) Either way, I'm okay with letting bygones be > bygones > > and focus on how we can get this show on the road. > > Shall I just reject PEP 3153 so it doesn't distract people? Of course > we can still refer to it when people ask for a rationale for the > separation between transports and protocols, but it doesn't seem the > PEP itself is going to be finished (correct me if I'm wrong), and as > it stands it is not useful as a software specification. > I'm not sure that's necessary; these threads show a lot of willpower to get it done (even though that's not enough), and it's pretty easy to edit. You're certainly right that right now it's not a useful software spec; but neither would an empty new PEP be ;) --Guido van Rossum (python.org/~guido) > cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 17:53:15 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 08:53:15 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 4:42 PM, Devin Jeanpierre wrote: > There has to be some way to contract emails sent in discussions rather > than exploding them. I swear I'm trying to be concise, yet readable. > It's not working. Don't worry too much. I took essentially all Friday starting those four new threads. I am up at night thinking about the issues. I can't expect everyone else to have this much time to devote to Python! > On Fri, Oct 12, 2012 at 6:11 PM, Guido van Rossum wrote: >> I also don't doubt that using classic Futures you can't do this -- the >> chaining really matter for this style, and I presume this (modulo >> unimportant API differences) is what typical Twisted code looks like. > > My experience has been unfortunately rather devoid of deferreds in > Twisted. I always feel like the odd one out when people discuss this > confusion. For me, it was all Protocol this and Protocol that, and > deferreds only came up when I used Twisted's great AMP (Asynchronous > Messaging Protocol) library. Especially odd since you jumped into the discussion when I called Deferreds a bad name. :-) >> However, Python has yield, and you can do much better (I'll write >> plain yield for now, but it works the same with yield-from): >> >> try: >> value1 = yield step1() >> value2 = yield step2(value1) >> value3 = yield step3(value2) >> # Do something with value4 >> except Exception: >> # Handle any error from step1 through step4 >> > --snip-- >> >> This form is more flexible, since it is easier to catch different >> exceptions at different points. It is also much easier to pass extra >> information around. E.g. what if your flow ends up having to pass both >> value1 and value2 into step3()? Sure, you can do that by making value2 >> a tuple (or a dict, or an object) incorporating value1 and the >> original value2, but that's exactly where this style becomes >> cumbersome, whereas in the yield-based form, such things can remain >> simple local variables. All in all I find it more readable. > > Well, first of all, deferreds have ways of joining values together. For example: > > from __future__ import print_function > from twisted.internet import defer > > def example_joined(): > d1 = defer.Deferred() > d2 = defer.Deferred() > # consumeErrors looks scary, but it only means that > # d1 and d2's errbacks aren't called. Instead, the error is sent to d's > # errback. > d = defer.gatherResults([d1, d2], consumeErrors=True) > > d.addCallback(print) > d.addErrback(lambda v: print("ERROR!")) > > d1.callback("The first deferred has succeeded") > # now we're waiting on the second deferred to succeed, > # which we'll let the caller handle > return d2 > > example_joined().callback("The second deferred has succeeded too!") > print("==============") > example_joined().errback("The second deferred has failed...") I'm sorry, but that's not very readable at all. You needed a lambda (which if there was anything more would have to be expanded using 'def') and you're cheating by passing print as a callable (which saves you a second lambda, but only in this simple case). A readable version of this could should not have to use lambdas. > I agree it's easier to use the generator style in many complicated > cases. That doesn't preclude manual deferreds from also being useful. Yeah, but things should be as simple as they can. If you can do everything using plain callbacks, Futures and coroutines, why add Deferreds even if you can? (Except for backward compatibility of course. That's a totally different topic. But we're first defining the API of the future.) If Greg Ewing had his way we'd even do without Futures -- I'm still considering that bid. (In the yield-from thread I'm asking for common patterns that the new API should be able to solve.) >> So, in the end, for Python 3.4 and beyond, I want to promote a style >> that mixes simple callbacks (perhaps augmented with simple Futures) >> and generator-based coroutines (either PEP 342, yield/send-based, or >> PEP 380 yield-from-based). I'm looking to Twisted for the best >> reactors (see other thread). But for transport/protocol >> implementations I think that generator/coroutines offers a cleaner, >> better interface than incorporating Deferred. > > Egh. I mean, sure, supposed we have those things. But what if you want > to send the result of a callback to a generator-coroutine? Presumably > generator coroutines work by yielding deferreds and being called back > when the future resolves (deferred fires). No, they don't use deferreds. They use Futures. You've made it quite clear that they are very different. > But if those > futures/deferreds aren't unexposed, and instead only the generator > stuff is exposed, then bridging the gap between callbacks and > generator-coroutines is impossible. So every callback function has to > also be defined to use something else. And worse, other APIs using > callbacks are left in the dust. My plan is to expose the Futures *will* be exposed -- this is what worked well in NDB. > Suppose, OTOH, futures/deferreds are exposed. Then we can easily > bridge between callbacks and generators, by returning a future whose > `set_result` is the callback to our callback function (deferred whose > `callback` is the callback). And that's how NDB does it. I've got a question to Greg Ewing on how he does it. > But if we're exposing futures/deferreds, why have callbacks in the > first place? The difference between these two functions, is that the > second can be used in generator-coroutines trivially and the first > cannot: > > # callbacks: > reactor.timer(10, print, "hello world") > > # deferreds > reactor.timer(10).addCallback(print, "hello world") How about this: f = reactor.timer(10, f.set_result, None) Then whoever waits for f gets woken up in 10 seconds, and the reactor doesn't have to know what Futures are. But I believe your whole argument may be based on a misreading of my proposal. *I* want plain callbacks, Futures, and coroutines, and an event loop that only knows about plain callbacks and IO objects (e.g. sockets). > Now here's another thing: suppose we have a list of "deferred events", > but instead of handling all 10 at once, we want to handle them "as > they arrive", and then synthesize a result at the bottom. How do you > do this with pure generator coroutines? Let's ask Greg that. In NDB, I have a wait_any() function that you give a set of Futures and returns the first one that completes. It would be easy to build an iterator on top of this that takes a set of Futures and iterates over them in the order in which they are completed. > For example, perhaps I am implementing a game server, where all the > players choose their characters and then the game begins. Whenever a > character is chosen, everyone else has to know about it so that they > can plan their strategy based on who has chosen a character. Character > selections are final, just so that I can use deferreds (hee hee). > > I am imagining something like the following: > > # WRONG: handles players in a certain order, rather than as they come in > def player_lobby(reactor, players): > for player in players: > player_character = yield player.wait_for_confirm(reactor) > player.set_character(player_character) > # tell all the other players what character the player has chosen > notify_choice((player, player_character), players) > > start_game(players) > > This is wrong, because it goes in a certain order and "blocks" the > coroutine until every character is chosen. Players will not know who > has chosen what characters in an appropriate order. > > But hypothetically, maybe we could do the following: > > # Hypothetical magical code? > def player_lobby(reactor, players): > confirmation_events = > UnorderedEventList([player.wait_for_confirm(reactor) for player in > players]) > while confirmation_events: > player_character = yield confirmation_events.get_next() > player.set_character(player_character) > # tell all the other players what character the player has chosen > notify_choice((player, player_character), players) > > start_game(players) > > But then, how do we write UnorderedEventList? I don't really know. I > suspect I've made the problem harder, not easier! eek. Plus, it > doesn't even read very well. Especially not compared to the deferred > version: > > This is how I would personally do it in Twisted, without using > UnorderedEventList (no magic!): > > @inlineCallbacks > def player_lobby(reactor, players): > events = [] > for player in players: > confirm_event = player.wait_for_confirm(reactor) > @confirm_event.addCallback > def on_confirmation(player_character, player=player) > player.set_character(player_character) > # tell all the other players what character the player has chosen > notify_choice((player, player_character), players) > > yield gatherResults(events) > start_game(players) > > Notice how I dropped down into the level of manipulating deferreds so > that I could add this "as they come in" functionality, and then went > back. Actually it wouldn't've hurt much to just not bother with > inlineCallbacks at all. > > I don't think this is particularly unreadable. More importantly, I > actually know how to do it. I have no idea how I would do this without > using addCallback, or without reimplementing addCallback using > inlineCallbacks. Clearly we have an educational issue on our hands! :-) > And then, supposing we don't have these deferreds/futures exposed... > how do we implement delayed computation stuff from extension modules? > What if we want to do these kinds of compositions within said > extension modules? What if we want to write our own version of @tasks > or @inlineCallbacks with extra features, or generate callback chains > from XML files, and so on? > > I don't really like the prospect of having just the "sugary syntax" > available, without a flexible underlying representation also exposed. > I don't know if you've ever shared that worry -- sometimes the pretty > syntax gets in the way of getting stuff done. You're barking up the wrong tree -- please badger Greg Ewing with use cases in the yield-from thread. With my approach all of these can be done. (See the yield-from thread for an example I just posted of a barrier, where multiple tasks wait for a single event.) >> I hope that the path forward for Twisted will be simple enough: it >> should be possible to hook Deferred into the simpler callback APIs >> (perhaps a new implementation using some form of adaptation, but >> keeping the interface the same). In a sense, the greenlet/gevent crowd >> will be the biggest losers, since they currently write async code >> without either callbacks or yield, using microthreads instead. I >> wouldn't want to have to start putting yield back everywhere into that >> code. But the stdlib will still support yield-free blocking calls >> (even if under the hood some of these use yield/send-based or >> yield-from-based couroutines) so the monkey-patchey tradition can >> continue. > > Surely it's no harder to make yourself into a generator than to make > yourself into a low-level thread-like context switching function with > a saved callstack implemented by hand in assembler, and so on? > > I'm sure they'll be fine. The thing that worries me most is reimplementing httplib, urllib and so on to use all this new machinery *and* keep the old synchronous APIs working *even* if some code is written using the old style and some other code wants to use the new style. >>> 1. Explicit callbacks: >>> >>> For example, reactor.callLater(t, lambda: print("woo hoo")) >> >> I actually like this, as it's a lowest-common-denominator approach >> which everyone can easily adapt to their purposes. See the thread I >> started about reactors. > > Will do (but also see my response above about why not "everyone" can). > >>> 2. Method dispatch callbacks: >>> >>> Similar to the above, the reactor or somebody has a handle on your >>> object, and calls methods that you've defined when events happen >>> e.g. IProtocol's dataReceived method >> >> While I'm sure it's expedient and captures certain common patterns >> well, I like this the least of all -- calling fixed methods on an >> object sounds like a step back; it smells of the old Java way (before >> it had some equivalent of anonymous functions), and of asyncore, which >> (nearly) everybody agrees is kind of bad due to its insistence that >> you subclass its classes. (Notice how subclassing as the prevalent >> approach to structuring your code has gotten into a lot of discredit >> since 1996.) > > I only used asyncore once, indirectly, so I don't know anything about > it. I'm willing to dismiss it (and, in fact, various parts of twisted > (I'm looking at you twisted.words)) as not good examples of the > pattern. > > First of all, I'd like to separate the notion of subclassing and > method dispatch. They're entirely unrelated. If I pass my object to > you, and you call different methods depending on what happens > elsewhere, that's method dispatch. And my object doesn't have to be > subclassed or anything for it to happen. Agreed. Antoine made the same point elsewhere and I half conceded. > Now here's the thing. Suppose we're writing, for example, an IRC bot. > (Everyone loves IRC bots.) (For the record, I hate IRC, the software, the culture, the interaction style. But maybe I'm unusual that way. :-) > My IRC bot needs to handle several > different possible events, such as: > > private messages > channel join event > CTCP event > > and so on. My event handlers for each of these events probably > manipulate some internal state (such as a log file, or a GUI). We'd > probably organize this as a class, or else as a bunch of functions > accessing global state. Or, perhaps a collection of closures. This > last one is pretty unlikely. I certainly wouldn't recommend collections of closures for that! > For the most part, these functions are all intrinsically related and > can't be sensibly treated separately. You can't take the private > message callback of Bot A, and the channel join callback of bot B, and > register these and expect a result that makes sense. > > If we look at this, we're expecting to deal with a set of functions > that manage shared data. The abstraction for this is usually an > object, and we'd really probably write the callbacks in a class unless > we were being contrarian. And it's not too crazy for the dispatcher to > know this and expect you to write it as a class that supports a > certain interface (certain methods correspond to certain events). > Missing methods can be assumed to have the empty implementation (no > subclassing, just catching AttributeError). > > This isn't too much of an imposition on the user -- any collection of > functions (with shared state via globals or closure variables) can be > converted to an object with callable attributes very simply (thanks to > types.SimpleNamespace, especially). And I only really think this is OK > when writing it as an object -- as a collection of functions with > shared state -- is the eminently obvious primary use case, so that > that situation wouldn't come up very often. > > So, as an example, a protocol that passes data on further down the > line needs to be notified when data is received, but also when the > connection begins and ends. So the twisted protocol interface has > "dataReceived", "connectionMade", and "connectionLost" callbacks. > These really do belong together, they manage a single connection > between computers and how it gets mapped to events usable by a twisted > application. So I like the convenience and suggestiveness of them all > being methods on an object. There's also a certain order to them, right? I'd think the state transition diagram is something like connectionMade (1); dataReceived (*); connectionLost (1) I wonder if there are any guarantees that they will only be called in this order, and who is supposed to enforce this? If would be awkward if the user code would have to guard itself against this; also if the developer made an unwarranted assumption (e.g. dataReceived is called at least once). >>> 4. Generator coroutines >>> >>> These are a syntactic wrapper around deferreds. If you yield a >>> deferred, you will be sent the result if the deferred succeeds, or an >>> exception if the deferred fails. >>> e.g. examples from previous message >> >> Seeing them as syntactic sugar for Deferreds is one way of looking at >> it; no doubt this is how they're seen in the Twisted community because >> Deferreds are older and more entrenched. But there's no requirement >> that an architecture has to have Deferreds in order to use generator >> coroutines -- simple Futures will do just fine, and Greg Ewing has >> shown that using yield-from you can even do without those. (But he >> does use simple, explicit callbacks at the lowest level of his >> system.) > > I meant it as a factual explanation of what generator coroutines are > in Twisted, not what they are in general. Sorry for the confusion. We > are probably agreed here. > > After a cursory examination, I don't really understand Greg Ewing's > thing. I'd have to dig deeper into the logs for when he first > introduced it. Please press him for explanations. Ask questions. He knows his dream best of all. We need to learn. >> I'd like to come back to that Django example though. You are implying >> that there are some opportunities for concurrency here, and I agree, >> assuming we believe disk I/O is slow enough to bother making it >> asynchronously. (In App Engine it's not, and we can't anyways, but in >> other contexts I agree that it would be bad if a slow disk seek were >> to hold up all processing -- not to mention that it might really be >> NFS...) >> > --snip-- >> How would you code that using Twisted Deferreds? > > Well. I'd replace the @task in your NDB thing with @inlineCallbacks > and call it a day. ;) > > (I think there's enough deferred examples above, and I'm getting tired > and it's been a day since I started writing this damned email.) No problem. Same here. :-) >>> For that stuff, you'd have to speak to the main authors of Twisted. >>> I'm just a twisted user. :( >> >> They seem to be mostly ignoring this conversation, so your standing in >> as a proxy for them is much appreciated! > > Well. We are on Python-Ideas... :( Somehow we got Itamar and Glyph to join, so I think we're covered! >>> In the end it really doesn't matter what API you go with. The Twisted >>> people will wrap it up so that they are compatible, as far as that is >>> possible. >> >> And I want to ensure that that is possible and preferably easy, if I >> can do it without introducing too many warts in the API that >> non-Twisted users see and use. > > I probably lack the expertise to help too much with this. I can point > out anything that sticks out, if/when an extended futures proposal is > made. You've done great in increasing my understanding of Twisted and Deferred. Thank you very much! -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Sun Oct 14 17:48:53 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 14 Oct 2012 08:48:53 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121014171640.06ee1143@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <507AA056.7020907@pearwood.info> <507AD11E.4080503@stoneleaf.us> <20121014171640.06ee1143@pitrou.net> Message-ID: <507ADEE5.3050800@stoneleaf.us> Antoine Pitrou wrote: > On Sun, 14 Oct 2012 07:50:06 -0700 > Ethan Furman wrote: > >> Steven D'Aprano wrote: >> >>> On 14/10/12 22:03, Antoine Pitrou wrote: >>> >>>> On Sun, 14 Oct 2012 21:48:59 +1100 >>>> Steven D'Aprano wrote: >>>> >>>>> Ah, I wondered if anyone else had picked up on that. When I read the >>>>> PEP, >>>>> I was concerned about the mental conflict between iteration and >>>>> indexing >>>>> of Path objects: given a Path p the sequence p[0] p[1] p[2] ... does >>>>> something completely different from iterating over p directly. >>>>> >>>> p[0] p[1] etc. are just TypeErrors: >>>> >>> Ah, my mistake... I didn't register that you sequenced over the parts >>> attribute, not the path itself. Sorry for the noise. >>> >>> >>> >>> >> I actually prefer Steven's interpretation. If we are going to iterate >> directly on a path object, we should be yeilding the pieces of the path >> object. >> After all, a path can contain a file name (most of mine do) and >> what sense does it make to iterate over the children of >> /usr/home/ethanf/some_table.dbf? >> > > Well, given that: > > 1. sequence access (including the iterator protocol) to the path's > parts is already provided through the ".parts" property > > 2. it makes little sense to actually iterate over those parts (what > operations are you going to do sequentially over '/', then 'home', then > 'ethanf', etc.?) > > ... I think yielding the directory contents is a much more useful > alternative when iterating over the path itself. > > Regards > > Antoine. > Useful, sure. Still potentially confusing. I'm perfectly happy with not allowing any default iteration at all. What behavior can I expect with your Path implementation when I try to iterate over /usr/home/ethanf/some_table.dbf ? ~Ethan~ From daniel.mcdougall at liftoffsoftware.com Sun Oct 14 18:03:27 2012 From: daniel.mcdougall at liftoffsoftware.com (Daniel McDougall) Date: Sun, 14 Oct 2012 12:03:27 -0400 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 5:32 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> import async # The API of the future ;) >> async.async_call(retrieve_log_playback, settings, tws, >> mechanism=multiprocessing) >> # tws == instance of tornado.web.WebSocketHandler that holds the open >> connection > > > Is this a CPU-bound problem? It depends on the host. On embedded platforms (e.g. the BeagleBone) it is more IO-bound than CPU bound (fast CPU but slow disk and slow memory). On regular x86 systems it is mostly CPU-bound. >> * I should be able to choose the type of event loop/async mechanism >> that is appropriate for the task: For CPU-bound tasks I'll probably >> want to use multiprocessing. For IO-bound tasks I might want to use >> threading. For a multitude of tasks that "just need to be async" (by >> nature) I'll want to use an event loop. > > > Ehhh, maybe. This sounds like it confounds the tools for different use > cases. You can quite easily have threads and processes on top of an event > loop; that works out particularly nicely for processes because you still > have to talk to your processes. > > Examples: > > twisted.internet.reactor.spawnProcess (local processes) > twisted.internet.threads.deferToThread (local threads) > ampoule (remote processes) > > It's quite easy to do blocking IO in a thread with deferToThread; in fact, > that's how twisted's adbapi, an async wrapper to dbapi, works. As I understand it, twisted.internet.reactor.spawnProcess is all about spawning subprocesses akin to subprocess.Popen(). Also, it requires writing a sophisticated ProcessProtocol. It seems to be completely unrelated and wickedly complicated. The complete opposite of what I would consider ideal for an asynchronous library since it is anything but simple. I mean, I could write a separate program to generate HTML playback files from logs, spawn a subprocess in an asynchronous fashion, then watch it for completion but I could do that with termio.Multiplex (see: https://github.com/liftoff/GateOne/blob/master/gateone/termio.py) . deferToThread() does what one would expect but in many situations I'd prefer something like deferToMultiprocessing(). >> * It should support publish/subscribe-style events (i.e. an event >> dispatcher). For example, the ability to watch a file descriptor or >> socket for changes in state and call a function when that happens. >> Preferably with the flexibility to define custom events (i.e don't >> have it tied to kqueue/epoll-specific events). > > > Like connectionMade, connectionLost, dataReceived etc? Oh there's a hundred different ways to fire and catch events. I'll let the low-level async experts decide which is best. Having said that, it would be nice if the interface didn't use such network-specific naming conventions. I would prefer something more generic. It is fine if it uses sockets and whatnot in the background. -- Dan McDougall - Chief Executive Officer and Developer Liftoff Software ? Your flight to the cloud is now boarding. From _ at lvh.cc Sun Oct 14 18:11:38 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 14 Oct 2012 18:11:38 +0200 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 6:03 PM, Daniel McDougall < daniel.mcdougall at liftoffsoftware.com> wrote: > deferToThread() does what one would expect but in many situations I'd > prefer something like deferToMultiprocessing(). > Twisted sort of has that with ampoule. The main issue is that arbitrary object serialization is pretty much impossible. Within threads, you sidestep that issue completely; across processes, you have to do deal with serialization, leading to the issues with pickle you've mentioned. I would prefer something more generic. > So maybe something like is popular in JS, where you subscribe to events by some string identifier? I personally use and like AngularJS' $broadcast, $emit and $on -- quite nice, but depedant on a hierarchical structure that seems to be missing here. > -- > Dan McDougall - Chief Executive Officer and Developer > Liftoff Software ? Your flight to the cloud is now boarding. > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Sun Oct 14 18:15:26 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 14 Oct 2012 10:15:26 -0600 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: Message-ID: On Oct 14, 2012 8:42 AM, "Guido van Rossum" wrote: > Sadly it looks that > > r = yield from (f1(), f2()) > > ends up interpreting the tuple as the iterator, and you end up with > > r = (f1(), f2()) > > (i.e., a tuple of generators) rather than the desired > > r = ((yield from f1()), (yield from f2())) Didn't want this tangent to get lost to the async discussion. Would it be too late to make a change along these lines? Would it be enough of an improvement to be warranted? -eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sun Oct 14 18:18:52 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 14 Oct 2012 18:18:52 +0200 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum wrote: > A readable version of this could should not have to use lambdas. > In a lot of Twisted code, it happens with methods as callback methods, something like: d = self._doRPC(....) d.addCallbacks(self._formatResponse, self._formatException) d.addCallback(self._finish) That doesn't talk about gatherResults, but hopefully it makes the idea clear. A lot of the legibility is dependant on making those method names sensible, though. Our in-house style guide asks for limiting functions to about ten lines, preferably half that. Works for us. Another pattern that's frowned upon since it's a bit of an abuse of decorator syntax, but I still like because it tends to make things easier to read for inline callback definitions where you do need more than a lambda: d = somethingThatHappensLater() @d.addCallback def whenItsDone(result): doSomethingWith(result) --Guido van Rossum (python.org/~guido) > cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 18:54:54 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 09:54:54 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman wrote: > Why is subclassing a problem? It can be overused, but seems the right > thing to do in this case. You want a protocol that responds to new data by > echoing and tells the user when the connection was terminated? It makes > sense that this is a subclass: a special case of some class that handles the > base behavior. I replied to this in detail on the "Twisted and Deferreds" thread in an exchange. Summary: I'm -0 when it comes to subclassing protocol classes; -1 on subclassing objects that implement significant functionality. > What if this was just an optional way and we could also provide a helper to > attach handlers to the base class instance without subclassing it? The function > registering it could take keyword arguments mapping additional event->callbacks > to the object. Yeah, there are many APIs that we could offer. We just have to offer one that's general enough so that people who prefer other styles can implement their preferred style in a library. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 14 19:15:27 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 10:15:27 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: Message-ID: On Fri, Oct 12, 2012 at 9:52 PM, Ben Darnell wrote: > First of all, to clear up the terminology, edge-triggered actually has > a specific meaning in this context that is separate from the question > of whether callbacks are used more than once. The edge- vs > level-triggered question is moot with one-shot callbacks, but when > you're reusing callbacks in edge-triggered mode you won't get a second > call until you've drained the socket buffer and then it becomes > readable again. This turns out to be helpful for hybrid > event/threaded systems, since the network thread may go into the next > iteration of its loop while the worker thread is still consuming the > data from a previous event. Yeah, sorry for contributing to the confusion here! Glyph cleared it up for me. > You can't always emulate edge-triggered behavior since it needs > knowledge of internal socket buffers (epoll has an edge-triggered mode > and I think kqueue does too, but you can't get edge-triggered behavior > if you're falling back to select()). However, you can easily get > one-shot callbacks from an event loop with persistent callbacks just > by unregistering the callback once it has received an event. This has > a performance cost, though - in tornado we try to avoid unnecessary > unregister/register pairs. We should do be careful to support all this in our event loop design, without necessarily offering two ways of doing everything -- the event loop should be at liberty to use the most efficient strategy for the platform. (If that depends on what sort of I/O the user is interested in, we should be sure that that information reaches the event loop too. I like the idea more and more of an IO object that encapsulates a socket or other event source, using predefined subclasses for each type that is relevant to the platform. >> I'm not at all familiar with the Twisted reactor interface. My own >> design would be along the following lines: >> >> - There's an abstract Reactor class and an abstract Async I/O object >> class. To get a reactor to call you back, you must give it an I/O >> object, a callback, and maybe some more stuff. (I have gone back and >> like passing optional args for the callback, rather than requiring >> lambdas to create closures.) Note that the callback is *not* a >> designated method on the I/O object! In order to distinguish between >> edge-triggered and level-triggered, you just use a different reactor >> method. There could also be a reactor method to schedule a "bare" >> callback, either after some delay, or immediately (maybe with a given >> priority), although such functionality could also be implemented >> through magic I/O objects. > > One reason to have a distinct method for running a bare callback is > that you need to have some thread-safe entry point, but you otherwise > don't really want locking on all the internal methods. Tornado's > IOLoop.add_callback and Twisted's Reactor.callFromThread can be used > to run code in the IOLoop's thread (which can then call the other > IOLoop methods). That's an important use case to support. > We also have distinct methods for running a callback after a timeout, > although if you had a variant of add_handler that didn't require a > subsequent call to remove_handler you could probably do timeouts using > a magical IO object. (an additional subtlety for the time-based > methods is how time is computed. I recently added support in tornado > to optionally use time.monotonic instead of time.time) >> - In systems supporting file descriptors, there's a reactor >> implementation that knows how to use select/poll/etc., and there are >> concrete I/O object classes that wrap file descriptors. On Windows, >> those would only be socket file descriptors. On Unix, any file >> descriptor would do. To create such an I/O object you would use a >> platform-specific factory. There would be specialized factories to >> create e.g. listening sockets, connections, files, pipes, and so on. >> > > Jython is another interesting case - it has a select() function that > doesn't take integer file descriptors, just the opaque objects > returned by socket.fileno(). Interesting. > While it's convenient to have higher-level constructors for various > specialized types, I'd like to emphasize that having the low-level > interface is important for interoperability. Tornado doesn't know > whether the file descriptors are listening sockets, connected sockets, > or pipes, so we'd just have to pass in a file descriptor with no other > information. Yeah, the IO object will still need to have a fileno() method. >> - In systems like App Engine that don't support async I/O on file >> descriptors at all, the constructors for creating I/O objects for disk >> files and connection sockets would comply with the interface but fake >> out almost everything (just like today, using httplib or httplib2 on >> App Engine works by adapting them to a "urlfetch" RPC request). > > Why would you be allowed to make IO objects for sockets that don't > work? I would expect that to just raise an exception. On app engine > RPCs would be the only supported async I/O objects (and timers, if > those are implemented as magic I/O objects), and they're not > implemented in terms of sockets or files. Here's my use case. Suppose in general one can use async I/O for disk files, and it is integrated with the standard (abstract) event loop. So someone writes a handy templating library that wants to play nice with async apps, so it uses the async I/O idiom to read e.g. the template source code. Support I want to use that library on App Engine. It would be a pain if I had to modify that template-reading code to not use the async API. But (given the right async API!) it would be pretty simple for the App Engine API to provide a mock implementation of the async file reading API that was synchronous under the hood. Yes, it would block while waiting for disk, but App Engine uses threads anyway so it wouldn't be a problem. Another, current-day, use case is the httplib interface in the stdlib (a fairly fancy HTTP/1.1 client, although it has its flaws). That's based on sockets, which App Engine doesn't have; we have a "urlfetch" RPC that you give a URL (and more optional stuff) and returns a record containing the contents and headers. But again, many useful 3rd party libraries use httplib, and they won't work unless we somehow support httplib. So we have had to go out of our way to cover most uses of httplib. While the app believes it is opening the connection and sending the request, we are actually just buffering everything; and when the app starts reading from the connection, we make the urlfetch RPC and buffer the response, which we then feed back to the app as it believes it is reading from the socket. As long as the app doesn't try to get the socket's file descriptor and call select() it will work fine. But some libraries *do* call select(), and here our emulation breaks down. It would be nicer if the standard way to do async stuff was higher level than select(), so that we could offer the emulation at a level that would integrate with the event loop -- that way, ideally when we have to send the urlfetch RPC we could actually return a Future (or whatever), and the task would correctly be suspended, just *thinking* it was waiting for the response on a socket, but actually waiting for the RPC. Hopefully SSL provides another use case. -- --Guido van Rossum (python.org/~guido) From jeanpierreda at gmail.com Sun Oct 14 19:26:16 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sun, 14 Oct 2012 13:26:16 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 11:53 AM, Guido van Rossum wrote: >> My experience has been unfortunately rather devoid of deferreds in >> Twisted. I always feel like the odd one out when people discuss this >> confusion. For me, it was all Protocol this and Protocol that, and >> deferreds only came up when I used Twisted's great AMP (Asynchronous >> Messaging Protocol) library. > > Especially odd since you jumped into the discussion when I called > Deferreds a bad name. :-) Did I mention how great AMP was? ;) > I'm sorry, but that's not very readable at all. You needed a lambda > (which if there was anything more would have to be expanded using > 'def') and you're cheating by passing print as a callable (which saves > you a second lambda, but only in this simple case). > > A readable version of this could should not have to use lambdas. Sure. I probably erred in not using inlineCallbacks form, what I wanted to do was highlight the gatherResults function (which, as it happens, does something generators can't without invoking an external function.) My worry here was that generators are being praised for being more readable, which is true and reasonable, but I don't know that they're flexible enough to be the only way to do things. But you've stated now that you'd want futures to be there too, so... those are probably mostly flexible enough. >> Egh. I mean, sure, supposed we have those things. But what if you want >> to send the result of a callback to a generator-coroutine? Presumably >> generator coroutines work by yielding deferreds and being called back >> when the future resolves (deferred fires). > > No, they don't use deferreds. They use Futures. You've made it quite > clear that they are very different. Haha, different in API and what they can do, but they are meant to do the same thing (represent delayed results). I meant to talk about futures and deferreds equally, and ask the same questions of both of them. >> But if those >> futures/deferreds aren't unexposed, and instead only the generator >> stuff is exposed, then bridging the gap between callbacks and >> generator-coroutines is impossible. So every callback function has to >> also be defined to use something else. And worse, other APIs using >> callbacks are left in the dust. > > My plan is to expose the Futures *will* be exposed -- this is what > worked well in NDB. OK. I was confused when you said there would only be generators and simple callbacks (and so I posed questions about what happens when you have just generators, which you took to be questions aimed at Greg Ewing's thing.) > How about this: > > f = > reactor.timer(10, f.set_result, None) > > Then whoever waits for f gets woken up in 10 seconds, and the reactor > doesn't have to know what Futures are. I know that Twisted has historically agreed with the idea that the reactor shouldn't know about futures/deferreds. I'm not sure I agree it's so important. If the universal way of writing asynchronous code is generator-coroutines, then the reactor should work well with this and not require extra effort. > But I believe your whole argument may be based on a misreading of my > proposal. *I* want plain callbacks, Futures, and coroutines, and an > event loop that only knows about plain callbacks and IO objects (e.g. > sockets). You're correct. >> Now here's another thing: suppose we have a list of "deferred events", >> but instead of handling all 10 at once, we want to handle them "as >> they arrive", and then synthesize a result at the bottom. How do you >> do this with pure generator coroutines? > > Let's ask Greg that. I meant to be asking about the situation you were proposing. I thought it was just callbacks and generators, now we've added futures. Futures sans chaining can definitely implement this, just maybe not as nicely as how I'd do it. The issue is that it's a reasonable thing to want to escape the generator system in order to implement things that aren't "linear" the way generator coroutines are. And if we escape the system, it should be possible and easy to do a large variety of things. But, on the plus side, I'm convinced that it's possible, and that the necessary things will be exposed (even if it's very unpleasant, there's always helper functions...). Unless you do Greg's thing, then I'm worried again. I will read his stuff later today or tomorrow. (Unrelated: I'm not sure why I was so sure UnorderedEventList had to be that ugly. It can use a for loop... oops.) > The thing that worries me most is reimplementing httplib, urllib and > so on to use all this new machinery *and* keep the old synchronous > APIs working *even* if some code is written using the old style and > some other code wants to use the new style. (We're now deviating from futures and deferreds, but I think the part I was taking was drawing to a close anyway) Code that wants to use the old style can be integrated by calling it in a separate thread, and that's fine. If the results should be used in the asynchronous code, then have a thing that integrates with threading so that when the thread returns (or fails with an exception) it can notify a future/deferred of the outcome. Twisted's has deferToThread for this. It also has blockingCallFromThread if the synchronous code wants to talk back to the asynchronous code. And that leads me to this: Imagine if, instead of having two implementations (one synchronous, one not), we had only one (asynchronous), and then had some wrappers to make it work as a synchronous implementation as well? Here is an example of a synchronous program written in Python+Twisted, where I wrap deferlater to be a blocking function (so that it is similar to a time.sleep() followed by a function call). The reactor is started in a separate thread, and is left to die whenever the main thread dies (because thread daemons yay.) from __future__ import print_function import threading from twisted.internet import task, reactor from twisted.internet.threads import blockingCallFromThread def my_deferlater(reactor, time, callback, *args, **kwargs): return blockingCallFromThread(reactor, task.deferLater, reactor, time, callback, *args, **kwargs) # in reality, global reactor for all threads is terrible idea. # We'd want to instantiate a new reactor for # the reactor thread, and have a global REACTOR as well. # We'll just use this reactor. # This code will not work with any other twisted # code because of the global reactor shenanigans. # (But it'd work if we were able to have a reactor per thread.) REACTOR_THREAD = None def start_reactor(): global REACTOR_THREAD if REACTOR_THREAD is not None: # could be an error, or not, depending on how you feel this should be. return REACTOR_THREAD = threading.Thread(target=reactor.run, kwargs=dict( # signal handlers don't work if not in main thread. installSignalHandlers=0)) REACTOR_THREAD.daemon = True # Probably really evil. REACTOR_THREAD.start() start_reactor() my_deferlater(reactor, 1, print, "This will print after 1 second!") my_deferlater(reactor, 1, print, "This will print after 2 seconds!") my_deferlater(reactor, 1, print, "This will print after 3 seconds!") So maybe this is an option? It's really important that there not be just one global reactor, and that multiple reactors can run at the same time, for this to really work. But if that were done, then you could have a single global reactor responsible for being the back end of the new implementations of old synchronous APIs. Maybe it'd be started whenever the first call is made to a synchronous function. And maybe, to interoperate with some actual asynchronous code, you could have a way to change which reactor acts as the global reactor for synchronous APIs? I did this once, because I needed to rewrite a blocking API and wanted to use Twisted, except that I made the mistake of starting the thread when the module was created instead of on first call. This lead to a deadlock because of the global import lock... :( In principle I don't know why this would be a terrible awful idea, if it was done right, but maybe people with more experiences with threaded code can correct me. (The whole thread daemon thing necessary to make it act like a synchronous program, might be terribly insane and therefore an idea killer. I'm not sure.) I'm under the understanding that the global import lock won't cause this particular issue anymore as of Python 3.3, so perhaps starting a reactor on import is reasonable. > There's also a certain order to them, right? I'd think the state > transition diagram is something like > > connectionMade (1); dataReceived (*); connectionLost (1) > > I wonder if there are any guarantees that they will only be called in > this order, and who is supposed to enforce this? If would be awkward > if the user code would have to guard itself against this; also if the > developer made an unwarranted assumption (e.g. dataReceived is called > at least once). The docs in Twisted don't spell it out, but they do say that connectionMade should be considered to be the initializer for the connection, and that upon connectionLost the one should let the protocol be garbage collected. So, that seems like a guarantee that they are called in that order. I don't think it can really be enforced in Python (unless you want to do some jiggery pokery into model checking at runtime), but the responsibility for this failing in Twisted would be on the transport, as far as I understand it. If the transport calls back to the protocol in some invalid combination, it's the transport's fault for being broken. This is something that should be clearly documented. (It's an issue, also, regardless of whether or not a class is used to encapsulate the callbacks, or whether they are registered individually.) -- Devin From tjreedy at udel.edu Sun Oct 14 19:27:34 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 14 Oct 2012 13:27:34 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On 10/14/2012 10:36 AM, Guido van Rossum wrote: > So, can par() be as simple as > > def par(*args): > results = [] > for task in args: > result = yield from task > results.append(result) > return results > > ??? > > Or does it need to interact with the scheduler to ensure fairness? > (Not having built one of these, my intuition for how the primitives > fit together is still lacking, so excuse me for asking naive > questions.) > > Of course there's the question of what to do when one of the tasks > raises an error -- I haven't quite figured that out in NDB either, it > runs all the tasks to completion but the caller only sees the first > exception. I briefly considered having an "multi-exception" but it > felt too weird -- though I'm not married to that decision. One answer is to append the exception object to results and let the requesting code sort out what to do. def par(*args): results = [] for task in args: try: result = yield from task results.append(result) except Exception as exc: results.append(exc) return results -- Terry Jan Reedy From guido at python.org Sun Oct 14 19:33:55 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 10:33:55 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 9:15 AM, Eric Snow wrote: > On Oct 14, 2012 8:42 AM, "Guido van Rossum" wrote: >> Sadly it looks that >> >> r = yield from (f1(), f2()) >> >> ends up interpreting the tuple as the iterator, and you end up with >> >> r = (f1(), f2()) >> >> (i.e., a tuple of generators) rather than the desired >> >> r = ((yield from f1()), (yield from f2())) > > Didn't want this tangent to get lost to the async discussion. Would it be > too late to make a change along these lines? Would it be enough of an > improvement to be warranted? 3.3 has been released. It's too late. Also I'm not sure what change *could* be made. Surely yield from should just iterate over that tuple -- that's fundamental to yield from. The only thing that could be done might be to change "yield from x, y" to mean something different than "yield from (x, y)" -- but that's questionable at best, and violates many other contexts (e.g. "return x, y", "yield x, y", "for i in x, y:"). -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 14 19:39:59 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 10:39:59 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 9:18 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > On Sun, Oct 14, 2012 at 5:53 PM, Guido van Rossum wrote: > >> >> A readable version of this could should not have to use lambdas. > > > In a lot of Twisted code, it happens with methods as callback methods, > something like: > > d = self._doRPC(....) > d.addCallbacks(self._formatResponse, self._formatException) > d.addCallback(self._finish) > > That doesn't talk about gatherResults, but hopefully it makes the idea > clear. A lot of the legibility is dependant on making those method names > sensible, though. Our in-house style guide asks for limiting functions to > about ten lines, preferably half that. Works for us. I quite understand that in your ecosystem you've found best practices for every imaginable use case. And I understand that once you're part of the community and have internalized the idioms and style, it's quite readable. But you haven't shaken my belief that we can do better with the current version of the language (3.3). (FWIW, I think it would be a good idea to develop a "reference implementation" of many of these ideas outside the standard library. Depending on whether we end up adopting yield or yield from it might even support versions of Python 3 before 3.3. I certainly don't want to have to wait for 3.4 -- although that's the first opportunity for incorporating it into the stdlib.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 14 19:42:25 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 10:42:25 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy wrote: > On 10/14/2012 10:36 AM, Guido van Rossum wrote: > >> So, can par() be as simple as >> >> def par(*args): >> results = [] >> for task in args: >> result = yield from task >> results.append(result) >> return results >> >> ??? >> >> Or does it need to interact with the scheduler to ensure fairness? >> (Not having built one of these, my intuition for how the primitives >> fit together is still lacking, so excuse me for asking naive >> questions.) >> >> Of course there's the question of what to do when one of the tasks >> raises an error -- I haven't quite figured that out in NDB either, it >> runs all the tasks to completion but the caller only sees the first >> exception. I briefly considered having an "multi-exception" but it >> felt too weird -- though I'm not married to that decision. > > > One answer is to append the exception object to results and let the > requesting code sort out what to do. > > > def par(*args): > results = [] > for task in args: > try: > > result = yield from task > results.append(result) > except Exception as exc: > results.append(exc) > return results But then the caller would have to sort through the results and check for exceptions. I want the caller to be able to use try/except as well. So far the best I've come up with is to recommend that if you care about distinguishing multiple exceptions, use separate yields surrounded by separate try/except blocks. Note that the tasks can still run concurrently, just create all the futures before doing the first yield. -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sun Oct 14 19:53:26 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 14 Oct 2012 19:53:26 +0200 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <507AA056.7020907@pearwood.info> <507AD11E.4080503@stoneleaf.us> <20121014171640.06ee1143@pitrou.net> <507ADEE5.3050800@stoneleaf.us> Message-ID: <20121014195326.6483d3bd@pitrou.net> On Sun, 14 Oct 2012 08:48:53 -0700 Ethan Furman wrote: > > What behavior can I expect with your Path implementation when I try to > iterate over > > /usr/home/ethanf/some_table.dbf >>> p = Path('setup.py') >>> list(p) Traceback (most recent call last): File "", line 1, in File "./pathlib.py", line 1176, in __iter__ for name in self._accessor.listdir(self): File "./pathlib.py", line 455, in wrapped return strfunc(str(pathobj), *args) NotADirectoryError: [Errno 20] Not a directory: 'setup.py' Regards Antoine. From jstpierre at mecheye.net Sun Oct 14 19:55:38 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Sun, 14 Oct 2012 13:55:38 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds Message-ID: (Sorry if this is in the wrong place, I'm joining the conversation and I'm not sure where mailman will put it) > Alternatively, yielding a future (or whatever ones calls the objects > returned by *_async()) could register *and* wait for the result. To > register without waiting one would yield a wrapper for the future. So > one could write What would registering a Future do? As far as I understood it, the plan here is that a Future was just a marker for an outstanding request: def callback(result): print "The result was", result def say_hello(name): f = Future() f.resolve("Hello, %s!") return f f = say_hello("Jeff") f.add_callback(callback) The outstanding request doesn't have to care about socket connections; it's just a way to pass around a result that hasn't arrived yet. This is pretty much the same as Deferreds/Promises, with a different name. There's no reactor here to register here, because there doesn't need to be one. -- Jasper From guido at python.org Sun Oct 14 20:17:51 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 11:17:51 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 10:55 AM, Jasper St. Pierre wrote: > (Sorry if this is in the wrong place, I'm joining the conversation and > I'm not sure where mailman will put it) > >> Alternatively, yielding a future (or whatever ones calls the objects >> returned by *_async()) could register *and* wait for the result. To >> register without waiting one would yield a wrapper for the future. So >> one could write > > What would registering a Future do? As far as I understood it, the > plan here is that a Future was just a marker for an outstanding > request: > > def callback(result): > print "The result was", result > > def say_hello(name): > f = Future() > f.resolve("Hello, %s!") > return f > > f = say_hello("Jeff") > f.add_callback(callback) > > The outstanding request doesn't have to care about socket connections; > it's just a way to pass around a result that hasn't arrived yet. This > is pretty much the same as Deferreds/Promises, with a different name. > There's no reactor here to register here, because there doesn't need > to be one. The Future class itself probably shouldn't interface with the event loop. But an operation that creates and returns a Future certainly can. -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Sun Oct 14 20:19:50 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Sun, 14 Oct 2012 14:19:50 -0400 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 2:17 PM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 10:55 AM, Jasper St. Pierre > wrote: >> (Sorry if this is in the wrong place, I'm joining the conversation and >> I'm not sure where mailman will put it) >> >>> Alternatively, yielding a future (or whatever ones calls the objects >>> returned by *_async()) could register *and* wait for the result. To >>> register without waiting one would yield a wrapper for the future. So >>> one could write >> >> What would registering a Future do? As far as I understood it, the >> plan here is that a Future was just a marker for an outstanding >> request: >> >> def callback(result): >> print "The result was", result >> >> def say_hello(name): >> f = Future() >> f.resolve("Hello, %s!") >> return f >> >> f = say_hello("Jeff") >> f.add_callback(callback) >> >> The outstanding request doesn't have to care about socket connections; >> it's just a way to pass around a result that hasn't arrived yet. This >> is pretty much the same as Deferreds/Promises, with a different name. >> There's no reactor here to register here, because there doesn't need >> to be one. > > The Future class itself probably shouldn't interface with the event > loop. But an operation that creates and returns a Future certainly > can. Of course, but that wouldn't be done at the Future level, but at the fetch_async level. I just want to make sure that we're clear that the Future itself isn't being registered with any event loop or reactor. > -- > --Guido van Rossum (python.org/~guido) -- Jasper From guido at python.org Sun Oct 14 20:21:01 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 11:21:01 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 11:19 AM, Jasper St. Pierre wrote: > On Sun, Oct 14, 2012 at 2:17 PM, Guido van Rossum wrote: >> The Future class itself probably shouldn't interface with the event >> loop. But an operation that creates and returns a Future certainly >> can. > > Of course, but that wouldn't be done at the Future level, but at the > fetch_async level. I just want to make sure that we're clear that the > Future itself isn't being registered with any event loop or reactor. Of course. -- --Guido van Rossum (python.org/~guido) From ironfroggy at gmail.com Sun Oct 14 20:46:49 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sun, 14 Oct 2012 14:46:49 -0400 Subject: [Python-ideas] The async API of the future: PEP 3153 (async-pep) In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 1:54 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > On Sat, Oct 13, 2012 at 1:22 AM, Guido van Rossum wrote: >> >> [Hopefully this is the last spin-off thread from "asyncore: included >> batteries don't fit"] >> >> So it's totally unfinished? > > > At the time, the people I talked to placed significantly more weight in > "explain why this is necessary" than "get me something I can play with". > >> >> > Do you feel that there should be less talk about rationale? >> >> No, but I feel that there should be some actual specification. I am >> also looking forward to an actual meaty bit of example code -- ISTR >> you mentioned you had something, but that it was incomplete, and I >> can't find the link. > > > Just examples of how it would work, nothing hooked up to real code. My > memory of it is more of a drowning-in-politics-and-bikeshedding kind of > thing, unfortunately :) Either way, I'm okay with letting bygones be bygones > and focus on how we can get this show on the road. > >> > It's not that there's *no* reference to IO: it's just that that >> > reference is >> > abstracted away in data_received and the protocol's transport object, >> > just >> > like Twisted's IProtocol. >> >> The words "data_received" don't even occur in the PEP. > > > See above. > > What thread should I reply in about the pull APIs? > >> >> I just want to make sure that we don't *completely* paint ourselves into >> the wrong corner when it comes to that. > > > I don't think we have to worry about it too much. Any reasonable API I can > think of makes this completely doable. > >> But I'm really hoping you'll make good on your promise of redoing >> async-pep, giving some actual specifications and example code, so I >> can play with it. > > > Takeaways: > > - The async API of the future is very important, and too important to be > left to chance. Could not agree more. > - It requires a lot of very experienced manpower. I'm sitting on the sidelines, wishing I had much of either, because of point number 1. > - It requires a lot of effort to handle the hashing out of it (as we're > doing here) as well as it deserves to be. > > I'll take as proactive a role as I can afford to take in this process, but I > don't think I can do it by myself. Furthermore, it's a risk nobody wants to > take: a repeat performance wouldn't be good for anyone, in particular not > for Python nor myself. > > I've asked JP Calderone and Itamar Turner-Trauring if they would be > interested in carrying this forward professionally, and they have > tentatively said yes. JP's already familiar with a large part of the problem > space with the implementation of the ssl module. JP and Itamar have worked > together for years and have recently set up a consulting firm. > > Given that this is emphatically important to Python, I intend to apply for a > PSF grant on their behalf to further this goal. Given their experience in > the field, I expect this to be a fairly low risk endeavor. I like this idea. There are some problems spare time isn't enough to solve. I can't think of many people as qualified for the task. >> >> -- >> --Guido van Rossum (python.org/~guido) > > > -- > cheers > lvh > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From steve at pearwood.info Sun Oct 14 21:14:19 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 15 Oct 2012 06:14:19 +1100 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: Message-ID: <507B0F0B.8080700@pearwood.info> On 15/10/12 03:15, Eric Snow wrote: > On Oct 14, 2012 8:42 AM, "Guido van Rossum" wrote: >> Sadly it looks that >> >> r = yield from (f1(), f2()) >> >> ends up interpreting the tuple as the iterator, and you end up with >> >> r = (f1(), f2()) >> >> (i.e., a tuple of generators) rather than the desired >> >> r = ((yield from f1()), (yield from f2())) How about this? r = yield from *(f1(), f2()) which currently is a SyntaxError in 3.3. -- Steven From guido at python.org Sun Oct 14 21:19:04 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 12:19:04 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507B0F0B.8080700@pearwood.info> References: <507B0F0B.8080700@pearwood.info> Message-ID: On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano wrote: > On 15/10/12 03:15, Eric Snow wrote: >> >> On Oct 14, 2012 8:42 AM, "Guido van Rossum" wrote: >>> >>> Sadly it looks that >>> >>> r = yield from (f1(), f2()) >>> >>> ends up interpreting the tuple as the iterator, and you end up with >>> >>> r = (f1(), f2()) >>> >>> (i.e., a tuple of generators) rather than the desired >>> >>> r = ((yield from f1()), (yield from f2())) > > > How about this? > > r = yield from *(f1(), f2()) > > > which currently is a SyntaxError in 3.3. I think it's too early to start proposing new syntax for a problem we don't even know is common at all. Greg Ewing's proposal works for me: r = yield from par(f1(), f2()) -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Sun Oct 14 21:16:28 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 14 Oct 2012 12:16:28 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121014195326.6483d3bd@pitrou.net> References: <20121005202534.5f721292@pitrou.net> <20121006190821.02ae50cd@pitrou.net> <387F855A-B5BD-491D-9FA5-918F4042332F@gmail.com> <50707AD5.3030407@stoneleaf.us> <50707F91.6060301@stoneleaf.us> <50786E42.6050308@stoneleaf.us> <20121012214224.55f3ed27@pitrou.net> <50787E8A.2090804@stoneleaf.us> <20121012225306.295d93e6@pitrou.net> <507884F0.2060608@stoneleaf.us> <507A989B.9020206@pearwood.info> <20121014130318.7140255e@pitrou.net> <507AA056.7020907@pearwood.info> <507AD11E.4080503@stoneleaf.us> <20121014171640.06ee1143@pitrou.net> <507ADEE5.3050800@stoneleaf.us> <20121014195326.6483d3bd@pitrou.net> Message-ID: <507B0F8C.6080102@stoneleaf.us> Antoine Pitrou wrote: > On Sun, 14 Oct 2012 08:48:53 -0700 > Ethan Furman wrote: > >> What behavior can I expect with your Path implementation when I try to >> iterate over >> >> /usr/home/ethanf/some_table.dbf >> >>>> p = Path('setup.py') >>>> list(p) >>>> > Traceback (most recent call last): > File "", line 1, in > File "./pathlib.py", line 1176, in __iter__ > for name in self._accessor.listdir(self): > File "./pathlib.py", line 455, in wrapped > return strfunc(str(pathobj), *args) > NotADirectoryError: [Errno 20] Not a directory: 'setup.py' > Certainly reasonable, and the same behavior I would expect of, e.g., p.children(). I guess it just feels too magical to me. -1 for built-in iteration. +1 for a .children() (or other) method. ~Ethan~ From tjreedy at udel.edu Sun Oct 14 21:38:20 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sun, 14 Oct 2012 15:38:20 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On 10/14/2012 1:42 PM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy wrote: >> On 10/14/2012 10:36 AM, Guido van Rossum wrote: >>> Of course there's the question of what to do when one of the tasks >>> raises an error -- I haven't quite figured that out in NDB either, it >>> runs all the tasks to completion but the caller only sees the first >>> exception. I briefly considered having an "multi-exception" but it >>> felt too weird -- though I'm not married to that decision. >> One answer is to append the exception object to results and let the >> requesting code sort out what to do. >> >> >> def par(*args): >> results = [] >> for task in args: >> try: >> >> result = yield from task >> results.append(result) >> except Exception as exc: >> results.append(exc) >> return results > > But then the caller would have to sort through the results and check > for exceptions. I want the caller to be able to use try/except as > well. OK. Then ... def par(*args): results = [] exceptions = False for task in args: try: result = yield from task results.append(result) except Exception as exc: results.append(exc) exceptions = True if not exceptions: return results else: exc = MultiXException() exc.results = results raise exc Is this is what you meant by 'multi-exception'? caller: try: results = except MultiXException as exc: errprocess(exc.results) From mwm at mired.org Sun Oct 14 21:57:38 2012 From: mwm at mired.org (Mike Meyer) Date: Sun, 14 Oct 2012 14:57:38 -0500 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> Message-ID: <20121014145738.57948600@bhuda.mired.org> On Sun, 14 Oct 2012 07:40:57 +0200 Yuval Greenfield wrote: > On Sun, Oct 14, 2012 at 2:04 AM, MRAB wrote: > > > If it's more than one codepoint, we could prefix with the length of the > > codepoint's name: > > > > def __12CIRCLED_PLUS__(x, y): > > ... > > > > > That's a bit impractical, and why reinvent the wheel? I'd much rather: > > def \u2295(x, y): > .... > > So readable I want to read it twice. And that's not legal python today so > we don't break backwards compatibility! Yes, but we're defining an operator for instances of the class, so it needs the 'special' method marking: def __\u2295__(self, other): Now *that's* pretty! http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From tismer at stackless.com Sun Oct 14 22:55:34 2012 From: tismer at stackless.com (Christian Tismer) Date: Sun, 14 Oct 2012 22:55:34 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> Message-ID: <507B26C6.10602@stackless.com> Hmmm... On 14.10.12 21:19, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 12:14 PM, Steven D'Aprano wrote: >> On 15/10/12 03:15, Eric Snow wrote: >>> On Oct 14, 2012 8:42 AM, "Guido van Rossum" wrote: >>>> Sadly it looks that >>>> >>>> r = yield from (f1(), f2()) >>>> >>>> ends up interpreting the tuple as the iterator, and you end up with >>>> >>>> r = (f1(), f2()) >>>> >>>> (i.e., a tuple of generators) rather than the desired >>>> >>>> r = ((yield from f1()), (yield from f2())) >> >> How about this? >> >> r = yield from *(f1(), f2()) >> >> >> which currently is a SyntaxError in 3.3. > I think it's too early to start proposing new syntax for a problem we > don't even know is common at all. > > Greg Ewing's proposal works for me: > > r = yield from par(f1(), f2()) I'm not very positive about all I've read in the last 50 hours. The concept of generators IMHO gets overly bent towards modelling a sensible syntax for a problem that not even had a convincing solution in a dialect that already has full coroutines. 'par' and 'join' and friends should be considered without thinking of generators in the first place. This is attacking too many problems in one shot. My approach would be to first find out how async operations should be modelled the best under the assumption that we have a coroutine concept that works without headaches about yielding in and out from something to whatnot. After that is settled and gets consensus, then I would think about bearable patterns to implement that using generators. And when we know what we really need, maybe considering more suitable Syntax. my 0.2 thousand yen - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From joshua.landau.ws at gmail.com Sun Oct 14 23:06:44 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 14 Oct 2012 22:06:44 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121014145738.57948600@bhuda.mired.org> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: On 14 October 2012 20:57, Mike Meyer wrote: > On Sun, 14 Oct 2012 07:40:57 +0200 > Yuval Greenfield wrote: > > > On Sun, Oct 14, 2012 at 2:04 AM, MRAB > wrote: > > > > > If it's more than one codepoint, we could prefix with the length of the > > > codepoint's name: > > > > > > def __12CIRCLED_PLUS__(x, y): > > > ... > > > > > > > > That's a bit impractical, and why reinvent the wheel? I'd much rather: > > > > def \u2295(x, y): > > .... > > > > So readable I want to read it twice. And that's not legal python today so > > we don't break backwards compatibility! > > Yes, but we're defining an operator for instances of the class, so it > needs the 'special' method marking: > > def __\u2295__(self, other): > > Now *that's* pretty! > > I much preferred your first choice: def __$?__(self, other): But to keep the "$" unused we can write: def __op_?__(self, other): (new methods will take precedence over the older __add__ and so forth) What we can do then is use the "\u" syntax to let people without unicode editors have accessibility: def __op_\u2295__(self, other): ...later in the code... new = first \u2295 second Which adds consistency whereas before we could only use that in specific circumstances (inside strings), reducing cognitive burden. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 14 23:30:21 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 14:30:21 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 12:38 PM, Terry Reedy wrote: > On 10/14/2012 1:42 PM, Guido van Rossum wrote: >> On Sun, Oct 14, 2012 at 10:27 AM, Terry Reedy wrote: >>> On 10/14/2012 10:36 AM, Guido van Rossum wrote: >>>> Of course there's the question of what to do when one of the tasks >>>> raises an error -- I haven't quite figured that out in NDB either, it >>>> runs all the tasks to completion but the caller only sees the first >>>> exception. I briefly considered having an "multi-exception" but it >>>> felt too weird -- though I'm not married to that decision. > >>> One answer is to append the exception object to results and let the >>> requesting code sort out what to do. >>> >>> >>> def par(*args): >>> results = [] >>> for task in args: >>> try: >>> >>> result = yield from task >>> results.append(result) >>> except Exception as exc: >>> results.append(exc) >>> return results >> >> But then the caller would have to sort through the results and check >> for exceptions. I want the caller to be able to use try/except as >> well. > > OK. Then ... > > def par(*args): > results = [] > exceptions = False > for task in args: > try: > result = yield from task > results.append(result) > except Exception as exc: > results.append(exc) > exceptions = True > if not exceptions: > return results > else: > exc = MultiXException() > exc.results = results > raise exc > > Is this is what you meant by 'multi-exception'? Yes. > caller: > > try: > results = > successed> > except MultiXException as exc: > errprocess(exc.results) In NDB I have yet to encounter a situation where I care. -- --Guido van Rossum (python.org/~guido) From rene at stranden.com Sun Oct 14 23:55:53 2012 From: rene at stranden.com (Rene Nejsum) Date: Sun, 14 Oct 2012 23:55:53 +0200 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <7E7218BC-A367-4C2B-9B5D-C51BBA2FBEB3@stranden.com> Message-ID: On Oct 14, 2012, at 9:22 PM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum wrote: >> On the high level (Python) basically what you need is that the queue.get() >> can handle: >> 1) Python objects (as today) >> 2) timeout (as today, maybe in mills instead of seconds) >> 3) Network (socket input/state change) >> 4) File desc input/state change >> 5) Other I/O changes like serial comm, etc. >> 6) Maybe also yield based coroutine support ? >> >> This requires support from the underlaying >> OS. A support which is probably not there today ? >> >> As far as I can see, having this one extended queue.get() would nicely enable >> all high level concurrency issues in Python. > > [...] > >> I believe a "super" queue.get() would solve all use cases. >> >> I have no idea on how difficult it would be to implement in >> a cross platform manner. > > Hm. I know that a common (and often right!) recommendation for thread > communication is to use the queue module. But that module is meant to > work with threads. I think that the correct I/O primitives are more > likely to come by looking at what Tornado and Twisted have done than > by trying to "pimp up" the queue module -- it's good for what it does, > but trying to add all that new functionality to it doesn't sound like > a good fit. You are probably right about the queue class. Maybe it should be a new class, but I still believe I would be an excellent fit for doing concurrent stuff if Python had a multiplexer message queue, Python is high-level enough to be able to hide thread/select/read etc. A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which is a kind of Erlang implementation for Python, making objects concurrent and return values Futures, without adding much new code. Methods are sent asynchronous, simply by doing standard obj.method(). obj is a proxy for the real object sending method() as a message to the real object running in a separate thread. Return value is a Future. So you can do val = obj.method() ? continue async with method() ? and do some other stuff, until: print val which will hang waiting for the Future to complete, if it's not. It has been used in a couple of projects, making it much easier to do concurrent systems. But, it would be great if the object/task could wait for more events than queue.get() br /Rene > > -- > --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 00:05:26 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 15:05:26 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <7E7218BC-A367-4C2B-9B5D-C51BBA2FBEB3@stranden.com> Message-ID: On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum wrote: > > On Oct 14, 2012, at 9:22 PM, Guido van Rossum wrote: > >> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum wrote: >>> On the high level (Python) basically what you need is that the queue.get() >>> can handle: >>> 1) Python objects (as today) >>> 2) timeout (as today, maybe in mills instead of seconds) >>> 3) Network (socket input/state change) >>> 4) File desc input/state change >>> 5) Other I/O changes like serial comm, etc. >>> 6) Maybe also yield based coroutine support ? >>> >>> This requires support from the underlaying >>> OS. A support which is probably not there today ? >>> >>> As far as I can see, having this one extended queue.get() would nicely enable >>> all high level concurrency issues in Python. >> >> [...] >> >>> I believe a "super" queue.get() would solve all use cases. >>> >>> I have no idea on how difficult it would be to implement in >>> a cross platform manner. >> >> Hm. I know that a common (and often right!) recommendation for thread >> communication is to use the queue module. But that module is meant to >> work with threads. I think that the correct I/O primitives are more >> likely to come by looking at what Tornado and Twisted have done than >> by trying to "pimp up" the queue module -- it's good for what it does, >> but trying to add all that new functionality to it doesn't sound like >> a good fit. > > You are probably right about the queue class. Maybe it should be a new class, > but I still believe I would be an excellent fit for doing concurrent stuff if Python > had a multiplexer message queue, Python is high-level enough to be able to > hide thread/select/read etc. I believe that the Twisted and Tornado event loops have APIs to push work into a thread and/or process, and it will be a requirement for the new stdlib event loop. However the main focus of the current effort is not making the distinction between process, threads and tasks (or microthreads or coroutines) disappear -- it is simply to have the most useful API for tasks. > A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which > is a kind of Erlang implementation for Python, making objects concurrent and return > values Futures, without adding much new code. Methods are sent asynchronous, simply > by doing standard obj.method(). obj is a proxy for the real object sending method() as a > message to the real object running in a separate thread. Return value is a Future. So > you can do > > val = obj.method() > ? continue async with method() > ? and do some other stuff, until: > print val > > which will hang waiting for the Future to complete, if it's not. That sounds like implicit futures (to use the Wikipedia article's terminology). I'm not a big fan of that. In fact, I'm proposing an API where all task switching is explicit, using the yield keyword (or yield from), and accessing the value of a future is also explicit in such a system. > It has been used in a couple of projects, making it much easier to do concurrent systems. > But, it would be great if the object/task could wait for more events than queue.get() I still think you're focused more on concurrent CPU activity than async I/O. These are quire different fields, even though they often use similar terminology (like future, task/thread/process, concurrent/parallel, spawn/join, queue). I think the keyword that most distinguishes them is "event". If you hear people talk about events they are probably multiplexing I/O, not CPU activities. -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Mon Oct 15 00:08:25 2012 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 14 Oct 2012 23:08:25 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: <507B37D9.6040504@mrabarnett.plus.com> On 2012-10-14 22:06, Joshua Landau wrote: > On 14 October 2012 20:57, Mike Meyer > wrote: > > On Sun, 14 Oct 2012 07:40:57 +0200 > Yuval Greenfield > wrote: > > > On Sun, Oct 14, 2012 at 2:04 AM, MRAB > wrote: > > > > > If it's more than one codepoint, we could prefix with the > length of the > > > codepoint's name: > > > > > > def __12CIRCLED_PLUS__(x, y): > > > ... > > > > > > > > That's a bit impractical, and why reinvent the wheel? I'd much > rather: > > > > def \u2295(x, y): > > .... > > > > So readable I want to read it twice. And that's not legal python > today so > > we don't break backwards compatibility! > > Yes, but we're defining an operator for instances of the class, so it > needs the 'special' method marking: > > def __\u2295__(self, other): > > Now *that's* pretty! > > > > I much preferred your first choice: > def __$?__(self, other): > > But to keep the "$" unused we can write: > def __op_?__(self, other): > (new methods will take precedence over the older __add__ and so forth) > > What we can do then is use the "\u" syntax to let people without unicode > editors have accessibility: > def __op_\u2295__(self, other): > ...later in the code... > new = first \u2295 second > > Which adds consistency whereas before we could only use that in > specific circumstances (inside strings), reducing cognitive burden. > I don't think we should change what happens inside a string literal. Consider what would happen if you wanted to write "\\u0190". It would convert that into "\?". IIRC, Java can suffer from that kind of problem because \uXXXX is treated as that codepoint wherever it occurs. From ben at bendarnell.com Mon Oct 15 00:09:10 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 14 Oct 2012 15:09:10 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum wrote: >> So it would look something like >> >> Yield-from: >> task1 = subtask1(args1) >> task2 = subtask2(args2) >> res1, res2 = yield from par(task1, task2) >> >> where the implementation of par() is left as an exercise for >> the reader. > > So, can par() be as simple as > > def par(*args): > results = [] > for task in args: > result = yield from task > results.append(result) > return results > > ??? > > Or does it need to interact with the scheduler to ensure fairness? > (Not having built one of these, my intuition for how the primitives > fit together is still lacking, so excuse me for asking naive > questions.) It's not just fairness, it needs to interact with the scheduler to get any parallelism at all if the sub-generators have more than one step. Consider: def task1(): print "1A" yield print "1B" yield print "1C" # and so on... def task2(): print "2A" yield print "2B" yield print "2C" def outer(): yield from par(task1(), task2()) Both tasks are started immediately, but can't progress further until they are yielded from to advance the iterator. So with this version of par() you get 1A, 2A, 1B, 1C..., 2B, 2C. To get parallelism I think you have to schedule each sub-generator separately instead of just yielding from them (which negates some of the benefits of yield from like easy error handling). Even if there is a clever version of par() that works more like yield from, you'd need to go back to explicit scheduling if you wanted parallel execution without forcing everything to finish at the same time (which is simple with Futures). > > Of course there's the question of what to do when one of the tasks > raises an error -- I haven't quite figured that out in NDB either, it > runs all the tasks to completion but the caller only sees the first > exception. I briefly considered having an "multi-exception" but it > felt too weird -- though I'm not married to that decision. In general for this kind of parallel operation I think it's fine to say that one (unspecified) exception is raised in the outer function and the rest are hidden. With futures, "(r1, r2) = yield (f1, f2)" is just shorthand for "r1 = yield f1; r2 = yield f2", so separating the yields to have separate try/except blocks is no problem. WIth yield from it's not as good because the second operation can't proceed while the outer function is waiting for the first. -Ben From ben at bendarnell.com Mon Oct 15 00:19:27 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 14 Oct 2012 15:19:27 -0700 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user In-Reply-To: References: Message-ID: On Sat, Oct 13, 2012 at 3:27 PM, Daniel McDougall wrote: > (This is a response to GVR's Google+ post asking for ideas; I > apologize in advance if I come off as an ignorant programming newbie) > > I am the author of Gate One (https://github.com/liftoff/GateOne/) > which makes extensive use of Tornado's asynchronous capabilities. It > also uses multiprocessing and threading to a lesser extent. The > biggest issue I've had trying to write asynchronous code for Gate One > is complexity. Complexity creates problems with expressiveness which > results in code that, to me, feels un-Pythonic. For evidence of this > I present the following example: The retrieve_log_playback() > function: http://bit.ly/W532m6 (link goes to Github) > > All the function does is generate and return (to the client browser) > an HTML playback of their terminal session recording. To do it > efficiently without blocking the event loop or slowing down all other > connected clients required loads of complexity (or maybe I'm just > ignorant of "a better way"--feel free to enlighten me). In an ideal > world I could have just done something like this: > > import async # The API of the future ;) > async.async_call(retrieve_log_playback, settings, tws, > mechanism=multiprocessing) > # tws == instance of tornado.web.WebSocketHandler that holds the open connection What you've described is very similar the the concurrent.futures.Executor.submit() method. ProcessPoolExecutor still has multiprocessing's pickle-related limitations, but other than that you're free to create ProcessPoolExecutors and/or ThreadPoolExecutors and submit work to them. Your retrieve_log_playback function could become: # create a global/singleton ProcessPoolExecutor executor = concurrent.futures.ProcessPoolExecutor() def retrieve_log_playback(settings, tws=None): # set up settings dict just like the original io_loop = tornado.ioloop.IOLoop.instance() future = executor.submit(_retrieve_log_playback, settings) def send_message(future): tws.write_message(future.result()) future.add_done_callback(lambda future: io_loop.add_callback(send_message) In Tornado 3.0 there will be some native support for Futures - the last line will probably become "io_loop.add_future(future, send_message)". In _retrieve_log_playback you no longer have a queue argument, and instead just return the result normally. It's also possible to do this just using multiprocessing instead of concurrent.futures - see multiprocessing.Pool.apply_async. -Ben From guido at python.org Mon Oct 15 00:27:43 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 15:27:43 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell wrote: > On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum wrote: >>> So it would look something like >>> >>> Yield-from: >>> task1 = subtask1(args1) >>> task2 = subtask2(args2) >>> res1, res2 = yield from par(task1, task2) >>> >>> where the implementation of par() is left as an exercise for >>> the reader. >> >> So, can par() be as simple as >> >> def par(*args): >> results = [] >> for task in args: >> result = yield from task >> results.append(result) >> return results >> >> ??? >> >> Or does it need to interact with the scheduler to ensure fairness? >> (Not having built one of these, my intuition for how the primitives >> fit together is still lacking, so excuse me for asking naive >> questions.) > > It's not just fairness, it needs to interact with the scheduler to get > any parallelism at all if the sub-generators have more than one step. > Consider: > > def task1(): > print "1A" > yield > print "1B" > yield > print "1C" > # and so on... > > def task2(): > print "2A" > yield > print "2B" > yield > print "2C" > > def outer(): > yield from par(task1(), task2()) Hm, that's a little unrealistic -- in practice you'll rarely see code that yields unless it is also blocking for I/O. I presume that if both tasks immediately block for I/O, the one whose I/O completes first gets the run next; and if it then blocks again, it'll again depend on whose I/O finishes first. (Admittedly this has little to do with fairness now.) > Both tasks are started immediately, but can't progress further until > they are yielded from to advance the iterator. So with this version > of par() you get 1A, 2A, 1B, 1C..., 2B, 2C. Really? When you call a generator, it doesn't run until the first yield; it gets suspended before the first bytecode of the body. So if anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove your point just as much of course.) Sadly I don't have a framework lying around where I can test this easily -- I'm pretty sure that the equivalent code in NDB interacts with the scheduler in a way that ensures round-robin scheduling. > To get parallelism I > think you have to schedule each sub-generator separately instead of > just yielding from them (which negates some of the benefits of yield > from like easy error handling). Honestly I don't mind of the scheduler has to be messy, as long the mess is hidden from the caller. > Even if there is a clever version of par() that works more like yield > from, you'd need to go back to explicit scheduling if you wanted > parallel execution without forcing everything to finish at the same > time (which is simple with Futures). Why wouldn't all generators that aren't blocked for I/O just run until their next yield, in a round-robin fashion? That's fair enough for me. But as I said, my intuition for how things work in Greg's world is not very good. >> Of course there's the question of what to do when one of the tasks >> raises an error -- I haven't quite figured that out in NDB either, it >> runs all the tasks to completion but the caller only sees the first >> exception. I briefly considered having an "multi-exception" but it >> felt too weird -- though I'm not married to that decision. > > In general for this kind of parallel operation I think it's fine to > say that one (unspecified) exception is raised in the outer function > and the rest are hidden. With futures, "(r1, r2) = yield (f1, f2)" is > just shorthand for "r1 = yield f1; r2 = yield f2", so separating the > yields to have separate try/except blocks is no problem. With yield > from it's not as good because the second operation can't proceed while > the outer function is waiting for the first. Hmmm, I think I see your point. This seems to follow if (as Greg insists) you don't have any decorators on the generators. OTOH I am okay with only getting one of the exceptions. But I think all of the remaining tasks should still be run to completion -- maybe the caller just cared about their side effects. Or maybe this should be an option to par(). -- --Guido van Rossum (python.org/~guido) From ericsnowcurrently at gmail.com Mon Oct 15 00:34:08 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Sun, 14 Oct 2012 16:34:08 -0600 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: On Oct 14, 2012 11:27 AM, "Devin Jeanpierre" wrote: > I did this once, because I needed to rewrite a blocking API and wanted > to use Twisted, except that I made the mistake of starting the thread > when the module was created instead of on first call. This lead to a > deadlock because of the global import lock... :( In principle I don't > know why this would be a terrible awful idea, if it was done right, > but maybe people with more experiences with threaded code can correct > me. > > (The whole thread daemon thing necessary to make it act like a > synchronous program, might be terribly insane and therefore an idea > killer. I'm not sure.) > > I'm under the understanding that the global import lock won't cause > this particular issue anymore as of Python 3.3, so perhaps starting a > reactor on import is reasonable. Yeah, while a global import lock still exists, it's used just long enough to get a per-module lock. On top of that, the import system now uses importlib (read: pure Python) for most functionality, which has bearing on threading and ease of better accommodating async if needed. -import -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Mon Oct 15 00:41:01 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 15 Oct 2012 01:41:01 +0300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On 14.10.12 22:38, Terry Reedy wrote: > On 10/14/2012 1:42 PM, Guido van Rossum wrote: > > But then the caller would have to sort through the results and check > > for exceptions. I want the caller to be able to use try/except as > > well. > OK. Then ... def par(*args): results = [] exceptions = False for task in args: try: result = yield from task if exceptions: results.append(StopIteration(result)) else: results.append(result) except Exception as exc: results = [StopIteration(result) for result in results] results.append(exc) exceptions = True if not exceptions: return results else: exc = MultiXException() exc.results = results raise exc From joshua.landau.ws at gmail.com Mon Oct 15 00:42:09 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Sun, 14 Oct 2012 23:42:09 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507B37D9.6040504@mrabarnett.plus.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <507B37D9.6040504@mrabarnett.plus.com> Message-ID: On 14 October 2012 23:08, MRAB wrote: > On 2012-10-14 22:06, Joshua Landau wrote: > >> On 14 October 2012 20:57, Mike Meyer > > wrote: >> >> On Sun, 14 Oct 2012 07:40:57 +0200 >> Yuval Greenfield > **> wrote: >> >> > On Sun, Oct 14, 2012 at 2:04 AM, MRAB > >> >> wrote: >> > >> > > If it's more than one codepoint, we could prefix with the >> length of the >> > > codepoint's name: >> > > >> > > def __12CIRCLED_PLUS__(x, y): >> > > ... >> > > >> > > >> > That's a bit impractical, and why reinvent the wheel? I'd much >> rather: >> > >> > def \u2295(x, y): >> > .... >> > >> > So readable I want to read it twice. And that's not legal python >> today so >> > we don't break backwards compatibility! >> >> Yes, but we're defining an operator for instances of the class, so it >> needs the 'special' method marking: >> >> def __\u2295__(self, other): >> >> Now *that's* pretty! >> >> > >> >> I much preferred your first choice: >> def __$?__(self, other): >> >> But to keep the "$" unused we can write: >> def __op_?__(self, other): >> (new methods will take precedence over the older __add__ and so forth) >> >> What we can do then is use the "\u" syntax to let people without unicode >> editors have accessibility: >> def __op_\u2295__(self, other): >> ...later in the code... >> new = first \u2295 second >> >> Which adds consistency whereas before we could only use that in >> specific circumstances (inside strings), reducing cognitive burden. >> >> I don't think we should change what happens inside a string literal. > > Consider what would happen if you wanted to write "\\u0190". It would > convert that into "\?". > > IIRC, Java can suffer from that kind of problem because \uXXXX is > treated as that codepoint wherever it occurs. No, no. "\\" would have priority, still. "\\uXXXX" is invalid outside of a string, anyway, so we're allowed to say that. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 15 00:45:23 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 11:45:23 +1300 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <11A498C1-9ACE-4F64-8147-D3CE173EC279@umbrellacode.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <507A4DA1.2070701@canterbury.ac.nz> <11A498C1-9ACE-4F64-8147-D3CE173EC279@umbrellacode.com> Message-ID: <507B4083.5050002@canterbury.ac.nz> Shane Green wrote: >> On 14/10/2012 6:29am, Greg Ewing wrote: >> >>> Once it has reported that >>> a given file descriptor is ready, it *won't* report that file >>> descriptor again until you do something with it. >> Unless I have misunderstood you, the following example contradicts that: It does indeed contradict me. It looks like this is implementation-dependent, because I distinctly remember encountering a bug once that I traced back to the fact that I wasn't servicing *all* the fds reported as ready before making another select call. Since then I've always been careful to do that, so it's possible that the behaviour has changed in the meantime and I haven't noticed. -- Greg From ben at bendarnell.com Mon Oct 15 00:55:46 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 14 Oct 2012 15:55:46 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 3:27 PM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 3:09 PM, Ben Darnell wrote: >> On Sun, Oct 14, 2012 at 7:36 AM, Guido van Rossum wrote: >>>> So it would look something like >>>> >>>> Yield-from: >>>> task1 = subtask1(args1) >>>> task2 = subtask2(args2) >>>> res1, res2 = yield from par(task1, task2) >>>> >>>> where the implementation of par() is left as an exercise for >>>> the reader. >>> >>> So, can par() be as simple as >>> >>> def par(*args): >>> results = [] >>> for task in args: >>> result = yield from task >>> results.append(result) >>> return results >>> >>> ??? >>> >>> Or does it need to interact with the scheduler to ensure fairness? >>> (Not having built one of these, my intuition for how the primitives >>> fit together is still lacking, so excuse me for asking naive >>> questions.) >> >> It's not just fairness, it needs to interact with the scheduler to get >> any parallelism at all if the sub-generators have more than one step. >> Consider: >> >> def task1(): >> print "1A" >> yield >> print "1B" >> yield >> print "1C" >> # and so on... >> >> def task2(): >> print "2A" >> yield >> print "2B" >> yield >> print "2C" >> >> def outer(): >> yield from par(task1(), task2()) > > Hm, that's a little unrealistic -- in practice you'll rarely see code > that yields unless it is also blocking for I/O. I presume that if both > tasks immediately block for I/O, the one whose I/O completes first > gets the run next; and if it then blocks again, it'll again depend on > whose I/O finishes first. > > (Admittedly this has little to do with fairness now.) > >> Both tasks are started immediately, but can't progress further until >> they are yielded from to advance the iterator. So with this version >> of par() you get 1A, 2A, 1B, 1C..., 2B, 2C. > > Really? When you call a generator, it doesn't run until the first > yield; it gets suspended before the first bytecode of the body. So if > anything, you might get 1A, 1B, 1C, 2A, 2B, 2C. (Which would prove > your point just as much of course.) Ah, OK. I was mistaken about the "first yield" part, but the rest stands. The problem is that as soon as task1 blocks on IO, the entire current task (which includes outer(), par(), and both children) gets unscheduled. no part of task2 gets scheduled until it gets yielded from, because the scheduler can't see it until then. > > Sadly I don't have a framework lying around where I can test this > easily -- I'm pretty sure that the equivalent code in NDB interacts > with the scheduler in a way that ensures round-robin scheduling. > >> To get parallelism I >> think you have to schedule each sub-generator separately instead of >> just yielding from them (which negates some of the benefits of yield >> from like easy error handling). > > Honestly I don't mind of the scheduler has to be messy, as long the > mess is hidden from the caller. Agreed. > >> Even if there is a clever version of par() that works more like yield >> from, you'd need to go back to explicit scheduling if you wanted >> parallel execution without forcing everything to finish at the same >> time (which is simple with Futures). > > Why wouldn't all generators that aren't blocked for I/O just run until > their next yield, in a round-robin fashion? That's fair enough for me. > > But as I said, my intuition for how things work in Greg's world is not > very good. The good and bad parts of this proposal both stem from the fact that yield from is very similar to just inlining everything together. This gives you the exception handling semantics that you expect from synchronous code, but it means that the scheduler can't distinguish between subtasks; you have to explicitly schedule them as top-level tasks. > >>> Of course there's the question of what to do when one of the tasks >>> raises an error -- I haven't quite figured that out in NDB either, it >>> runs all the tasks to completion but the caller only sees the first >>> exception. I briefly considered having an "multi-exception" but it >>> felt too weird -- though I'm not married to that decision. >> >> In general for this kind of parallel operation I think it's fine to >> say that one (unspecified) exception is raised in the outer function >> and the rest are hidden. With futures, "(r1, r2) = yield (f1, f2)" is >> just shorthand for "r1 = yield f1; r2 = yield f2", so separating the >> yields to have separate try/except blocks is no problem. With yield >> from it's not as good because the second operation can't proceed while >> the outer function is waiting for the first. > > Hmmm, I think I see your point. This seems to follow if (as Greg > insists) you don't have any decorators on the generators. > > OTOH I am okay with only getting one of the exceptions. But I think > all of the remaining tasks should still be run to completion -- maybe > the caller just cared about their side effects. Or maybe this should > be an option to par(). That's probably a good idea. -Ben > > -- > --Guido van Rossum (python.org/~guido) From rene at stranden.com Mon Oct 15 01:08:42 2012 From: rene at stranden.com (Rene Nejsum) Date: Mon, 15 Oct 2012 01:08:42 +0200 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <7E7218BC-A367-4C2B-9B5D-C51BBA2FBEB3@stranden.com> Message-ID: <55EF4C53-D922-42FC-8863-EED452BF4093@stranden.com> On Oct 15, 2012, at 12:05 AM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 2:55 PM, Rene Nejsum wrote: >> >> On Oct 14, 2012, at 9:22 PM, Guido van Rossum wrote: >> >>> On Sun, Oct 14, 2012 at 10:51 AM, Rene Nejsum wrote: >>>> On the high level (Python) basically what you need is that the queue.get() >>>> can handle: >>>> 1) Python objects (as today) >>>> 2) timeout (as today, maybe in mills instead of seconds) >>>> 3) Network (socket input/state change) >>>> 4) File desc input/state change >>>> 5) Other I/O changes like serial comm, etc. >>>> 6) Maybe also yield based coroutine support ? >>>> >>>> This requires support from the underlaying >>>> OS. A support which is probably not there today ? >>>> >>>> As far as I can see, having this one extended queue.get() would nicely enable >>>> all high level concurrency issues in Python. >>> >>> [...] >>> >>>> I believe a "super" queue.get() would solve all use cases. >>>> >>>> I have no idea on how difficult it would be to implement in >>>> a cross platform manner. >>> >>> Hm. I know that a common (and often right!) recommendation for thread >>> communication is to use the queue module. But that module is meant to >>> work with threads. I think that the correct I/O primitives are more >>> likely to come by looking at what Tornado and Twisted have done than >>> by trying to "pimp up" the queue module -- it's good for what it does, >>> but trying to add all that new functionality to it doesn't sound like >>> a good fit. >> >> You are probably right about the queue class. Maybe it should be a new class, >> but I still believe I would be an excellent fit for doing concurrent stuff if Python >> had a multiplexer message queue, Python is high-level enough to be able to >> hide thread/select/read etc. > > I believe that the Twisted and Tornado event loops have APIs to push > work into a thread and/or process, and it will be a requirement for > the new stdlib event loop. However the main focus of the current > effort is not making the distinction between process, threads and > tasks (or microthreads or coroutines) disappear -- it is simply to > have the most useful API for tasks. > >> A while ago I implemented pyworks (bitbucket.org/raindog/pyworks) which >> is a kind of Erlang implementation for Python, making objects concurrent and return >> values Futures, without adding much new code. Methods are sent asynchronous, simply >> by doing standard obj.method(). obj is a proxy for the real object sending method() as a >> message to the real object running in a separate thread. Return value is a Future. So >> you can do >> >> val = obj.method() >> ? continue async with method() >> ? and do some other stuff, until: >> print val >> >> which will hang waiting for the Future to complete, if it's not. > > That sounds like implicit futures (to use the Wikipedia article's > terminology). I'm not a big fan of that. In fact, I'm proposing an API > where all task switching is explicit, using the yield keyword (or > yield from), and accessing the value of a future is also explicit in > such a system. You are right, it's implicit. An I think I understand your concern, how much should be hidden/implicit and how much should be left to the programmer. IMHO Python is such an excellent tool, mainly because it hides a lot of details. Things like Memory management, GC, threads and concurrency should be (and - I believe - can be hidden for the developer. >> It has been used in a couple of projects, making it much easier to do concurrent systems. >> But, it would be great if the object/task could wait for more events than queue.get() > > I still think you're focused more on concurrent CPU activity than > async I/O. These are quire different fields, even though they often > use similar terminology (like future, task/thread/process, > concurrent/parallel, spawn/join, queue). I think the keyword that most > distinguishes them is "event". If you hear people talk about events > they are probably multiplexing I/O, not CPU activities. Yes and No. My field of concurrency and IO is process control, like controlling high speed sorting machines with a lot of IO from 24V inputs, scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO, state and parallelism (concurrent CPU). when you have an async (I/O) event, you need some kind of concurrency to handle it at the next level. It is difficult to do concurrent CPU activity without events, even if they are only signal events on a semaphore. One difference from ex. web servers is that we at design time, knows exactly who many tasks we need and what the maximum load is going to be. Typical between 50 to 100 tasks/threads sending messages to each other. br /Rene > > -- > --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 01:26:11 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 16:26:11 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <55EF4C53-D922-42FC-8863-EED452BF4093@stranden.com> References: <7E7218BC-A367-4C2B-9B5D-C51BBA2FBEB3@stranden.com> <55EF4C53-D922-42FC-8863-EED452BF4093@stranden.com> Message-ID: On Sun, Oct 14, 2012 at 4:08 PM, Rene Nejsum wrote: > > On Oct 15, 2012, at 12:05 AM, Guido van Rossum wrote: [...] >> That sounds like implicit futures (to use the Wikipedia article's >> terminology). I'm not a big fan of that. In fact, I'm proposing an API >> where all task switching is explicit, using the yield keyword (or >> yield from), and accessing the value of a future is also explicit in >> such a system. > > You are right, it's implicit. An I think I understand your concern, how > much should be hidden/implicit and how much should be left to the > programmer. IMHO Python is such an excellent tool, mainly > because it hides a lot of details. Things like Memory management, GC, > threads and concurrency should be (and - I believe - can be hidden for > the developer. I don't think you can hide threads or concurrency. You can offer different APIs to work with them that have different advantages and disadvantages, but I don't think you can *hide* them any more than you can hide language constructs like classes or sequences. >> I still think you're focused more on concurrent CPU activity than >> async I/O. These are quire different fields, even though they often >> use similar terminology (like future, task/thread/process, >> concurrent/parallel, spawn/join, queue). I think the keyword that most >> distinguishes them is "event". If you hear people talk about events >> they are probably multiplexing I/O, not CPU activities. > > Yes and No. My field of concurrency and IO is process control, like > controlling high speed sorting machines with a lot of IO from 24V inputs, > scanners, scales, OCR, serial ports, etc. So for me it's a combination of concurrent IO, > state and parallelism (concurrent CPU). when you have an async (I/O) event, > you need some kind of concurrency to handle it at the next level. > It is difficult to do concurrent CPU activity without events, even if > they are only signal events on a semaphore. Can you do it with threads? Because if threads serve your purpose, they are probably easier to use than the async API we're considering here, especially given your desire to hide unnecessary details. The async APIs under consideration (Twisted, Tornado, coroutines) all intentionally makes task switching explicit. You may also consider greenlets/gevent, which is a compromise that makes task-switching semi-explicit -- only certain calls cause task switches, but those calls may be hidden inside other calls (or even overloaded operations like __getattr__). > One difference from ex. web servers is that we at design time, knows > exactly who many tasks we need and what the maximum load is going > to be. Typical between 50 to 100 tasks/threads sending messages to > each other. That does sound like threads are just fine for you. Of course you may have to craft your own synchronization primitives out of the lower-level locks and queues offered by the stdlib... -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 01:28:56 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 16:28:56 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 3:55 PM, Ben Darnell wrote: > Ah, OK. I was mistaken about the "first yield" part, but the rest > stands. The problem is that as soon as task1 blocks on IO, the entire > current task (which includes outer(), par(), and both children) gets > unscheduled. no part of task2 gets scheduled until it gets yielded > from, because the scheduler can't see it until then. Ah, yes. I had forgotten that the whole stack (at least all frames currently blocked in yield-from) is suspended. I really hope that Greg has a working implementation of par(). [...] > The good and bad parts of this proposal both stem from the fact that > yield from is very similar to just inlining everything together. This > gives you the exception handling semantics that you expect from > synchronous code, but it means that the scheduler can't distinguish > between subtasks; you have to explicitly schedule them as top-level > tasks. I'm beginning to see that. Thanks for helping me form my intuition about how this stuff works! -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Mon Oct 15 01:46:23 2012 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 15 Oct 2012 00:46:23 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <507B37D9.6040504@mrabarnett.plus.com> Message-ID: <507B4ECF.8060406@mrabarnett.plus.com> On 2012-10-14 23:42, Joshua Landau wrote: > On 14 October 2012 23:08, MRAB > wrote: > > On 2012-10-14 22:06, Joshua Landau wrote: > > On 14 October 2012 20:57, Mike Meyer > >> wrote: > > On Sun, 14 Oct 2012 07:40:57 +0200 > Yuval Greenfield > >__> wrote: > > > On Sun, Oct 14, 2012 at 2:04 AM, MRAB > > >> wrote: > > > > > If it's more than one codepoint, we could prefix with the > length of the > > > codepoint's name: > > > > > > def __12CIRCLED_PLUS__(x, y): > > > ... > > > > > > > > That's a bit impractical, and why reinvent the wheel? > I'd much > rather: > > > > def \u2295(x, y): > > .... > > > > So readable I want to read it twice. And that's not > legal python > today so > > we don't break backwards compatibility! > > Yes, but we're defining an operator for instances of the > class, so it > needs the 'special' method marking: > > def __\u2295__(self, other): > > Now *that's* pretty! > > > > I much preferred your first choice: > def __$?__(self, other): > > But to keep the "$" unused we can write: > def __op_?__(self, other): > (new methods will take precedence over the older __add__ and so > forth) > > What we can do then is use the "\u" syntax to let people without > unicode > editors have accessibility: > def __op_\u2295__(self, other): > ...later in the code... > new = first \u2295 second > > Which adds consistency whereas before we could only use that in > specific circumstances (inside strings), reducing cognitive burden. > > I don't think we should change what happens inside a string literal. > > Consider what would happen if you wanted to write "\\u0190". It would > convert that into "\?". > > IIRC, Java can suffer from that kind of problem because \uXXXX is > treated as that codepoint wherever it occurs. > > > No, no. "\\" would have priority, still. "\\uXXXX" is invalid outside of > a string, anyway, so we're allowed to say that. > OK, but what about raw string literals? Currently, "\\u0190" == r"\u0190", but "\\u0190" != r"?". From greg.ewing at canterbury.ac.nz Mon Oct 15 01:49:49 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 12:49:49 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: <507B4F9D.6040701@canterbury.ac.nz> Guido van Rossum wrote: > On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing > wrote: >>You could go further and say that yielding a tuple of generators >>means to spawn them all concurrently, wait for them all to >>complete and send back a tuple of the results. The yield-from >>code would then look pretty much the same as the futures code. > > Sadly it looks that > > r = yield from (f1(), f2()) > > ends up interpreting the tuple as the iterator, That's not yielding a tuple of generators. This is: r = yield (f1(), f2()) Note the absence of 'from'. > So, can par() be as simple as > > def par(*args): > results = [] > for task in args: > result = yield from task > results.append(result) > return results No, it can't be as simple as that, because that will just execute the tasks sequentially. It would have to be something like this: def par(*tasks): n = len(tasks) results = [None] * n for i, task in enumerate(tasks): def thunk(): nonlocal n results[i] = yield from task n -= 1 scheduler.schedule(thunk) while n > 0: yield return results Not exactly straightforward, but that's why we write it once and put it in the library. :-) > Of course there's the question of what to do when one of the tasks > raises an error -- I haven't quite figured that out in NDB either, it > runs all the tasks to completion but the caller only sees the first > exception. I briefly considered having an "multi-exception" but it > felt too weird -- though I'm not married to that decision. Hmmm. Probably what should happen is that all the other tasks get cancelled and then the exception gets propagated to the caller of par(). If we assume another couple of primitives: scheduler.cancel(task) -- cancels the task scheduler.throw(task, exc) -- raises an exception in the task then we could implement it this way: def par(*tasks): n = len(tasks) results = [None] * n this = scheduler.current_task for i, task in enumerate(tasks): def thunk(): nonlocal n try: results[i] = yield from task except BaseException as e: for t in tasks: scheduler.cancel(t) scheduler.throw(this, e) n -= 1 scheduler.schedule(thunk) while n > 0: yield return results >>>(10) Registering additional callbacks While we're at it: class task_with_callbacks(): def __init__(self, task): self.task = task self.callbacks = [] def add_callback(self, cb): self.callbacks.append(cb) def run(self): result = yield from self.task for cb in self.callbacks: cb() return result > Here's another pattern that I can't quite figure out. ... > Essentially, it's a barrier pattern where multiple tasks (each > representing a different HTTP request, and thus not all starting at > the same time) render a partial web page and then block until a new > HTTP request comes in that provides the missing info. This should be fairly straightforward. waiters = [] # Tasks waiting for the event When a task wants to wait: scheduler.block(waiters) When the event occurs: for t in waiters: scheduler.schedule(t) del waiters[:] Incidentally, this is a commonly encountered pattern known as a "condition queue" in IPC parlance. I envisage that the async library would provide encapsulations of this and other standard IPC mechanisms such as mutexes, semaphores, channels, etc. -- Greg From shibturn at gmail.com Mon Oct 15 01:55:45 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 15 Oct 2012 00:55:45 +0100 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <507B4083.5050002@canterbury.ac.nz> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <507A4DA1.2070701@canterbury.ac.nz> <11A498C1-9ACE-4F64-8147-D3CE173EC279@umbrellacode.com> <507B4083.5050002@canterbury.ac.nz> Message-ID: On 14/10/2012 11:45pm, Greg Ewing wrote: > It does indeed contradict me. It looks like this is > implementation-dependent, because I distinctly remember > encountering a bug once that I traced back to the fact > that I wasn't servicing *all* the fds reported as ready > before making another select call. Could it have been that some fds were being starved because the earlier ones in the lists were getting priority? Servicing all fds reported prevents such starvation problems. -- Richard From joshua.landau.ws at gmail.com Mon Oct 15 02:12:45 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 15 Oct 2012 01:12:45 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507B4ECF.8060406@mrabarnett.plus.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <507B37D9.6040504@mrabarnett.plus.com> <507B4ECF.8060406@mrabarnett.plus.com> Message-ID: On 15 October 2012 00:46, MRAB wrote: > OK, but what about raw string literals? Currently, "\\u0190" == > r"\u0190", but "\\u0190" != r"?". The ?r"? prefix escapes all escapes, so will escape this escape too. Hence, this behaviour is un...escaped ;). -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 15 02:23:52 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 13:23:52 +1300 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: References: Message-ID: <507B5798.4080603@canterbury.ac.nz> Guido van Rossum wrote: > The thing that worries me most is reimplementing httplib, urllib and > so on to use all this new machinery *and* keep the old synchronous > APIs working *even* if some code is written using the old style and > some other code wants to use the new style. I think this could be handled the same way you alluded to before when talking about the App Engine. The base implementation is asynchronous, and you provide a synchronous API that sets up an async operation and then runs a nested event loop until it completes. -- Greg From guido at python.org Mon Oct 15 02:35:25 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 14 Oct 2012 17:35:25 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507B4F9D.6040701@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 4:49 PM, Greg Ewing wrote: > Guido van Rossum wrote: >> >> On Fri, Oct 12, 2012 at 10:05 PM, Greg Ewing >> wrote: > > >>> You could go further and say that yielding a tuple of generators >>> means to spawn them all concurrently, wait for them all to >>> complete and send back a tuple of the results. The yield-from >>> code would then look pretty much the same as the futures code. >> >> >> Sadly it looks that >> >> r = yield from (f1(), f2()) >> >> ends up interpreting the tuple as the iterator, > > > That's not yielding a tuple of generators. This is: > > r = yield (f1(), f2()) > > Note the absence of 'from'. That's what I meant -- excuse me for not writing "yield-fromming". :-) >> So, can par() be as simple as >> >> def par(*args): >> results = [] >> for task in args: >> result = yield from task >> results.append(result) >> return results > > > No, it can't be as simple as that, because that will just > execute the tasks sequentially. Yeah, Ben just cleared that up for me. > It would have to be something like this: > > def par(*tasks): > n = len(tasks) > results = [None] * n > for i, task in enumerate(tasks): > def thunk(): > nonlocal n > results[i] = yield from task > n -= 1 > scheduler.schedule(thunk) > while n > 0: > yield > return results > > Not exactly straightforward, but that's why we write it once > and put it in the library. :-) But, as Christian Tismer wrote, we need to have some kind of idea of what the primitives are that we want to support. Or should we just have async equivalents for everything in threading.py and queue.py? (What about thread-local? Do we need task-local? Shudder.) >> Of course there's the question of what to do when one of the tasks >> raises an error -- I haven't quite figured that out in NDB either, it >> runs all the tasks to completion but the caller only sees the first >> exception. I briefly considered having an "multi-exception" but it >> felt too weird -- though I'm not married to that decision. > > > Hmmm. Probably what should happen is that all the other tasks > get cancelled and then the exception gets propagated to the > caller of par(). I think it ought to be at least an option to run them all to completion -- I can easily imagine use cases for that. Also for wanting to receive a list of exceptions. A practical par() may have to grow a few options... > If we assume another couple of primitives: > > scheduler.cancel(task) -- cancels the task > > scheduler.throw(task, exc) -- raises an exception in the task > > then we could implement it this way: > > def par(*tasks): > n = len(tasks) > results = [None] * n > this = scheduler.current_task > for i, task in enumerate(tasks): > def thunk(): > nonlocal n > try: > results[i] = yield from task > except BaseException as e: > for t in tasks: > scheduler.cancel(t) > scheduler.throw(this, e) > n -= 1 > scheduler.schedule(thunk) > while n > 0: > yield > return results I glazed over here but I trust you. >>>> (10) Registering additional callbacks > > > While we're at it: > > class task_with_callbacks(): > > def __init__(self, task): > self.task = task > self.callbacks = [] > > def add_callback(self, cb): > self.callbacks.append(cb) > > def run(self): > result = yield from self.task > for cb in self.callbacks: > cb() > return result Nice. (In fact so simple that maybe users can craft this for themselves?) >> Here's another pattern that I can't quite figure out. ... >> >> Essentially, it's a barrier pattern where multiple tasks (each >> representing a different HTTP request, and thus not all starting at >> the same time) render a partial web page and then block until a new >> HTTP request comes in that provides the missing info. > > > This should be fairly straightforward. > > waiters = [] # Tasks waiting for the event > > When a task wants to wait: > > scheduler.block(waiters) > > When the event occurs: > > for t in waiters: > scheduler.schedule(t) > del waiters[:] > > Incidentally, this is a commonly encountered pattern known as a > "condition queue" in IPC parlance. I envisage that the async > library would provide encapsulations of this and other standard > IPC mechanisms such as mutexes, semaphores, channels, etc. Maybe you meant condition variable? It looks like threading.Condition with notify_all(). Anyway, I agree we need some primitives like these, but I'm not sure how to choose the set of essentials. -- --Guido van Rossum (python.org/~guido) From python at mrabarnett.plus.com Mon Oct 15 02:35:51 2012 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 15 Oct 2012 01:35:51 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <507B37D9.6040504@mrabarnett.plus.com> <507B4ECF.8060406@mrabarnett.plus.com> Message-ID: <507B5A67.3020306@mrabarnett.plus.com> On 2012-10-15 01:12, Joshua Landau wrote: > On 15 October 2012 00:46, MRAB > wrote: > > OK, but what about raw string literals? Currently, "\\u0190" == > r"\u0190", but "\\u0190" != r"?". > > > The ?r"? prefix escapes all escapes, so will escape this escape too. > Hence, this behaviour is un...escaped ;). > If "\u0190" becomes "?", what happens to "\u000A"? Currently it's legal. :-) From shane at umbrellacode.com Mon Oct 15 03:50:56 2012 From: shane at umbrellacode.com (Shane Green) Date: Sun, 14 Oct 2012 18:50:56 -0700 Subject: [Python-ideas] The async API of the future: Twisted and Deferreds In-Reply-To: <507B5798.4080603@canterbury.ac.nz> References: <507B5798.4080603@canterbury.ac.nz> Message-ID: Okay, I hate to do this, but is there any chance someone can provide a quick summary of the solution we're referring to here? I just started watching python-ideas today, and have a lot of things going on, plus real bad ADD, so I'm having a hard time reassembling the solutions being referred to? (maybe there's a web presentation that gives a better threaded presentation than my mail program? Or maybe I'm daff. Either way, this sounded interesting!) In summary, then, the Q/A below is referring to which approach? Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 14, 2012, at 5:23 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> The thing that worries me most is reimplementing httplib, urllib and >> so on to use all this new machinery *and* keep the old synchronous >> APIs working *even* if some code is written using the old style and >> some other code wants to use the new style. > > I think this could be handled the same way you alluded to > before when talking about the App Engine. The base implementation > is asynchronous, and you provide a synchronous API that sets > up an async operation and then runs a nested event loop until > it completes. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From joshua.landau.ws at gmail.com Mon Oct 15 04:05:30 2012 From: joshua.landau.ws at gmail.com (Joshua Landau) Date: Mon, 15 Oct 2012 03:05:30 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <507B5A67.3020306@mrabarnett.plus.com> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <507B37D9.6040504@mrabarnett.plus.com> <507B4ECF.8060406@mrabarnett.plus.com> <507B5A67.3020306@mrabarnett.plus.com> Message-ID: On 15 October 2012 01:35, MRAB wrote: > On 2012-10-15 01:12, Joshua Landau wrote: > >> On 15 October 2012 00:46, MRAB > >> >> wrote: >> >> OK, but what about raw string literals? Currently, "\\u0190" == >> r"\u0190", but "\\u0190" != r"?". >> >> >> The ?r"? prefix escapes all escapes, so will escape this escape too. >> Hence, this behaviour is un...escaped ;). >> >> If "\u0190" becomes "?", what happens to "\u000A"? Currently it's > legal. :-) The python interpreter could distinguish between its morphed Unicode escapes and the originals - the escapes would never match against already-syntactically-relevant constructs*. Hence "a \u0069s b" is equivalent to "a i\u0073 b" but *not* "a is b": the first two are defined by __op_is__ and the last is just the "is" keyword. Hence, \u000A would just act like a character, and be definable as an operator, and have little to do with the newline character. Nice try, but the proposal stands firm. * Except, of course, the old operators which will be phased into the new mechanism. -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Mon Oct 15 04:07:11 2012 From: glyph at twistedmatrix.com (Glyph) Date: Sun, 14 Oct 2012 19:07:11 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> Message-ID: <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> On Oct 13, 2012, at 9:49 PM, Guido van Rossum wrote: > It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation. Hopefully I'll have time to reply to some of the other stuff in this message, but: Yes, absolutely. This is the most important core issue, for me. There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface. Again, I'll have to kick the appropriate people to try to get that started again. (Already started, at .) It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead. Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed). > (Thanks for writing this; this is the kind of insight I am hoping to get from you and others.) Thanks for the repeated prompts for Twisted representatives to participate. I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation? But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation. In my (ha ha) copious spare time, I'll try to do some blogging about these topics. -glyph [1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive. This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Mon Oct 15 04:11:23 2012 From: shane at umbrellacode.com (Shane Green) Date: Sun, 14 Oct 2012 19:11:23 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <507A4DA1.2070701@canterbury.ac.nz> <11A498C1-9ACE-4F64-8147-D3CE173EC279@umbrellacode.com> <507B4083.5050002@canterbury.ac.nz> Message-ID: <1CD39FC3-9996-4FF4-B56F-8B513226868B@umbrellacode.com> There are definitely bugs and system-dependent behaviour* in these areas. Just a couple years ago I ran into one that lead to me writing a "handle_expt()" method with this comment before it: > ##### > # Semi-crazy method that is working around a sort-of bug within > # asyncore. When using select-based I/O multiplexing, the POLLHUP > # the socket state is indicated by the socket becoming readable, > # and not by indicating an exceptional event. > # > # When using POLL instead, the flag returned indicates precisely > # what the state is because "flags & select.POLLHUP" will be true. > # > # In the former case, when using select-based I/O multiplexing, > # select's indication that the the descriptor has become readable > # leads to the channel's handle read event method being invoked. > # Invoking receive on the socket then returns an empty string, > # which is taken by the channel as an indication that the socket > # is no longer connected and the channel correctly shuts itself > # down. > # > # However, asyncore's current implementation of the poll-based > # I/O multiplex event handling invokes the channel's > # handle exceptional data event anytime "flags & POLLHUP" is true. > # While select-based multiplexing would only call this method when > # OOB or urgent data was detected, it can now be called for POLLHUP > # events too. > # > # Under most scenarios this is not problematic because poll-based > # multiplexing also indicates the descriptor is readable and > # so the handle read event is also called and therefore the > # channel is properly close, with only an extraneous invocation > # to handle exceptional event being a side-effect. Under certain > # situations, however, the socket is not indicated as being > # readable, only that it has had an exceptional data event. I > # believe this occurs when the attemtp to connect never succeeds, > # but a POLLHUP does. Previously this lead to a busy loop, which > # is what this method fixes. > ### Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 14, 2012, at 4:55 PM, Richard Oudkerk wrote: > On 14/10/2012 11:45pm, Greg Ewing wrote: >> It does indeed contradict me. It looks like this is >> implementation-dependent, because I distinctly remember >> encountering a bug once that I traced back to the fact >> that I wasn't servicing *all* the fds reported as ready >> before making another select call. > > Could it have been that some fds were being starved because the earlier ones in the lists were getting priority? Servicing all fds reported prevents such starvation problems. > > > -- > Richard > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Mon Oct 15 04:47:28 2012 From: shane at umbrellacode.com (Shane Green) Date: Sun, 14 Oct 2012 19:47:28 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> Message-ID: <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following: # Server accepting incoming connections and spawning new HTTP/S channels? this.socket.accept().then(this.handle_connection) # With a handle_connection() kind of like? def handle_connection(conn): # Create new channel and add to socket map, then? if (this.running()): this.accept().then(this.handle_connection) # And HTTP/S channels with code like this? this.read_until("\r\n\r\n").then(this.handle_request) # And handle-request code that did stuff like? if (this.chunked): get_content = this.read_until("\r\n").then(self.parse_chunk_size).then(this.read_bytes) else:. get_content = this.read_bytes(this.content_length) return get_content.then(handle_content) I'll look around for the code, because it's been well over a year and wasn't complete event then, but that should convey some of how it was shaping up. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 14, 2012, at 7:07 PM, Glyph wrote: > On Oct 13, 2012, at 9:49 PM, Guido van Rossum wrote: > >> It seems that peraps the 'data_received' interface is the most important one to standardize (for the event loop); I can imagine many variations on the handle_read() implementation, and there would be different ones for IOCP, SSL[1], and probably others. The stdlib should have good ones for the common platforms but it should be designed to allow people who know better to hook up their own implementation. > > > Hopefully I'll have time to reply to some of the other stuff in this message, but: > > Yes, absolutely. This is the most important core issue, for me. There's a little more to it than "data_received" (for example: listening for incoming connections, establishing outgoing connections, and scheduling timed calls) but this was the original motivation for the async PEP: to specify this interface. > > Again, I'll have to kick the appropriate people to try to get that started again. (Already started, at .) It's on github so anyone can contribute, so if other participants in this thread - especially those of you with connections to the Tornado community - would like to try fleshing some of it out, please go ahead. Even if you just have a question, or an area you think the PEP should address, file an issue (or comment on one already filed). > >> (Thanks for writing this; this is the kind of insight I am hoping to get from you and others.) > > Thanks for the repeated prompts for Twisted representatives to participate. > > I was somewhat reluctant to engage with this thread at first, because it looked like a lot of meandering discussion of how to implement stuff that Twisted already deals with quite effectively and I wasn't sure what the point of it all was - why not just go look at Twisted's implementation? But, after writing that message, I'm glad I did, since I realize that many of these insights are not recorded anywhere and in many cases there's no reasonable way to back this information out of Twisted's implementation. > > In my (ha ha) copious spare time, I'll try to do some blogging about these topics. > > -glyph > > [1]: With one minor nitpick: IOCP and SSL should not be mutually exclusive. This was a problem for Twisted for a while, given the absolutely stupid way that OpenSSL thinks "sockets" work; but, we now have which could probably be adapted to be framework-neutral if this transport/event-loop level were standardized. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Mon Oct 15 03:45:49 2012 From: ethan at stoneleaf.us (Ethan Furman) Date: Sun, 14 Oct 2012 18:45:49 -0700 Subject: [Python-ideas] PEP 428 - object-oriented filesystem paths In-Reply-To: <20121005202534.5f721292@pitrou.net> References: <20121005202534.5f721292@pitrou.net> Message-ID: <507B6ACD.7080408@stoneleaf.us> I would like to see some backwards compatibility here. ;) In other words, add method names where reasonable (such as .child or .children instead of or along with built-in iteration) so that this new Path beast can be backported to the 2.x line. I'm happy to take that task on if Antoine has better uses of his time. What this would allow is a nice shiny toy for the 2.x series, plus an easier migration to 3.x when the time comes. While I am very excited about the 3.x branch, and will use it whenever I can, some projects still have to be 2.x because of other dependencies. If the new Path doesn't have conflicting method or dunder names it would be possible to have a str-based 2.x version that otherwise acted remarkably like the non-str based 3.x version -- especially if the __strpath__ concept takes hold and Path objects can be passed around the os and os.path modules the way strings are now. ~Ethan~ From ben at bendarnell.com Mon Oct 15 05:20:33 2012 From: ben at bendarnell.com (Ben Darnell) Date: Sun, 14 Oct 2012 20:20:33 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 10:15 AM, Guido van Rossum wrote: >> While it's convenient to have higher-level constructors for various >> specialized types, I'd like to emphasize that having the low-level >> interface is important for interoperability. Tornado doesn't know >> whether the file descriptors are listening sockets, connected sockets, >> or pipes, so we'd just have to pass in a file descriptor with no other >> information. > > Yeah, the IO object will still need to have a fileno() method. They also need to be constructible given nothing but a fileno (but more on this later) > >>> - In systems like App Engine that don't support async I/O on file >>> descriptors at all, the constructors for creating I/O objects for disk >>> files and connection sockets would comply with the interface but fake >>> out almost everything (just like today, using httplib or httplib2 on >>> App Engine works by adapting them to a "urlfetch" RPC request). >> >> Why would you be allowed to make IO objects for sockets that don't >> work? I would expect that to just raise an exception. On app engine >> RPCs would be the only supported async I/O objects (and timers, if >> those are implemented as magic I/O objects), and they're not >> implemented in terms of sockets or files. > > Here's my use case. Suppose in general one can use async I/O for disk > files, and it is integrated with the standard (abstract) event loop. > So someone writes a handy templating library that wants to play nice > with async apps, so it uses the async I/O idiom to read e.g. the > template source code. Support I want to use that library on App > Engine. It would be a pain if I had to modify that template-reading > code to not use the async API. But (given the right async API!) it > would be pretty simple for the App Engine API to provide a mock > implementation of the async file reading API that was synchronous > under the hood. Yes, it would block while waiting for disk, but App > Engine uses threads anyway so it wouldn't be a problem. > > Another, current-day, use case is the httplib interface in the stdlib > (a fairly fancy HTTP/1.1 client, although it has its flaws). That's > based on sockets, which App Engine doesn't have; we have a "urlfetch" > RPC that you give a URL (and more optional stuff) and returns a record > containing the contents and headers. But again, many useful 3rd party > libraries use httplib, and they won't work unless we somehow support > httplib. So we have had to go out of our way to cover most uses of > httplib. While the app believes it is opening the connection and > sending the request, we are actually just buffering everything; and > when the app starts reading from the connection, we make the urlfetch > RPC and buffer the response, which we then feed back to the app as it > believes it is reading from the socket. As long as the app doesn't try > to get the socket's file descriptor and call select() it will work > fine. > > But some libraries *do* call select(), and here our emulation breaks > down. It would be nicer if the standard way to do async stuff was > higher level than select(), so that we could offer the emulation at a > level that would integrate with the event loop -- that way, ideally > when we have to send the urlfetch RPC we could actually return a > Future (or whatever), and the task would correctly be suspended, just > *thinking* it was waiting for the response on a socket, but actually > waiting for the RPC. Understood. > > Hopefully SSL provides another use case. In posix-land, SSL isn't that different from regular sockets (using ssl.wrap_socket from the 2.6+ stdlib). The connection process is a little more complicated, and it gets hairy if you want to support renegotiation, but once a connection is established you can select() on its file descriptor and generally use it just like a regular socket. On IOCP it's another story, though. I've finally gotten around to reading up on IOCP and see how it's so different from everything I'm used to (a lot of Twisted's design decisions at the reactor level make a lot more sense now). Earlier you had mentioned platform-specific constructors for IOObjects, but it actually needs to be event-loop-specific: On windows you can use select() or IOCP, and the IOObjects would be completely different for each of them (and I do think you need to support both - select() is kind of a second-class citizen on windows but is useful due to its ubiquity). This means that the event loop needs to be involved in the creation of these objects, which is why twisted has connectTCP, listenTCP, listenUDP, connectSSL, etc methods on the reactor interface. I think that in order to handle both IOCP and select-style event loops you'll need a very broad interface (roughly the union of twisted's IReactor{Core, Time, Thread, TCP, UDP, SSL} as a minimum, with IReactorFDSet and maybe IReactorSocket on posix for compatible with existing posixy practices). Basically, an event loop that supports IOCP (or hopes to support it in the future) will end up looking a lot like the bottom couple of layers of twisted (and assuming IOCP is a requirement I wouldn't want to stray too far from twisted's designs here). -Ben From pjdelport at gmail.com Mon Oct 15 05:36:59 2012 From: pjdelport at gmail.com (Piet Delport) Date: Mon, 15 Oct 2012 05:36:59 +0200 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks Message-ID: [This is a lengthy mail; I apologize in advance!] Hi, I've been following this discussion with great interest, and would like to put forward a suggestion that might simplify some of the questions that are up in the air. There are several key point being considered: what exactly constitutes a "coroutine" or "tasklet", what the precise semantics of "yield" and "yield from" should be, how the stdlib can support different event loops and reactors, and how exactly Futures, Deferreds, and other APIs fit into the whole picture. This mail is mostly about the first point: I think everyone agrees roughly what a coroutine-style generator is, but there's enough variation in how they are used, both historically and presently, that the concept isn't as precise as it should be. This makes them hard to think and reason about (failing the "BDFL gets headaches" test), and makes it harder to define the behavior of all the parts that they interact with, too. This is a sketch of an attempt to define what constitutes a generator-based task or coroutine more rigorously: I think that the essential behavior can be captured in a small protocol, building on the generator and iterator protocols. If anyone else thinks this is a good idea, maybe something like this could work its way into a PEP? (For the sake of this mail, I will use the term "generator task" or "task" as a straw man term, but feel free to substitute "coroutine", or whatever the preferred name ends up being.) Definition ========== Very informally: A "generator task" is what you get if you take a normal Python function and replace its blocking calls with "yield from" calls to equivalent subtasks. More formally, a "generator task" is a generator that implements an incremental, multi-step computation, and is intended to be externally driven to completion by a runner, or "scheduler", until it delivers a final result. This driving process happens as follows: 1. A generator task is iterated by its scheduler to yield a series of intermediate "step" values. 2. Each value yielded as a "step" represents a scheduling instruction, or primitive, to be interpreted by the task's scheduler. This scheduling instruction can be None ("just resume this task later"), or a variety of other primitives, such as Futures ("resume this task with the result of this Future"); see below for more. 3. The scheduler is responsible for interpreting each "step" instruction as appropriate, and sending the instruction's result, if any, back to the task using send() or throw(). A scheduler may run a single task to completion, or may multiplex execution between many tasks: generator tasks should assume that other tasks may have executed while the task was yielding. 4. The generator task completes by successfully returning (raising StopIteration), or by raising an exception. The task's caller receives this result. (For the sake of discussion, I use "the scheduler" to refer to whoever calls the generator task's next/send/throw methods, and "the task's caller" to refer to whoever receives the task's final result, but this is not important to the protocol: a task should not care who drives it or consumes its result, just like an iterator should not.) Scheduling instructions / primitives ==================================== (This could probably use a better name.) The protocol is intentionally agnostic about the implementation of schedulers, event loops, or reactors: as long as they implement the same set of scheduling primitives, code should work across them. There multiple ways to accomplish this, but one possibility is to have a set common, generic instructions in a standard library module such as "tasklib" (which could also contain things like default scheduler implementations, helper functions, and so on). A partial list of possible primitives (the names are all made up, not serious suggestions): 1. None: The most basic "do nothing" instruction. This just instructs the scheduler to resume the yielding task later. 2. Futures: Instruct the scheduler to resume with the future's result. Similar types in third-party libraries, such Deferreds, could potentially be implemented either natively by a scheduler that supports it, or using a wait_for_deferred(d) helper task, or using the idea of a "adapter" scheduler (see below). 3. Control primitives: spawn, sleep, etc. - Spawn a new (independent) task: yield tasklib.spawn(task()) - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar()) - Delay execution: yield tasklib.sleep(seconds) - etc. These could be simple marker objects, leaving it up to the underlying scheduler to actually recognize and implement them; some could also be implemented in terms of simpler operations (e.g. sleep(), in terms of lower-level suspend and resume operations). 4. I/O operations This could be anything from low-level "yield fd_readable(sock)" style requests, or any of the higher-level APIs being discussed elsewhere. Whatever the exact API ends up being, the scheduler should implement these primitives by waiting for the I/O (or condition), and resuming the task with the result, if any. 5. Cooperative concurrency primitives, for working with locks, condition variables, and so on. (If useful?) 6. Custom, scheduler-specific instructions: Since a generator task can potentially yield anything as a scheduler instruction, it's not inconceivable for specialized schedulers to support specialized instructions. (Code that relies on such special instructions won't work on other schedulers, but that would be the point.) A question open to debate is what a scheduler should do when faced with an unrecognized scheduling instruction. Raising TypeError or NotImplementedError back into the task is probably a reasonable action, and would allow code like: def task(): try: yield fancy_magic_instruction() except NotImplementedError: yield from boring_fallback() ... Generator tasks as schedulers, and vice versa ============================================= Note that there is a symmetry to the protocol when a generator task calls another using "yield from": def task() spam = yield from subtask() Here, task() is both a generator task, and the effective scheduler for subtask(): it "implements" subtask()'s scheduling instructions by delegating them to its own scheduler. This is a plain observation on its own, however, it raises one or two interesting possibilities for more interesting schedulers implemented as generator tasks themselves, including: - Specialized sub-schedulers that run as a normal task within their parent scheduler, but implement for example weighted or priority queuing of their subtasks, or similar features. - "Adapter" schedulers that intercept special scheduler instructions (say, Deferreds or other library-specific objects), and implement them using more generic instructions to the underlying scheduler. -- Piet Delport -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 15 07:10:56 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 18:10:56 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> Message-ID: <507B9AE0.9030707@canterbury.ac.nz> Guido van Rossum wrote: > I think it's too early to start proposing new syntax for a problem we > don't even know is common at all. > > Greg Ewing's proposal works for me: > > r = yield from par(f1(), f2()) Also, whoever's proposing this needs to understand that even if the suggested change to yield-from were made, it would NOT automatically result in par() behaviour. It would just be another way of sequentially calling two sub-generators. -- Greg From stephen at xemacs.org Mon Oct 15 07:02:20 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 15 Oct 2012 14:02:20 +0900 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> Message-ID: <87mwzo72g3.fsf@uwakimon.sk.tsukuba.ac.jp> Ram Rachum writes: > On Fri, Oct 12, 2012 at 10:40 PM, Blake Hyde wrote: > > > Is anything gained from this addition? > > > To give a practical answer, I could say that for newbies it's one small > confusion that could removed from the language. Get Microsoft to agree and implement it in Excel and you might have a point. But as long as Excel uses * for multiplication, I don't think anybody who uses computers is going to have trouble learning this. Anyway, Python believes in TOOWTDI ("the one old way to do it").[1] Footnotes: [1] With apologies to Tim Peters. From greg.ewing at canterbury.ac.nz Mon Oct 15 07:34:45 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 18:34:45 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507B26C6.10602@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> Message-ID: <507BA075.4030508@canterbury.ac.nz> Christian Tismer wrote: > My approach would be to first find out how async operations should > be modelled the best under the assumption that we have a coroutine > concept that works without headaches about yielding in and out from > something to whatnot. I think we already know that. People like Dijkstra and Hoare figured it all out decades ago. That's what my generator-oriented approach is based on -- using standard techniques for managing concurrency. > After that is settled and gets consensus, then I would think about > bearable patterns to implement that using generators. And when we > know what we really need, maybe considering more suitable Syntax. Given that we don't want to use OS threads or greenlets, but we're happy to use generators, all that's left is to find bearable patterns for doing so. -- Greg From greg.ewing at canterbury.ac.nz Mon Oct 15 07:58:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 18:58:35 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: <507BA60B.2030806@canterbury.ac.nz> Guido van Rossum wrote: > Why wouldn't all generators that aren't blocked for I/O just run until > their next yield, in a round-robin fashion? That's fair enough for me. > > But as I said, my intuition for how things work in Greg's world is not > very good. That's exactly how my scheduler behaves. > OTOH I am okay with only getting one of the exceptions. But I think > all of the remaining tasks should still be run to completion -- maybe > the caller just cared about their side effects. Or maybe this should > be an option to par(). This is hard to answer without considering real use cases, but my feeling is that if I care enough about the results of the subtasks to wait until they've all completed before continuing, then if anything goes wrong in any of them, I might as well abandon the whole computation. If that's not the case, I'd be happy to wrap each one in a try-except that doesn't propagate the exception to the main task, but just records the information that the subtask failed somewhere, for the main task to check afterwards. Another direction to approach this is to consider that par() ought to be just an optimisation -- the result should be the same as if you'd written sequential code to perform the subtasks one after another. And in that case, an exception in one would prevent any of the following ones from executing, so it's fine if par() behaves like that, too. -- Greg From greg.ewing at canterbury.ac.nz Mon Oct 15 08:04:16 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 19:04:16 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> Message-ID: <507BA760.3050708@canterbury.ac.nz> Ben Darnell wrote: The problem is that as soon as task1 blocks on IO, the entire > current task (which includes outer(), par(), and both children) gets > unscheduled. no part of task2 gets scheduled until it gets yielded > from, because the scheduler can't see it until then. The suggested implementation of par() that I posted does explicitly schedule the subtasks. Then it repeatedly yields, giving them a chance to run, until they all complete. -- Greg From pjdelport at gmail.com Mon Oct 15 08:59:12 2012 From: pjdelport at gmail.com (Piet Delport) Date: Mon, 15 Oct 2012 08:59:12 +0200 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507B4F9D.6040701@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 1:49 AM, Greg Ewing wrote: [...] > > No, it can't be as simple as that, because that will just > execute the tasks sequentially. It would have to be something > like this: > > def par(*tasks): > n = len(tasks) > results = [None] * n > for i, task in enumerate(tasks): > def thunk(): > nonlocal n > results[i] = yield from task > n -= 1 > scheduler.schedule(thunk) > while n > 0: > yield > return results > > Not exactly straightforward, but that's why we write it once > and put it in the library. :-) There are two problems with this code. :) The first is a scoping gotcha: every copy of thunk() will attempt run the same task, and assign it to the same index, due to them sharing the "i" and "task" variables. (The closure captures a reference to the outer variable cells, rather than a copy of their values at the time of thunk's definition.) This could be fixed by defining it as "def thunk(i=i, task=task)", to capture copies. The second problem is more serious: the final while loop busy-waits, which will consume all spare CPU time waiting for the underlying tasks to complete. For this to be practical, it must suspend and resume itself more efficiently. Here's my own attempt. I'll assume the following primitive scheduler instructions (see my "generator task protocol" mail for context), but it should be readily adaptable to other primitives: 1. yield tasklib.spawn(task()) instructs the scheduler to spawn a new, independent task. 2. yield tasklib.suspend() suspends the current task. 3. yield tasklib.get_resume() obtains a callable / primitive that can be used to resume the current task later. I'll also expand it to keep track of success and failure by returning a list of (flag, result) tuples, in the style of DeferredList[1]. Code: def par(*tasks): resume = yield tasklib.get_resume() # Prepare to hold the results, and keep track of progress. results = [None] * len(tasks) finished = 0 # Gather the i'th task's result def gather(i, task): nonlocal finished try: r = yield from task except Exception as e: results[i] = (False, e) else: results[i] = (True, r) finished += 1 # If we're the last to complete, resume par() if finished == len(tasks): yield resume() # Spawn subtasks, and wait for completion. for (i, task) in tasks: yield tasklib.spawn(gather(i, task)) yield tasklib.suspend() return results Hopefully, this is easy enough to read: it should be obvious to see how to modify gather() to add support for resuming immediately on the first result or first error. [1] http://twistedmatrix.com/documents/12.1.0/api/twisted.internet.defer.DeferredList.html -- Piet Delport From greg.ewing at canterbury.ac.nz Mon Oct 15 09:37:55 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 20:37:55 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> Message-ID: <507BBD53.5000602@canterbury.ac.nz> Guido van Rossum wrote: > But, as Christian Tismer wrote, we need to have some kind of idea of > what the primitives are that we want to support. Well, I was just responding to your asking what the yield-from equivalent would be to the corresponding thing using Futures. I assumed from the fact that you asked that it was something Futures-using people like to do a lot, so it would be worth putting into a library. There may be other ways to approach it, though. Suppose we had a primitive that just waits for a single task to finish and returns its value. Then we could do this: def par(*tasks): for task in tasks: scheduler.schedule(task) return [yield from scheduler.wait_for(task) for task in tasks] That's straightforward enough that maybe it doesn't even need to be a library function, just a well-known pattern. > Maybe you meant condition variable? It looks like threading.Condition > with notify_all(). Something like that -- the terminology probably varies a bit from one library to another. The basic concept is "set of tasks waiting for some condition to become true". > Anyway, I agree we need some primitives like these, but I'm not sure > how to choose the set of essentials. I think that most, maybe all, of the standard synchronisation mechanisms, like mutexes and semaphores, can be built out of the primitives I've already introduced -- essentially just block() and yield. So anything of this kind that we provide will be more in the way of convenience features than essential primitives. -- Greg From glyph at twistedmatrix.com Mon Oct 15 09:45:14 2012 From: glyph at twistedmatrix.com (Glyph) Date: Mon, 15 Oct 2012 00:45:14 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> Message-ID: <537E074C-49B5-47A2-978F-D0592862B74E@twistedmatrix.com> On Oct 14, 2012, at 7:47 PM, Shane Green wrote: > Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following: > this.socket.accept().then(this.handle_connection) > > # With a handle_connection() kind of like? > def handle_connection(conn): > # Create new channel and add to socket map, then? > if (this.running()): > this.accept().then(this.handle_connection) As I explained in a previous message, I think this is the wrong way to go, because: It's error-prone. It's very easy to forget to call this.accept().then(...). What if you get an exception? How do you associate it with 'this'? (Why do you have to constantly have application code check 'this.running'?) It's inefficient. You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'. It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts? As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :). We implemented something very like this in Twisted in a module called called "twisted.web2.stream" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use. Please don't repeat this mistake in the standard library. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Mon Oct 15 10:03:33 2012 From: shane at umbrellacode.com (Shane Green) Date: Mon, 15 Oct 2012 01:03:33 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <537E074C-49B5-47A2-978F-D0592862B74E@twistedmatrix.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> <537E074C-49B5-47A2-978F-D0592862B74E@twistedmatrix.com> Message-ID: <09394CD5-9950-49CF-A25E-C906B70F3BC9@umbrellacode.com> Your points regarding performance are good ones. My tests indicated it was slightly slower than asyncore. The API I based it on is actually quite thorough, and addresses many of the shortcomings Deferreds (in Twisted) have. Namely, all callbacks registered with a given Promise instance, receive the output of the original operation; chaining is fully supported but explicitly (this.then(that).then(that)?), rather than having a Deferred whose value automatically assumes that of each callback, making them necessarily dependent handlers fired before them, with a default guaranteed behaviour being that only the first one actually receives the output of the originating application. I haven't come across many instances where one wants to chain their callback by accident, but many examples where multiple parties were interested in the same operation's output. Finally, I'm not sure you're other points differ greatly from the gotchas of I/O programming in general. Uncoordinated access by multiple threads tends to be problematic. Again, though, you're point about efficiency and the less than ideal "an instance for every" arrangement are good ones. Just throwing it out there as a source of ideas, and hopefully to unseat Deferreds as the defacto callback standard four discussion because the promise pattern is more flexible and robust. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 15, 2012, at 12:45 AM, Glyph wrote: > > On Oct 14, 2012, at 7:47 PM, Shane Green wrote: > >> Hm, just jumping in out of turn (async ;-) here, but I prototyped pretty clean versions of asyncore.dispatcher and asynchat.async_chat type classes built on top of a promise-based asynchronous I/O socket-monitor. Code ended up looking something like this following: > >> this.socket.accept().then(this.handle_connection) >> >> # With a handle_connection() kind of like? >> def handle_connection(conn): >> # Create new channel and add to socket map, then? >> if (this.running()): >> this.accept().then(this.handle_connection) > > As I explained in a previous message, I think this is the wrong way to go, because: > > It's error-prone. It's very easy to forget to call this.accept().then(...). What if you get an exception? How do you associate it with 'this'? (Why do you have to constantly have application code check 'this.running'?) > It's inefficient. You have to allocate a promise for every single operation. (Not a big deal for 'accept()' but kind of a big deal for 'recv()'. > It's hard to share resources. What if multiple layers try to call .accept() or .read_until() from different promise contexts? > As a bonus fourth point, this uses some wacky new promise abstraction which isn't Deferreds, and therefore (most likely) forgets to implement some part of the callback-flow abstraction that you really need for layered, composable systems :). > > We implemented something very like this in Twisted in a module called called "twisted.web2.stream" and it was a big problem and had very poor performance and I just this week fixed yet another instance of the 'oops I forgot to call .read() again in my exception handler' bug in a system where it's still in use. Please don't repeat this mistake in the standard library. > > -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 15 10:18:13 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Oct 2012 18:18:13 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 10:35 AM, Guido van Rossum wrote: > But, as Christian Tismer wrote, we need to have some kind of idea of > what the primitives are that we want to support. Or should we just > have async equivalents for everything in threading.py and queue.py? > (What about thread-local? Do we need task-local? Shudder.) Task locals aren't so scary, since they're already the main reason why generators are so handy - task locals are just the frame locals in the generator :) The main primitive I personally want out of an async API is a task-based equivalent to concurrent.futures.as_completed() [1]. This is what I meant about iteration being a bit of a mess: the way the as_completed() works, the suspend/resume channel of the iterator protocol is being used to pass completed future objects back to the calling iterator. That means that channel *can't* be used to talk between the coroutine and the scheduler, so if you decide you need to free it up for that purpose, you're either forced to wait for *all* the futures to be triggered before any of them can be passed to the caller (allowing you to use yield-from and return a container of completed futures) or else you're forced to switch to callback-style programming (this is where Ruby's blocks are a huge advantage - because their for loops essentially *are* callbacks, you have a lot more flexibility in calling back to different places from a single piece of code). However, I can see one why to make it work which is to require the *invoking* code to continue to manage the communication with the scheduler. Using this concept, there would be an "as_completed_async()" primitive that works something like: for get_next_result in as_completed_task(tasks): task, result = yield get_next_result # Process this result, wait for next one The async equivalent of the concurrent.futures example would then look something like: URLS = ['http://www.foxnews.com/', 'http://www.cnn.com/', 'http://europe.wsj.com/', 'http://www.bbc.co.uk/', 'http://some-made-up-domain.com/'] @task def load_url_async(url, timeout): with (yield urlopen_async(url, timeout=timeout)) as handle: return url, handle.read() tasks = (load_url_async(url, 60) for url in URLS) with concurrent.futures.as_completed_async(tasks) as async_results for get_next_result in async_results: try: url, data = yield get_next_result except Exception as exc: print('{!r} generated an exception: {}'.format(url, exc)) else: print('{!r} page is {:d} bytes'.format(url, len(data))) Key parts of this idea: 1. as_completed_async registers the supplied tasks with the main scheduler so they can all start running in parallel 2. as_completed_async is a context manager that will *cancel* all pending jobs on exit 3. as_completed_async is an iterator that produces a special future that fires whenever *any* of the registered tasks has run to completion 4. because there's a separate yield step for each result retrieval, ordinary exception handling mechanisms can be used rather than needing to introspect a future object Cheers, Nick. [1] http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Mon Oct 15 10:24:53 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 15 Oct 2012 10:24:53 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507BA075.4030508@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> Message-ID: <507BC855.4070802@stackless.com> On 15.10.12 07:34, Greg Ewing wrote: > Christian Tismer wrote: > >> My approach would be to first find out how async operations should >> be modelled the best under the assumption that we have a coroutine >> concept that works without headaches about yielding in and out from >> something to whatnot. > > I think we already know that. People like Dijkstra and Hoare > figured it all out decades ago. > > That's what my generator-oriented approach is based on -- > using standard techniques for managing concurrency. > Sure, the theory is clear and well-known. Not so clear is which of the concepts to implement to what detail, and things like the C10K problem still are a challenge to solve efficiently for Python. http://www.kegel.com/c10k.html I think it is necessary to take these considerations into account at least and to think about updating large sets of waiting channels efficiently, using appropriate data structures. >> After that is settled and gets consensus, then I would think about >> bearable patterns to implement that using generators. And when we >> know what we really need, maybe considering more suitable Syntax. > > Given that we don't want to use OS threads or greenlets, > but we're happy to use generators, all that's left is to > find bearable patterns for doing so. Question: Is it already given that something like greenlets is out of consideration? I did not find a final say on that (but I'm bad at searching...) Is the whole discussion about what would be best, or just "you can choose any implementation provided it's generators" ? :-) cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ncoghlan at gmail.com Mon Oct 15 10:33:44 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Oct 2012 18:33:44 +1000 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 2:54 AM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman wrote: >> Why is subclassing a problem? It can be overused, but seems the right >> thing to do in this case. You want a protocol that responds to new data by >> echoing and tells the user when the connection was terminated? It makes >> sense that this is a subclass: a special case of some class that handles the >> base behavior. > > I replied to this in detail on the "Twisted and Deferreds" thread in > an exchange. Summary: I'm -0 when it comes to subclassing protocol > classes; -1 on subclassing objects that implement significant > functionality. This problem does seem tailor-made for a Protocol ABC - you can inherit from it if you want, or call register() if you don't. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Mon Oct 15 11:17:17 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 15 Oct 2012 22:17:17 +1300 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks In-Reply-To: References: Message-ID: <507BD49D.3020704@canterbury.ac.nz> Piet Delport wrote: > 2. Each value yielded as a "step" represents a scheduling instruction, > or primitive, to be interpreted by the task's scheduler. I don't think this technique should be used to communicate with the scheduler, other than *maybe* for a *very* small set of operations that are truly primitive -- and even then I'm not convinced. To begin with, there are some operations that *can't* rely on yielded instructions as the only way of invoking them. Spawning a task, for example -- there must be some way for non-task code to invoke that, otherwise you wouldn't be able to get top-level tasks into the system. Also, consider the operation of unblocking a task that's waiting for some event to occur. Often you will want to invoke this using a callback from an event loop, which is not a generator and can't yield anything to anywhere. Given that these operations must provide a way of invoking them using a plain function call, there is little reason to provide a second way using a yielded instruction. In any case, I believe that the public interface for *any* scheduler operation should not be a yielded instruction, but either a plain function or something called using yield-from, for reasons I explained to Guido earlier. > - Specialized sub-schedulers that run as a normal task within their > parent scheduler, but implement for example weighted or priority > queuing of their subtasks, or similar features. There are problems with allowing multiple schedulers to coexist within the one system, especially if yielded instructions are the only way to communicate with them. It might work for instructions to a task's own scheduler concerning itself, but some operations need to operate on a *different* task, e.g. unblocking a task when the event it was waiting for occurs. How do you know which scheduler is managing it? And even if you can find out, if you have to control it using yielded instructions, you have no way of yielding something to a different task's scheduler. -- Greg From _ at lvh.cc Mon Oct 15 11:24:44 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Mon, 15 Oct 2012 11:24:44 +0200 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <7E7218BC-A367-4C2B-9B5D-C51BBA2FBEB3@stranden.com> <55EF4C53-D922-42FC-8863-EED452BF4093@stranden.com> Message-ID: On Mon, Oct 15, 2012 at 1:26 AM, Guido van Rossum wrote: > I don't think you can hide threads or concurrency. You can offer > different APIs to work with them that have different advantages and > disadvantages, but I don't think you can *hide* them any more than you > can hide language constructs like classes or sequences. > +1. Nice APIs to put padding on the sharp edges, yes. Hiding them? IMHO, usually a mistake. lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Mon Oct 15 11:29:37 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Mon, 15 Oct 2012 11:29:37 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507BC855.4070802@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> Message-ID: On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer wrote: > Question: Is it already given that something like greenlets is out > of consideration? I did not find a final say on that (but I'm bad at > searching...) > I think an number of people have expressed a distaste for implicit task switching. That doesn't mean "no", but I'm guessing what's going to happen is having some kind of explicit, generator based thing, with an underlying API that makes implementing greenlets pretty easy. > Is the whole discussion about what would be best, or just > "you can choose any implementation provided it's generators" ? :-) > > cheers - Chris > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Mon Oct 15 12:25:16 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 06:25:16 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507BBD53.5000602@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 3:37 AM, Greg Ewing wrote: > Guido van Rossum wrote: >> >> But, as Christian Tismer wrote, we need to have some kind of idea of >> what the primitives are that we want to support. > > > Well, I was just responding to your asking what the yield-from > equivalent would be to the corresponding thing using Futures. > I assumed from the fact that you asked that it was something > Futures-using people like to do a lot, so it would be worth > putting into a library. > > There may be other ways to approach it, though. Suppose we > had a primitive that just waits for a single task to finish > and returns its value. Then we could do this: > > def par(*tasks): > for task in tasks: > scheduler.schedule(task) > return [yield from scheduler.wait_for(task) for task in tasks] > > That's straightforward enough that maybe it doesn't even need > to be a library function, just a well-known pattern. The more I follow this thread the less I understand the point of introducing a new use for yield-from in this discussion. All of this extra work trying to figure how to make yield-from work giving its existing 3.3 semantics could just be avoided if we just allow yielding the tasks directly, and treating them like any other async operation. In the original message yield-from seemed to be suggested, there was no justification, it was just said "so you have to do this" but I don't see that you do. If you allow yielding tasks, then yielding multiple tasks to wait together because trivial: just yield a tuple of them. In fact, I think we should say that yielding any tuple of async operations, whatever those objects actually end of being, should wait for all of them. Maybe we also want to wait on both some http request operation, implemented as a task (a generator), and also a cache hit. def handle_or_cached(request): api_resp, cache_resp = yield request(API_ENDPOINT), cache.get(KEY) if cache_resp: return cache_resp return render(api_resp) Or we could provide wrappers to control the behavior of multiple-wait: def handle_or_cached(request): api_resp, cache_resp = yield first(request(API_ENDPOINT), cache.get(KEY)) if cache_resp: return cache_resp return render(api_resp) >> Maybe you meant condition variable? It looks like threading.Condition >> with notify_all(). > > > Something like that -- the terminology probably varies a bit > from one library to another. The basic concept is "set of > tasks waiting for some condition to become true". > > >> Anyway, I agree we need some primitives like these, but I'm not sure >> how to choose the set of essentials. > > > I think that most, maybe all, of the standard synchronisation > mechanisms, like mutexes and semaphores, can be built out of the > primitives I've already introduced -- essentially just block() > and yield. So anything of this kind that we provide will be more > in the way of convenience features than essential primitives. > > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From tismer at stackless.com Mon Oct 15 12:38:21 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 15 Oct 2012 12:38:21 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> Message-ID: <507BE79D.1090100@stackless.com> On 15.10.12 11:29, Laurens Van Houtven wrote: > On Mon, Oct 15, 2012 at 10:24 AM, Christian Tismer > > wrote: > > Question: Is it already given that something like greenlets is out > of consideration? I did not find a final say on that (but I'm bad at > searching...) > > > I think an number of people have expressed a distaste for implicit > task switching. That doesn't mean "no", but I'm guessing what's going > to happen is having some kind of explicit, generator based thing, with > an underlying API that makes implementing greenlets pretty easy. > > Is the whole discussion about what would be best, or just > "you can choose any implementation provided it's generators" ? :-) > Thanks for your reply. Just one thing that I don't get. What do you mean by 'implicit taskswitching' ? There is no such thing in greenlet, if you really meant that Library from Armin Rigo. greenlets do everything explicitly, no pre-emption at all. So, is there a general understanding what a greenlet is and what not? Just to make sure that the discussed terms are clearly defined. cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Mon Oct 15 12:48:20 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 06:48:20 -0400 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 11:36 PM, Piet Delport wrote: > [This is a lengthy mail; I apologize in advance!] This is what I get for deciding to check up on these threads at 6AM after a late night. > Hi, > > I've been following this discussion with great interest, and would like > to put forward a suggestion that might simplify some of the questions > that are up in the air. > > There are several key point being considered: what exactly constitutes a > "coroutine" or "tasklet", what the precise semantics of "yield" and > "yield from" should be, how the stdlib can support different event loops > and reactors, and how exactly Futures, Deferreds, and other APIs fit > into the whole picture. > > This mail is mostly about the first point: I think everyone agrees > roughly what a coroutine-style generator is, but there's enough > variation in how they are used, both historically and presently, that > the concept isn't as precise as it should be. This makes them hard to > think and reason about (failing the "BDFL gets headaches" test), and > makes it harder to define the behavior of all the parts that they > interact with, too. > > This is a sketch of an attempt to define what constitutes a > generator-based task or coroutine more rigorously: I think that the > essential behavior can be captured in a small protocol, building on the > generator and iterator protocols. If anyone else thinks this is a good > idea, maybe something like this could work its way into a PEP? > > (For the sake of this mail, I will use the term "generator task" or > "task" as a straw man term, but feel free to substitute "coroutine", or > whatever the preferred name ends up being.) I like that "task" is more general and avoids complaints from some that these are not "real" coroutines. > Definition > ========== > > Very informally: A "generator task" is what you get if you take a normal > Python function and replace its blocking calls with "yield from" calls > to equivalent subtasks. "yield" and "yield from", although I'm really disliking the second being included at all. More on this later. > More formally, a "generator task" is a generator that implements an > incremental, multi-step computation, and is intended to be externally > driven to completion by a runner, or "scheduler", until it delivers a > final result. > > This driving process happens as follows: > > 1. A generator task is iterated by its scheduler to yield a series of > intermediate "step" values. > > 2. Each value yielded as a "step" represents a scheduling instruction, > or primitive, to be interpreted by the task's scheduler. > > This scheduling instruction can be None ("just resume this task > later"), or a variety of other primitives, such as Futures ("resume > this task with the result of this Future"); see below for more. > > 3. The scheduler is responsible for interpreting each "step" instruction > as appropriate, and sending the instruction's result, if any, back to > the task using send() or throw(). > > A scheduler may run a single task to completion, or may multiplex > execution between many tasks: generator tasks should assume that > other tasks may have executed while the task was yielding. > > 4. The generator task completes by successfully returning (raising > StopIteration), or by raising an exception. The task's caller > receives this result. > > (For the sake of discussion, I use "the scheduler" to refer to whoever > calls the generator task's next/send/throw methods, and "the task's > caller" to refer to whoever receives the task's final result, but this > is not important to the protocol: a task should not care who drives it > or consumes its result, just like an iterator should not.) > > > Scheduling instructions / primitives > ==================================== > > (This could probably use a better name.) > > The protocol is intentionally agnostic about the implementation of > schedulers, event loops, or reactors: as long as they implement the same > set of scheduling primitives, code should work across them. > > There multiple ways to accomplish this, but one possibility is to have a > set common, generic instructions in a standard library module such as > "tasklib" (which could also contain things like default scheduler > implementations, helper functions, and so on). > > A partial list of possible primitives (the names are all made up, not > serious suggestions): > > 1. None: The most basic "do nothing" instruction. This just instructs > the scheduler to resume the yielding task later. > > 2. Futures: Instruct the scheduler to resume with the future's result. > > Similar types in third-party libraries, such Deferreds, could > potentially be implemented either natively by a scheduler that > supports it, or using a wait_for_deferred(d) helper task, or using > the idea of a "adapter" scheduler (see below). > > 3. Control primitives: spawn, sleep, etc. > > - Spawn a new (independent) task: yield tasklib.spawn(task()) > - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar()) > - Delay execution: yield tasklib.sleep(seconds) > - etc. > > These could be simple marker objects, leaving it up to the underlying > scheduler to actually recognize and implement them; some could also > be implemented in terms of simpler operations (e.g. sleep(), in > terms of lower-level suspend and resume operations). What is the difference between the tossed around "yield from task()" and this "yield tasklib.spawn(task())" And, why isn't it simply spelled "yield task()"? You have all these different types that can be yielded to the scheduler from tasks to the scheduler. Why isn't a task one of those possible types? If the scheduler gets an iterator, it should schedule it automatically. > 4. I/O operations > > This could be anything from low-level "yield fd_readable(sock)" style > requests, or any of the higher-level APIs being discussed elsewhere. > > Whatever the exact API ends up being, the scheduler should implement > these primitives by waiting for the I/O (or condition), and resuming > the task with the result, if any. > > 5. Cooperative concurrency primitives, for working with locks, condition > variables, and so on. (If useful?) I am sure these will come about, but I think that is considered a library that sits on top of whatever API comes out, not part of it. > 6. Custom, scheduler-specific instructions: Since a generator task can > potentially yield anything as a scheduler instruction, it's not > inconceivable for specialized schedulers to support specialized > instructions. (Code that relies on such special instructions won't > work on other schedulers, but that would be the point.) > > A question open to debate is what a scheduler should do when faced with > an unrecognized scheduling instruction. > > Raising TypeError or NotImplementedError back into the task is probably > a reasonable action, and would allow code like: > > def task(): > try: > yield fancy_magic_instruction() > except NotImplementedError: > yield from boring_fallback() > ... Interesting. Can anyone think of an example of this? > > Generator tasks as schedulers, and vice versa > ============================================= > > Note that there is a symmetry to the protocol when a generator task > calls another using "yield from": > > def task() > spam = yield from subtask() > > Here, task() is both a generator task, and the effective scheduler for > subtask(): it "implements" subtask()'s scheduling instructions by > delegating them to its own scheduler. As raised above, why not simply "yield subtask()"? > This is a plain observation on its own, however, it raises one or two > interesting possibilities for more interesting schedulers implemented as > generator tasks themselves, including: > > - Specialized sub-schedulers that run as a normal task within their > parent scheduler, but implement for example weighted or priority > queuing of their subtasks, or similar features. I think that is too messy, you could have so many different scheduler semantics. Maybe this sort of thing is what your schedule-specific instructions should be for. Or, attributes on tasks that schedulers can be known to look for. > - "Adapter" schedulers that intercept special scheduler instructions > (say, Deferreds or other library-specific objects), and implement them > using more generic instructions to the underlying scheduler. I think we can make yielding tasks a direct operation, and still implment sub-schedulers. They should be more opaque, I think. > -- > Piet Delport > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From techtonik at gmail.com Mon Oct 15 13:13:56 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 15 Oct 2012 14:13:56 +0300 Subject: [Python-ideas] Python as a tool to download stuff for bootstrapping In-Reply-To: References: Message-ID: On Fri, Jul 6, 2012 at 10:30 PM, Georg Brandl wrote: > On 05.07.2012 22:24, Amaury Forgeot d'Arc wrote: >> >> 2012/7/5 anatoly techtonik : >>> >>> This makes me kind of sad. You have Python installed. Why can't you >>> just crossplatformly do: >>> >>> mkdir nacl >>> cd nacl >>> python -m urllib get >>> >>> http://commondatastorage.googleapis.com/nativeclient-mirror/nacl/nacl_sdk/update_sdk.py >>> python update_sdk.py >> >> >> I'm sure there is already a way with standard python tools. Something >> along these lines: >> >> python -c "from urllib.request import urlretrieve; urlretrieve('URL', >> 'update_sdk.zip')" >> python -m update_sdk.zip >> >> The second command will work if the zip file has a __main__.py. >> Do you think we need other tools? > > > The "python -m urllib" (don't think "get" is required) interface certainly > looks nice and is similar in style with many of the other __main__ stuff we > add to stdlib modules. Here is the implementation of urllib.__main__ module for Python 3 with progress bar. I've left 'get' argument to make it extensible in future with other commands, such as `test`. While working on this code I've also found the regression which would be nice to see fixed at the same time. http://bugs.python.org/issue10836 -------------- next part -------------- A non-text attachment was scrubbed... Name: __main__.py Type: application/octet-stream Size: 4783 bytes Desc: not available URL: From Ronny.Pfannschmidt at gmx.de Mon Oct 15 13:39:16 2012 From: Ronny.Pfannschmidt at gmx.de (Ronny Pfannschmidt) Date: Mon, 15 Oct 2012 13:39:16 +0200 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks In-Reply-To: References: Message-ID: <507BF5E4.6060901@gmx.de> Hi Piet, i like that finally someone is pointing out how to deal with the *concurrent* part i have some further notes * greenlet interaction wanted since interacting with greenlets is slightly different from generators * they don?t get the function arguments at greenlet creation time, but on the first `switch` generator outer use: gn = f(*arg, **kwarg) gn.next() greenlet outer use: gr = greenlet.greenlet(f) gr.switch(*args, **kw) * instead of send/next, they always use switch * `yield` is a function call -> there is need for a lib to manage the local part of greenlet operations in any case (so we should just ensure that the scheduler can handle their way if `yield`, but not actually have support/compat code in the stdlib for their yielding) * considering regular classes for interaction since for some protocol implementations different means might make sense (this could also be used for the scheduler part of greenlet interaction) result -> a protocol for cooperative concurrency * considering the upcoming pypy transaction module/stm since using that right could mean "free" parallelism in future * alternatives for queues/channels are needed * pools/rate-limiters and other exercises are needed as well * some kind of default tools for servers are needed * the stdlib could have a very simple default scheduler that?s just doing something basic like run all work it can do, and if it cant block on a io reactor we just need something that can run() after all has been created having an api like sheduler.add(gen) would be a plus (since it would be just like pypy's transaction module) an example i have in mind is something like sheduler.add(...) sheduler.add(...) sheduler.run() If things go as I planned on my side, starting in jan/feb 2013 i'll try a prototype implementation for further comments/actual experimentation. -- Ronny On 10/15/2012 05:36 AM, Piet Delport wrote: > [This is a lengthy mail; I apologize in advance!] > > Hi, > > I've been following this discussion with great interest, and would like > to put forward a suggestion that might simplify some of the questions > that are up in the air. > > There are several key point being considered: what exactly constitutes a > "coroutine" or "tasklet", what the precise semantics of "yield" and > "yield from" should be, how the stdlib can support different event loops > and reactors, and how exactly Futures, Deferreds, and other APIs fit > into the whole picture. > > This mail is mostly about the first point: I think everyone agrees > roughly what a coroutine-style generator is, but there's enough > variation in how they are used, both historically and presently, that > the concept isn't as precise as it should be. This makes them hard to > think and reason about (failing the "BDFL gets headaches" test), and > makes it harder to define the behavior of all the parts that they > interact with, too. > > This is a sketch of an attempt to define what constitutes a > generator-based task or coroutine more rigorously: I think that the > essential behavior can be captured in a small protocol, building on the > generator and iterator protocols. If anyone else thinks this is a good > idea, maybe something like this could work its way into a PEP? > > (For the sake of this mail, I will use the term "generator task" or > "task" as a straw man term, but feel free to substitute "coroutine", or > whatever the preferred name ends up being.) > > > Definition > ========== > > Very informally: A "generator task" is what you get if you take a normal > Python function and replace its blocking calls with "yield from" calls > to equivalent subtasks. > > More formally, a "generator task" is a generator that implements an > incremental, multi-step computation, and is intended to be externally > driven to completion by a runner, or "scheduler", until it delivers a > final result. > > This driving process happens as follows: > > 1. A generator task is iterated by its scheduler to yield a series of > intermediate "step" values. > > 2. Each value yielded as a "step" represents a scheduling instruction, > or primitive, to be interpreted by the task's scheduler. > > This scheduling instruction can be None ("just resume this task > later"), or a variety of other primitives, such as Futures ("resume > this task with the result of this Future"); see below for more. > > 3. The scheduler is responsible for interpreting each "step" instruction > as appropriate, and sending the instruction's result, if any, back to > the task using send() or throw(). > > A scheduler may run a single task to completion, or may multiplex > execution between many tasks: generator tasks should assume that > other tasks may have executed while the task was yielding. > > 4. The generator task completes by successfully returning (raising > StopIteration), or by raising an exception. The task's caller > receives this result. > > (For the sake of discussion, I use "the scheduler" to refer to whoever > calls the generator task's next/send/throw methods, and "the task's > caller" to refer to whoever receives the task's final result, but this > is not important to the protocol: a task should not care who drives it > or consumes its result, just like an iterator should not.) > > > Scheduling instructions / primitives > ==================================== > > (This could probably use a better name.) > > The protocol is intentionally agnostic about the implementation of > schedulers, event loops, or reactors: as long as they implement the same > set of scheduling primitives, code should work across them. > > There multiple ways to accomplish this, but one possibility is to have a > set common, generic instructions in a standard library module such as > "tasklib" (which could also contain things like default scheduler > implementations, helper functions, and so on). > > A partial list of possible primitives (the names are all made up, not > serious suggestions): > > 1. None: The most basic "do nothing" instruction. This just instructs > the scheduler to resume the yielding task later. > > 2. Futures: Instruct the scheduler to resume with the future's result. > > Similar types in third-party libraries, such Deferreds, could > potentially be implemented either natively by a scheduler that > supports it, or using a wait_for_deferred(d) helper task, or using > the idea of a "adapter" scheduler (see below). > > 3. Control primitives: spawn, sleep, etc. > > - Spawn a new (independent) task: yield tasklib.spawn(task()) > - Wait for multiple tasks: (x, y) = yield tasklib.par(foo(), bar()) > - Delay execution: yield tasklib.sleep(seconds) > - etc. > > These could be simple marker objects, leaving it up to the underlying > scheduler to actually recognize and implement them; some could also > be implemented in terms of simpler operations (e.g. sleep(), in > terms of lower-level suspend and resume operations). > > 4. I/O operations > > This could be anything from low-level "yield fd_readable(sock)" style > requests, or any of the higher-level APIs being discussed elsewhere. > > Whatever the exact API ends up being, the scheduler should implement > these primitives by waiting for the I/O (or condition), and resuming > the task with the result, if any. > > 5. Cooperative concurrency primitives, for working with locks, condition > variables, and so on. (If useful?) > > 6. Custom, scheduler-specific instructions: Since a generator task can > potentially yield anything as a scheduler instruction, it's not > inconceivable for specialized schedulers to support specialized > instructions. (Code that relies on such special instructions won't > work on other schedulers, but that would be the point.) > > A question open to debate is what a scheduler should do when faced with > an unrecognized scheduling instruction. > > Raising TypeError or NotImplementedError back into the task is probably > a reasonable action, and would allow code like: > > def task(): > try: > yield fancy_magic_instruction() > except NotImplementedError: > yield from boring_fallback() > ... > > > Generator tasks as schedulers, and vice versa > ============================================= > > Note that there is a symmetry to the protocol when a generator task > calls another using "yield from": > > def task() > spam = yield from subtask() > > Here, task() is both a generator task, and the effective scheduler for > subtask(): it "implements" subtask()'s scheduling instructions by > delegating them to its own scheduler. > > This is a plain observation on its own, however, it raises one or two > interesting possibilities for more interesting schedulers implemented as > generator tasks themselves, including: > > - Specialized sub-schedulers that run as a normal task within their > parent scheduler, but implement for example weighted or priority > queuing of their subtasks, or similar features. > > - "Adapter" schedulers that intercept special scheduler instructions > (say, Deferreds or other library-specific objects), and implement them > using more generic instructions to the underlying scheduler. > > > -- > Piet Delport > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ncoghlan at gmail.com Mon Oct 15 14:08:21 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Oct 2012 22:08:21 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 8:25 PM, Calvin Spealman wrote: > The more I follow this thread the less I understand the point of > introducing a new use for yield-from in this discussion. +1. To me, "yield from" is just a tool that brings generators back to parity with functions when it comes to breaking up a larger algorithm into smaller pieces. Where you would break a function out into subfunctions and call them normally, with a generator you can break out subgenerators and invoke them with yield from. Any meaningful use of "yield from" in the coroutine context *has* to ultimate devolve to an operation that: 1. Asks the scheduler to schedule another operation 2. Waits for that operation to complete Guido's approach to that problem is that step 1 is handled by calling functions that in turn call methods on a thread-local scheduler. These methods return Future objects, which can subsequently be yielded to the scheduler to say "I'm waiting for this future to be set". I *thought* Greg's way combined step 1 and step 2 into a single operation: the objects you yield *not only* say what you want to wait for, but also what you want to do. However, his example par() implementation killed that idea, since it turned out to need to schedule tasks explicitly rather than their being a "execute this in parallel" option. So now I'm back to think that Greg and Guido are talking about different levels. *Any* scheduling option will be able to be collapsed into an async task invoked by "yield from" by writing: def simple_async_task(): return yield start_task() The part that still needs to be figured out is how you turn that suspend/resume communications channel between the lowest level of the task stack and the scheduling loop into something usable, as well as how you handle iteration in a sensible way (I described my preferred approach when writing about the API I'd like to see for an async version of as_completed). I haven't seen anything to suggest that "yield from"'s role should change from what it is in 3.3: a way to factor out generators into multiple pieces with out breaking send() and throw(). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Oct 15 14:18:54 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Oct 2012 22:18:54 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507BE79D.1090100@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> Message-ID: On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer wrote: > Just one thing that I don't get. > What do you mean by 'implicit taskswitching' ? > There is no such thing in greenlet, if you really meant that > Library from Armin Rigo. > > greenlets do everything explicitly, no pre-emption at all. > > So, is there a general understanding what a greenlet is and what not? > Just to make sure that the discussed terms are clearly defined. With greenlets, your potential switching points are every function call (because you can call switch() from anywhere, and you can't reliably know the name of *every* IO operation, or operation that implicitly invokes an IO operation). With generators, there is always an explicit *local* marker within the generator body of the potential switching points: yield expressions (including yield from). Ordinary function calls cannot cause the function to be suspended. So greenlets give you the scalability benefits of microthreading (as almost any OS supports a couple of orders of magnitude more sockets than it can threads), but without the same benefits of locally visible suspension points that are provided by generators and explicit callbacks. That's the philosophical reason. As a *practical* matter, there's still the problem you described in more detail elsewhere that CPython relies too much on the C stack to support suspension of arbitrary call chains without the stack switching assembly code in Stackless/greenlets. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ironfroggy at gmail.com Mon Oct 15 14:31:22 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 08:31:22 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: A thought about more ways we could control groups of tasks, and avoid yield-from, just came to me this morning. def asset_packer(asset_urls): with yield until_all as results: for url in asset_urls: yield http.get(url) return pack(results) or def handle_or_cached(url): with yield first as result: yield http.get(url) yield cache.get(url) return result Currently, "with yield expr:" is not valid syntax, surprisingly. This gives us room to use it for something new. A generator-sensitive context manager. One option is just to allow the syntax directly. The generator yields, and sent value is used as a context manager. This would let the generator tell the scheduler "I'm going to give you a few different async ops, and I want to wait for all of them before I continue." etc. However, it leaves open the question how the scheduler knows the context manager has ended. Could it somehow indicate this to the correct scheduler in __exit__? Another option, if we're adding a new syntax anyway, is to make "with yield expr:" special and yield first the result of __enter__() and then, after the block is done, yield the result of __exit__(), which lets context blocks in the generator talk to the scheduler both before and after. Maybe we don't need the second, nuttier idea. But, I like the general idea. It feels right. On Mon, Oct 15, 2012 at 8:08 AM, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 8:25 PM, Calvin Spealman wrote: >> The more I follow this thread the less I understand the point of >> introducing a new use for yield-from in this discussion. > > +1. To me, "yield from" is just a tool that brings generators back to > parity with functions when it comes to breaking up a larger algorithm > into smaller pieces. Where you would break a function out into > subfunctions and call them normally, with a generator you can break > out subgenerators and invoke them with yield from. > > Any meaningful use of "yield from" in the coroutine context *has* to > ultimate devolve to an operation that: > 1. Asks the scheduler to schedule another operation > 2. Waits for that operation to complete > > Guido's approach to that problem is that step 1 is handled by calling > functions that in turn call methods on a thread-local scheduler. These > methods return Future objects, which can subsequently be yielded to > the scheduler to say "I'm waiting for this future to be set". > > I *thought* Greg's way combined step 1 and step 2 into a single > operation: the objects you yield *not only* say what you want to wait > for, but also what you want to do. However, his example par() > implementation killed that idea, since it turned out to need to > schedule tasks explicitly rather than their being a "execute this in > parallel" option. > > So now I'm back to think that Greg and Guido are talking about > different levels. *Any* scheduling option will be able to be collapsed > into an async task invoked by "yield from" by writing: > > def simple_async_task(): > return yield start_task() > > The part that still needs to be figured out is how you turn that > suspend/resume communications channel between the lowest level of the > task stack and the scheduling loop into something usable, as well as > how you handle iteration in a sensible way (I described my preferred > approach when writing about the API I'd like to see for an async > version of as_completed). I haven't seen anything to suggest that > "yield from"'s role should change from what it is in 3.3: a way to > factor out generators into multiple pieces with out breaking send() > and throw(). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia hink there is something wrong with the autolists that are set up to include Premium and Free content. -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From shibturn at gmail.com Mon Oct 15 15:11:07 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 15 Oct 2012 14:11:07 +0100 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On 12/10/2012 11:49pm, Guido van Rossum wrote: >>> >>That said, the idea of a common API architected around async I/O, >>> >>rather than non-blocking I/O, sounds interesting at least theoretically. > (Oh, what a nice distinction.) > > ... > > How close would our abstracted reactor interface have to be exactly > like IOCP? The actual IOCP API calls have very little to recommend > them -- it's the implementation and the architecture that we're after. > But we want it to be able to use actual IOCP calls on all systems that > have them. One could use IOCP or select/poll/... to implement an API which looks like class AsyncHub: def read(self, fd, nbytes): """Return future which is ready when read is complete""" def write(self, fd, buf): """Return future which is ready when write is complete""" def accept(self, fd): """Return future which is ready when connection is accepted""" def connect(self, fd, address): """Return future which is ready when connection has succeeded""" def wait(self, timeout=None): """Wait till a future is ready; return list of ready futures""" A reactor could then be built on top of such a hub. -- Richard From ncoghlan at gmail.com Mon Oct 15 15:48:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 15 Oct 2012 23:48:46 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: > Currently, "with yield expr:" is not valid syntax, surprisingly. It's not that surprising, it's the general requirement that yield expressions must be enclosed in parentheses except when used standalone or in a simple assignment statement. "with (yield expr):" is valid syntax though, so I'm reluctant to endorse doing anything substantially different if the parentheses are omitted. I think the combination of "yield from" to delegate control (including exception handling) completely to a subgenerator and "context manager + for loop + explicit yield" when an operation needs to yield multiple times and the exception handling behaviour should be left to the caller (as in the "as_completed" case) should cover the necessary behaviours. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Mon Oct 15 15:57:53 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 15 Oct 2012 15:57:53 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> Message-ID: <507C1661.5070206@stackless.com> Hey Nick, On 15.10.12 14:18, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 8:38 PM, Christian Tismer wrote: >> Just one thing that I don't get. >> What do you mean by 'implicit taskswitching' ? >> There is no such thing in greenlet, if you really meant that >> Library from Armin Rigo. >> >> greenlets do everything explicitly, no pre-emption at all. >> >> So, is there a general understanding what a greenlet is and what not? >> Just to make sure that the discussed terms are clearly defined. > With greenlets, your potential switching points are every function > call (because you can call switch() from anywhere, and you can't > reliably know the name of *every* IO operation, or operation that > implicitly invokes an IO operation). That's true, and you will wonder: I never liked that! See below (you'll wonder even more) > With generators, there is always an explicit *local* marker within the > generator body of the potential switching points: yield expressions > (including yield from). Ordinary function calls cannot cause the > function to be suspended. > > So greenlets give you the scalability benefits of microthreading (as > almost any OS supports a couple of orders of magnitude more sockets > than it can threads), but without the same benefits of locally visible > suspension points that are provided by generators and explicit > callbacks. Yes, I understood that a lot better now. The nice trick of the (actually a bit ugly) explicit down-chaining of the locally visible switching points is the one thing that makes a huge difference, both for Stackless and Greenlets. Because we could never know the exact switching points, things became so difficult to handle. > That's the philosophical reason. As a *practical* matter, there's > still the problem you described in more detail elsewhere that CPython > relies too much on the C stack to support suspension of arbitrary call > chains without the stack switching assembly code in > Stackless/greenlets. > Right, CPython still keeps unneccessary crap on the C stack. But that's not the point right now, because on the other hand, in the context of a possible yield (from or not), the C stack is clean, and this enables switching. And actually in such clean positions, Stackless Python (as opposed to Greenlets) does soft-switching, which is very similar to what the generators are doing - there is no assembly stuff involved at all. So in the context of switching, CPython is presumably more efficient than greenlet (because of stack slicing), and a bit less efficient than stackless because of the generator chaining. I have begun studying the code for YIELD_FROM. As it is written, every next iteration elevates the chain of generators once up and down. Maybe that can be avoided by changing the frame chain, so this can become a cheaper O(1) operation. Alternatively I could also imagine to write real generators or coroutines as an extension module. It would use the same concept as generators, internally. No big deal, not changing the interpreter, maybe adding a bit. I think this would make Greenlet and even Stackless obsolete in most cases which are of real use. I would like to discuss this and maybe do a prototype. cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From rene at stranden.com Mon Oct 15 16:11:00 2012 From: rene at stranden.com (Rene Nejsum) Date: Mon, 15 Oct 2012 16:11:00 +0200 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: <599379E5-9C51-463E-9434-76BB843DD663@stranden.com> On Oct 15, 2012, at 3:11 PM, Richard Oudkerk wrote: > On 12/10/2012 11:49pm, Guido van Rossum wrote: >>>> >>That said, the idea of a common API architected around async I/O, >>>> >>rather than non-blocking I/O, sounds interesting at least theoretically. >> (Oh, what a nice distinction.) >> >> ... >> >> How close would our abstracted reactor interface have to be exactly >> like IOCP? The actual IOCP API calls have very little to recommend >> them -- it's the implementation and the architecture that we're after. >> But we want it to be able to use actual IOCP calls on all systems that >> have them. > > One could use IOCP or select/poll/... to implement an API which looks like > > class AsyncHub: > def read(self, fd, nbytes): > """Return future which is ready when read is complete""" > > def write(self, fd, buf): > """Return future which is ready when write is complete""" > > def accept(self, fd): > """Return future which is ready when connection is accepted""" > > def connect(self, fd, address): > """Return future which is ready when connection has succeeded""" > > def wait(self, timeout=None): > """Wait till a future is ready; return list of ready futures""" > > A reactor could then be built on top of such a hub. So in general alle methods are async, even the wait() could be async if it returned a Furure, this way all methods would be of the same concept. I like this as a general API for all types of connections and all underlying OS' /Rene > > > -- > Richard > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From ironfroggy at gmail.com Mon Oct 15 16:16:14 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 10:16:14 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >> Currently, "with yield expr:" is not valid syntax, surprisingly. > > It's not that surprising, it's the general requirement that yield > expressions must be enclosed in parentheses except when used > standalone or in a simple assignment statement. > > "with (yield expr):" is valid syntax though, so I'm reluctant to > endorse doing anything substantially different if the parentheses are > omitted. Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. > I think the combination of "yield from" to delegate control (including > exception handling) completely to a subgenerator and "context manager > + for loop + explicit yield" when an operation needs to yield multiple > times and the exception handling behaviour should be left to the > caller (as in the "as_completed" case) should cover the necessary > behaviours. I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. with yield collect() as tasks: yield task1() yield task2() results = yield tasks > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ncoghlan at gmail.com Mon Oct 15 16:46:00 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 00:46:00 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507C1661.5070206@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> Message-ID: On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer wrote: > So in the context of switching, CPython is presumably more efficient > than greenlet (because of stack slicing), and a bit less efficient than > stackless because of the generator chaining. > > I have begun studying the code for YIELD_FROM. As it is written, every > next iteration elevates the chain of generators once up and down. > Maybe that can be avoided by changing the frame chain, so this can become > a cheaper O(1) operation. Yes, we certainly talked about that, but I don't believe anyone came up with the code needed to make it behave itself properly when unwinding the stack. (Either that or someone *did* try it, and then undid it because it broke the test suite, which amounts to the same thing. Mercurial could say for sure) > Alternatively I could also imagine to write real generators or coroutines > as an extension module. It would use the same concept as generators, > internally. No big deal, not changing the interpreter, maybe adding a bit. Tangentially related, there are some patches [1,2] on the tracker looking to shuffle a few things related to generator state around to get them out of the frame objects and into the generator objects where they belong. There are definitely a few things that could do with cleaning up in this space. [1] http://bugs.python.org/issue13897 [2] http://bugs.python.org/issue13607 > I think this would make Greenlet and even Stackless obsolete in most > cases which are of real use. The "take this synchronous code and magically make it scale better" aspect is still a nice feature of greenlets & gevent. > > I would like to discuss this and maybe do a prototype. Sure, I think there's several things we can do better here, and I think the test suite is comprehensive enough to keep us honest. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Mon Oct 15 16:50:39 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 00:50:39 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 12:16 AM, Calvin Spealman wrote: > I'm still -1 on delegating control to subgenerators with yield-from, > versus having the scheduler just deal with them directly. I think it > is far less flexible. Um, yield from is to generators as calls are to functions... delegating to subgenerators, regardless of context, is what it's *for*. Without it, the scheduler will have to do quite a bit of extra work to reconstruct sane stack traces. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From daniel.mcdougall at liftoffsoftware.com Mon Oct 15 17:00:57 2012 From: daniel.mcdougall at liftoffsoftware.com (Daniel McDougall) Date: Mon, 15 Oct 2012 11:00:57 -0400 Subject: [Python-ideas] The async API of the future: Some thoughts from an ignorant Tornado user In-Reply-To: References: Message-ID: On Sun, Oct 14, 2012 at 5:32 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> import async # The API of the future ;) >> async.async_call(retrieve_log_playback, settings, tws, >> mechanism=multiprocessing) >> # tws == instance of tornado.web.WebSocketHandler that holds the open >> connection > > > Is this a CPU-bound problem? It depends on the host. On embedded platforms (e.g. the BeagleBone) it is more IO-bound than CPU bound (fast CPU but slow disk and slow memory). On regular x86 systems it is mostly CPU-bound. >> * I should be able to choose the type of event loop/async mechanism >> that is appropriate for the task: For CPU-bound tasks I'll probably >> want to use multiprocessing. For IO-bound tasks I might want to use >> threading. For a multitude of tasks that "just need to be async" (by >> nature) I'll want to use an event loop. > > > Ehhh, maybe. This sounds like it confounds the tools for different use > cases. You can quite easily have threads and processes on top of an event > loop; that works out particularly nicely for processes because you still > have to talk to your processes. > > Examples: > > twisted.internet.reactor.spawnProcess (local processes) > twisted.internet.threads.deferToThread (local threads) > ampoule (remote processes) > > It's quite easy to do blocking IO in a thread with deferToThread; in fact, > that's how twisted's adbapi, an async wrapper to dbapi, works. As I understand it, twisted.internet.reactor.spawnProcess is all about spawning subprocesses akin to subprocess.Popen(). Also, it requires writing a sophisticated ProcessProtocol. It seems to be completely unrelated and wickedly complicated. The complete opposite of what I would consider ideal for an asynchronous library since it is anything but simple. I mean, I could write a separate program to generate HTML playback files from logs, spawn a subprocess in an asynchronous fashion, then watch it for completion but I could do that with termio.Multiplex (see: https://github.com/liftoff/GateOne/blob/master/gateone/termio.py) . > >> * Any async module should support 'basics' like calling functions at >> an interval and calling functions after a timeout occurs (with the >> ability to cancel). >> * Asynchronous tasks should be able to access the same namespace as >> everything else. Maybe wishful thinking. > > > With twisted, this is already the case; general caveats for shared mutable > state across threads of course still apply. Fortunately in most Twisted > apps, that's a tiny fraction of the total code, and they tend to be > fractions that are well-isolated or at least easily isolatable. > >> >> * It should support publish/subscribe-style events (i.e. an event >> dispatcher). For example, the ability to watch a file descriptor or >> socket for changes in state and call a function when that happens. >> Preferably with the flexibility to define custom events (i.e don't >> have it tied to kqueue/epoll-specific events). > > > Like connectionMade, connectionLost, dataReceived etc? > >> >> >> Thanks for your consideration; and thanks for the awesome language. >> >> -- >> Dan McDougall - Chief Executive Officer and Developer >> Liftoff Software ? Your flight to the cloud is now boarding. >> 904-446-8323 >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > -- > cheers > lvh > -- Dan McDougall - Chief Executive Officer and Developer Liftoff Software ? Your flight to the cloud is now boarding. 904-446-8323 From ironfroggy at gmail.com Mon Oct 15 17:16:18 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 11:16:18 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 10:50 AM, Nick Coghlan wrote: > On Tue, Oct 16, 2012 at 12:16 AM, Calvin Spealman wrote: >> I'm still -1 on delegating control to subgenerators with yield-from, >> versus having the scheduler just deal with them directly. I think it >> is far less flexible. > > Um, yield from is to generators as calls are to functions... > delegating to subgenerators, regardless of context, is what it's > *for*. Without it, the scheduler will have to do quite a bit of extra > work to reconstruct sane stack traces. I didn't consider the ease of sane stack traces, that is a good point. I just see all the problems that seem to be harder to do right with yield-from and I wish it could be made simpler by just bypassing them for coroutines. I don't feel they are the same as the original intent of yield-from, but I see the obvious way they match the need now. But, I still want to make my case and will put another hypothetical on the board. A "sane stack trace" only makes sense if we assume that tasks "call" each other in the same kind of call tree that synchronous code flows in, and I don't think that is necessarily the case. There are cases when one task might want to end before tasks it as "called" are complete, and if we use yield-from this is *impossible* but it is very useful. An example of this is a task which makes multiple requests, but only needs to wait for the results from less-than-all of them before returning. It might still want the other tasks to complete, even if it won't do anything with the results. yield-from semantics won't allow a called task to continue, if needed, after the calling task itself has completed. Is there another way these semantics could be expressed? > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From g.brandl at gmx.net Mon Oct 15 17:32:42 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 15 Oct 2012 17:32:42 +0200 Subject: [Python-ideas] Python as a tool to download stuff for bootstrapping In-Reply-To: References: Message-ID: On 10/15/2012 01:13 PM, anatoly techtonik wrote: > On Fri, Jul 6, 2012 at 10:30 PM, Georg Brandl wrote: >> On 05.07.2012 22:24, Amaury Forgeot d'Arc wrote: >>> >>> 2012/7/5 anatoly techtonik : >>>> >>>> This makes me kind of sad. You have Python installed. Why can't you >>>> just crossplatformly do: >>>> >>>> mkdir nacl >>>> cd nacl >>>> python -m urllib get >>>> >>>> http://commondatastorage.googleapis.com/nativeclient-mirror/nacl/nacl_sdk/update_sdk.py >>>> python update_sdk.py >>> >>> >>> I'm sure there is already a way with standard python tools. Something >>> along these lines: >>> >>> python -c "from urllib.request import urlretrieve; urlretrieve('URL', >>> 'update_sdk.zip')" >>> python -m update_sdk.zip >>> >>> The second command will work if the zip file has a __main__.py. >>> Do you think we need other tools? >> >> >> The "python -m urllib" (don't think "get" is required) interface certainly >> looks nice and is similar in style with many of the other __main__ stuff we >> add to stdlib modules. > > Here is the implementation of urllib.__main__ module for Python 3 with > progress bar. I've left 'get' argument to make it extensible in future with > other commands, such as `test`. Please don't send patches to the mailing list, open a new tracker issue instead. Georg From ncoghlan at gmail.com Mon Oct 15 17:32:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 01:32:46 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 1:16 AM, Calvin Spealman wrote: > An example of this is a task which makes multiple requests, but only needs to > wait for the results from less-than-all of them before returning. It > might still want > the other tasks to complete, even if it won't do anything with the results. > > yield-from semantics won't allow a called task to continue, if needed, after the > calling task itself has completed. > > Is there another way these semantics could be expressed? Sure, did you see my as_completed example? You couldn't use "yield from" for that, you'd need to use an ordinary iterator and an explicit yield in the body of the loop (this is why I disagree with Greg that "yield from" can serve as the one true API - it doesn't handle partial iteration, and it doesn't handle pre- or post- processing around the suspension points while iterating). My preferred way of thinking of "yield from" is as a simple refactoring tool: "Gee, this generator is getting kind of long and unwieldy. I'll move this piece out into a separate generator, and use yield from to invoke it" or "Hmm, I keep using this same sequence of 3 or 4 operations. I guess I'll move them out to a separate generator and use yield from to invoke it in the appropriate places". Compare that with the almost identical equivalents when refactoring a function to call a helper function instead of doing everything inline: "Gee, this function is getting kind of long and unwieldy. I'll move this piece out into a separate function, and call it" or "Hmm, I keep using this same sequence of 3 or 4 operations. I guess I'll move them out to a separate function and call it it in the appropriate places". Just as some operations can't be factored out with simple function calls, hence we have iterators and context managers, so not all operations will be able to be factored out of a coroutine with "yield from" (hence why I consider "yield" to be the more appropriate core primitive, with "yield from" just correctly factoring out the task of complete delegation, which is otherwise hard to do correctly) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Oct 15 17:33:32 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 08:33:32 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 1:33 AM, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 2:54 AM, Guido van Rossum wrote: >> On Sun, Oct 14, 2012 at 8:01 AM, Calvin Spealman wrote: >>> Why is subclassing a problem? It can be overused, but seems the right >>> thing to do in this case. You want a protocol that responds to new data by >>> echoing and tells the user when the connection was terminated? It makes >>> sense that this is a subclass: a special case of some class that handles the >>> base behavior. >> >> I replied to this in detail on the "Twisted and Deferreds" thread in >> an exchange. Summary: I'm -0 when it comes to subclassing protocol >> classes; -1 on subclassing objects that implement significant >> functionality. > > This problem does seem tailor-made for a Protocol ABC - you can > inherit from it if you want, or call register() if you don't. But you're still stuck with implementing the names that someone else decided upon a decade ago... :-) -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Mon Oct 15 17:39:57 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Mon, 15 Oct 2012 11:39:57 -0400 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum wrote: > But you're still stuck with implementing the names that someone else > decided upon a decade ago... :-) And why is that a bad thing? I don't see the value in having something like: thing.set_data_received_callback(self.bake_some_eggs) We're going to have to give *something* a name, eventually. Why not pick it at the most direct level? > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From _ at lvh.cc Mon Oct 15 17:51:03 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Mon, 15 Oct 2012 17:51:03 +0200 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 5:32 PM, Nick Coghlan wrote: > My preferred way of thinking of "yield from" is as a simple > refactoring tool: "Gee, this generator is getting kind of long and > unwieldy. I'll move this piece out into a separate generator, and use > yield from to invoke it" or "Hmm, I keep using this same sequence of 3 > or 4 operations. I guess I'll move them out to a separate generator > and use yield from to invoke it in the appropriate places". > I agree. That's how I've used it. Maybe that's just short-sightedness. -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Mon Oct 15 17:51:17 2012 From: glyph at twistedmatrix.com (Glyph) Date: Mon, 15 Oct 2012 08:51:17 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <09394CD5-9950-49CF-A25E-C906B70F3BC9@umbrellacode.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> <537E074C-49B5-47A2-978F-D0592862B74E@twistedmatrix.com> <09394CD5-9950-49CF-A25E-C906B70F3BC9@umbrellacode.com> Message-ID: <79BE3A91-9A01-4632-97D2-1761999FAA97@twistedmatrix.com> On Oct 15, 2012, at 1:03 AM, Shane Green wrote: > Namely, all callbacks registered with a given Promise instance, receive the output of the original operation This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer. The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations. What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable. Sometimes I do think that the visibly mutable nature of Deferreds was a mistake. If I read you properly though, what you're saying is that you can do this: promise = ... promise.then(alpha).then(beta) promise.then(gamma).then(delta) and in yield-coroutine style this is effectively: value = yield promise beta(yield alpha(value)) delta(yield gamma(value)) This deficiency is reasonably easy to work around with Deferreds. You can just do: def fork(d): dprime = Deferred() def propagate(result): dprime.callback(result) return result d.addBoth(propagate) return dprime and then: fork(x).addCallback(alpha).addCallback(beta) fork(x).addCallback(gamma).addCallback(delta) Perhaps this function should be in Twisted; it's certainly come up a few times. But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed. And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 15 17:53:49 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 08:53:49 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507BA60B.2030806@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> Message-ID: On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> Why wouldn't all generators that aren't blocked for I/O just run until >> their next yield, in a round-robin fashion? That's fair enough for me. >> >> But as I said, my intuition for how things work in Greg's world is not >> very good. > > > That's exactly how my scheduler behaves. > > >> OTOH I am okay with only getting one of the exceptions. But I think >> all of the remaining tasks should still be run to completion -- maybe >> the caller just cared about their side effects. Or maybe this should >> be an option to par(). > > > This is hard to answer without considering real use cases, > but my feeling is that if I care enough about the results of > the subtasks to wait until they've all completed before continuing, > then if anything goes wrong in any of them, I might as well abandon > the whole computation. > > If that's not the case, I'd be happy to wrap each one in a > try-except that doesn't propagate the exception to the main > task, but just records the information that the subtask > failed somewhere, for the main task to check afterwards. > > Another direction to approach this is to consider that par() > ought to be just an optimisation -- the result should be the same > as if you'd written sequential code to perform the subtasks > one after another. And in that case, an exception in one would > prevent any of the following ones from executing, so it's fine > if par() behaves like that, too. I'd think of such a par() more as something that saves me typing than as an optimization. Anyway, the key functionality I cannot live without here is to start multiple tasks concurrently. It seems that without par() or some other scheduling primitive, you cannot do that: if I write a = foo_task() # Search google b = bar_task() # Search bing ra = yield from a rb = yield from b # now compare search results the tasks run sequentially. A good par() should run then concurrently. But there needs to be another way to get a task running immediately and concurrently; I believe that would be a = spawn(foo_task()) right? One could then at any later point use ra = yield from a One could also combine these and do e.g. a = spawn(foo_task()) b = spawn(bar_task()) ra, rb = yield from par(a, b) Have I got the spelling for spawn() right? In many other systems (e.g. threads, greenlets) this kind of operation takes a callable, not the result of calling a function (albeit a generator). If it takes a generator, would it return the same generator or a different one to wait for? -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Mon Oct 15 17:54:11 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 15 Oct 2012 17:54:11 +0200 Subject: [Python-ideas] The async API of the future: Reactors References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: <20121015175411.4e084fef@pitrou.net> On Mon, 15 Oct 2012 14:11:07 +0100 Richard Oudkerk wrote: > > One could use IOCP or select/poll/... to implement an API which looks like > > class AsyncHub: > def read(self, fd, nbytes): > """Return future which is ready when read is complete""" > > def write(self, fd, buf): > """Return future which is ready when write is complete""" > > def accept(self, fd): > """Return future which is ready when connection is accepted""" > > def connect(self, fd, address): > """Return future which is ready when connection has succeeded""" > > def wait(self, timeout=None): > """Wait till a future is ready; return list of ready futures""" > > A reactor could then be built on top of such a hub. I suppose the reactor would handle higher-level stuff such as TLS? Regards Antoine. From ncoghlan at gmail.com Mon Oct 15 17:56:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 01:56:45 +1000 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum wrote: > But you're still stuck with implementing the names that someone else > decided upon a decade ago... :-) There's a certain benefit to everyone using the same names and being able to read each others code, even when there's a (small?) risk of the names not aging well. Do we really want the first step in deciphering someone else's async code to be "OK, what did they call their connection and data processing callbacks?"? Twisted's IProtocol API is pretty simple: - makeConnection - connectionMade - dataReceived - connectionLost Everything else is up to the individual protocols (including whether or not they offer a "write" method) The transport and producer/consumer APIs aren't much more complicated (https://twistedmatrix.com/documents/current/core/howto/producers.html) and make rather a lot of sense. The precise *shape* of those APIs are likely to be different in a generator based system, and I assume we'd want to lose the camel-case names, but standardising the terminology seems like a good idea. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From _ at lvh.cc Mon Oct 15 18:04:09 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Mon, 15 Oct 2012 18:04:09 +0200 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 5:56 PM, Nick Coghlan wrote: > Twisted's IProtocol API is pretty simple: > - makeConnection > - connectionMade > - dataReceived > - connectionLost > > Everything else is up to the individual protocols (including whether > or not they offer a "write" method) > While I agree with everything else you're saying, write may be a bad example: it's generally something on the *transport*, and it's an interface method (ie always available) there. > > Cheers, > Nick. > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ironfroggy at gmail.com Mon Oct 15 18:06:44 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Mon, 15 Oct 2012 12:06:44 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 11:53 AM, Guido van Rossum wrote: > On Sun, Oct 14, 2012 at 10:58 PM, Greg Ewing > wrote: >> Guido van Rossum wrote: >> >>> Why wouldn't all generators that aren't blocked for I/O just run until >>> their next yield, in a round-robin fashion? That's fair enough for me. >>> >>> But as I said, my intuition for how things work in Greg's world is not >>> very good. >> >> >> That's exactly how my scheduler behaves. >> >> >>> OTOH I am okay with only getting one of the exceptions. But I think >>> all of the remaining tasks should still be run to completion -- maybe >>> the caller just cared about their side effects. Or maybe this should >>> be an option to par(). >> >> >> This is hard to answer without considering real use cases, >> but my feeling is that if I care enough about the results of >> the subtasks to wait until they've all completed before continuing, >> then if anything goes wrong in any of them, I might as well abandon >> the whole computation. >> >> If that's not the case, I'd be happy to wrap each one in a >> try-except that doesn't propagate the exception to the main >> task, but just records the information that the subtask >> failed somewhere, for the main task to check afterwards. >> >> Another direction to approach this is to consider that par() >> ought to be just an optimisation -- the result should be the same >> as if you'd written sequential code to perform the subtasks >> one after another. And in that case, an exception in one would >> prevent any of the following ones from executing, so it's fine >> if par() behaves like that, too. > > I'd think of such a par() more as something that saves me typing than > as an optimization. Anyway, the key functionality I cannot live > without here is to start multiple tasks concurrently. It seems that > without par() or some other scheduling primitive, you cannot do that: > if I write > > a = foo_task() # Search google > b = bar_task() # Search bing > ra = yield from a > rb = yield from b > # now compare search results > > the tasks run sequentially. A good par() should run then concurrently. > But there needs to be another way to get a task running immediately > and concurrently; I believe that would be > > a = spawn(foo_task()) > > right? One could then at any later point use > > ra = yield from a > > One could also combine these and do e.g. > > a = spawn(foo_task()) > b = spawn(bar_task()) > > ra, rb = yield from par(a, b) > > Have I got the spelling for spawn() right? In many other systems (e.g. > threads, greenlets) this kind of operation takes a callable, not the > result of calling a function (albeit a generator). If it takes a > generator, would it return the same generator or a different one to > wait for? I think "start this other async task, but let me continue now" (spawn) is so common and basic an operation it needs to be first class. What if we allow both yield and yield from of a task? If we allow spawn(task()) then we're not getting nice tracebacks anyway, so I think we should allow result1 = yield from task1() # wait for this other task result2 = yield from task2() # wait for this next and future1 = yield task1() # spawn task future2 = yield task2() # spawn other task results = yield future1, future2 I was wrong to say we shouldn't do yield-from task scheduling, I see the benefits now. but I don't think it has to be either or. I think it makes sense to allow both, and that the behavior differences between the two ways to invoke another task would be sensible. Both are primitives we need to support as first-class operation. That is, without some wrapper like spawn(). > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From guido at python.org Mon Oct 15 18:09:51 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 09:09:51 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 8:32 AM, Nick Coghlan wrote: > My preferred way of thinking of "yield from" is as a simple > refactoring tool: "Gee, this generator is getting kind of long and > unwieldy. I'll move this piece out into a separate generator, and use > yield from to invoke it" or "Hmm, I keep using this same sequence of 3 > or 4 operations. I guess I'll move them out to a separate generator > and use yield from to invoke it in the appropriate places". In the NDB world you would say: "Gee this _tasklet_ is getting kind of long and unwieldy. I'll move this piece out into a separate _tasklet_, and use _yield_ to invoke it." Creating a tasklet is just writing a generator decorated with @ndb.tasklet -- after using this a bit it becomes total second nature (I've seen several coworkers pick it up effortlessly). I'll have to digest your other points about yield vs. yield-from more carefully -- on the one hand I think it would be cool if yield-from could give us an even simpler paradigm to write async code than NDB's version, and that expectation was one of my main reasons to push for PEP 380's acceptance. On the other hand you bring up some good points with the as_completed() example (though I have a feeling Greg will easily sail around it :-). PS. Unrelated, and please don't respond to this or at least change the subject if you feel compelled: there seem to be a lot of bad names in this field. Twisted uses adjectives as nouns (Twisted, Deferred, I read about another one), "add_done_callback" is too longwinded, "as_completed" brings absolutely no useful association with it.. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 18:17:55 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 09:17:55 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 8:39 AM, Jasper St. Pierre wrote: > On Mon, Oct 15, 2012 at 11:33 AM, Guido van Rossum wrote: >> But you're still stuck with implementing the names that someone else >> decided upon a decade ago... :-) > > And why is that a bad thing? I don't see the value in having something > like: thing.set_data_received_callback(self.bake_some_eggs) But I do, and you've pinpointed exactly my argument. My code is all about baking an egg, and (from my POV) it's secondary that it's invoked by the reactor when data is received. > We're going to have to give *something* a name, eventually. Why not > pick it at the most direct level? Let the reactor pick *its* names (e.g. set_data_received_callback). Then I can pick mine. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 18:24:12 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 09:24:12 -0700 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: References: <20121012203311.4b3ee8af@pitrou.net> Message-ID: On Mon, Oct 15, 2012 at 8:56 AM, Nick Coghlan wrote: > On Tue, Oct 16, 2012 at 1:33 AM, Guido van Rossum wrote: >> But you're still stuck with implementing the names that someone else >> decided upon a decade ago... :-) > > There's a certain benefit to everyone using the same names and being > able to read each others code, even when there's a (small?) risk of > the names not aging well. Do we really want the first step in > deciphering someone else's async code to be "OK, what did they call > their connection and data processing callbacks?"? > > Twisted's IProtocol API is pretty simple: > - makeConnection > - connectionMade > - dataReceived > - connectionLost > > Everything else is up to the individual protocols (including whether > or not they offer a "write" method) > > The transport and producer/consumer APIs aren't much more complicated > (https://twistedmatrix.com/documents/current/core/howto/producers.html) > and make rather a lot of sense. The precise *shape* of those APIs are > likely to be different in a generator based system, and I assume we'd > want to lose the camel-case names, but standardising the terminology > seems like a good idea. I guess you see it as a template pattern, where everybody has to implement the same state machine *somehow*. Like having to implement a file-like object, or a mapping. I'm still convinced that the alternate POV is just as valid in this case, but I'm going to let it rest because it doesn't matter enough to me to keep arguing. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 15 18:25:53 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 09:25:53 -0700 Subject: [Python-ideas] Off-line most of the day Message-ID: I'm about to enter an intense all-day-long meeting at work, and won't have time to keep up with email at all until late tonight. So have fun discussing async APIs without me, and please stay on topic! -- --Guido van Rossum (python.org/~guido) From tismer at stackless.com Mon Oct 15 18:41:27 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 15 Oct 2012 18:41:27 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> Message-ID: <507C3CB7.5030804@stackless.com> On 15.10.12 16:46, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 11:57 PM, Christian Tismer wrote: ... >> Alternatively I could also imagine to write real generators or coroutines >> as an extension module. It would use the same concept as generators, >> internally. No big deal, not changing the interpreter, maybe adding a bit. > Tangentially related, there are some patches [1,2] on the tracker > looking to shuffle a few things related to generator state around to > get them out of the frame objects and into the generator objects where > they belong. There are definitely a few things that could do with > cleaning up in this space. > > [1] http://bugs.python.org/issue13897 > [2] http://bugs.python.org/issue13607 Thanks for pointing me at that. I think Mark Shannon has quite similar thoughts. I need to talk to him. >> I think this would make Greenlet and even Stackless obsolete in most >> cases which are of real use. > The "take this synchronous code and magically make it scale better" > aspect is still a nice feature of greenlets & gevent. I had a deeper look into gevent and how it uses greenlet and does its monkey-patching. Indeed, cute! My assumption was that I could write a surrogate greenlet using the advanced generators. But I overlooked that for this to work, everything must behave like generators. Not only the surrogate greenlet, but also the code that it wants to switch. Argh... A work-around for gevent would be a rewrite of all supported modules to patch. Not a cake walk. Thanks, you gave me a lot of insight! >> I would like to discuss this and maybe do a prototype. > Sure, I think there's several things we can do better here, and I > think the test suite is comprehensive enough to keep us honest. > > Cheers, > Nick. > Cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Mon Oct 15 19:25:31 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 15 Oct 2012 19:25:31 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507C1661.5070206@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> Message-ID: <507C470B.4090409@stackless.com> On 15.10.12 15:57, Christian Tismer wrote: > > Right, CPython still keeps unneccessary crap on the C stack. > But that's not the point right now, because on the other hand, > in the context of a possible yield (from or not), the C stack > is clean, and this enables switching. > And actually in such clean positions, Stackless Python (as opposed to > Greenlets) does soft-switching, which is very similar to what the > generators > are doing - there is no assembly stuff involved at all. I'm sorry about the expression "crap". Please read this as "stuff". I was not aware of the unfriendliness of this word and will be more careful next time. cheers - Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From shibturn at gmail.com Mon Oct 15 19:25:16 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 15 Oct 2012 18:25:16 +0100 Subject: [Python-ideas] The async API of the future: Reactors In-Reply-To: <20121015175411.4e084fef@pitrou.net> References: <20121012203311.4b3ee8af@pitrou.net> <20121015175411.4e084fef@pitrou.net> Message-ID: On 15/10/2012 4:54pm, Antoine Pitrou wrote: > I suppose the reactor would handle higher-level stuff such as TLS? Yes. The hub would just cover the platform dependent IO stuff. -- Richard From dinov at microsoft.com Mon Oct 15 19:24:16 2012 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 15 Oct 2012 17:24:16 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space. We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3. Our version of this is also based around futures so that an @async API will return a future. The big difference here might be that we always return a future from a call rather than yielding it up the stack. So our API works with just simple yields rather than yield froms. This is what a simple usage of the API looks like: from concurrent.futures import ThreadPoolExecutor from urllib.request import urlopen executor = ThreadPoolExecutor(max_workers=5) def load_url(url): return urlopen(_url).read() @async def get_image_async(url): buffer = yield executor.submit(load_url, url) return Image(buffer) def main(image_uri): img_future = get_image_async(image_uri) # perform other tasks while the image is downloading img = img_future.result() main("http://www.python.org/images/python-logo.gif") This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future. So inside of an async method anything which is yielded should be a future. The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future. Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value. Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there. Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed. The big outstanding item we're still working through is I/O, but we think the contexts help here too. We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue. We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread. Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run): Contexts.py: http://pastebin.com/ndS53Cd8 Tk context: http://pastebin.com/FuZwc1Ur Tk app: http://pastebin.com/Fm5wMXpN Hardwork.py: http://pastebin.com/nMMytdTG -----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+dinov=microsoft.com at python.org] On Behalf Of Calvin Spealman Sent: Monday, October 15, 2012 7:16 AM To: Nick Coghlan Cc: python-ideas at python.org Subject: Re: [Python-ideas] The async API of the future: yield-from On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: > On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >> Currently, "with yield expr:" is not valid syntax, surprisingly. > > It's not that surprising, it's the general requirement that yield > expressions must be enclosed in parentheses except when used > standalone or in a simple assignment statement. > > "with (yield expr):" is valid syntax though, so I'm reluctant to > endorse doing anything substantially different if the parentheses are > omitted. Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. > I think the combination of "yield from" to delegate control (including > exception handling) completely to a subgenerator and "context manager > + for loop + explicit yield" when an operation needs to yield multiple > times and the exception handling behaviour should be left to the > caller (as in the "as_completed" case) should cover the necessary > behaviours. I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. with yield collect() as tasks: yield task1() yield task2() results = yield tasks > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas From glyph at twistedmatrix.com Mon Oct 15 20:08:41 2012 From: glyph at twistedmatrix.com (Glyph) Date: Mon, 15 Oct 2012 11:08:41 -0700 Subject: [Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures) Message-ID: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> Still working my way through zillions of messages on this thread, trying to find things worth responding to, I found this, from Guido: > [Generators are] more flexible [than Deferreds], since it is easier to catch different exceptions at different points (...) In the past, when I pointed this out to Twisted aficionados, the responses usually were a mix of "sure, if you like that style, we got it covered, Twisted has inlineCallbacks," and "but that only works for the simple cases, for the real stuff you still need Deferreds." But that really sounds to me like Twisted people just liking what they've got and not wanting to change. If you were actually paying attention, we did explain what "the real stuff" is, and why you can't do it with inlineCallbacks. ;-) (Or perhaps I should say, why we prefer to do it with Deferreds explicitly.) Managing parallelism is easy with the when-this-then-that idiom of Deferreds, but challenging with the sequential this-then-this-then-this idiom of generators. The examples in the quoted message were all sequential workflows, which are roughly equivalent in both styles. As soon as a for loop gets involved though, yield-based coroutines have a harder time expressing the kind of parallelism that a lot of applications should use, so it's easy to become accidentally sequential (and therefore less responsive) even if you don't need to be. For example, using some hypothetical generator coroutine library, the idiomatic expression of a loop across several request/responses would be something like this: @yield_coroutine def something_async(): values = yield step1() results = set() for value in values: results.add(step3((yield step2(value)))) return_(results) Since it's in a set, the order of 'results' doesn't actually matter; but this code needs to sit and wait for each result to come back in order; it can't perform any processing on the ones that are already ready while it's waiting. You express this with Deferreds: def something_deferred(): return step1().addCallback( lambda values: gatherResults([step2(value).addCallback(step3) for value in values])).addCallback(set) In addition to being a roughly equivalent amount of code (fewer lines, but denser), that will run step2() and step3() on demand, as results are ready from the set of Deferreds from step1. That means that your program will automatically spread out its computation, which makes better use of time as results may be arriving in any order. The problem is that it is difficult to express laziness with generator coroutines: you've already spent the generator-ness on the function on responding to events, so there's no longer any syntactic support for laziness. (There's another problem where sometimes you can determine that work needs to be done as it arrives; that's an even trickier abstraction than Deferreds though and I'm still working on it. I think I've mentioned already in one of my previous posts.) Also, this is not at all a hypothetical or academic example. This pattern comes up all the time in e.g. web-spidering and chat applications. To be fair, you could express this in a generator-coroutine library like this: @yield_coroutine def something_async(): values = yield step1() thunks = [] @yield_coroutine def do_steps(value): return_(step3((yield step2(value)))) for value in values: thunks.append(do_steps(value)) return_(set((yield multi_wait(thunks)))) but that seems bizarre and not very idiomatic; to me, it looks like the confusing aspects of both styles. David Reid also wrote up some examples of how Deferreds can express sequential workflows more nicely as well (also indirectly as a response to Guido!) on his blog, here: . > Which I understand -- I don't want to change either. But I also observe that a lot of people find bare Twisted-with-Deferreds too hard to grok, so they use Tornado instead, or they build a layer on top of either (like Monocle), inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle. That's not to say Monocle has no value; it is a portability layer between Twisted and Tornado that does the same thing inlineCallbacks does but allows you to do it even if you're not using Deferreds, which will surely be useful to some people. I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X? I'd use it if it did X, but it's all weird and I don't understand Y that it forces me to do instead, that's why I use Z" when, in fact: Twisted does do X It's done X for years It actually invented X in the first place There are legitimate reasons why we (Twisted core developers) suggest and prefer Y for many cases, but you don't need to do it if you don't want to follow our advice Thing Z that is being cited as doing X actually explicitly mentions Twisted as an inspiration for its implementation of X It's fair, of course, to complain that we haven't explained this very well, and I'll cop to that unless I can immediately respond with a pre-existing URL that explains things :). One other comment that's probably worth responding to: > I suppose on systems that support both networking and GUI events, in my design these would use different I/O objects (created using different platform-specific factories) and the shared reactor API would sort things out based on the type of I/O object passed in to it. In my opinion, it is a mistake to try to harmonize or unify all GUI event systems, unless you are also harmonizing the GUI itself (i.e. writing a totally portable GUI toolkit that does everything). And I think we can all agree that writing a totally portable GUI toolkit is an impossibly huge task that is out of scope for this (or, really, any other) discussion. GUI systems can already dispatch its event to user code just fine - interposing a Python reactor API between the GUI and the event registration adds additional unnecessary work, and may not even be possible in some cases. See, for example, the way that Xcode (formerly Interface Builder) and the Glade interface designer use: the name of the event handler is registered inside a somewhat opaque blob, which is data and not code, and then hooked up automatically at runtime based on reflection. The code itself never calls any event-registration APIs. Also, modeling all GUI interaction as a request/response conversation is limiting and leads to bad UI conventions. Consider: the UI element that most readily corresponds to a request/response is a modal dialog box. Does anyone out there really like applications that consist mainly of popping up dialog after dialog to prompt you for the answers to questions? -g -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Mon Oct 15 21:12:15 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 15 Oct 2012 15:12:15 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121014145738.57948600@bhuda.mired.org> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: On Sun, Oct 14, 2012 at 3:57 PM, Mike Meyer wrote: > On Sun, 14 Oct 2012 07:40:57 +0200 > Yuval Greenfield wrote: > >> On Sun, Oct 14, 2012 at 2:04 AM, MRAB wrote: >> >> > If it's more than one codepoint, we could prefix with the length of the >> > codepoint's name: >> > >> > def __12CIRCLED_PLUS__(x, y): >> > ... >> > >> > >> That's a bit impractical, and why reinvent the wheel? I'd much rather: >> >> def \u2295(x, y): >> .... >> >> So readable I want to read it twice. And that's not legal python today so >> we don't break backwards compatibility! > > Yes, but we're defining an operator for instances of the class, so it > needs the 'special' method marking: > > def __\u2295__(self, other): > > Now *that's* pretty! > > References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop: http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland wrote: > I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space. We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3. Our version of this is also based around futures so that an @async API will return a future. The big difference here might be that we always return a future from a call rather than yielding it up the stack. So our API works with just simple yields rather than yield froms. This is what a simple usage of the API looks like: > > from concurrent.futures import ThreadPoolExecutor > from urllib.request import urlopen > > executor = ThreadPoolExecutor(max_workers=5) > > def load_url(url): > return urlopen(_url).read() > > @async > def get_image_async(url): > buffer = yield executor.submit(load_url, url) > return Image(buffer) > > def main(image_uri): > img_future = get_image_async(image_uri) > # perform other tasks while the image is downloading > img = img_future.result() > > main("http://www.python.org/images/python-logo.gif") > > This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future. So inside of an async method anything which is yielded should be a future. The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future. Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value. > > Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there. Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed. > > The big outstanding item we're still working through is I/O, but we think the contexts help here too. We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue. We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread. > > Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run): > > Contexts.py: http://pastebin.com/ndS53Cd8 > Tk context: http://pastebin.com/FuZwc1Ur > Tk app: http://pastebin.com/Fm5wMXpN > Hardwork.py: http://pastebin.com/nMMytdTG > > > > > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+dinov=microsoft.com at python.org] On Behalf Of Calvin Spealman > Sent: Monday, October 15, 2012 7:16 AM > To: Nick Coghlan > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] The async API of the future: yield-from > > On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: >> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >>> Currently, "with yield expr:" is not valid syntax, surprisingly. >> >> It's not that surprising, it's the general requirement that yield >> expressions must be enclosed in parentheses except when used >> standalone or in a simple assignment statement. >> >> "with (yield expr):" is valid syntax though, so I'm reluctant to >> endorse doing anything substantially different if the parentheses are >> omitted. > > Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. > >> I think the combination of "yield from" to delegate control (including >> exception handling) completely to a subgenerator and "context manager >> + for loop + explicit yield" when an operation needs to yield multiple >> times and the exception handling behaviour should be left to the >> caller (as in the "as_completed" case) should cover the necessary >> behaviours. > > I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. > > I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. > > with yield collect() as tasks: > yield task1() > yield task2() > results = yield tasks > >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > -- > Read my blog! I depend on your acceptance of my opinion! I am interesting! > http://techblog.ironfroggy.com/ > Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Mon Oct 15 21:37:32 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Mon, 15 Oct 2012 15:37:32 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: On Mon, Oct 15, 2012 at 3:12 PM, Mike Graham wrote: >> def __\u2295__(self, other): >> >> Now *that's* pretty! >> >> > > IMO it's essential that we add source code escapes. Imagine the > one-liners this will allow! > > def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None > > Can we get this fix applied in Python 2.2 and up? Yeah, this is how Java works, and it's one of the best features of the language, because any valid program can be expressed using ASCII only. Of course, it means that there are going to be some edge cases. Like, now: print "\n" will be an invalid program, since the newline escape will be translated before the source is tokenized. But who does that? It's just a small price to pay for the big wins of having any program expressed in simple ASCII. > Mike > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From jsbueno at python.org.br Mon Oct 15 22:00:27 2012 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 15 Oct 2012 17:00:27 -0300 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: On 15 October 2012 16:12, Mike Graham wrote: > On Sun, Oct 14, 2012 at 3:57 PM, Mike Meyer wrote: >> On Sun, 14 Oct 2012 07:40:57 +0200 >> Yuval Greenfield wrote: >> >>> On Sun, Oct 14, 2012 at 2:04 AM, MRAB wrote: >>> >>> > If it's more than one codepoint, we could prefix with the length of the >>> > codepoint's name: >>> > >>> > def __12CIRCLED_PLUS__(x, y): >>> > ... >>> > >>> > >>> That's a bit impractical, and why reinvent the wheel? I'd much rather: >>> >>> def \u2295(x, y): >>> .... >>> >>> So readable I want to read it twice. And that's not legal python today so >>> we don't break backwards compatibility! >> >> Yes, but we're defining an operator for instances of the class, so it >> needs the 'special' method marking: >> >> def __\u2295__(self, other): >> >> Now *that's* pretty! >> >> > > IMO it's essential that we add source code escapes. Imagine the > one-liners this will allow! > > def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None > > Can we get this fix applied in Python 2.2 and up? > " The time machine strikes again!" What you want is _valid_ in Python, likely since 2.2 - You will need at least two lines in the file: # coding:unicode_escape\n def a():\n\tprint "Helo World"\n\na() > Mike > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From python at mrabarnett.plus.com Mon Oct 15 22:14:17 2012 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 15 Oct 2012 21:14:17 +0100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: <507C6E99.8050208@mrabarnett.plus.com> On 2012-10-15 20:37, Jasper St. Pierre wrote: > On Mon, Oct 15, 2012 at 3:12 PM, Mike Graham wrote: >>> def __\u2295__(self, other): >>> >>> Now *that's* pretty! >>> >>> > >> >> IMO it's essential that we add source code escapes. Imagine the >> one-liners this will allow! >> >> def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None >> >> Can we get this fix applied in Python 2.2 and up? > > Yeah, this is how Java works, and it's one of the best features of the > language, because any valid program can be expressed using ASCII only. > > Of course, it means that there are going to be some edge cases. Like, now: > > print "\n" > > will be an invalid program, since the newline escape will be > translated before the source is tokenized. But who does that? It's > just a small price to pay for the big wins of having any program > expressed in simple ASCII. > Simple: print "\\n" From greg.ewing at canterbury.ac.nz Mon Oct 15 22:14:47 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 09:14:47 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> Message-ID: <507C6EB7.3090601@canterbury.ac.nz> Nick Coghlan wrote: > The main primitive I personally want out of an async API is a > task-based equivalent to concurrent.futures.as_completed() [1]. This > is what I meant about iteration being a bit of a mess: the way the > as_completed() works, the suspend/resume channel of the iterator > protocol is being used to pass completed future objects back to the > calling iterator. That means that channel *can't* be used to talk > between the coroutine and the scheduler, I had to read this a couple of times before I figured out what you're talking about, but I get it now. This is an instance of a general problem that was noticed back when I was discussing my cofunctions idea: using generator-based coroutines, it's not possible to have a "suspendable iterator", because that would require "yield" to have two conflicting meanings: "suspend this coroutine" on one hand, and "provide a value to my caller" on the other. Unfortunately, I suspect that a truly elegant solution to this problem will require yet another language addition -- something like yield for item in subtask(): ... which would run a slightly different version of the iterator protocol in which values to be yield are wrapped somehow (I haven't figured out all the details yet). -- Greg From greg.ewing at canterbury.ac.nz Mon Oct 15 22:19:58 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 09:19:58 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507BC855.4070802@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> Message-ID: <507C6FEE.7030506@canterbury.ac.nz> Christian Tismer wrote: > Question: Is it already given that something like greenlets is out > of consideration? Greenlets will always be available to those who want and are able to use them. But there's a desire to have something in the standard library that is completely portable and doesn't rely on any platform dependent techniques or tricks. That's what we're talking about here. -- Greg From jimjjewett at gmail.com Mon Oct 15 22:21:50 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Mon, 15 Oct 2012 16:21:50 -0400 Subject: [Python-ideas] filename comparison [was] Re: PEP 428 - object-oriented filesystem paths Message-ID: On 10/8/12, Greg Ewing wrote: > Ronald Oussoren wrote: >> neither statvs, statvfs, nor pathconf seem to be able to tell if a >> filesystem is case insensitive. > Even if they could, you wouldn't be entirely out of the woods, > because different parts of the same path can be on different > file systems... > But how important is all this anyway? I'm trying to think of > occasions when I've wanted to compare two entire paths for > equality, and I can't think of *any*. I can think of several, but when I thought a bit harder, they were mostly bug attractors. If I want my program (or a dict) to know that "CONFIG" and "config" are the same, then I also want it to know that "My Documents" is the same as "MYDOCU~1".* Ideally, I would also have a way to find out that a pathname is likely to be problematic for cross-platform uses, or at least whether two specific pathnames are known to be collision-prone on existing platforms other than mine. (But I'm not sure that sort of test can be reliable enough for the stdlib. Would just check for caseless equality, reserved Windows names, and non-alphanumeric characters in the filename?) *(Well, assuming it is. The short name depends on the history of the directory.) -jJ From phd at phdru.name Mon Oct 15 21:23:29 2012 From: phd at phdru.name (Oleg Broytman) Date: Mon, 15 Oct 2012 23:23:29 +0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: <20121015192329.GA11054@iskra.aviel.ru> On Mon, Oct 15, 2012 at 03:12:15PM -0400, Mike Graham wrote: > IMO it's essential that we add source code escapes. Imagine the > one-liners this will allow! > > def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None SyntaxError: a semicolon required after 'except ValueError'. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From dinov at microsoft.com Mon Oct 15 23:45:13 2012 From: dinov at microsoft.com (Dino Viehland) Date: Mon, 15 Oct 2012 21:45:13 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: They look remarkably similar. The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along. So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously. But frameworks can replace the default context so that work is posted into an event loop of some form. -----Original Message----- From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum Sent: Monday, October 15, 2012 12:34 PM To: Dino Viehland Cc: ironfroggy at gmail.com; Nick Coghlan; python-ideas at python.org Subject: Re: [Python-ideas] The async API of the future: yield-from Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop: http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland wrote: > I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space. We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3. Our version of this is also based around futures so that an @async API will return a future. The big difference here might be that we always return a future from a call rather than yielding it up the stack. So our API works with just simple yields rather than yield froms. This is what a simple usage of the API looks like: > > from concurrent.futures import ThreadPoolExecutor > from urllib.request import urlopen > > executor = ThreadPoolExecutor(max_workers=5) > > def load_url(url): > return urlopen(_url).read() > > @async > def get_image_async(url): > buffer = yield executor.submit(load_url, url) > return Image(buffer) > > def main(image_uri): > img_future = get_image_async(image_uri) > # perform other tasks while the image is downloading > img = img_future.result() > > main("http://www.python.org/images/python-logo.gif") > > This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future. So inside of an async method anything which is yielded should be a future. The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future. Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value. > > Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there. Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed. > > The big outstanding item we're still working through is I/O, but we think the contexts help here too. We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue. We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread. > > Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run): > > Contexts.py: http://pastebin.com/ndS53Cd8 Tk context: > http://pastebin.com/FuZwc1Ur Tk app: http://pastebin.com/Fm5wMXpN > Hardwork.py: http://pastebin.com/nMMytdTG > > > > > -----Original Message----- > From: Python-ideas > [mailto:python-ideas-bounces+dinov=microsoft.com at python.org] On Behalf > Of Calvin Spealman > Sent: Monday, October 15, 2012 7:16 AM > To: Nick Coghlan > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] The async API of the future: yield-from > > On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: >> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >>> Currently, "with yield expr:" is not valid syntax, surprisingly. >> >> It's not that surprising, it's the general requirement that yield >> expressions must be enclosed in parentheses except when used >> standalone or in a simple assignment statement. >> >> "with (yield expr):" is valid syntax though, so I'm reluctant to >> endorse doing anything substantially different if the parentheses are >> omitted. > > Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. > >> I think the combination of "yield from" to delegate control >> (including exception handling) completely to a subgenerator and >> "context manager >> + for loop + explicit yield" when an operation needs to yield >> + multiple >> times and the exception handling behaviour should be left to the >> caller (as in the "as_completed" case) should cover the necessary >> behaviours. > > I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. > > I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. > > with yield collect() as tasks: > yield task1() > yield task2() > results = yield tasks > >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > > -- > Read my blog! I depend on your acceptance of my opinion! I am interesting! > http://techblog.ironfroggy.com/ > Follow me if you're into that sort of thing: > http://www.twitter.com/ironfroggy > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From mikegraham at gmail.com Tue Oct 16 00:06:47 2012 From: mikegraham at gmail.com (Mike Graham) Date: Mon, 15 Oct 2012 18:06:47 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <20121015192329.GA11054@iskra.aviel.ru> References: <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> <20121015192329.GA11054@iskra.aviel.ru> Message-ID: On Mon, Oct 15, 2012 at 3:23 PM, Oleg Broytman wrote: > On Mon, Oct 15, 2012 at 03:12:15PM -0400, Mike Graham wrote: >> IMO it's essential that we add source code escapes. Imagine the >> one-liners this will allow! >> >> def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None > > SyntaxError: a semicolon required after 'except ValueError'. > > Oleg. Obviously we'd make those pesky semicolons optional in the process. Mike From anacrolix at gmail.com Tue Oct 16 01:37:58 2012 From: anacrolix at gmail.com (Matt Joiner) Date: Tue, 16 Oct 2012 10:37:58 +1100 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: I gave something like this a go a while ago: https://bitbucket.org/anacrolix/green380 "Coroutines" yield events or futures as Nick put them from the top, and the scheduler at the bottom manages events and scheduling. There are a few things I took away from this attempt: 1) Explicit yield a la PEP380 requires syntactical changes *everywhere*. 2) Python's dynamic typing means that neglecting to "yield from" gives you broken code, and Python won't help you here. Add to this that you now have a 380, and "normal synchronous" form of most interfaces and the caller must know what kind is used at all times. 3) Concurrency is nice, but it requires language-level support, and proper parallelism to really shine. The "C way" of doing things is already so heavily ingrained in Python, an entirely new standard library and interpreter that breaks C compatibility is really the only way to proceed, and this certainly isn't worth it just to write code with "yield from" littered on every line. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Oct 16 02:13:58 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 13:13:58 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CA6C6.2010003@canterbury.ac.nz> Nick Coghlan wrote: > To me, "yield from" is just a tool that brings generators back to > parity with functions when it comes to breaking up a larger algorithm > into smaller pieces. Where you would break a function out into > subfunctions and call them normally, with a generator you can break > out subgenerators and invoke them with yield from. That's exactly correct. It's the way I intended "yield from" to be thought of right from the beginning. What I'm arguing is that the *public api* for any suspendable operation should be in the form of something called using yield-from, because you never know when the implementation might want to break it down into sub-operations and use yield-from to call *them*. > Any meaningful use of "yield from" in the coroutine context *has* to > ultimate devolve to an operation that: > 1. Asks the scheduler to schedule another operation > 2. Waits for that operation to complete I don't think I would put it quite that way. In my view of things at least, the scheduler doesn't schedule "operations" (in the sense of "read some bytes from this socket" etc.) Rather, it schedules the running of tasks. So the breakdown is really: 1. Start an operation (this doesn't involve the scheduler) 2. Ask the scheduler to suspend this task until the operation is finished Also, this breakdown is only necessary at the very lowest level, where you want to do something that isn't provided in the form of a generator. Obviously it's *possible* to treat each level of the call chain as its own subtask, that you spawn independently and then wait for it to finish. That's what people have done in the past with their trampoline schedulers that interpret yielded "call" and "return" instructions. But one of the purposes of yield-from is to relieve the scheduler of the need to handle things at that level of granularity. It can treat a generator together with all the subgenerators it might call as a *single* task, the same way that a greenlet is thought of as a single task, however many levels of function calls it might make. > I *thought* Greg's way combined step 1 and step 2 into a single > operation: the objects you yield *not only* say what you want to wait > for, but also what you want to do. I don't actually yield objects at all, but... > However, his example par() > implementation killed that idea, since it turned out to need to > schedule tasks explicitly rather than their being a "execute this in > parallel" option. I don't see how that's a problem. Seems to me it's just as easy for the user to call a par() function as it is to yield a tuple of tasks. And providing this functionality using a function means that different versions or options can be made available for variations such as different ways of handling exceptions. Using yield, you need to pick one of the variations and bless it as being the one that you invoke using special syntax. If you're complaining that the implementation of par() seems too complex, well, that complexity has to occur *somewhere* -- if it's not in the par() function, then it will turn up inside whatever part of the scheduler handles the case that it's given a tuple of tasks. > So now I'm back to think that Greg and Guido are talking about > different levels. *Any* scheduling option will be able to be collapsed > into an async task invoked by "yield from" by writing: > > def simple_async_task(): > return yield start_task() Yes... or another implementation that works some way other than yielding instructions to the scheduler. > I haven't seen anything to suggest that > "yield from"'s role should change from what it is in 3.3: a way to > factor out generators into multiple pieces with out breaking send() > and throw(). I don't think anyone is suggesting that. I'm certainly not. -- Greg From steve at pearwood.info Tue Oct 16 02:17:52 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Oct 2012 11:17:52 +1100 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: <507CA7B0.6080200@pearwood.info> Deliberately not naming names, 'cos this isn't intended as a personal attack on anyone... Some people suggested as syntax: >>>> def __12CIRCLED_PLUS__(x, y): >>>> ... >>> def \u2295(x, y): >>> .... >> def __\u2295__(self, other): > IMO it's essential that we add source code escapes. Imagine the > one-liners this will allow! > > def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None > > Can we get this fix applied in Python 2.2 and up? As much as I've been wetting yourselves from all the hilarity, I'm afraid that I have to ask you all to stop. Competing to see who can come up with the worst possible joke syntax gets *real old* fast. Sorry to be a wet blanket spoiling the fun, but this list does have a serious purpose, and it seems to me that sarcastically[1] inventing deliberately awful syntax is off-topic. Or at least off-putting. Now I enjoy reading the occasional piece of obfuscated code or syntax as much as the next guy, but there are limits, and I think this thread passed them about a dozen posts back. Believe it or not, there are good, reasonable reasons for wanting more operators, and at least one serious PEP driven by real-world needs: http://www.python.org/dev/peps/pep-0225/ So can we please drop this thread unless you have a serious suggestion that doesn't need to wait until Python 4? [1] By all the gods, PLEASE don't tell me these proposals are meant seriously! -- Steven From greg.ewing at canterbury.ac.nz Tue Oct 16 02:24:15 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 13:24:15 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CA92F.4000301@canterbury.ac.nz> I've just had another thought I'd like to mention concerning the way we think about subtasks. There's actually a subtle difference between invoking a subgenerator using yield-from on the one hand, and spawning it as a separate task and then waiting for it on the other. When you call a subgenerator using yield-from, a switch to another task can't occur until that subgenerator or something it calls reaches a yield. But (at least the way my scheduler currently works), if you spawn it as a separate task and then block waiting for it to complete, other tasks can run immediately, before the subtask has even started. If you're relying on knowing where the yields can occur, this difference could be important. So I think the distinction between calling and spawning subtasks needs to be maintained. This means that spawning must be something you request explicitly in some way. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 02:44:22 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 13:44:22 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507C1661.5070206@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> Message-ID: <507CADE6.7050604@canterbury.ac.nz> Christian Tismer wrote: > Right, CPython still keeps unneccessary crap on the C stack. It's not just Python leaving stuff on the stack that's a problem, it's external C code that calls back into Python. > But that's not the point right now, because on the other hand, > in the context of a possible yield (from or not), the C stack > is clean, and this enables switching. > And actually in such clean positions, Stackless Python (as opposed to > Greenlets) does soft-switching, which is very similar to what the > generators > are doing - there is no assembly stuff involved at all. But the assembly code still needs to be there to handle the cases where you *can't* do soft switching. It's the presence of the code that's the issue here, not how frequently it gets called. > I have begun studying the code for YIELD_FROM. As it is written, every > next iteration elevates the chain of generators once up and down. > Maybe that can be avoided by changing the frame chain, so this can become > a cheaper O(1) operation. My original implementation of yield-from actually *did* avoid this, by keeping a C-level pointer chain of yielding-from frames. But that part was ripped out at the last minute when someone discovered that it had a detrimental effect on tracebacks. There are probably other ways the traceback problem could be fixed, so maybe we will get this optimisation back one day. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 02:55:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 13:55:20 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CB078.8040506@canterbury.ac.nz> Calvin Spealman wrote: > I'm still -1 on delegating control to subgenerators with yield-from, > versus having the scheduler just deal with them directly. Do you mean to *disallow* using yield-from for this, or just not to encourage it? I don't see how you *could* disallow it; there's no way for the scheduler to know whether one of the generators it's handling is delegating using yield-from. I also can't see any reason you would want to discourage it. Given that yield-from exists, it's an obvious thing to do. -- Greg From guido at python.org Tue Oct 16 03:17:48 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 18:17:48 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: On Mon, Oct 15, 2012 at 2:45 PM, Dino Viehland wrote: > They look remarkably similar. The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along. So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously. But frameworks can replace the default context so that work is posted into an event loop of some form. But do your Futures use threads? NDB doesn't. NDB's event loop doesn't know about Futures; however the @ndb.tasklet decorator does, and the Futures know about the event loop. When you wait for a Future, a callback is added to the Future that will resume the generator when it is done, and in order to run them, the Future passes its callbacks to the event loop to be run. --Guido > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum > Sent: Monday, October 15, 2012 12:34 PM > To: Dino Viehland > Cc: ironfroggy at gmail.com; Nick Coghlan; python-ideas at python.org > Subject: Re: [Python-ideas] The async API of the future: yield-from > > Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop: > http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py > > On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland wrote: >> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space. We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3. Our version of this is also based around futures so that an @async API will return a future. The big difference here might be that we always return a future from a call rather than yielding it up the stack. So our API works with just simple yields rather than yield froms. This is what a simple usage of the API looks like: >> >> from concurrent.futures import ThreadPoolExecutor >> from urllib.request import urlopen >> >> executor = ThreadPoolExecutor(max_workers=5) >> >> def load_url(url): >> return urlopen(_url).read() >> >> @async >> def get_image_async(url): >> buffer = yield executor.submit(load_url, url) >> return Image(buffer) >> >> def main(image_uri): >> img_future = get_image_async(image_uri) >> # perform other tasks while the image is downloading >> img = img_future.result() >> >> main("http://www.python.org/images/python-logo.gif") >> >> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future. So inside of an async method anything which is yielded should be a future. The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future. Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value. >> >> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there. Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed. >> >> The big outstanding item we're still working through is I/O, but we think the contexts help here too. We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue. We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread. >> >> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run): >> >> Contexts.py: http://pastebin.com/ndS53Cd8 Tk context: >> http://pastebin.com/FuZwc1Ur Tk app: http://pastebin.com/Fm5wMXpN >> Hardwork.py: http://pastebin.com/nMMytdTG >> >> >> >> >> -----Original Message----- >> From: Python-ideas >> [mailto:python-ideas-bounces+dinov=microsoft.com at python.org] On Behalf >> Of Calvin Spealman >> Sent: Monday, October 15, 2012 7:16 AM >> To: Nick Coghlan >> Cc: python-ideas at python.org >> Subject: Re: [Python-ideas] The async API of the future: yield-from >> >> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: >>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >>>> Currently, "with yield expr:" is not valid syntax, surprisingly. >>> >>> It's not that surprising, it's the general requirement that yield >>> expressions must be enclosed in parentheses except when used >>> standalone or in a simple assignment statement. >>> >>> "with (yield expr):" is valid syntax though, so I'm reluctant to >>> endorse doing anything substantially different if the parentheses are >>> omitted. >> >> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. >> >>> I think the combination of "yield from" to delegate control >>> (including exception handling) completely to a subgenerator and >>> "context manager >>> + for loop + explicit yield" when an operation needs to yield >>> + multiple >>> times and the exception handling behaviour should be left to the >>> caller (as in the "as_completed" case) should cover the necessary >>> behaviours. >> >> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. >> >> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. >> >> with yield collect() as tasks: >> yield task1() >> yield task2() >> results = yield tasks >> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> >> >> -- >> Read my blog! I depend on your acceptance of my opinion! I am interesting! >> http://techblog.ironfroggy.com/ >> Follow me if you're into that sort of thing: >> http://www.twitter.com/ironfroggy >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > > > > > -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Tue Oct 16 03:49:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 11:49:01 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507CADE6.7050604@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing wrote: > My original implementation of yield-from actually *did* avoid > this, by keeping a C-level pointer chain of yielding-from frames. > But that part was ripped out at the last minute when someone > discovered that it had a detrimental effect on tracebacks. > > There are probably other ways the traceback problem could be > fixed, so maybe we will get this optimisation back one day. Ah, I thought I remembered something along those lines. IIRC, it was a bug report on one of the alphas that prompted us to change it. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Tue Oct 16 03:51:26 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 18:51:26 -0700 Subject: [Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures) In-Reply-To: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> References: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> Message-ID: On Mon, Oct 15, 2012 at 11:08 AM, Glyph wrote: > Still working my way through zillions of messages on this thread, trying > to find things worth responding to, I found this, from Guido: > > [Generators are] more flexible [than Deferreds], since it is easier to > catch different exceptions at different points (...) In the past, when I > pointed this out to Twisted aficionados, the responses usually were a mix > of "sure, if you like that style, we got it covered, Twisted has > inlineCallbacks," and "but that only works for the simple cases, for the > real stuff you still need Deferreds." But that really sounds to me like > Twisted people just liking what they've got and not wanting to change. > > > If you were actually paying attention, we did explain what "the real > stuff" is, and why you can't do it with inlineCallbacks. ;-) > An yet the rest of your email could be paraphrased by those two quoted phrases. :-) But seriously, thanks for repeating the explanation for my benefit. > (Or perhaps I should say, why we prefer to do it with Deferreds > explicitly.) > > Managing parallelism is easy with the when-this-then-that idiom of > Deferreds, but challenging with the sequential this-then-this-then-this > idiom of generators. The examples in the quoted message were all > sequential workflows, which are roughly equivalent in both styles. As soon > as a for loop gets involved though, yield-based coroutines have a harder > time expressing the kind of parallelism that a lot of applications *should > * use, so it's easy to become accidentally sequential (and therefore less > responsive) even if you don't need to be. For example, using some > hypothetical generator coroutine library, the idiomatic expression of a > loop across several request/responses would be something like this: > > @yield_coroutine > def something_async(): > values = yield step1() > results = set() > for value in values: > results.add(step3((yield step2(value)))) > return_(results) > > > Since it's in a set, the order of 'results' doesn't actually matter; but > this code needs to sit and wait for each result to come back in order; it > can't perform any processing on the ones that are already ready while it's > waiting. You express this with Deferreds: > > def something_deferred(): > return step1().addCallback( > lambda values: gatherResults([step2(value).addCallback(step3) > for value in > values])).addCallback(set) > > > In addition to being a roughly equivalent amount of code (fewer lines, but > denser), that will run step2() and step3() on demand, as results are ready > from the set of Deferreds from step1. That means that your program will > automatically spread out its computation, which makes better use of time as > results may be arriving in any order. > > The problem is that it is difficult to express laziness with generator > coroutines: you've already spent the generator-ness on the function on > responding to events, so there's no longer any syntactic support for > laziness. > I see your example as a perfect motivation for adding some kind of map() primitive. In NDB there is one for the specific case of mapping over query results (common in NDB because it's primarily a database client). That map() primitive takes a callback that is either a plain function or a tasklet (i.e. something returning a Future). map() itself is also async (returning a Future) and all the tasklets results are waited for and collected only when you wait for the map(). It also handles the input arriving in batches (as they do for App Engine Datastore queries). IOW it exploits all available parallelism. While the public API is tailored for queries, the underlying mechanism can support a few different ways of collecting the results, supporting filter() and even reduce() (!) in addition to map(); and most of the code is reusable for other (non-query) contexts. I feel it would be possible to extend it to support "stop after the first N results" and "stop when this predicate says so" too. In general, whenever you want parallelism in Python, you have to introduce a new function, unless you happen to have a suitable function lying around already; so I don't feel I am contradicting myself by proposing a mechanism using callbacks here. It's the callbacks for sequencing that I dislike. > (There's another problem where sometimes you can determine that work needs > to be done as it arrives; that's an even trickier abstraction than > Deferreds though and I'm still working on it. I think I've mentioned < > http://tm.tl/1956> already in one of my previous posts.) > NDB's map() does this. > Also, this is not at all a hypothetical or academic example. This pattern > comes up all the time in e.g. web-spidering and chat applications. > Of course. In App Engine, fetching multiple URLs in parallel is the hello-world of async operations. > To be fair, you *could* express this in a generator-coroutine library > like this: > > @yield_coroutine > > def something_async(): > > values = yield step1() > > thunks = [] > > @yield_coroutine > > def do_steps(value): > > return_(step3((yield step2(value)))) > > for value in values: > > thunks.append(do_steps(value)) > > return_(set((yield multi_wait(thunks)))) > > > but that seems bizarre and not very idiomatic; to me, it looks like the > confusing aspects of both styles. > Yeah, you need a map() operation: @yield_coroutine def something_async(): values = yield step1() @yield_coroutine def do_steps(value): return step3((yield step2(value))) return set(yield map_async(do_steps, values)) Or maybe map_async()'s Future's result should be a set? > David Reid also wrote up some examples of how Deferreds can express > sequential workflows more nicely as well (also indirectly as a response to > Guido!) on his blog, here: < > http://dreid.org/2012/03/30/deferreds-are-a-dataflow-abstraction>. > > Which I understand -- I don't want to change either. But I also observe > that a lot of people find bare Twisted-with-Deferreds too hard to grok, so > they use Tornado instead, or they build a layer on top of either (like > Monocle), > > > inlineCallbacks (and the even-earlier deferredGenerator) predates Monocle. > That's not to say Monocle has no value; it is a portability layer between > Twisted and Tornado that does the same thing inlineCallbacks does but > allows you to do it even if you're not using Deferreds, which will surely > be useful to some people. > > I don't want to belabor this point, but it bugs me a little bit that we > get so much feedback from the broader Python community along the lines of > "Why doesn't Twisted do X? > I don't think I quite said that. But I suspect it happens because Twisted is hard to get into. I suspect anything using higher-order functions this much has that problem; I feel this way about Haskell's Monads. I wouldn't be surprised if many Twisted lovers are also closet (or not) Haskell lovers. > I'd use it if it did X, but it's all weird and I don't understand Y that > it forces me to do instead, that's why I use Z" when, in fact: > > > 1. Twisted does do X > 2. It's done X for years > 3. It actually invented X in the first place > 4. There are legitimate reasons why we (Twisted core developers) > suggest and prefer Y for many cases, but you don't need to do it if you > don't want to follow our advice > 5. Thing Z that is being cited as doing X actually explicitly mentions > Twisted as an inspiration for its implementation of X > > > It's fair, of course, to complain that we haven't explained this very > well, and I'll cop to that unless I can immediately respond with a > pre-existing URL that explains things :). > > One other comment that's probably worth responding to: > > I suppose on systems that support both networking and GUI events, in my > design these would use different I/O objects (created using different > platform-specific factories) and the shared reactor API would sort things > out based on the type of I/O object passed in to it. > > > In my opinion, it is a mistake to try to harmonize or unify all GUI event > systems, unless you are also harmonizing the GUI itself (i.e. writing a > totally portable GUI toolkit that does everything). And I think we can all > agree that writing a totally portable GUI toolkit is an impossibly huge > task that is out of scope for this (or, really, any other) discussion. GUI > systems can already dispatch its event to user code just fine - interposing > a Python reactor API between the GUI and the event registration adds > additional unnecessary work, and may not even be possible in some cases. > See, for example, the way that Xcode (formerly Interface Builder) and the > Glade interface designer use: the name of the event handler is registered > inside a somewhat opaque blob, which is data and not code, and then hooked > up automatically at runtime based on reflection. The code itself never > calls any event-registration APIs. > > Also, modeling all GUI interaction as a request/response conversation is > limiting and leads to bad UI conventions. Consider: the UI element that > most readily corresponds to a request/response is a modal dialog box. Does > anyone out there really like applications that consist mainly of popping up > dialog after dialog to prompt you for the answers to questions? > I don't feel very strongly about integrating GUI systems. IIRC Twisted has some way to integrate with certain GUI event loops. I don't think we should desire any more (but neither, less). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 16 04:10:46 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 19:10:46 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507C6EB7.3090601@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507C6EB7.3090601@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 1:14 PM, Greg Ewing wrote: > Nick Coghlan wrote: > >> The main primitive I personally want out of an async API is a >> task-based equivalent to concurrent.futures.as_completed() [1]. This >> is what I meant about iteration being a bit of a mess: the way the >> as_completed() works, the suspend/resume channel of the iterator >> protocol is being used to pass completed future objects back to the >> calling iterator. That means that channel *can't* be used to talk >> between the coroutine and the scheduler, > > > I had to read this a couple of times before I figured out > what you're talking about, but I get it now. > > This is an instance of a general problem that was noticed > back when I was discussing my cofunctions idea: using > generator-based coroutines, it's not possible to have a > "suspendable iterator", because that would require "yield" > to have two conflicting meanings: "suspend this coroutine" > on one hand, and "provide a value to my caller" on the > other. > > Unfortunately, I suspect that a truly elegant solution to this > problem will require yet another language addition -- something > like > > yield for item in subtask(): > ... > > which would run a slightly different version of the iterator > protocol in which values to be yield are wrapped somehow > (I haven't figured out all the details yet). I think I ran into a similar issue with NDB when defining iteration over an asynchronous query. My solution: q = it = q.iter() # Fire off the query to the datastore while (yield it.has_next_async()): # Block until one result emp = it.next() # Get the result that was buffered on the iterator print emp.name, emp.age # Use it -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 16 04:19:29 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 15 Oct 2012 19:19:29 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: On Mon, Oct 15, 2012 at 4:37 PM, Matt Joiner wrote: > I gave something like this a go a while ago: > https://bitbucket.org/anacrolix/green380 > > "Coroutines" yield events or futures as Nick put them from the top, and the > scheduler at the bottom manages events and scheduling. > > There are a few things I took away from this attempt: > > 1) Explicit yield a la PEP380 requires syntactical changes *everywhere*. So does using PEP 342 style coroutines (yield Future instead of yield from). > 2) Python's dynamic typing means that neglecting to "yield from" gives you > broken code, and Python won't help you here. Add to this that you now have a > 380, and "normal synchronous" form of most interfaces and the caller must > know what kind is used at all times. In NDB this is alleviated by insisting that the only thing you are allowed to yield is a Future. Anything else raises TypeError. But yes, the first few days when getting used to this style, you end up debugging this a few times. > 3) Concurrency is nice, but it requires language-level support, and proper > parallelism to really shine. The "C way" of doing things is already so > heavily ingrained in Python, an entirely new standard library and > interpreter that breaks C compatibility is really the only way to proceed, > and this certainly isn't worth it just to write code with "yield from" > littered on every line. Here you're basically arguing for greenlets/gevent -- you're saying you just don't want to put up with the yields everywhere. But the popularity of Twisted and Tornado show that at least some people are willing to make even bigger sacrifices in order to be able to do async I/O efficiently -- i.e., to solve the C10K problem that Christian Tismer referred to (http://www.kegel.com/c10k.html, http://en.wikipedia.org/wiki/C10k_problem). There happen to be several problems with greenlets (even Christian Tismer said so, and included Stackless in the problem). The current effort is striving to help people solve it ith less effort than the async style Twisted and Tornado promote, while avoiding the problems with greenlets. -- --Guido van Rossum (python.org/~guido) From dinov at microsoft.com Tue Oct 16 03:50:29 2012 From: dinov at microsoft.com (Dino Viehland) Date: Tue, 16 Oct 2012 01:50:29 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: <2fc37a32135b45efa34bd0ad73ebd799@BY2PR03MB596.namprd03.prod.outlook.com> Guido wrote: > But do your Futures use threads? NDB doesn't. NDB's event loop doesn't > know about Futures; however the @ndb.tasklet decorator does, and the > Futures know about the event loop. When you wait for a Future, a callback is > added to the Future that will resume the generator when it is done, and in > order to run them, the Future passes its callbacks to the event loop to be > run. The decorator and the default context don't do anything w/ threads by default, but once you start combining it w/ other futures threads are likely to be used. For example if you take: @async def get_image_async(url): buffer = yield executor.submit(load_url, url) return Image(buffer) Then the " yield executor.submit(load_url, url)" line is going to yield a future which is running on a thread pool thread. When it completes it's done callback is also going to be delivered on the same thread pool thread. At that point we let the context which was captured when the function was initially called handle resuming the generator. The default context is just going to synchronously continue to the function, so the generator would then resume running on the thread pool thread. But if you're running in a GUI app which sets up its own context then the context will post an event into the UI event loop and execution will continue on the UI thread. Likewise if there were a bunch of async I/O routines then this would combine with them in a similar way - async I/O would result in a future, the futures would signal that they're done on some worker thread, and then the async methods will get to continue running on that worker thread unless the current context wants to do something different. From Steve.Dower at microsoft.com Tue Oct 16 03:45:25 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 01:45:25 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> , Message-ID: That's basically exactly the same as ours (I worked on it with Dino). We assume that yielded objects are futures and wire up the callback. I think the difference is that all the yielded futures are hidden within the decorator, which returns one main future to the caller. This may be slightly inefficient, but it also makes it far easier for end-users. An event loop (in our terminology, a 'context') is only necessary if you need to ensure that callbacks (in this case, the next step in the generator) is run in a certain context (such as a UI thread). Without one, calling an @async method simply gives you back a future that you can wait on. The most important part of PEP 380 for this approach is not yield from, but allowing return inside a generator. It makes the methods that much more natural. Probably the most important part is that we assume whatever context is available (through contexts.get_current()) has a post() method for scheduling a callback. Basically, we approached this as less of a "how do I run this asynchronously" problem and more of a "how do I run something after this finishes" problem. We also have some ideas about associating properties with futures in a way that lets the caller decide how to run continuations, so you can opt-out of coming back to the calling thread or provide a cancellation token of some sort. These aren't written up yet (obviously), but we've certainly considered it. Cheers, Steve ________________________________________ From: Python-ideas [python-ideas-bounces+steve.dower=microsoft.com at python.org] on behalf of Guido van Rossum [guido at python.org] Sent: Monday, October 15, 2012 6:17 PM To: Dino Viehland Cc: python-ideas at python.org Subject: Re: [Python-ideas] The async API of the future: yield-from On Mon, Oct 15, 2012 at 2:45 PM, Dino Viehland wrote: > They look remarkably similar. The biggest difference I see is that NDB appears to be using an event loop to keep the futures running while we're using add_done_callback (on the yielded futures) to continue stepping the generator function along. So there's not necessary an event loop in our case, and in fact the default context always just executes things synchronously. But frameworks can replace the default context so that work is posted into an event loop of some form. But do your Futures use threads? NDB doesn't. NDB's event loop doesn't know about Futures; however the @ndb.tasklet decorator does, and the Futures know about the event loop. When you wait for a Future, a callback is added to the Future that will resume the generator when it is done, and in order to run them, the Future passes its callbacks to the event loop to be run. --Guido > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum > Sent: Monday, October 15, 2012 12:34 PM > To: Dino Viehland > Cc: ironfroggy at gmail.com; Nick Coghlan; python-ideas at python.org > Subject: Re: [Python-ideas] The async API of the future: yield-from > > Wow, sounds very similar to NDB's approach! Please do check out NDB's tasklets and event loop: > http://code.google.com/p/appengine-ndb-experiment/source/browse/ndb/tasklets.py > > On Mon, Oct 15, 2012 at 10:24 AM, Dino Viehland wrote: >> I'm still catching up to this thread, but we've been investigating Win 8 support for Python and Win 8 has a very asynchronous API design and so we've been interested in much the same space. We've actually come up with an example of the @task decorator (we called it @async) which is built around using yield + the ability to return from generators added in Python 3.3. Our version of this is also based around futures so that an @async API will return a future. The big difference here might be that we always return a future from a call rather than yielding it up the stack. So our API works with just simple yields rather than yield froms. This is what a simple usage of the API looks like: >> >> from concurrent.futures import ThreadPoolExecutor >> from urllib.request import urlopen >> >> executor = ThreadPoolExecutor(max_workers=5) >> >> def load_url(url): >> return urlopen(_url).read() >> >> @async >> def get_image_async(url): >> buffer = yield executor.submit(load_url, url) >> return Image(buffer) >> >> def main(image_uri): >> img_future = get_image_async(image_uri) >> # perform other tasks while the image is downloading >> img = img_future.result() >> >> main("http://www.python.org/images/python-logo.gif") >> >> This example us just using the existing thread pool to run the actual I/O but this will work with anything that will return a future. So inside of an async method anything which is yielded should be a future. The decorator will then attach a callback which will send the result of the future back into the generator, so the "buffer = " line gets the result of the future. Finally the function completes and the future returned from calling get_image_async will have its value set to Image when the StopIteration exception is raised with the return value. >> >> Because we're interested in the GUI side of things here we've also wired this up into Tk so that we can experiment with an existing GUI framework, and I've included the source for the context there. Our thinking here is that different contexts can be created depending upon the framework which you're running in and that the context makes sure the code is running in the right spot, in this case getting back to the GUI thread after an async operation has been completed. >> >> The big outstanding item we're still working through is I/O, but we think the contexts help here too. We're still not quite sure how polling I/O will work, but with the contexts if there's a single thread polling for I/O then the context will get us off the I/O thread and let the polling continue. We are currently thinking that there will need to be a polling thread which handles all of the I/Os, and there could potentially be more than one of these if different libraries aren't cooperating on sharing a single thread. >> >> Here's the code plus the demo Tk app (you'll need your own Holmes.txt file for the sample app to run): >> >> Contexts.py: http://pastebin.com/ndS53Cd8 Tk context: >> http://pastebin.com/FuZwc1Ur Tk app: http://pastebin.com/Fm5wMXpN >> Hardwork.py: http://pastebin.com/nMMytdTG >> >> >> >> >> -----Original Message----- >> From: Python-ideas >> [mailto:python-ideas-bounces+dinov=microsoft.com at python.org] On Behalf >> Of Calvin Spealman >> Sent: Monday, October 15, 2012 7:16 AM >> To: Nick Coghlan >> Cc: python-ideas at python.org >> Subject: Re: [Python-ideas] The async API of the future: yield-from >> >> On Mon, Oct 15, 2012 at 9:48 AM, Nick Coghlan wrote: >>> On Mon, Oct 15, 2012 at 10:31 PM, Calvin Spealman wrote: >>>> Currently, "with yield expr:" is not valid syntax, surprisingly. >>> >>> It's not that surprising, it's the general requirement that yield >>> expressions must be enclosed in parentheses except when used >>> standalone or in a simple assignment statement. >>> >>> "with (yield expr):" is valid syntax though, so I'm reluctant to >>> endorse doing anything substantially different if the parentheses are >>> omitted. >> >> Silly oversight on my part, and I agree that the parens shouldn't make the difference in meaning. >> >>> I think the combination of "yield from" to delegate control >>> (including exception handling) completely to a subgenerator and >>> "context manager >>> + for loop + explicit yield" when an operation needs to yield >>> + multiple >>> times and the exception handling behaviour should be left to the >>> caller (as in the "as_completed" case) should cover the necessary >>> behaviours. >> >> I'm still -1 on delegating control to subgenerators with yield-from, versus having the scheduler just deal with them directly. I think it is far less flexible. >> >> I would still like to see a less confusing "with yield expr:" by simply allowing it without parens, but no special meaning. I think it would be really useful in coroutines. >> >> with yield collect() as tasks: >> yield task1() >> yield task2() >> results = yield tasks >> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> >> >> >> -- >> Read my blog! I depend on your acceptance of my opinion! I am interesting! >> http://techblog.ironfroggy.com/ >> Follow me if you're into that sort of thing: >> http://www.twitter.com/ironfroggy >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > > > > > -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas From turnbull at sk.tsukuba.ac.jp Tue Oct 16 06:07:42 2012 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 16 Oct 2012 13:07:42 +0900 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <5078D4CE.1040201@pearwood.info> <507923C4.8040201@pearwood.info> <20121013102229.259572ad@bhuda.mired.org> <5079FC79.3040506@canterbury.ac.nz> <507A01AB.2060708@mrabarnett.plus.com> <20121014145738.57948600@bhuda.mired.org> Message-ID: <87626b6ovl.fsf@uwakimon.sk.tsukuba.ac.jp> Mike Graham writes: > IMO it's essential that we add source code escapes. Imagine the > one-liners this will allow! > > def f(xs):\n\ttry:\n\t\treturn x.pop()\n\texcept ValueError\n\t\treturn None > > Can we get this fix applied in Python 2.2 and up? Why not go all the way back to v1.5.2? All it takes is a version bump to v1j.5.2. From greg.ewing at canterbury.ac.nz Tue Oct 16 07:25:24 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 18:25:24 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CEFC4.6040700@canterbury.ac.nz> Calvin Spealman wrote: > A "sane stack trace" only makes sense if we assume that tasks > "call" each other in the same kind of call tree that synchronous code flows > in, and I don't think that is necessarily the case. No, but often it *is* the case, and in those cases we would like to get a traceback that correctly reflects the chain of calls. > There are cases when one > task might want to end before tasks it as "called" are complete, and if we use > yield-from this is *impossible* but it is very useful. That depends on what you mean by "use yield-from". It's true that yield-from *on its own* can't achieve the effect of spawning concurrent subtasks; other mechanisms will need to be brought to bear at some point. But there's no reason a solution involving those other mechanisms can't be encapsulated in a library function that you invoke using yield-from. I've posted a couple of examples of how a par() function which does that might be written. > yield-from semantics won't allow a called task to continue, if needed, after the > calling task itself has completed. You seem to be imagining that more is being claimed about the abilities of yield-from than is actually being claimed. Yield-from is just a procedure call; the important thing is what the called procedure does. One of the things it can do is invoke a scheduler primitive that spawns an independent task. In my example scheduler, this is spelled scheduler.schedule(task). This is not a yield-from call, it's just an ordinary call. It adds the given generator to the list of ready tasks, so that it will get run when its chance comes around. Meanwhile, the calling task carries on. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 07:39:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 18:39:20 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CF308.2050803@canterbury.ac.nz> Nick Coghlan wrote: > (this is why I disagree with Greg that > "yield from" can serve as the one true API - it doesn't handle partial > iteration, and it doesn't handle pre- or post- processing around the > suspension points while iterating). I'm aware of the iteration problem, but I'm not convinced that the convolutions necessary to make it possible to use a for-loop for this are worth the bother, as opposed to simply accepting that you can't use the for statement in this situation, and using some other kind of loop. In any case, even if we decide to provide a scheduler instruction to enable using for-loops on suspendable iterators somehow, it doesn't follow that we should use scheduler instructions for anything *else*. I would consider such a scheduler instruction to be a stopgap measure until we can find a better solution -- just as yield-from is a better solution than using "call" and "return" scheduler instructions. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 07:46:12 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 18:46:12 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> Message-ID: <507CF4A4.4090806@canterbury.ac.nz> Laurens Van Houtven wrote: > On Mon, Oct 15, 2012 at 5:32 PM, Nick Coghlan > wrote: > > My preferred way of thinking of "yield from" is as a simple > refactoring tool > > I agree. That's how I've used it. Maybe that's just short-sightedness. And that's exactly how *I* see it as well! Which means some people must be misinterpreting something I'm saying, if they think I see it some other way. -- Greg From ncoghlan at gmail.com Tue Oct 16 08:01:51 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 16:01:51 +1000 Subject: [Python-ideas] filename comparison [was] Re: PEP 428 - object-oriented filesystem paths In-Reply-To: References: Message-ID: On Tue, Oct 16, 2012 at 6:21 AM, Jim Jewett wrote: > Ideally, I would also have a way to find out that a pathname is likely > to be problematic for cross-platform uses, or at least whether two > specific pathnames are known to be collision-prone on existing > platforms other than mine. (But I'm not sure that sort of test can be > reliable enough for the stdlib. Would just check for caseless > equality, reserved Windows names, and non-alphanumeric characters in > the filename?) I'd forgotten about it until reading this, but I think you can get into trouble with Unicode normalisation as well - so, I think we can safely dismiss this as an irrelevant tangent and just stick with Antoine's basic Windows vs Posix distinction. If need be, the strategies can be exposed at a later date (via keyword-only arguments) if we come up with a more convincing use case. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Oct 16 08:43:52 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 16 Oct 2012 17:43:52 +1100 Subject: [Python-ideas] filename comparison [was] Re: PEP 428 - object-oriented filesystem paths In-Reply-To: References: Message-ID: <20121016064351.GA20296@ando> On Mon, Oct 15, 2012 at 04:21:50PM -0400, Jim Jewett wrote: > If I want my program (or a dict) to know that "CONFIG" and "config" > are the same, then I also want it to know that "My Documents" is the > same as "MYDOCU~1".* Well, perhaps you do, but those not using Windows are unlikely to care about DOS short names. However, they may care about some other form of short name. E.g. on iso9660 file systems (CDs) long names are just truncated; if two truncated names clash, the second and subsequent file is given a three digit suffix: this-is-a-long-file THIS-IS-A-LONG-NAME My Documents get renamed to: THIS_IS_ THIS_000 MY_DOCUM although my Linux computer displays those names in lower case. The Rock Ridge and Joliet extensions can record the unmangled file names, but not all CDs use them. It is not the case that all case-insensitive file systems necessarily support DOS short names. There are file systems that don't support long names at all, there are case-insensitive file systems that preserve case, and those that don't. It's not even necessarily so that Windows is always case-insensitive: http://support.microsoft.com/kb/929110 -- Steven From greg.ewing at canterbury.ac.nz Tue Oct 16 09:20:08 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 20:20:08 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> Message-ID: <507D0AA8.6090509@canterbury.ac.nz> Guido van Rossum wrote: > But there needs to be another way to get a task running immediately > and concurrently; I believe that would be > > a = spawn(foo_task()) > > right? One could then at any later point use > > ra = yield from a Hmmm. I suppose it *could* be made to work that way, but I'm not sure it's a good idea, because it blurs the distinction between invoking a subtask synchronously and waiting for the result of a previously spawned independent task. Recently I've been thinking about an implementation where it would look like this. First you do t = spawn(foo_task()) but what you get back is *not* a generator; rather it's a Task object which wraps a generator and provides various operations. One of them would be r = yield from t.wait() which waits for the task to complete and then returns its value (or if it raised an exception, propagates the exception). Other operations that a Task object might support include t.unblock() # wake up a blocked task t.cancel() # unschedule and clean up the task t.throw(exception) # raise an exception in the task (I haven't included t.block(), because I think that should be a stand-alone function that operates on the current task. Telling some other task to block feels like a dodgy thing to do.) > One could also combine these and do e.g. > > a = spawn(foo_task()) > b = spawn(bar_task()) > > ra, rb = yield from par(a, b) If you're happy to bail out at the first exception, you wouldn't strictly need a par() function for this, you could just do a = spawn(foo_task()) b = spawn(bar_task()) ra = yield from a.wait() rb = yield from b.wait() > Have I got the spelling for spawn() right? In many other systems (e.g. > threads, greenlets) this kind of operation takes a callable, not the > result of calling a function (albeit a generator). That's a result of the fact that a generator doesn't start running as soon as you call it. If you don't like that, the spawn() operation could be defined to take an uncalled generator and make the call for you. But I think it's useful to make the call yourself, because it gives you an opportunity to pass parameters to the task. > If it takes a > generator, would it return the same generator or a different one to > wait for? In your version above where you wait for the task simply by calling it with yield-from, spawn() would have to return a generator (or something with the same interface). But it couldn't be the same generator -- it would have to be a wrapper that takes care of blocking until the subtask is finished. -- Greg From pjdelport at gmail.com Tue Oct 16 09:27:01 2012 From: pjdelport at gmail.com (Piet Delport) Date: Tue, 16 Oct 2012 09:27:01 +0200 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks In-Reply-To: <507BD49D.3020704@canterbury.ac.nz> References: <507BD49D.3020704@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 11:17 AM, Greg Ewing wrote: > Piet Delport wrote: > >> 2. Each value yielded as a "step" represents a scheduling instruction, >> or primitive, to be interpreted by the task's scheduler. > > > I don't think this technique should be used to communicate > with the scheduler, other than *maybe* for a *very* small > set of operations that are truly primitive -- and even then > I'm not convinced. But this is by necessity how the scheduler is *already* being communicated with, at least for the de facto scheduler instructions like None, Future, and the other primitives being discussed. This concept of an "intermediate object yielded by a task to its scheduler on each step, instructing it how to schedule" is already unavoidably fundamental to how these tasks / coroutines work: this proposal is just an attempt to name that concept, and define it more clearly. > To begin with, there are some operations that *can't* rely > on yielded instructions as the only way of invoking them. > Spawning a task, for example -- there must be some way for > non-task code to invoke that, otherwise you wouldn't be able > to get top-level tasks into the system. I'm definitely not suggesting that this be the *only* way of invoking operations, or that all operations should be invoked this way. Certainly, everything that is possible inside this protocol will also be possible outside of it by directly calling methods on some global scheduler, but that requires knowing who and what that global scheduler is. It's important to note that a globally identifiable scheduler object might not even exist: it's entirely reasonable, for example, to implement this entire protocol in Twisted by writing a deferTask(task) helper that handles generic scheduler instructions (None, Future-alike, and things like spawn() and sleep()) by just arranging for the appropriate Twisted callbacks and resumptions to happen under the hood. (This is basically how Twisted's deferredGenerator works currently: the main difference is that a deferTask() implementation would be able to run any generic coroutine / generator task code that uses this protocol, without that code having to know about Twisted.) Regarding getting top-level tasks into the system, this can be done in a variety of ways, depending on how particular applications are structured. For example, if the stdlib grows a standardized default event loop: tasklib.DefaultScheduler(tasks).start() or: result = tasklib.run(task()) or with existing frameworks like Twisted: deferTask(task()).addCallback(consume) deferTasks(othertasks) reactor.start() In other words, only the top level of an application should need to worry about how the initial scheduler, tasks, and everything else are started. > Also, consider the operation of unblocking a task that's > waiting for some event to occur. Often you will want to > invoke this using a callback from an event loop, which is > not a generator and can't yield anything to anywhere. This can be done with a scheduler primitive that obtains a callable to resume the current task, like the strawman: resume = yield tasklib.get_resume() from the other thread. However the exact API ends up looking, suspending and resuming tasks are very fundamental operations, and probably the most worth having as standardized instructions that any scheduler can implement: a variety of more powerful abstractions can be generically built on top of them. > Given that these operations must provide a way of invoking > them using a plain function call, there is little reason > to provide a second way using a yielded instruction. I don't see the former as an argument to avoid supporting the same operations as standard yielded instructions. A task can arrange to wait for a Future using plain function calls, or by yielding it as an instruction (i.e., "result = yield some_future()"): the ability to do the former should not make the latter any less desirable. The advantage of treating certain primitives as yielded scheduler instructions is that: - It's generic and scheduler-agnostic: for example, any task can simply yield a Future to its scheduler without caring exactly how the scheduler arranges for add_done_callback() to resume the task. - It requires no global coordination: every generator task already has a direct line of communication to its immediate scheduler, without having to identify itself using handles, task ids, or other mechanisms. In other words, it's the difference between saying: h = get_current_task_handle() current_scheduler.sleep(h, 10) yield current_scheduler.suspend(h) yield and, saying: yield tasklib.sleep(10) yield tasklib.suspend() where sleep(n) and suspend() are simple generic objects that any scheduler can recognize and implement, just like how yielded None and Future values are recognized and implemented. > In any case, I believe that the public interface for *any* > scheduler operation should not be a yielded instruction, > but either a plain function or something called using > yield-from, for reasons I explained to Guido earlier. In other words, limiting the allowable set of yielded scheduler instructions to None, and doing everything else separate API? This is possible, but it seems like an awful waste of the perfectly good and dedicated communication channel that already exists between tasks and their schedulers, in favor of something more complex and indirect. There's certainly a motivation for global APIs too, as with the discussion about getting standardized event loops and schedulers in the stdlib, but I think that is solving a somewhat different problem, and see this no reason to tie coroutines / generator tasks to those APIs when simpler, more generic and universal protocol could be defined. To me, defining locally how a scheduler should behave and respond to certain yielded types and values is a much more tractable problem than the question of designing a good global scheduler API that exposes all the same operations in a way that's portable and usable across many different application architectures and lifecycles. > There are problems with allowing multiple schedulers to > coexist within the one system, especially if yielded > instructions are the only way to communicate with them. > > It might work for instructions to a task's own scheduler > concerning itself, but some operations need to operate on > a *different* task, e.g. unblocking a task when the event > it was waiting for occurs. How do you know which scheduler > is managing it? The point of a protocol like this is that there would be no need for tasks to know which schedulers are managing what: they can limit themselves to using a generic protocol. For example, the par() implementation I gave assumes the primitive: resume = yield tasklib.get_resume() to get a callable to resume itself, and can simply pass that callable to the tasks it spawns: the last child to complete just calls resume() to resume the parent task in its own scheduler. In this example, the resume callable contains all the necessary state to resume that particular task. A particular scheduler could implement this primitive by sending back a closure like: lambda: current_scheduler.schedule(the_task) In the case of something like deferTask(), there need not even be any particular long-lived scheduler aside from the transient calls arranged by deferTask, and all the state would live in the Twisted reactor and its queues: lambda: reactor.callLater(_defertask_iterate, the_task) As far as the generic protocol is concerned, it does not matter whether there's a single global scheduler, or multiple schedulers, or no single scheduler at all: the scheduler side of the protocol is free to be implemented in many ways, and manage its state however it's convenient. > And even if you can find out, if you have to control it using yielded > instructions, you have no way of yielding something to a different > task's scheduler. Generally speaking, this should not be necessary: inter-task communication is a different question to how tasks should communicate with their immediate scheduler. Generically controlling the scheduling of different tasks can be done in many ways: - The way par() passes its resume callable to its spawned children. - Using synchronization primitives: for example, an alternative way to implement something like par() without direct use of suspend/resume is cooperative condition variable or semaphore. - Using queues, channels, or similar mechanisms to communicate information between tasks. (The communicated values can implicitly even be scheduler instructions themselves, like a queue of Futures.) If something cannot be done inside this generator task protocol, you can of course still step outside of it and use other mechanisms directly, but that necessarily ties your code to those mechanisms, which may not be as simple and universal as code that only relies on this protocol. From greg.ewing at canterbury.ac.nz Tue Oct 16 09:45:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 16 Oct 2012 20:45:35 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> Message-ID: <507D109F.3020302@canterbury.ac.nz> Calvin Spealman wrote: > If we allow spawn(task()) > then we're not getting nice tracebacks anyway, so I think we should > allow > > future1 = yield task1() # spawn task > future2 = yield task2() # spawn other task I don't think it's necessary to allow 'yield task' as a method of spawning in order to get nice tracebacks for spawned tasks. In the Task-object-based system I'm thinking about, if an exception reaches the top level of a Task, it gets stored in the Task object until another task wait()s for it, and then it continues to propagate. This makes sense, because the wait() establishes a task-subtask relationship, so the traceback should proceed from the subtask to the waiting task. > Both are primitives we > need to support as first-class operation. That is, without some wrapper > like spawn(). In my system, spawn() isn't a wrapper -- it *is* the primitive way to create an independent task. And I think it's the only one we need. -- Greg From ncoghlan at gmail.com Tue Oct 16 09:48:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 17:48:24 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507CF308.2050803@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 3:39 PM, Greg Ewing wrote: > In any case, even if we decide to provide a scheduler > instruction to enable using for-loops on suspendable > iterators somehow, it doesn't follow that we should use > scheduler instructions for anything *else*. The only additional operation needed is an async equivalent to the concurrent.futures.wait() API, which would allow you to provide a set of Futures and say "let me know when one of these operations are done" (http://docs.python.org/py3k/library/concurrent.futures#concurrent.futures.wait) As it turns out, this shouldn't *need* a new scheduler primitive in Guido's design, since it can be handled by hooking up an appropriate callback to the supplied future objects. Following code isn't tested, but given my understanding of how Guido wants things to work, it should do what I want: def _wait_first(futures): # futures must be a set, items will be removed as they complete signal = Future() def chain_result(completed): futures.remove(completed) if completed.cancelled(): signal.cancel() signal.set_running_or_notify_cancel() elif completed.done(): signal.set_result(completed.result()) else: signal.set_exception(completed.exception()) for f in futures: f.add_done_callback(chain_result) return signal def wait_first(futures): return _wait_first(set(futures)) def as_completed(futures): remaining = set(futures) while 1: if not remaining: break yield _wait_first(remaining) @task def load_url_async(url) return url, (yield urllib.urlopen_async(url)).read() @task def example(urls): for get_next_page in as_completed(load_url_async(url) for url in urls): try: url, data = yield get_next_page except Exception as exc: print("Something broke: {}".format(exc)) else: print("Loaded {} bytes from {!r}".format(len(data), url)) There's no scheduler instruction, there's just Guido's core API concept: the only thing a tasklet is allowed to yield is a Future object, and the step of registering tasks to be run is *always* done via an explicit call to the event loop rather than via the "yield" channel. The yield channel is only used to say "wait for this operation now". What this approach means is that, to get sensible iteration, all you need is an ordinary iterator that produces future objects instead of reporting the results directly. You can then either call this operator with "yield from", in which case the individual results will be ignored and the first failure will abort the iteration, *or* you can invoke it with an explicit for loop, which will be enough to give you control over how exceptions are handled by means of an ordinary try/except block rather than a complicated exception chain. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Oct 16 11:30:01 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Oct 2012 11:30:01 +0200 Subject: [Python-ideas] The async API of the future: yield-from References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <4ee5fb995c854c1699bc4edfde9e1ae6@BY2PR03MB596.namprd03.prod.outlook.com> Message-ID: <20121016113001.3f5e9415@pitrou.net> On Mon, 15 Oct 2012 19:19:29 -0700 Guido van Rossum wrote: > > Here you're basically arguing for greenlets/gevent -- you're saying > you just don't want to put up with the yields everywhere. But the > popularity of Twisted and Tornado show that at least some people are > willing to make even bigger sacrifices in order to be able to do async > I/O efficiently -- i.e., to solve the C10K problem that Christian > Tismer referred to (http://www.kegel.com/c10k.html, > http://en.wikipedia.org/wiki/C10k_problem). To be honest, one of the selling points of Twisted is not that it solves the C10k problem, it's that it's a comprehensive network programming toolkit. I'd bet many users of Twisted don't care that much about the single-thread event-loop approach, and don't have C10k-like problems. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From solipsis at pitrou.net Tue Oct 16 11:43:15 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Oct 2012 11:43:15 +0200 Subject: [Python-ideas] The async API of the future: yield-from References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> Message-ID: <20121016114315.7b967a64@pitrou.net> On Tue, 16 Oct 2012 17:48:24 +1000 Nick Coghlan wrote: > > def _wait_first(futures): > # futures must be a set, items will be removed as they complete > signal = Future() > def chain_result(completed): > futures.remove(completed) > if completed.cancelled(): > signal.cancel() > signal.set_running_or_notify_cancel() > elif completed.done(): > signal.set_result(completed.result()) > else: > signal.set_exception(completed.exception()) > for f in futures: > f.add_done_callback(chain_result) > return signal > > def wait_first(futures): > return _wait_first(set(futures)) > > def as_completed(futures): > remaining = set(futures) > while 1: > if not remaining: > break > yield _wait_first(remaining) > > @task > def load_url_async(url) > return url, (yield urllib.urlopen_async(url)).read() > > @task > def example(urls): > for get_next_page in as_completed(load_url_async(url) for url in urls): > try: > url, data = yield get_next_page > except Exception as exc: > print("Something broke: {}".format(exc)) > else: > print("Loaded {} bytes from {!r}".format(len(data), url)) Your example looks rather confusing to me. There are a couple of things I don't understand: - why does load_url_async return something instead of yielding it? - how does overlapping of reads happen? you seem to consider that a read() will be non-blocking once the server starts responding to your request, which is only true if the response is small (or you have a very fast connection to the server). - if read() is really non-blocking, why do you yield get_next_page? What does that achieve? Actually, what is yielding a tuple supposed to achieve at all? - where is control transferred over to the scheduler? it seems it's only in get_next_page, while I would expect it to be transferred in as_completed as well. Regards Antoine. From shane at umbrellacode.com Tue Oct 16 14:04:33 2012 From: shane at umbrellacode.com (Shane Green) Date: Tue, 16 Oct 2012 05:04:33 -0700 Subject: [Python-ideas] re-implementing Twisted for fun and profit In-Reply-To: <79BE3A91-9A01-4632-97D2-1761999FAA97@twistedmatrix.com> References: <40862DD9-DF71-4280-A47F-B20E7E742254@twistedmatrix.com> <42AC178D-A7E1-47D7-8B83-F2F6B390BE1C@twistedmatrix.com> <58AA33EF-BF3C-4725-BD4A-743EA3E26266@umbrellacode.com> <537E074C-49B5-47A2-978F-D0592862B74E@twistedmatrix.com> <09394CD5-9950-49CF-A25E-C906B70F3BC9@umbrellacode.com> <79BE3A91-9A01-4632-97D2-1761999FAA97@twistedmatrix.com> Message-ID: <5BEBE586-52E1-40C7-8398-17744FB6A512@umbrellacode.com> You make an excellent point about the different levels being discussed. Yes, you understand my point well. For some reason I've always hated thinking of the promise as immutable, but that's the normal terminology. The reality is that a Promise represents the output of an operation, and will emit the output of that operation to all callers that register with it. The promise doesn't pass itself as the value to the callbacks, so its immutability is somewhat immaterial. I'm not arguing with you on that point, just the general description of the pattern. The more I think about it, the more I'm realizing how inappropriate something like a deferred or promise is to this discussion. Unfortunately my knowledge of coroutines is somewhat limited, and my time the lasts couple of days, and the next couple, is preventing me from giving it a good think through. I understand them well enough to know they're cool, and I'm pretty sure I like the idea of making them the event loop mechanism. I think it would be good for us all to continuously revisit concrete examples during the discussion, because the set of core I/O are small enough to revisit multiple times. If a much more general mechanism naturally falls out then great. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Oct 15, 2012, at 8:51 AM, Glyph wrote: > > On Oct 15, 2012, at 1:03 AM, Shane Green wrote: > >> Namely, all callbacks registered with a given Promise instance, receive the output of the original operation > > This is somewhat tangential to the I/O loop discussion, and my hope for that discussion is that it won't involve Deferreds, or Futures, or Promises, or any other request/response callback management abstraction, because requests and responses are significantly higher level than accept() and recv() and do not belong within the same layer. The event loop ought to provide tools to experiment with event-driven abstractions so that users can use Deferreds and Promises - which are, fundamentally, perfectly interoperable, and still use standard library network protocol implementations. > > What I think you were trying to say was that callback addition on Deferreds is a destructive operation; whereas your promises are (from the caller's perspective, at least) immutable. Sometimes I do think that the visibly mutable nature of Deferreds was a mistake. If I read you properly though, what you're saying is that you can do this: > > promise = ... > promise.then(alpha).then(beta) > promise.then(gamma).then(delta) > > and in yield-coroutine style this is effectively: > > value = yield promise > beta(yield alpha(value)) > delta(yield gamma(value)) > > This deficiency is reasonably easy to work around with Deferreds. You can just do: > > def fork(d): > dprime = Deferred() > def propagate(result): > dprime.callback(result) > return result > d.addBoth(propagate) > return dprime > > and then: > > fork(x).addCallback(alpha).addCallback(beta) > fork(x).addCallback(gamma).addCallback(delta) > > Perhaps this function should be in Twisted; it's certainly come up a few times. > > But, the fact that the original result is immediately forgotten can also be handy, because it helps the unused result get garbage collected faster, even if multiple things are hanging on to the Deferred after the initial result has been processed. And it is actually pretty unusual to want to share the same result among multiple callers (which is why this function hasn't been added to the core yet). > > -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 16 14:08:15 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 16 Oct 2012 22:08:15 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <20121016114315.7b967a64@pitrou.net> References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> Message-ID: On Tue, Oct 16, 2012 at 7:43 PM, Antoine Pitrou wrote: > Your example looks rather confusing to me. There are a couple of things > I don't understand: > > - why does load_url_async return something instead of yielding it? It yields *and* returns, that's the way Guido's API works (as I understand it). However, some of the other stuff was just plain mistakes in my example. Background (again, as I understand it, and I'm sure Guido will correct me if I'm wrong. So, if you think this sounds crazy, *please wait until Guido clarifies* before worrying too much about it): - the "@task" decorator is the part that knows how to interface generators with the event loop (just as @contextmanager adapts between generators and with statements). I believe it handles these things: - when you call it, it creates the generator object and calls next() to advance it to the first yield point - this initial call returns a Future that will fire only when the entire *task* is complete - if a Future is yielded by the underlying generator, the task wrapper adds the appropriate callback to ensure results are pushed back into the underlying generator on completion of the associated operation - when one of these callbacks fires, the generator is advanced and a yielded Future is processed in the same fashion - if at any point the generator finishes instead of yielding another Future, then the callback will call the appropriate notification method on the originally *returned* Future - yielding anything other than a Future from a tasklet is not permitted - it's the IO operations themselves that know how to kick off operations and register the appropriate callbacks with the event loop to get the Future to be triggered - The Future object API is documented in concurrent.futures: http://docs.python.org/py3k/library/concurrent.futures#future-objects I've now posted this example as a gist (https://gist.github.com/3898874), so it should be a easier to read over there. However, I've included it inline below as well. - This first part in my example is a helper function to wait for any one of a set of Futures to be signalled and help keep track of which ones we're still waiting for def _wait_first(futures): # futures must be a set as items will be removed as they complete # we create a signalling future to return to our caller. We will copy # the result of the first future to complete to this signalling future signal = Future() def copy_result(completed): # We ignore every callback after the first one if signal.done(): return # Keep track of which ones have been processed across multiple calls futures.remove(completed) # It would be nice if we could also remove our callback from all the other futures at # this point, but the Future API doesn't currently allow that # Now we pass the result of this future through to our signalling future if completed.cancelled(): signal.cancel() signal.set_running_or_notify_cancel() else: try: result = completed.result() except Exception as exc: signal.set_exception(exc) else: signal.set_result(result) # Here we hook our signalling future up to all our actual operations # If any of them are already complete, then the callback will fire immediately # and we're OK with that for f in futures: f.add_done_callback(copy_result) # And, for our signalling future to be useful, the caller must be able to access it return signal - This is just a public version of the above helper that works with arbitrary iterables: def wait_first(futures): # Helper needs a real set, so we give it one # Also makes sure all operations start immediately when passed a generator return _wait_first(set(futures)) - This is the API I'm most interested in, as it's the async equivalent of http://docs.python.org/py3k/library/concurrent.futures#concurrent.futures.as_completed, which powers this URL retrieval example: http://docs.python.org/py3k/library/concurrent.futures#threadpoolexecutor-example # Note that this is an *ordinary iterator*, not a tasklet def as_completed(futures): # We ensure all the operations have started, and get ourselves a set to work with remaining = set(futures) while remaining: # The trick here is that we *don't yield the original futures directly* # Instead, we yield yield _wait_first(remaining) And now a more complete, heavily commented, version of the example: # First, a tasklet for loading a single page @task def load_url_async(url) # The async URL open operation does three things: # 1. kicks off the connection process # 2. registers a callback with the event handler that will signal a Future object when IO is complete # 3. returns the future object # We then *yield* the Future object, at which point the task decorator takes over and registers a callback # with the *Future* object to resume this generator with the *result* that was passed to the Future object conn = yield urllib.urlopen_async(url) # We assume "conn.read()" is defined in such a way that it allows both "read everything at once" usage *and* a # usage where you read the individual bits of data as they arrive like this: # for wait_for_chunk in conn.read(): # chunk = yield wait_for_chunk # The secret is that conn.read() would be an *ordinary generator* in that case rather than a tasklet. # You could also do a version that *only* supported complete reads, in which case the "from" wouldn't be needed data = yield from conn.read() # We return both the URL *and* the data, so our caller doesn't have to keep track of which url the data is for return url, data # And now the payoff: defining a tasklet to read a bunch of URLs in parallel, processing them in the order of loading rather than the order of requesting them or having to wait until the slowest load completes before doing anything @task def example(urls): # We define the tasks we want to run based on the given URLs # This results in an iterable of Future objects that will fire when # the associated page has been read completely tasks = (load_url_async(url) for url in urls) # And now we use our helper iterable to run things in parallel # and get access to the results as they complete for wait_for_page in as_completed(tasks): try: url, data = yield wait_for_page except Exception as exc: print("Something broke for {!r} ({}: {})".format(url, type(exc), exc)) else: print("Loaded {} bytes from {!r}".format(len(data), url)) # The real kicker here? Replace "yield wait_for_page" with "wait_for_page.result()" and you have the equivalent concurrent.futures code. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Steve.Dower at microsoft.com Tue Oct 16 16:21:10 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 14:21:10 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net>, Message-ID: > It yields *and* returns, that's the way Guido's API works (as I understand it). I can't speak for Guido obviously, but you've certainly described what we came up with perfectly (http://pastebin.com/ndS53Cd8, the _Awaiter class starts on line 93). > # The real kicker here? Replace "yield wait_for_page" with > "wait_for_page.result()" and you have the equivalent > concurrent.futures code. Basically, the task/tasklet/async decorator aggregates the futures from inside the wrapped method and exposes a single future to the caller. Your example doesn't even need a scheduler or event loop, and we found that all the event loop was really doing was running the callbacks in a certain thread/context/equivalent. And because there is a future coming out of every call, the user can choose when to stop using tasklets and go back to using plain old futures (or whatever subclasses have been used). From solipsis at pitrou.net Tue Oct 16 17:10:37 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Oct 2012 17:10:37 +0200 Subject: [Python-ideas] The async API of the future: yield-from References: <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> Message-ID: <20121016171037.4dc9bf24@pitrou.net> On Tue, 16 Oct 2012 14:21:10 +0000 Steve Dower wrote: > > It yields *and* returns, that's the way Guido's API works (as I understand it). > > I can't speak for Guido obviously, but you've certainly described what we came up with perfectly (http://pastebin.com/ndS53Cd8, the _Awaiter class starts on line 93). > > > # The real kicker here? Replace "yield wait_for_page" with > > "wait_for_page.result()" and you have the equivalent > > concurrent.futures code. > > Basically, the task/tasklet/async decorator aggregates the futures from inside the wrapped method and exposes a single future to the caller. Your example doesn't even need a scheduler or event loop, and we found that all the event loop was really doing was running the callbacks in a certain thread/context/equivalent. I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood... Regards Antoine. -- Software development and contracting: http://pro.pitrou.net From ironfroggy at gmail.com Tue Oct 16 17:18:22 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 11:18:22 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507CB078.8040506@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CB078.8040506@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 8:55 PM, Greg Ewing wrote: > Calvin Spealman wrote: > >> I'm still -1 on delegating control to subgenerators with yield-from, >> versus having the scheduler just deal with them directly. > > Do you mean to *disallow* using yield-from for this, or just > not to encourage it? > > I don't see how you *could* disallow it; there's no way for > the scheduler to know whether one of the generators it's > handling is delegating using yield-from. > > I also can't see any reason you would want to discourage it. > Given that yield-from exists, it's an obvious thing to do. I have since changed my position slightly. I don't want to disallow it, no. I don't want to discourage, no. But, I do think *both* are useful. I think "yield from" is the obvious way to "call" between tasks, but that there are other cases when we want to spawn a task to begin without blocking our task, and that "yield" should be used here. We should be table to simply yield a task to tell the scheduler to start it, possibly getting a Future in return which we can use to get the eventual result. This may make it easier to do multiple sub-tasks together. We might yield N tasks, and then "yield from wait(futures)" to wait for them all to complete. > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ironfroggy at gmail.com Tue Oct 16 17:20:53 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 11:20:53 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507C6EB7.3090601@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 10:10 PM, Guido van Rossum wrote: > On Mon, Oct 15, 2012 at 1:14 PM, Greg Ewing wrote: >> Nick Coghlan wrote: >> >>> The main primitive I personally want out of an async API is a >>> task-based equivalent to concurrent.futures.as_completed() [1]. This >>> is what I meant about iteration being a bit of a mess: the way the >>> as_completed() works, the suspend/resume channel of the iterator >>> protocol is being used to pass completed future objects back to the >>> calling iterator. That means that channel *can't* be used to talk >>> between the coroutine and the scheduler, >> >> >> I had to read this a couple of times before I figured out >> what you're talking about, but I get it now. >> >> This is an instance of a general problem that was noticed >> back when I was discussing my cofunctions idea: using >> generator-based coroutines, it's not possible to have a >> "suspendable iterator", because that would require "yield" >> to have two conflicting meanings: "suspend this coroutine" >> on one hand, and "provide a value to my caller" on the >> other. >> >> Unfortunately, I suspect that a truly elegant solution to this >> problem will require yet another language addition -- something >> like >> >> yield for item in subtask(): >> ... >> >> which would run a slightly different version of the iterator >> protocol in which values to be yield are wrapped somehow >> (I haven't figured out all the details yet). > > I think I ran into a similar issue with NDB when defining iteration > over an asynchronous query. My solution: > > q = > it = q.iter() # Fire off the query to the datastore > while (yield it.has_next_async()): # Block until one result > emp = it.next() # Get the result that was buffered on the iterator > print emp.name, emp.age # Use it Crazy Idea I Probably Don't Actually Want: for yield emp in q: print emp.name, emp.age Turns into something like: _it = iter(q) for _emp in _it: emp = yield _emp print emp.name, emp.age > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ironfroggy at gmail.com Tue Oct 16 17:27:54 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 11:27:54 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507D109F.3020302@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 3:45 AM, Greg Ewing wrote: > Calvin Spealman wrote: >> >> If we allow spawn(task()) >> then we're not getting nice tracebacks anyway, so I think we should >> allow >> >> future1 = yield task1() # spawn task >> future2 = yield task2() # spawn other task > > > I don't think it's necessary to allow 'yield task' as a > method of spawning in order to get nice tracebacks for > spawned tasks. Necessary, no. But I think it feels obvious that you yield things you are waiting on, and so you want to start a task if you yield it. Also, its going to be a common primitive, so I think it should be very easy and clear to write. > In the Task-object-based system I'm thinking about, if > an exception reaches the top level of a Task, it gets > stored in the Task object until another task wait()s > for it, and then it continues to propagate. > > This makes sense, because the wait() establishes a > task-subtask relationship, so the traceback should > proceed from the subtask to the waiting task. What if two tasks call wait() on the same subtask which raises an error? I think we should let errors propagate through yield-from, primarily. That's what it exists for. >> Both are primitives we >> >> need to support as first-class operation. That is, without some wrapper >> like spawn(). > > > In my system, spawn() isn't a wrapper -- it *is* the > primitive way to create an independent task. And I > think it's the only one we need. It has to know what scheduler to talk to, right? We might want to allow multiple schedulers, and tasks shouldn't know who their scheduler is (right?) so that is another advantage of "yield task()" > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From Steve.Dower at microsoft.com Tue Oct 16 18:31:55 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 16:31:55 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <20121016171037.4dc9bf24@pitrou.net> References: <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> Message-ID: > I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood... Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it. Of course, without an event loop you still need to wait on the future - for polling I/O you could return a subclassed future where waiting starts the polling loop if there isn't a better event loop available. My view is that the most helpful thing to have in the standard is a way for any code to find and interact with an event loop - if we can discover a scheduler/context/loop/whatever and use its commands for "run this callable as soon as you can" and "run this callable when this condition is true" then we can have portable support for polling or event-based I/O (as well as being able to handle other thread-sensitive code such as in UIs). For optimal support, you'll need to have very close coupling between the scheduler and the asynchronous operations. This can be built on top of the portable support, but aiming for optimal support initially is a good way to make this API painful to use and more likely to be ignored. From solipsis at pitrou.net Tue Oct 16 18:39:59 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 16 Oct 2012 18:39:59 +0200 Subject: [Python-ideas] The async API of the future: yield-from References: <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> Message-ID: <20121016183959.481b4823@pitrou.net> On Tue, 16 Oct 2012 16:31:55 +0000 Steve Dower wrote: > > I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood... > > Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it. Well, how do you plan for that callback to execute Python code? Regards Antoine. From ironfroggy at gmail.com Tue Oct 16 18:54:37 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 12:54:37 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> Message-ID: On Tue, Oct 16, 2012 at 12:31 PM, Steve Dower wrote: >> I'm sure doing concurrent I/O will require an event loop, unless you use threads under the hood... > > Polling I/O will require some sort of loop, yes, but I/O that triggers a callback at the OS level (such as ReadFileEx and WriteFileEx on Windows) doesn't need it. > > Of course, without an event loop you still need to wait on the future - for polling I/O you could return a subclassed future where waiting starts the polling loop if there isn't a better event loop available. What if the event poll was just inside a task, not requiring any loop in the scheduler, or even knowledge by the scheduler, in any way? An extremely rudimentary version: class Selector(object): def __init__(self): self.r = [] self.w = [] self.x = [] self.futures = {} def add(self, t, fd, future): self.futures[fd] = future getattr(self, t).append(fd) def __iter__(self): return self def __next__(self): r = [fd for fd,future in self.r] w = [fd for fd,future in self.w] x = [fd for fd,future in self.x] r, w, x = select(r, w, x) for fd in chain(r, w, x): self.futures.pop(fd).set_result(fd) for fd in r: self.r.remove(fd) for fd in w: self.w.remove(fd) for fd in x: self.x.remove(fd) This, if even to the scheduler, would handle polling completely outside the scheduler, which makes it easier to mix and match event loops you need to use in a single project. I know I probably got details wrong. > My view is that the most helpful thing to have in the standard is a way for any code to find and interact with an event loop - if we can discover a scheduler/context/loop/whatever and use its commands for "run this callable as soon as you can" and "run this callable when this condition is true" then we can have portable support for polling or event-based I/O (as well as being able to handle other thread-sensitive code such as in UIs). > > For optimal support, you'll need to have very close coupling between the scheduler and the asynchronous operations. This can be built on top of the portable support, but aiming for optimal support initially is a good way to make this API painful to use and more likely to be ignored. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From Steve.Dower at microsoft.com Tue Oct 16 19:04:34 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 17:04:34 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <20121016183959.481b4823@pitrou.net> References: <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> <20121016183959.481b4823@pitrou.net> Message-ID: > Well, how do you plan for that callback to execute Python code? IMO, that is the most important question in all of this discussion. With any I/O some waiting is required - there must be a point where the application is not doing anything other than waiting for the I/O to complete, regardless of whether a loop is used or not. (Ideally the I/O is already complete by the time we start waiting.) The callbacks in the particular examples require a thread to be in an alertable wait state, which is basically equivalent to select(), though a little less discriminatory (as in, ANY I/O callback can interrupt an alertable wait). In my view, these callbacks should be 'leaving a message' for the main program to run a particular function when it next has a chance. Like an interrupt handler, the aim is to do the minimum amount of work and then get out of the way. Having a context (or event loop, message loop or whatever you want to call it) as I described in my last email lets us do the minimum amount of work. I posted our implementation of such a context earlier and Dino posted an example/recipe for using the concept with an existing event loop (Tcl). So while I said we don't _need_ an event loop, that relies on the asynchronous operations being on a separate thread or otherwise not requiring the current thread to pay any attention to them, AND assumes that the continuations are agile and can be run on any thread (or in any process, or whatever granularity you are working at). I believe some way of getting code running back where it started from is essential, and this is most easily done with a loop. From ironfroggy at gmail.com Tue Oct 16 19:25:31 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 13:25:31 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> <20121016183959.481b4823@pitrou.net> Message-ID: On Tue, Oct 16, 2012 at 1:04 PM, Steve Dower wrote: >> Well, how do you plan for that callback to execute Python code? > > IMO, that is the most important question in all of this discussion. > > With any I/O some waiting is required - there must be a point where the application is not doing anything other than waiting for the I/O to complete, regardless of whether a loop is used or not. (Ideally the I/O is already complete by the time we start waiting.) The callbacks in the particular examples require a thread to be in an alertable wait state, which is basically equivalent to select(), though a little less discriminatory (as in, ANY I/O callback can interrupt an alertable wait). > > In my view, these callbacks should be 'leaving a message' for the main program to run a particular function when it next has a chance. Like an interrupt handler, the aim is to do the minimum amount of work and then get out of the way. I like this model as well. However, I recognize some problems with it. If we don't kick whatever handles the callback and result immediately, we are essentially re-introducing pre-emptive scheduling. If TaskA is waiting on the result of TaskB, and when TaskB finishes we say "OK, but we need to go let TaskC do something before TaskA is given that result" then we leave room for C to break things, modify state, and generally act in a less-than-determinable way. I really *like* this model better, I just don't know the best way to reconcile this problem. > Having a context (or event loop, message loop or whatever you want to call it) as I described in my last email lets us do the minimum amount of work. I posted our implementation of such a context earlier and Dino posted an example/recipe for using the concept with an existing event loop (Tcl). > > So while I said we don't _need_ an event loop, that relies on the asynchronous operations being on a separate thread or otherwise not requiring the current thread to pay any attention to them, AND assumes that the continuations are agile and can be run on any thread (or in any process, or whatever granularity you are working at). I believe some way of getting code running back where it started from is essential, and this is most easily done with a loop. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From Steve.Dower at microsoft.com Tue Oct 16 19:17:34 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 17:17:34 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> Message-ID: > What if the event poll was just inside a task, not requiring any loop in the scheduler, or even knowledge by the scheduler, in any way? I agree, every task can handle all the asynchrony within it and just expose a single 'completed' notification (a Future or similar) to its caller. This is the portable solution - it is going to be less than optimal in some cases, but is much more composable and extensible. As a Python developer, I like the model of "I call this function normally and it gives me a Future to let me know when it's done but I don't really know how it's doing it." (Incidentally, I like it as a C# and C++ developer too.) From glyph at twistedmatrix.com Tue Oct 16 20:15:06 2012 From: glyph at twistedmatrix.com (Glyph) Date: Tue, 16 Oct 2012 11:15:06 -0700 Subject: [Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures) In-Reply-To: References: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> Message-ID: <4636067A-25BC-4D42-B872-BF132401F8BA@twistedmatrix.com> On Oct 15, 2012, at 6:51 PM, Guido van Rossum wrote: > (...) But seriously, thanks for repeating the explanation for my benefit. Glad it was useful. To be fair, I think this is the first time I've actually written the whole thing down. And I didn't even get the whole thing down, I missed the following important bit: > I see your example as a perfect motivation for adding some kind of map() primitive. (...) You're correct, of course; technically, a map() primitive resolves all the same issues. It's possible to do everything with generator coroutines that it's possible to do with callbacks explicitly; I shouldn't have made the case for sequencing callbacks on the basis that the behavior can't be replicated. And, modulo any of my other suggestions, a "map" primitive is a good idea - Twisted implements such a primitive with 'gatherResults' (although, of course, it works on any Deferred, not just those returned by inlineCallbacks). The real problem with generator coroutines is that if you make them the primitive, you have an abstraction inversion if you want to have callbacks (which, IMHO, are simply more straightforward in many cases). By using a generator scheduler, you're still using callbacks to implement the sequencing. At some point, you have to have some code calling x.next(), x.send(...), x.close(), and raising StopIteration(), but they are obscured by syntactic sugar. You still need a low-level callback-scheduling API to integrate with the heart of the event loop. One area where this abstraction inversion bites you is performance. Now, my experience might be dated here; I haven't measured in a few years, but as nice as generators can be for structuring complex event flows, that abstraction comes with a non-trivial performance cost. Exceptions in Python are much better than they used to be, but in CPython they're still not free. Every return value being replaced with a callback trampoline is bad, but replacing it instead with a generator being advanced, an exception being raised and a callback trampoline is worse. Of course, maybe inlineCallbacks is just badly implemented, but reviewing the implementation now it looks reasonably minimal. I don't want to raise the specter of premature optimization here; I'm not claiming that the implementation of the scheduler needs to be squeezed for every ounce of performance before anyone implements anything. But, by building in the requirement for these unnecessary gyrations to support syntax sugar for every request/response event-driven operation, one precludes the possibility of low-level optimizations for performance-sensitive event coordination later. Now, if a PyPy developer wants to chime in and tell me I'm full of crap, and either now or in the future StopIteration Exceptions will be free, and will actually send your CPU back in time, as well as giving a pet kitten as a present to a unicorn every time you 'raise', I'll probably believe it and happily retire this argument forever. But I doubt it. I'll also grant that it's possible that I'm just the equivalent of a crotchety old assembler programmer here, claiming that we can't afford these fancy automatic register allocators and indirect function calls and run-time linking because they'll never be fast enough for real programs. But I will note that, rarely as you need it, assembler does still exist at some layer of the C compiler stack, and you can write it yourself if you really want to; nothing will get in your way. So that's mainly the point I'm trying to make about a Deferred-like abstraction. Aside from matters of taste and performance, you need to implement your generator coroutines in terms of something, and it might as well be something clean and documented that can be used by people who feel they need it. This will also help if some future version of Python modifies something about the way that generators work, similar to the way .send() opened the door for non-ugly coroutines in the first place. Perhaps some optimized version of 'return' with a value? If the coroutine scheduler is firmly in terms of some other eventual-result API (Deferreds, Futures, Promises), then adding support to that scheduler for @yield_coroutine_v2 should be easy; as would adding support for other things I don't like, like tasklets and greenlets ;). > It also handles the input arriving in batches (as they do for App Engine Datastore queries). (...) >> ... I think I've mentioned already in one of my previous posts. ... > NDB's map() does this. I'm curious as to how this works. If you are getting a Future callback, don't you only get that once? How do you re-sequence all of your generators to run the same step again when more data is available? > In general, whenever you want parallelism in Python, you have to introduce a new function, unless you happen to have a suitable function lying around already; I'm glad we agree there, at least :). > so I don't feel I am contradicting myself by proposing a mechanism using callbacks here. It's the callbacks for sequencing that I dislike. Earlier I was talking about implementing event sequencing as callbacks, which you kind of have to do either way. Separately, there's the issue of presenting event sequencing as control flow. While this is definitely useful for high-level applications - at my day job, about half the code I write is decorated with @inlineCallbacks - these high-level applications depend on a huge amount of low-level code (protocol parsers, database bindings, thread pools) being written and exhaustively tested, whose edge cases are much easier to exhaustively flesh out with explicit callbacks. When you need to test a portion of the control flow, there's no need to fool a generator into executing down to a specific branch point; you just pull out the callback to a top-level name rather than a closure and call it directly. Also, correct usage of generator coroutines depends on a previous understanding of event-driven programming. This is why Twisted core maintainers are not particularly sanguine about inlineCallbacks and generally consider it a power-tool for advanced users rather than an introductory facility to make things easier. In our collective experience helping people understand both Deferreds and inlineCallbacks, there are different paths to enlightenment. When learning Deferreds, someone with no previous event-driven experience will initially be disgusted; why does their code have to look like such a mess? Then they'll come to terms with the problem being solved and accept it, but move on to being perplexed: what the heck are these Deferreds doing, anyway? Finally they start to understand what's happening and move on to depending on the reactor to much, and are somewhat baffled by callbacks never being called. Finally they realize they should start testing their code by firing Deferreds synchronously and inspecting results, and everything starts to come together. Keep in mind, as you read the following, that I probably couldn't do my job as effectively without inlineCallbacks and I am probably its biggest fan on the Twisted team, also :). When learning with inlineCallbacks, someone with no previous event-driven experience will usually be excited. The 'yield's are weird, but almost exciting - it makes the code feel more advanced somehow, and they sort of understand the concurrency implications, but not really. It's also convenient! They just sprinkle in a 'yield' any time they need to make a call that looks like maybe it'll block sometimes. Everything works okay for a while, and then (inevitably, it seems) they happen across some ordering bug and just absolutely cannot figure out, which causes state corruption (because they blithely stuck a 'yield' between two things that really needed to be in an effective critical section) or hangs (generators hanging around waiting on un-fired Deferreds so you don't even get the traceback out of GC closing them because something's keeping a reference to them; harder to debug even than "normal" unfired Deferreds because they're not familiar with how to inspect or trace the flow of event execution, since the code looked "normal"). Now, this is easier to back out of than a massive multithreaded (read) mess, because the code does at least have a finite number of visible task-switch points, and it's usually possible to track it down with some help. But the experience is not pleasant, because by this point there are usually 10-deep call-stacks of generator-calling-a-generator-calling-a-generator and, especially in the problematic cases, it's not clear what got started from where. inlineCallbacks is a great boon to promoting Twisted usage, because some people never make it out of the "everything works okay for a while" phase, and it's much easier to get started. We certainly support it as best we can - optimize it, add debugging information to it - because we want people to have the best experience possible. So it's not like it's unmaintained or anything. But, without Deferreds to fall back down to in order to break down sequencing into super explicit, individual steps, without any potentially misleading syntactic sugar, I don't know how we'd help these folks. I have a few thoughts on how our experiences have differed here, since I'm assuming you don't hear these sorts of complaints about NDB. One is that Twisted users are typically dealing with a truly bewildering diversity of events, whereas NDB is, as you said, mostly a database client. It's not entirely unusual for a Twisted application to be processing events from a serial port, a USB device, some PTYs, a couple of server connections, some timed events, some threads (usually database connections) and some HTTP client connections. Another is that we only hear from users with problems. Maybe there are millions of successful users of inlineCallbacks who have architected everything from tiny scripts to massive distributed systems without ever needing to say so much as a how-do-you-do to the Twisted mailing list or IRC channel. (Somehow I doubt this is completely accurate but maybe it accounts for some of our perspective.) Nevertheless I feel like the strategy of backing out a generator into lower-level discrete callback-sequenced operations is a very important tool in the debugging toolbox. > Or maybe map_async()'s Future's result should be a set? Well really it ought to be a dataflow of some kind so you can enumerate it as it's going :). But I think if the results arrive in some order you ought to be able to see that order in application code, even if you usually don't care. > I don't want to belabor this point, but it bugs me a little bit that we get so much feedback from the broader Python community along the lines of "Why doesn't Twisted do X? > > I don't think I quite said that. Sorry, I didn't mean to say that you did. I raised the point because people who do say things like that tend to cite your opinions that e.g. Monocle is something new and different as reasons why they thought that Twisted didn't do what it did. (I certainly sympathize with the pressure that comes along with everyone scrutinizing every word one says and trying to discover hidden meaning; I'm sure that in a message as long as this one, someone will draw at least five wrong conclusions from me, too.) > But I suspect it happens because Twisted is hard to get into. Part of it's a marketing issue. Like, if we just converted all of our examples to inlineCallbacks and let people trip over the problems we've seen later on, I'm sure we would get more adoption, and possibly not even a backlash later; people with bugs in their programs tend to think that there's a bug in their programs. They only blame the tools when the programs are hard to write in the first place. Part of it is a background issue. GUI programmers and people who have worked with multiplayer games instantly recognize what Deferreds are for and are usually up and running within minutes. People primarily with experience with databases and web servers - a pretty big audience, in this day and age - are usually mystified. But, there are intractable parts of it, too. The Twisted culture is all about extreme reliability and getting a good reputation for systems built using it, and I guess we've made some compromises about expanding our audience in service of that goal. > I suspect anything using higher-order functions this much has that problem; I feel this way about Haskell's Monads. I've heard several people who do know Haskell say things like "Deferreds are just a trivial linearization of the I/O eigenfunctor over the monadic category of callbacks" and it does worry me. I still think they're relatively straightforward - I invented them in one afternoon when I was about 20 and they have changed relatively little since then - but if they're actually a homomorphism of the lambda calculus over the event manifold as it approaches the monad limit (or whatever: does anyone else feel like Haskell people have great ideas, but they have sworn a solemn vow to only describe them in a language that can only be translated by using undiscovered stone tablets buried on the dark side of the moon?) then I can understand why some users have a hard time. > I wouldn't be surprised if many Twisted lovers are also closet (or not) Haskell lovers. There are definitely some appealing concepts there. Their 'async' package, for example, does everything in the completely wrong, naive but apparently straightforward way that Java originally did (asynchronous exceptions? communication via shared mutable state? everything's a thread? no event-driven I/O?) but I/O is so limited and the VM is so high tech that it might actually be able to work. I suppose I can best summarize my feelings as . Anyway, back on topic... > I don't feel very strongly about integrating GUI systems. IIRC Twisted has some way to integrate with certain GUI event loops. I don't think we should desire any more (but neither, less). Yeah, all we do is dispatch Twisted events from the GUI's loop, usually using the GUI's built-in support for sockets. So your GUI app runs as a normal app. You can, of course, return a Deferred from a function that prompts the user for input, and fire it from a GUI callback, and that'll all work fine: Deferreds don't actually depend on the reactor at all, so you can use them from any callback (they are only in the 'internet' package where the event loop goes for unfortunate historical reasons). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthias at urlichs.de Tue Oct 16 20:40:07 2012 From: matthias at urlichs.de (Matthias Urlichs) Date: Tue, 16 Oct 2012 18:40:07 +0000 (UTC) Subject: [Python-ideas] asyncore: included batteries don't fit References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: I'll have to put in my ..02? here ? Guido van Rossum writes: > (2) We're at a fork in the road here. On the one hand, we could choose > to deeply integrate greenlets/gevents into the standard library. Yes. I have two and a half reasons for this. (?) Ultimately I think that switching stacks around is always going to be faster than unwinding and re-winding things with yield(). (1) It's a whole lot easier to debug a problem with gevent than with anything which uses yield / Deferreds / asyncore / whatever. With gevent, you get a standard stack trace. With anything else, the "where did this call come from" information is not part of the call chain and thus is either unavailable, or will have to be carried around preemptively (with associated overhead). (2) Nothing against Twisted or any other async frameworks, but writing any nontrivial program in it requires warping my brain into something that's *not* second nature in Python, and never going to be. Python is not Javascript; if you want to use the "loads of callbacks" programming style, use node.js. Personal experience: I have written an interpreter for an asynchronous and vaguely Pythonic language which I use for home automation, my lawn sprinkers, and related stuff (which I should probably release in some form). The code was previously based on Twisted and was impossible to debug. It now uses gevent and Just Works. -- -- Matthias Urlichs From guido at python.org Tue Oct 16 21:58:18 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2012 12:58:18 -0700 Subject: [Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures) In-Reply-To: <4636067A-25BC-4D42-B872-BF132401F8BA@twistedmatrix.com> References: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> <4636067A-25BC-4D42-B872-BF132401F8BA@twistedmatrix.com> Message-ID: On Tue, Oct 16, 2012 at 11:15 AM, Glyph wrote: [lots] It'll be days before I digest all of that. But thank you very much for writing it all up. You bring up all sorts of interesting issues. I think I would like to start discovering some of the issues by writing an extensive prototype using Greg Ewing's model -- it is the most radical but therefore most worthy of some serious prototyping before either adopting or rejecting it. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 16 22:07:12 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2012 13:07:12 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507CF308.2050803@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> Message-ID: On Mon, Oct 15, 2012 at 10:39 PM, Greg Ewing wrote: > Nick Coghlan wrote: >> >> (this is why I disagree with Greg that >> "yield from" can serve as the one true API - it doesn't handle partial >> iteration, and it doesn't handle pre- or post- processing around the >> suspension points while iterating). > > > I'm aware of the iteration problem, but I'm not convinced > that the convolutions necessary to make it possible to use > a for-loop for this are worth the bother, as opposed to > simply accepting that you can't use the for statement in > this situation, and using some other kind of loop. > > In any case, even if we decide to provide a scheduler > instruction to enable using for-loops on suspendable > iterators somehow, it doesn't follow that we should use > scheduler instructions for anything *else*. I don't see how we could ever have a for-loop that yields on every iteration step. The for-loop never uses yield. Thus there can be no direct equivalent to as_completed() in the PEP 380 or PEP 342 coroutine worlds. > I would consider such a scheduler instruction to be a stopgap > measure until we can find a better solution -- just as > yield-from is a better solution than using "call" and "return" > scheduler instructions. I can already see the armchair language designers race to propose syntax the puts a yield keyword in the for-loop syntax at a point where it is currently not allowed. Let's nip that in the bud and focus on something that can work with Python 3.3. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 16 22:18:02 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2012 13:18:02 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507D0AA8.6090509@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D0AA8.6090509@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing wrote: > Guido van Rossum wrote: > >> But there needs to be another way to get a task running immediately >> and concurrently; I believe that would be >> >> a = spawn(foo_task()) >> >> right? One could then at any later point use >> >> ra = yield from a > > > Hmmm. I suppose it *could* be made to work that way, but I'm > not sure it's a good idea, because it blurs the distinction > between invoking a subtask synchronously and waiting for the > result of a previously spawned independent task. Are you sure you really want to distinguish between those though? In NDB they are intentionally the same -- invoking some API whose name ends in _async() starts an async subtask and returns a Future; you wait for the subtask by yielding the Future. Starting multiple tasks is just a matter of calling several _async() APIs; then you can wait for any or all of them using yield [future1, future2, ...] *or* by yielding the futures one at a time. This gives users a gentle introduction to concurrency (first they use the synchronous APIs; then they learn to use yield foo_async(); then they learn they can write: f = foo_async() r = yield f and finally they learn about spawning multiple tasks: f1 = foo_async() f2 = bar_async() rfoo, rbar = yield f1, f2 > Recently I've been thinking about an implementation where > it would look like this. First you do > > t = spawn(foo_task()) > > but what you get back is *not* a generator; rather it's > a Task object which wraps a generator and provides various > operations. One of them would be > > r = yield from t.wait() > > which waits for the task to complete and then returns its > value (or if it raised an exception, propagates the exception). > > Other operations that a Task object might support include > > t.unblock() # wake up a blocked task > t.cancel() # unschedule and clean up the task > t.throw(exception) # raise an exception in the task > > (I haven't included t.block(), because I think that should > be a stand-alone function that operates on the current task. > Telling some other task to block feels like a dodgy thing > to do.) Right. I'm looking forward to a larger example. >> One could also combine these and do e.g. >> >> a = spawn(foo_task()) >> b = spawn(bar_task()) >> >> ra, rb = yield from par(a, b) > > > If you're happy to bail out at the first exception, you > wouldn't strictly need a par() function for this, you could > just do > > > a = spawn(foo_task()) > b = spawn(bar_task()) > ra = yield from a.wait() > rb = yield from b.wait() > > >> Have I got the spelling for spawn() right? In many other systems (e.g. >> threads, greenlets) this kind of operation takes a callable, not the >> result of calling a function (albeit a generator). > > > That's a result of the fact that a generator doesn't start > running as soon as you call it. If you don't like that, the > spawn() operation could be defined to take an uncalled generator > and make the call for you. But I think it's useful to make the > call yourself, because it gives you an opportunity to pass > parameters to the task. Agreed, actually. I was just checking. >> If it takes a >> generator, would it return the same generator or a different one to >> wait for? > > > In your version above where you wait for the task simply > by calling it with yield-from, spawn() would have to return a > generator (or something with the same interface). But it > couldn't be the same generator -- it would have to be a wrapper > that takes care of blocking until the subtask is finished. That's fine with me (though Glyph would worry about creating too many objects). -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Oct 16 22:27:53 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 09:27:53 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> Message-ID: <507DC349.7010208@canterbury.ac.nz> Nick Coghlan wrote: > # Note that this is an *ordinary iterator*, not a tasklet > def as_completed(futures): > # We ensure all the operations have started, and get ourselves > a set to work with > remaining = set(futures) > while remaining: > # The trick here is that we *don't yield the original > futures directly* > # Instead, we yield > yield _wait_first(remaining) I've just figured out how your as_completed() thing works, and realised that it's *not* a general solution to the suspendable-iterator problem. You're making use of the fact that you know *how many* values there will be ahead of time, even if you don't know what they are yet. In general this won't be the case. I don't think there is any trick that will allow a for-loop to be used in the general case, because in order for an iterator to be suspendable, the call to next() would need to be made using yield-from, and it's hidden inside the for-loop implementation. I know you probably weren't intending as_completed() to be a solution to the general suspendable-iterator problem. I just wanted to record my thoughts on this. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 22:48:51 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 09:48:51 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CB078.8040506@canterbury.ac.nz> Message-ID: <507DC833.6050507@canterbury.ac.nz> Calvin Spealman wrote: > I think "yield from" is the obvious way to "call" between tasks, but that > there are other cases when we want to spawn a task to begin without > blocking our task, and that "yield" should be used here. I've thought of another problem with this. In my scheduler at least, simply spawning a task doesn't immediately allow that task, or any other, to run. Using "yield" to spell this operation gives the impression that it could be a suspension point, when it's actually not. It also forces anything that uses it to be called with "yield from", all the way up, so if you're relying on the presence of yield-froms to warn you of potential suspension points, you'll get false positives. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 16 23:14:11 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 10:14:11 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> Message-ID: <507DCE23.5080801@canterbury.ac.nz> Calvin Spealman wrote: > What if two tasks call wait() on the same subtask which raises an > error? That would be disallowed. A given Task would only be allowed to have its wait() method called once. The reason for this restriction is because of the way tracebacks are attached to exception objects in Python 3, which means that exceptions are effectively single-use now. If it weren't for that, the exception could simply be raised in *both* waiters. > I think we should let errors propagate through yield-from, > primarily. That's what it exists for. Yes, and that's exactly what my wait() mechanism does. You call the wait() method using yield-from. The important idea is that just because you spawn a task, it doesn't necessarily follow that you want to be regarded as the *parent* of that task and receive its exceptions. That only becomes clear when you wait() for it. >>In my system, spawn() isn't a wrapper -- it *is* the >>primitive way to create an independent task. And I >>think it's the only one we need. > > > It has to know what scheduler to talk to, right? Yes, but in my world, there is only *one* scheduler. I understand that not everyone thinks that's a good idea, and I'm thinking about ways to remove that restriction. But I'm not yet sure that it *should* be removed even if it can. It seems to me that having multiple schedulers is inviting many of the same problems as having multiple event loops, and this whole disussion is centred on the idea that there should only be one of those. Just to be clear, I'm not saying there should only be one scheduler *implementation* in existence -- only that there should only be one *instance* of some scheduler implementation in any given program (or thread, if you're using those). And there should be a standard interface for it and an agreed way of finding the instance. What you're saying is that the standard interface should consist of yielded instructions and the instance should be found implicitly using dynamic scoping. This is *very* different from the kind of interface used for everything else in Python, and I'm not yet convinced that such a large amount of weirdness is justified. -- Greg From guido at python.org Tue Oct 16 23:31:00 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 16 Oct 2012 14:31:00 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DCE23.5080801@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing wrote: > The important idea is that just because you spawn a task, it > doesn't necessarily follow that you want to be regarded as the > *parent* of that task and receive its exceptions. That only > becomes clear when you wait() for it. Maybe. But the opposite doesn't follow either. It's a toss-up between the spawner and the waiter. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 16 23:31:53 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 16 Oct 2012 21:31:53 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DCE23.5080801@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> Message-ID: > Yes, but in my world, there is only *one* scheduler. > > Just to be clear, I'm not saying there should only be one scheduler *implementation* in existence -- only that > there should only be one *instance* of some scheduler implementation in any given program (or thread, if > you're using those). And there should be a standard interface for it and an agreed way of finding the instance. I agree with this entirely. There are a lot of optimisations to be had with different scheduler implementations, but the only way this can be portable is with a minimum supported interface and a standard way to find it. From ironfroggy at gmail.com Wed Oct 17 00:33:55 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 18:33:55 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DC833.6050507@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CB078.8040506@canterbury.ac.nz> <507DC833.6050507@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 4:48 PM, Greg Ewing wrote: > Calvin Spealman wrote: > >> I think "yield from" is the obvious way to "call" between tasks, but that >> there are other cases when we want to spawn a task to begin without >> blocking our task, and that "yield" should be used here. > > > I've thought of another problem with this. In my scheduler at > least, simply spawning a task doesn't immediately allow that > task, or any other, to run. Using "yield" to spell this operation > gives the impression that it could be a suspension point, when > it's actually not. While i still like the feeling, I must concede this point. I could see them being yielded and forgotten... assuming they would suspend. Dang. > It also forces anything that uses it to be called with "yield > from", all the way up, so if you're relying on the presence of > yield-froms to warn you of potential suspension points, you'll > get false positives. > > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From ironfroggy at gmail.com Wed Oct 17 00:37:33 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Tue, 16 Oct 2012 18:37:33 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DCE23.5080801@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 5:14 PM, Greg Ewing wrote: > Calvin Spealman wrote: > >> What if two tasks call wait() on the same subtask which raises an >> error? > > > That would be disallowed. A given Task would only be allowed > to have its wait() method called once. > > The reason for this restriction is because of the way tracebacks > are attached to exception objects in Python 3, which means that > exceptions are effectively single-use now. If it weren't for > that, the exception could simply be raised in *both* waiters. > > >> I think we should let errors propagate through yield-from, >> primarily. That's what it exists for. > > > Yes, and that's exactly what my wait() mechanism does. You call > the wait() method using yield-from. > > The important idea is that just because you spawn a task, it > doesn't necessarily follow that you want to be regarded as the > *parent* of that task and receive its exceptions. That only > becomes clear when you wait() for it. > > >>> In my system, spawn() isn't a wrapper -- it *is* the >>> primitive way to create an independent task. And I >>> think it's the only one we need. >> >> >> >> It has to know what scheduler to talk to, right? > > > Yes, but in my world, there is only *one* scheduler. Practically speaking, that is nice. But, are there use cases for multiple schedulers we should support? I also like the idea of the scheduler being an iterable, and thus itself being something you can schedule. Turtles all the way down. > I understand that not everyone thinks that's a good idea, > and I'm thinking about ways to remove that restriction. But > I'm not yet sure that it *should* be removed even if it can. > It seems to me that having multiple schedulers is inviting > many of the same problems as having multiple event loops, > and this whole disussion is centred on the idea that there > should only be one of those. > > Just to be clear, I'm not saying there should only be one > scheduler *implementation* in existence -- only that there > should only be one *instance* of some scheduler implementation > in any given program (or thread, if you're using those). And > there should be a standard interface for it and an agreed > way of finding the instance. > > What you're saying is that the standard interface should > consist of yielded instructions and the instance should be > found implicitly using dynamic scoping. This is *very* > different from the kind of interface used for everything > else in Python, and I'm not yet convinced that such a > large amount of weirdness is justified. I don't follow the part about "found implicitly using dynamic scoping". What do you mean? In my model, the tasks never find the scheduler at all. They don't directly access it at all. > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From pjdelport at gmail.com Wed Oct 17 00:56:44 2012 From: pjdelport at gmail.com (Piet Delport) Date: Wed, 17 Oct 2012 00:56:44 +0200 Subject: [Python-ideas] Proposal: A simple protocol for generator tasks In-Reply-To: References: Message-ID: On Mon, Oct 15, 2012 at 12:48 PM, Calvin Spealman wrote: > > What is the difference between the tossed around "yield from task()" > and this "yield tasklib.spawn(task())" "yield from task()" is simply the coroutine / task version of a function call: it runs the task to completion, and returns its final result. "yield tasklib.spawn(task())" (or however it ends up being spelled) would be a scheduler primitive to start a task *without* waiting for its result: in other words, it's a request that the scheduler start a new, independent thread of control. > And, why isn't it simply spelled "yield task()"? You have all these different > types that can be yielded to the scheduler from tasks to the scheduler. Why > isn't a task one of those possible types? If the scheduler gets an iterator, it > should schedule it automatically. This is a good question: I stopped short of discussing it in the original message only to keep it short, and in the hope that the answer is implied. The short answer is that "yield task()" is the old, hacky, cumbersome, "legacy"[1] way of calling subtasks, and that "yield from" should entirely replace the need to have to support it. Before "yield from", "yield task()" was the only to call subtasks, but this approach has some major disadvantages: 1. In order for it to work, schedulers must manually implement task trampolining, which is ugly at best, and prone to bugs if not all edge cases are handled correctly. (IOW, it effectively places the burden of implementing PEP 380 onto each scheduler.) 2. It obfuscates exception tracebacks by default, requiring schedulers that want readable stack traces to take additional pains to clean up their own non-task frames, while propagating exceptions. 3. It requires schedulers to reliably distinguish between tasks and other primitives in the first place. Simply treating all iterators as tasks is not sufficient: to run a task, you need send() and throw(), at least. (Type-checking for GeneratorType would be marginally better, but would unnecessarily preclude for example implementing tasks as classes or C extension types, which is otherwise entirely possible with this protocol.) "yield from" simplifies and solves all these problems in elegant swoop: 1. No more manual trampolining: a scheduler can treat any task as a single unit, and only needs to worry about the single, combined stream of instructions coming from it. 2. Tracebacks (and return values) take care of themselves, as they should. 3. By separating the concerns of direct scheduler communication ("yield") and subtask delegation ("yield from"), schedulers can limit themselves to just knowing about scheduler primitives when dealing yielded values, which should be more easily and tightly defined than the full spectrum of tasks in general. (The set of officially-defined scheduler instructions could end up being as small as None and Future, say.) In summary, it's entirely possible for schedulers to continue supporting the old "yield task()" way of calling subtasks (and this has no problem fitting into the proposed protocol[2]), but there should be no reason to do so, and several good reasons not to: hopefully, it will become a pre-3.3 historical footnote. [1] For the purposes of this email, interpret "legacy" to mean "older than 17 days". :) [2] Interpreted as a scheduler instruction, a task value would simply mean "resume the current task with the result of completing the yielded subtask" (modulo the practical question of reliably type-checking tasks, as mentioned). >> Raising TypeError or NotImplementedError back into the task is probably >> a reasonable action, and would allow code like: >> >> def task(): >> try: >> yield fancy_magic_instruction() >> except NotImplementedError: >> yield from boring_fallback() >> ... > > Interesting. Can anyone think of an example of this? I just want to note for the record that I'm not *encouraging* this kind of thing: I'm just just observing that it would be allowed by the protocol. (However, one imaginable use case would be for tasks to send scheduler-specific hints, that can safely be ignored when those tasks are running on other scheduler implementations.) >> This is a plain observation on its own, however, it raises one or two >> interesting possibilities for more interesting schedulers implemented as >> generator tasks themselves, including: >> >> - Specialized sub-schedulers that run as a normal task within their >> parent scheduler, but implement for example weighted or priority >> queuing of their subtasks, or similar features. > > I think that is too messy, you could have so many different scheduler > semantics. Maybe this sort of thing is what your schedule-specific > instructions should be for. It shouldn't get messy: the core semantics of any scheduler should always stay within the proposed protocol. The above is not the best example of a custom scheduler, though. Perhaps a better example would be a generic helper function like the following, that implements throttling throttling of I/O requests made through it: def task(): result = yield from io_throttled(subtask(), rate=foo) io_throttled() would end up sitting between task() and subtask() in the hierarchy, like so: ... -> task() -> io_throttled() -> subtask() -> ... To recap, each task is implicitly driven by the scheduler above it, and implicitly drives the task(s) below it: The outer scheduler drives task(), which drives io_throttled(), which drives subtask(), and so on. In this picture: "yield from" is the "most default" scheduler: it simply delegates all yielded instructions to the outer scheduler. However, instead of relying on "yield from", io_throttled() can dip down into the task protocol itself, and drive subtask() directly. This would allow it to inspect and manipulate the underlying instructions instructions and responses flowing back and forth, and, assuming that there's a recognizable standard representation for I/O primitives, it could keep track of the rate of I/O, and insert delay instructions as necessary (or something similar). The key observations I want to make: * io_throttled() is not special: it is just a normal task, as far as the tasks above and below it are concerned, and assumes only a recognizable representation of the fundamental I/O and delay instructions used. * To the extent that said underlying primitives are scheduler-agnostic, io_throttled() can be used or inserted anywhere, without caring how the underlying scheduler or event loop handles I/O, or how its global API looks. It just acts locally, in terms of the task protocol. An example where this kind of thing might actually be useful is an application or library that wishes to throttle, say, certain HTTP requests: it could simply internally wrap the tasks that make those requests in io_throttled(), without any special support from the underlying scheduler. This is of course not the only way to solve this particular problem, but it's an example of how thinking about generator tasks and their schedulers as two sides of the same underlying protocol could be a powerful abstraction, enabling a compositional approach to combining implementations of the protocol that might not be obvious or possible otherwise. -- Piet Delport From ncoghlan at gmail.com Wed Oct 17 06:21:27 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 17 Oct 2012 14:21:27 +1000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DC349.7010208@canterbury.ac.nz> References: <507B4F9D.6040701@canterbury.ac.nz> <507BBD53.5000602@canterbury.ac.nz> <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <507DC349.7010208@canterbury.ac.nz> Message-ID: On Wed, Oct 17, 2012 at 6:27 AM, Greg Ewing wrote: > Nick Coghlan wrote: > >> # Note that this is an *ordinary iterator*, not a tasklet >> def as_completed(futures): >> # We ensure all the operations have started, and get ourselves >> a set to work with >> remaining = set(futures) >> while remaining: >> # The trick here is that we *don't yield the original >> futures directly* >> # Instead, we yield >> yield _wait_first(remaining) > > > I've just figured out how your as_completed() thing works, > and realised that it's *not* a general solution to the > suspendable-iterator problem. You're making use of the fact > that you know *how many* values there will be ahead of time, > even if you don't know what they are yet. > > In general this won't be the case. I don't think there is > any trick that will allow a for-loop to be used in the general > case, because in order for an iterator to be suspendable, the > call to next() would need to be made using yield-from, and > it's hidden inside the for-loop implementation. Yeah, that's what lets me get away with not passing the sent results back down into the iterator (it can figure out from the original arguments when it needs to stop). It gets trickier if you want to terminate the iteration based on the result of an asynchronous operation. For example, here's a very simplistic way you could apply the concept of "yield a future to be handled in the loop body" to the operation of continuously reading binary data from a connection until EOF is received: def read(self): """This knows how to start an IO operation such the future will fire on completion""" future = ... return future # Again, notice this is *not* a tasklet, it's an ordinary iterator that produces Future objects def readall(self): """This can be used in two modes - as an iterator or as a coroutine. As a coroutine: data = yield from conn.readall() As an iterator: for wait_for_chunk in conn.readall(): try: chunk = yield wait_for_chunk except EOFError: break Obviously, the coroutine mode is far more convenient, but you *can* override the default accumulator behaviour if you want/need to by waiting on the individual futures explicitly. However, in this case, you lose the automatic loop termination behaviour, so, you may as well implement the underlying loop explicitly: while 1: try: chunk = yield self.read() except EOFError: break """ output = io.BytesIO() while 1: try: data = yield self.read() except EOFError: break if data: # This check makes iterator mode possible output.write(data) return output.getvalue() Impedance matching in a way that allows the exception handling to be factored out as well as the iteration step is a *lot* trickier, since you need to bring context managers into play if termination is signalled by an exception: # This version produces context managers rather than producing futures directly, and thus can't be # used directly as a coroutine def read_chunks(self): finished = False @contextmanager def handle_chunk(): nonlocal finished data = b'' try: data = yield self.read() except EOFError: finished = True return data while not finished: yield handle_chunk() # Usage for handle_chunk in conn.read_chunks(): with handle_chunk as wait_for_chunk: chunk = yield from wait_for_chunk # We end up doing a final "extra" iteration with chunk = b'' # So we'd likely need to guard with an "if chunk:" or "if not chunk: continue" # which again means we're not getting much value out of using the iterator Using an explicit "add_done_callback" doesn't help much, as you still have to deal with the exception being thrown back in to your generator. I know Guido doesn't want people racing off and designing new syntax for asynchronous iteration, but I'm not sure it's going to be possible to avoid it if we want a clean approach to "forking" the results of asynchronous calls between passing them down into a coroutine (to decide whether or not to terminate iteration) and binding them to a local variable (to allow local processing in the loop body). Compare the arcane incantations above to something like (similar to suggestions previously made by Christian Heimes): def read_chunks(self): """Designed for use as an asynchronous iterator""" while 1: try: yield self.read() except EOFError: break # Usage for chunk in yield from conn.read_chunks(): ... The idea here would be that whereas "for chunk in (yield from conn.read_chunks()):" runs the underlying coroutine to completion and then iterates over the return value, the version without the parentheses would effectively "tee" the values being sent back, *first* sending them to the underlying coroutine (to decide whether or not iteration should continue and to get the value to be yielded at the start of the next iteration) and then, if that doesn't raise StopIteration, binding them to the local variable and proceeding to execution of the loop body. All that said, I still like Guido's concept that the core asynchronous API is *really* future objects, just as it already is in the concurrent.futures module. The @task decorator and yielding future objects to that decorator is then just nice syntactic sugar for hooking generators up to the "add_done_callback" API of future objects. It's completely independent of the underlying event loop and/or asynchronous IO interfaces - those interfaces are about setting things up to invoke the set_* methods of the returned future objects correctly, just as they are with the Executor API in concurrent.futures. > I know you probably weren't intending as_completed() to be > a solution to the general suspendable-iterator problem. Right, I just wanted to be sure that *that particular use case* of waiting for a collection of futures and processing them in completion order could be handled in terms of Guido's API *without* needing any extra magic. The "iterate over data chunks until EOFError is raised" is a better example for highlighting the "how do you write an asynchronous iterator?" problem when it comes to generators-as-coroutines. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From pjdelport at gmail.com Wed Oct 17 07:31:10 2012 From: pjdelport at gmail.com (Piet Delport) Date: Wed, 17 Oct 2012 07:31:10 +0200 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507D109F.3020302@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> Message-ID: On Tue, Oct 16, 2012 at 9:45 AM, Greg Ewing wrote: > > In my system, spawn() isn't a wrapper -- it *is* the > primitive way to create an independent task. And I > think it's the only one we need. I think you will at minimum need a way to suspend and resume tasks, in addition to spawn(), as illustrated by the example of par() waiting for not CPU-bound tasks. This could be done either as actual suspend and resume primitives, or by building on a related set of synchronization primitives, such as queues, channels, or condition variables: there are a number of sets of that are mutually co-expressible. Suspending and resuming, in particular, is highly related to the question of how you reify a task as a conventional callback, when the need for that arises. Here's one possible way of approaching this with a combined suspend/resume primitive that might look familiar to people with a FP background: result = yield suspend(lambda resume: ...) (Here, "suspend" could be a scheduler-agnostic instruction object, a la tasklib.suspend(), or a method on a global scheduler.) suspend() would instruct the scheduler to stop running the current task, and call its argument (the lambda in the above example) with a "resume(value)" callable that will arrange to resume the task again with the given value. The body of the lambda (or whatever is passed to suspend()) would be responsible for doing something useful with the resume() callable: e.g. in par() example, it would arrange that the last child task triggers it. In particular, this suspend() could be used to integrate fairly directly with callback-based APIs: for example, if you have a Twisted Deferred, you could do: result = yield suspend(d.addCallback) to suspend the current task and add a callback to d that will resume it again, and receive the Deferred's result. To add support for exceptions, a variation of suspend() could pass two callables, mirroring pairs like send/throw, or callback/errback: result = yield suspend2(lambda resume, throw: ...) result = yield suspend2(d.addCallbacks) -- Piet Delport From greg.ewing at canterbury.ac.nz Wed Oct 17 08:04:31 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 19:04:31 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <507CF308.2050803@canterbury.ac.nz> <20121016114315.7b967a64@pitrou.net> <20121016171037.4dc9bf24@pitrou.net> <20121016183959.481b4823@pitrou.net> Message-ID: <507E4A6F.7050807@canterbury.ac.nz> Calvin Spealman wrote: > If we don't kick whatever > handles the callback and result immediately, we are essentially > re-introducing pre-emptive scheduling. If TaskA is waiting on the > result of TaskB, and when TaskB finishes we say "OK, but we need to go > let TaskC do something before TaskA is given that result" then we > leave room for C to break things, modify state, and generally act in a > less-than-determinable way. I don't see how the risk of this is any higher than the risk that some other task D gets run while task A is waiting and messes something up. Ultimately you have to trust your tasks to behave themselves. -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 17 09:26:44 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 20:26:44 +1300 Subject: [Python-ideas] Expressiveness of coroutines versus Deferred callbacks (or possibly promises, futures) In-Reply-To: <4636067A-25BC-4D42-B872-BF132401F8BA@twistedmatrix.com> References: <91C1C15D-6E43-4F60-B65D-F45C6BAAB6F6@twistedmatrix.com> <4636067A-25BC-4D42-B872-BF132401F8BA@twistedmatrix.com> Message-ID: <507E5DB4.1090009@canterbury.ac.nz> Glyph wrote: > The real problem with generator coroutines is that if you make them the > primitive, you have an abstraction inversion if you want to have > callbacks Has anyone suggested making generator coroutines "the primitive", whatever that means? Guido seems to have made it clear that he wants the interface to the event loop layer to be based on plain callbacks. To plug in a generator coroutine, you install a callback that wakes up the coroutine. So using generators with the event loop will be entirely optional. > I haven't measured in a few > years, but as nice as generators can be for structuring complex event > flows, that abstraction comes with a non-trivial performance cost. > ... Every return value being replaced with > a callback trampoline is bad, but replacing it instead with a generator > being advanced, an exception being raised /and /a callback trampoline is > worse. This is where we expect yield-from to help a *lot*, by removing almost all of that overhead. A return to the trampoline is only needed when a task wants to yield the CPU, instead of every time it makes a function call to a subgenerator. Returns are still a bit more expensive due to the StopIterations, but raising and catching an exception in C code is still fairly efficient compared to doing it in Python. (Although not quite as super-efficient as it was in Python 2.x, unfortunately, due to tracebacks being attached to exceptions, so that we can't instantiate exceptions lazily any more.) -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 17 09:30:16 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 20:30:16 +1300 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <507E5E88.1020808@canterbury.ac.nz> Matthias Urlichs wrote: > (1) It's a whole lot easier to debug a problem with gevent than with anything > which uses yield / Deferreds / asyncore / whatever. With gevent, you get a > standard stack trace. With anything else, the "where did this call come from" > information is not part of the call chain With yield-from this is no longer true -- you get exactly the same traceback from a yield-from call chain that you would get from the corresponding ordinary call chain, without having to do anything special. This is one of the beauties of it. -- Greg From tismer at stackless.com Wed Oct 17 10:25:03 2012 From: tismer at stackless.com (Christian Tismer) Date: Wed, 17 Oct 2012 10:25:03 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: <507E6B5F.7040604@stackless.com> Ok I'll add a buck... On 16.10.12 20:40, Matthias Urlichs wrote: > I'll have to put in my ..02? here ? > > Guido van Rossum writes: > >> (2) We're at a fork in the road here. On the one hand, we could choose >> to deeply integrate greenlets/gevents into the standard library. > Yes. > > I have two and a half reasons for this. > > (?) Ultimately I think that switching stacks around is always going to be faster > than unwinding and re-winding things with yield(). If you are emulating things in Python, that may be true. Also if you are really only switching stacks, that may be true. But both assumptions do not fit, see below. > > (1) It's a whole lot easier to debug a problem with gevent than with anything > which uses yield / Deferreds / asyncore / whatever. With gevent, you get a > standard stack trace. With anything else, the "where did this call come from" > information is not part of the call chain and thus is either unavailable, or > will have to be carried around preemptively (with associated overhead). I'm absolutely your's on ease of coding straight forward. But this new, efficient "yield from" is a big step into that direction, see Greg's reply. > (2) Nothing against Twisted or any other async frameworks, but writing any > nontrivial program in it requires warping my brain into something that's *not* > second nature in Python, and never going to be. Same here. > Python is not Javascript; if you want to use the "loads of callbacks" > programming style, use node.js. > > > Personal experience: I have written an interpreter for an asynchronous and > vaguely Pythonic language which I use for home automation, my lawn sprinkers, > and related stuff (which I should probably release in some form). The code was > previously based on Twisted and was impossible to debug. It now uses gevent and > Just Works. You are using gevent, which uses greenlet! That means no pure stack switching, but the stack is sliced and moved onto the heap. But that technique (originally from Stackless 2.0) is known to be 5-10 times slower, compared to a cooperative context switching that is built into the interpreter. This story is by far not over. Even PyPy with all its advanced technology still depends on stack slicing when it emulates concurrency. Python 3.3 has done a huge move, because this efficient nesting of generators can deeply influence how people are coding, maybe with the effect that stack tricks loose more of their importance. I expect more like this to come. Greenlets are great. Stack inversion is faster. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From greg.ewing at canterbury.ac.nz Wed Oct 17 12:16:17 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 23:16:17 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D0AA8.6090509@canterbury.ac.nz> Message-ID: <507E8571.1080701@canterbury.ac.nz> Guido van Rossum wrote: > On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing > wrote: > >>it blurs the distinction >>between invoking a subtask synchronously and waiting for the >>result of a previously spawned independent task. > > Are you sure you really want to distinguish between those though? I think I do. Partly because I feel that not doing so would make code harder to reason about. Async stuff is difficult enough as it is without hiding the boundaries between one thread of control and another. There are technical reasons as well. If you use 'yield from' to wait for completion of an independent task, then it would seem like you should be able to do this: t1 = task1() t2 = task2() spawn(t1) spawn(t2) r1 = yield from t1 r2 = yield from t2 But that can't work -- the object that you wait on has to be different from the generator instance passed to spawn(). The reason is that if the task finishes before anyone waits on it, the return value needs to be stored somewhere. Having spawn() return an object that deliberately does *not* have the interface of a generator, and having to explicitly wait for it, makes it much less likely that anyone will make that kind of mistake. If you wrote t1 = task1() t2 = task2() spawn(t1) spawn(t2) r1 = yield from t1.wait() r2 = yield from t2.wait() you would quickly get an exception, because generators don't have a wait() method. -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 17 12:27:49 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 23:27:49 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507DCE23.5080801@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> Message-ID: <507E8825.9080001@canterbury.ac.nz> I wrote: > Just to be clear, I'm not saying there should only be one > scheduler *implementation* in existence But having said that, I can't see much reason why you would need to have more than one scheduler implementation. Multiple event loop implementations are necessary because async I/O needs to be done different ways on different platforms. But the scheduler we're talking about is all pure Python. If the interface is well known and universally used, and there's a good implementation of it in the standard library, why would anyone want another one? -- Greg From greg.ewing at canterbury.ac.nz Wed Oct 17 12:38:41 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 17 Oct 2012 23:38:41 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> Message-ID: <507E8AB1.90505@canterbury.ac.nz> Guido van Rossum wrote: > On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing wrote: > >>The important idea is that just because you spawn a task, it >>doesn't necessarily follow that you want to be regarded as the >>*parent* of that task and receive its exceptions. > > Maybe. But the opposite doesn't follow either. It's a toss-up between > the spawner and the waiter. So maybe spawn() should have an option indicating that the spawning task is to receive exceptions occuring in the spawned task. -- Greg From tismer at stackless.com Wed Oct 17 14:27:43 2012 From: tismer at stackless.com (Christian Tismer) Date: Wed, 17 Oct 2012 14:27:43 +0200 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507E8AB1.90505@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> <507E8AB1.90505@canterbury.ac.nz> Message-ID: <507EA43F.7040005@stackless.com> On 17.10.12 12:38, Greg Ewing wrote: > Guido van Rossum wrote: >> On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing >> wrote: >> >>> The important idea is that just because you spawn a task, it >>> doesn't necessarily follow that you want to be regarded as the >>> *parent* of that task and receive its exceptions. >> >> Maybe. But the opposite doesn't follow either. It's a toss-up between >> the spawner and the waiter. > > So maybe spawn() should have an option indicating that the > spawning task is to receive exceptions occuring in the > spawned task. > No idea if that helps here, but the same problem occurred for us as well. It is not always clear if an exception should be handled in a certain context, or if it should be passed on and get raised later in the context that is concerned. For that, Stackless has introduced a _bomb_ object that encapsulates an exception, in order to let it pass through the normal call/yield/return interface. It is used to send an exception over a channel, which will explode (raise that exception) when the receiver picks it later up. I could think of something similar as a way to collect very many results in a join construct that collects everything without the need to handle each exception in the very moment it was raised. That would make it possible to collect results efficiently using 'yield from' and inspect the results later. Probably nothing new, just mentioned an idea... -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From Steve.Dower at microsoft.com Wed Oct 17 15:54:22 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 17 Oct 2012 13:54:22 +0000 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507E8825.9080001@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz>, <507E8825.9080001@canterbury.ac.nz> Message-ID: > But the scheduler we're talking about is all pure Python. > If the interface is well known and universally used, and > there's a good implementation of it in the standard > library, why would anyone want another one? Probably because they already have another one and can't get rid of it. Whether or not we are trying to include GUI development in this, I can guarantee that people will try and use it with a GUI message loop (to avoid blocking on IO, largely). In this case we'd almost certainly need a different implementation for Wx/Tcl/whatever. "Universally used" is a nice idea, but it will take a long time to get there. A well known interface, especially one that doesn't require the loop itself (i.e. it doesn't have a blocking run() function), lets users write thin wrappers, like the one we did for Tcl: http://pastebin.com/FuZwc1Ur (CallableContext (the 'scheduler') base class is in http://pastebin.com/ndS53Cd8). There needs to be an way to change which one is used at runtime, but there only needs to be one per thread. Cheers, Steve From guido at python.org Wed Oct 17 16:55:57 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Oct 2012 07:55:57 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507E8571.1080701@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D0AA8.6090509@canterbury.ac.nz> <507E8571.1080701@canterbury.ac.nz> Message-ID: On Wed, Oct 17, 2012 at 3:16 AM, Greg Ewing wrote: > Guido van Rossum wrote: >> >> On Tue, Oct 16, 2012 at 12:20 AM, Greg Ewing >> wrote: >> >>> it blurs the distinction >>> between invoking a subtask synchronously and waiting for the >>> result of a previously spawned independent task. >> >> >> Are you sure you really want to distinguish between those though? > > > I think I do. Partly because I feel that not doing so would > make code harder to reason about. Async stuff is difficult > enough as it is without hiding the boundaries between one > thread of control and another. > > There are technical reasons as well. If you use 'yield from' > to wait for completion of an independent task, then it would > seem like you should be able to do this: > > t1 = task1() > t2 = task2() > spawn(t1) > spawn(t2) > r1 = yield from t1 > r2 = yield from t2 > > But that can't work -- the object that you wait on has to be > different from the generator instance passed to spawn(). The > reason is that if the task finishes before anyone waits on it, > the return value needs to be stored somewhere. > > Having spawn() return an object that deliberately does *not* > have the interface of a generator, and having to explicitly wait > for it, makes it much less likely that anyone will make that kind > of mistake. If you wrote > > t1 = task1() > t2 = task2() > spawn(t1) > spawn(t2) > r1 = yield from t1.wait() > r2 = yield from t2.wait() > > you would quickly get an exception, because generators don't > have a wait() method. Ack. I get it. It's like the difference between calling a function vs. running it in an OS thread. -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Oct 17 16:58:52 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 17 Oct 2012 07:58:52 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507EA43F.7040005@stackless.com> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507DCE23.5080801@canterbury.ac.nz> <507E8AB1.90505@canterbury.ac.nz> <507EA43F.7040005@stackless.com> Message-ID: On Wed, Oct 17, 2012 at 5:27 AM, Christian Tismer wrote: > On 17.10.12 12:38, Greg Ewing wrote: >> >> Guido van Rossum wrote: >>> >>> On Tue, Oct 16, 2012 at 2:14 PM, Greg Ewing >>> wrote: >>> >>>> The important idea is that just because you spawn a task, it >>>> doesn't necessarily follow that you want to be regarded as the >>>> *parent* of that task and receive its exceptions. >>> >>> >>> Maybe. But the opposite doesn't follow either. It's a toss-up between >>> the spawner and the waiter. >> >> >> So maybe spawn() should have an option indicating that the >> spawning task is to receive exceptions occuring in the >> spawned task. >> > > No idea if that helps here, but the same problem occurred for us > as well. It is not always clear if an exception should be handled > in a certain context, or if it should be passed on and get raised > later in the context that is concerned. > > For that, Stackless has introduced a _bomb_ object that encapsulates > an exception, in order to let it pass through the normal call/yield/return > interface. It is used to send an exception over a channel, which > will explode (raise that exception) when the receiver picks it later up. Hmm... That sounds a little like your iriginal design for the channel only supported transferring values. At least for NDB, all channels support exceptions and tracebacks as an explicit alternative to the value. > I could think of something similar as a way to collect very many > results in a join construct that collects everything without the need > to handle each exception in the very moment it was raised. > > That would make it possible to collect results efficiently using 'yield > from' and inspect the results later. > > Probably nothing new, just mentioned an idea... I do think we're hashing out important ideas... -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Wed Oct 17 23:49:48 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Oct 2012 10:49:48 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> Message-ID: <507F27FC.4040706@canterbury.ac.nz> Piet Delport wrote: > In particular, this suspend() could be used to integrate fairly directly > with callback-based APIs: for example, if you have a Twisted Deferred, > you could do: > > result = yield suspend(d.addCallback) I've been thinking about how to express this using the primitives provided by the scheduler in my tutorial. I don't actually have a primitive that simply suspends a task; instead, I have one that moves the current task from the ready list to a specified list: scheduler.block(queue) Similarly, I don't have a primitive that explicitly resumes a particular task[1] -- only one that takes the first task off a specified list and resumes it: scheduler.unblock(queue) I think this is a good idea if we want to be able to cancel tasks, because a cancelled task ought to cleanly disappear from the system, without any risk that something will try to schedule it again. This is achievable if we maintain the invariant that a task always belongs to some queue, and the scheduler knows about that queue. Given these primitives, we can define def wakeup_callback(queue): lambda: scheduler.unblock(queue) and def wait_for_callback(add_callback): q = [] add_callback(wakeup_callback(q)) scheduler.block(q) yield This is starting to look rather like a semaphore. If we assume semaphores as a facility provided by the library, then it becomes very straightforward: def wait_for_callback(add_callback): s = Semaphore() add_callback(s.notify) yield from s.wait() That assumes the callback is single-use. But a semaphore can also handle multi-use callbacks: you just keep the semaphore around and repeatedly wait on it. You will get woken up once for each time the callback is called. s = Semaphore() something.add_callback(s.notify) while we_are_still_interested(): yield from s.wait() ... --- [1] Actually I do, but I'm thinking it shouldn't be exposed as part of the public API for reasons given here. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 18 09:49:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 18 Oct 2012 20:49:20 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507F27FC.4040706@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507F27FC.4040706@canterbury.ac.nz> Message-ID: <507FB480.1090009@canterbury.ac.nz> I've converted my tutorial on generator-based tasks for Python 3.3, tidied it up a bit and posted it here: http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/ -- Greg From _ at lvh.cc Thu Oct 18 13:01:30 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Thu, 18 Oct 2012 13:01:30 +0200 Subject: [Python-ideas] asyncore: included batteries don't fit In-Reply-To: References: <20120926081718.GA20843@hephaistos.amsuess.com> <20121003144320.GA16485@hephaistos.amsuess.com> Message-ID: Do you use gevent's monkeypatch-the-stdlib feature? On Tue, Oct 16, 2012 at 8:40 PM, Matthias Urlichs wrote: > I'll have to put in my ..02? here ? > > Guido van Rossum writes: > > > (2) We're at a fork in the road here. On the one hand, we could choose > > to deeply integrate greenlets/gevents into the standard library. > > Yes. > > I have two and a half reasons for this. > > (?) Ultimately I think that switching stacks around is always going to be > faster > than unwinding and re-winding things with yield(). > That seems like something that can be factually proven or counterproven. > (1) It's a whole lot easier to debug a problem with gevent than with > anything > which uses yield / Deferreds / asyncore / whatever. With gevent, you get a > standard stack trace. With anything else, the "where did this call come > from" > information is not part of the call chain and thus is either unavailable, > or > will have to be carried around preemptively (with associated overhead). > gevent uses stack slicing, which IIUC is pretty expensive. Why is it not subject to the performance overhead you mention? Can you give an example of such a crappy stack trace in twisted? I develop in it all day, and get pretty decent stack traces. The closest thing I have to a crappy stack trace is when doing functional tests with an RPC API -- obviously on the client side all I'm going to see is a fairly crappy just-an-exception. That's okay, I also get the server side exception that looks like a plain old Python traceback to me and tells me exactly where the problem is from. > (2) Nothing against Twisted or any other async frameworks, but writing any > nontrivial program in it requires warping my brain into something that's > *not* > second nature in Python, and never going to be. > Which ones are you thinking about other than twisted? It seems that the issue you are describing is one of semantics, not so much of whether or not it actually does things asynchronously under the hood, as e.g gevent does too. > Python is not Javascript; if you want to use the "loads of callbacks" > programming style, use node.js. > None of the solutions on the table have node.js-style "loads of callbacks". Everything has some way of structuring them. It's either implicit switches (as in "can happen in the caller"), explicit switches (as in yield/yield from) or something like deferreds, some options having both of the latter. > Personal experience: I have written an interpreter for an asynchronous and > vaguely Pythonic language which I use for home automation, my lawn > sprinkers, > and related stuff (which I should probably release in some form). The code > was > previously based on Twisted and was impossible to debug. It now uses > gevent and > Just Works. > If you have undebuggable code samples from that I'd love to take a look. > > -- > -- Matthias Urlichs > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlopires at gmail.com Thu Oct 18 14:04:05 2012 From: carlopires at gmail.com (Carlo Pires) Date: Thu, 18 Oct 2012 09:04:05 -0300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507FB480.1090009@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507F27FC.4040706@canterbury.ac.nz> <507FB480.1090009@canterbury.ac.nz> Message-ID: 2012/10/18 Greg Ewing > I've converted my tutorial on generator-based tasks > for Python 3.3, tidied it up a bit and posted it here: > > http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/ > I liked it. I was kind of confused about use of yield/yield from in this style of async. Now things seems to be pretty clear. -- Carlo Pires -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Fri Oct 19 00:10:32 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 18 Oct 2012 18:10:32 -0400 Subject: [Python-ideas] Is there a good reason to use * for multiplication? In-Reply-To: <50788DB1.4090809@stoneleaf.us> References: <11ad8f01-0383-4b64-b0f0-78bb0e2f9308@googlegroups.com> <50788DB1.4090809@stoneleaf.us> Message-ID: On 10/12/12, Ethan Furman wrote: > I think your mailer must have stripped out unicode character 0x2062, INVISIBLE TIMES. Some spelling mistakes are harder to see than others... -jJ From tismer at stackless.com Fri Oct 19 03:12:58 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 03:12:58 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <507CADE6.7050604@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> Message-ID: <5080A91A.3020804@stackless.com> Hi Greg, coming back to this after quite a storm in my brain... On 16.10.12 02:44, Greg Ewing wrote: > Christian Tismer wrote: > >> Right, CPython still keeps unneccessary crap on the C stack. > > It's not just Python leaving stuff on the stack that's a > problem, it's external C code that calls back into Python. > That one is something that I think to ignore. Of course there are quite some situations where callbacks into Python are a problem, but I don't want to put this into Python. There are ways to cope with this, for instance using greenlet as an optional extension module that handles these cases. Alternatively, Python itself can do it with strictly controlled threads. But I think leaving that out will simplify matters a lot and keeps Python clean. In the end, I want to model something that is likely to be accepted. >> But that's not the point right now, because on the other hand, >> in the context of a possible yield (from or not), the C stack >> is clean, and this enables switching. > >> And actually in such clean positions, Stackless Python (as opposed to >> Greenlets) does soft-switching, which is very similar to what the >> generators >> are doing - there is no assembly stuff involved at all. > > But the assembly code still needs to be there to handle the > cases where you *can't* do soft switching. It's the presence > of the code that's the issue here, not how frequently it > gets called. No, I'm intending to really rip that out. Or better, I want to do a rewrite of a subset of Stackless, actually the functionality that allows to implement greenlets or multi-level generators, task scheduling and so on. In effect, I want to find something that enables some extended switching. Emulated, without hacking the kernel in the first place. Generators are restricted to call/yield/return positions, and I thing that's fine. What can be switched is totally clear by definitions, and I like that. I'm talking of exactly that. What I dislike is a different topic ;-) >> I have begun studying the code for YIELD_FROM. As it is written, every >> next iteration elevates the chain of generators once up and down. >> Maybe that can be avoided by changing the frame chain, so this can >> become >> a cheaper O(1) operation. > > My original implementation of yield-from actually *did* avoid > this, by keeping a C-level pointer chain of yielding-from frames. > But that part was ripped out at the last minute when someone > discovered that it had a detrimental effect on tracebacks. > > There are probably other ways the traceback problem could be > fixed, so maybe we will get this optimisation back one day. Ok, let's ignore this O(n) problem for now. _yield from_ is anyway probably faster by more than an order of magnitude, so it will serve your purpose (nesting generators) pretty well. My problem is different because I want a scaling building block for building higher level structures, and I would love to build them using _yield from_ . There are a few things which contradict completely my thinking: - _yield from_ works from the top. That is, if I have five nested iterators and I want to drive them, then I have to call the root generator?! I see that that works, but it is against all what I'm used to. How can I inject the info that I want to switch context? - generators always yield to the caller, and also return values to the caller. What I'm looking for is to express a switch so something else? - generators are able to free the stack, when they yield. But when they are active, they use the full stack. At least when I follow the pattern "generator is calling sub-generator". A deeply nested recursion is therefore something to avoid. :-( Now I'm playing around with different approaches to model something flexible that gives me more freedom. Right now I'm trying a slightly pervert approach to give me an _unwindable_, merely a frame-like object that can vanish on demand. I'm also experimenting with emulating different kinds of _yield_". Since there is only one kind of yield to one target, I get the problem to distinguish that for different purposes. Example: I can take a set of nested functions in their native form. Then replacing ordinary calls by _yield from_ and inserting proper yields before actually returning, I now have the equivalent of a nested function call, that I can drive with another _yield from_ . This is now a function that permanently releases the stack. Now I would like to give one of the nested function the ability to transfer execution somewhere else. The support is insofar there, as the stack is freed all the time. But this function that wants to switch needs to pass the fact that it wants to switch, plus the target somewhere. As I understood it, I would need to yield that to the driver function. In order to do that, I would need to yield a tuple or a bound object. This is a different purpose than the simple driver functionality. Do you see it? In my understanding, a switch would not be driven from the top and then dispatched upon, but a called function below the function to be switched would modify something that leads to a switch as a result. In this example, the generator-emulated function would be driven by thousands of yields, kind of polled to catch the one event that needs to be supported by a switching action. This looks wrong for me, like doing things upside down. To shorten this: I have the problem that I have your very efficient yield collector, but I need to dispatch on what is intended by the yield, instead of initiating a reaction from where I am. All in all, I can't get rid of the thought "un-pythonic". So I'm still thinking of a frame-like object that allows me to control its execution, let it vanish, and so on, and use it as a building block. As always, I'm feeling bad when going this road, because I want to use the eficient _yield from_ as much as possible. But it may be my missing experience. Do you understand, and maybe see where I have the wrong brain shortcuts? How do you write something composable that scales? Cheers -- Chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From jimjjewett at gmail.com Fri Oct 19 05:46:40 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 18 Oct 2012 23:46:40 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: Is the goal really to provide "The async API of the future", or just to provide "a stdlib module which provides one adequate way to do async"? I think the yield and yield from solutions all need too much magical scaffolding to be The One True Way, but I don't mind such conventions as much when they're part of a particular example class, such as concurrent.schedulers.YieldScheduler. To stretch an analogy, generators and context managers are different concepts. Allowing certain generators to be used as context managers (by using the "with" keyword) is fine. But I wouldn't want to give up all the other uses of generators. If yield starts implying other magical properties that are only useful when communicating with a scheduler, rather than a regular caller ... I'm afraid that muddies the concept up too much for my taste. More specific concerns below: On 10/12/12, Guido van Rossum wrote: > But the only use for send() on a generator is when using it as a > coroutine for a concurrent tasks system -- send() really makes no > sense for generators used as iterators. And you're claiming, it seems, > that you prefer yield-from for concurrent tasks. But the data doesn't have to be scheduling information; it can be new data, a seed for an algorithm, a command to switch or reset the state ... locking it to the scheduler is part of what worries me. > On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing >> Keep in mind that a value yielded by a generator being used as >> part of a coroutine is *not* seen by code calling it with >> yield-from. That is part of what bugs me about the yield-from examples. Until this discussion, I had thought of yield-from as factoring out some code that was still conceptually embedded within the parent generator. This (perhaps correctly) makes it seem more like a temporary replacement, as if the parent were no longer there at all. But with the yield channel reserved for scheduling overhead, the "generator" can't really generate anything, except through side effects... > ... I feel that "value = yield " > is quite a good paradigm, To me, it seems fine for a particular concrete scheduler, but too strong an assumption for an abstract API. I can mostly* understand: YieldScheduler assumes any yielded data is another Task; it will schedule that task, and cause the original (yielding) Task to wait until the new task is completed. But I wonder what I'm missing with: Generators should only yield (expressions that create) Futures; the scheduler will automatically unwrap the future and send (or throw) the result back into the parent (or other ancestor) Generator, which will then be resumed. * "mostly", because if my task is willing to wait for the subtask to complete, then why not just use a blocking call in the first place? Is it just because switching to another task is lighter weight than letting a thread block? What happens if a generator does yield something other than a Future? Will the generator be rescheduled in an already-runnable (as opposed to waiting) state? Will it never be resumed? Will that object be auto-wrapped in a Future for the benefit of whichever other co-routine originally made the request? Are generators assumed to run to exhaustion, or is some sort of driver needed to keep pumping them? > ... It would be horrible to require C to create a fake generator. Would it have to wrap results in a fake Future, so that the scheduler could properly unwrap? > ...Well, I'm talking about a decorator that you *always* apply, and which > does nothing (or very little) when wrapping a generator, but adds > generator behavior when wrapping a non-generator function. Why is an always-applied decorator any less intrusive than a mandatory (mixin) base class? > (1) Calling an async operation and waiting for its result, using yield > Futures: > result = yield some_async_op(args) I was repeatedly confused over whether "result" would be a Future that still needed resolution, and the example code wasn't always consistent. As I understand it now, the scheduler (not just the particular implementation, but the API) has to automatically treat any yielded data as a future, resolve that future to its result, and then send (or throw) that result (as opposed to the future) back into either the parent task or the least distant ancestor task not to be using "yield from". > Yield-from: > result = yield from some_async_op(args) So the generator containing this code suspends itself entirely until some_async_op is exhausted, at which point result will be the StopIteration? (Or None?) Non-Exception results get passed straight to the least-distant ancestor task not using "yield from", but Exceptions propagate through one generation at a time. > (2) Setting the result of an async operation > Futures: > f.set_result(value) # From any callback PEP 3148 considers set_result private to the executor. Can that always be done from arbitrary callbacks? Can it be done more than once? I think for the normal case, a task should just return its value, and the Future or the Scheduler should be responsible for calling set_result. > Yield-from: > return value # From the outermost generator Why only the outermost? I'm guessing it is because everything else is suspended, and even if a mid-level generator is explicitly re-added to the task queue, it can't actually continue because of re-entrancy. > (3) Handling an exception > > Futures: > try: > result = yield some_async_op(args) > except MyException: > So the scheduler does have to unpack the future, and throw rather than send. > (4) Raising an exception as the outcome of an async operation > Futures: > f.set_exception() Again, shouldn't the task itself just raise, and let the future (or the scheduler) call that? > Yield-from: > raise # From any of the generators So it doesn't need to be wrapped in a Future, until it needs to cross back over a "schedule this asynchronously" gulf? > (5) Having one async operation invoke another async operation > Futures: > @task > def outer(args): > res = yield inner(args) > return res > Yield-from: > def outer(args): > res = yield from inner(args) > return res Will it ever get to continue processing (under either model) before inner exhausts itself and stops yielding? > Note: I'm including this because in the Futures case, each level of > yield requires the creation of a separate Future. Only because of the auto-unboxing. And if the generator suspends itself to wait for the future, then the future will be resolved before control returns to the generator's own parents, so those per-layer Futures won't really add anything. > (6) Spawning off multiple async subtasks > > Futures: > f1 = subtask1(args1) # Note: no yield!!! > f2 = subtask2(args2) > res1, res2 = yield f1, f2 ah. That makes a bit more sense, though the tuple of futures does complicate the automagic unboxing. (Which containers, to which levels, have to be resolved?) > Yield-from: > ?????????? > > *** Greg, can you come up with a good idiom to spell concurrency at > this level? Your example only has concurrency in the philosophers > example, but it appears to interact directly with the scheduler, and > the philosophers don't return values. *** Why wouldn't this be the same as you already wrote without yield-from? Two subtasks were submitted but not waited for. I suppose you could yield from a generator that submits new subtasks every time it generates something, but that would be solving a more complicated problem. (So it wouldn't be a consequence of the "yield from".) > (7) Checking whether an operation is already complete > Futures: > if f.done(): ... If f was yielded, it is done, or this code wouldn't be running again to check. > Yield-from: > ????????????? And again, if the futures were yielded (even through a yield from) then they're already unboxed; otherwise, you can still check f.done > (8) Getting the result of an operation multiple times > > Futures: > > f = async_op(args) > # squirrel away a reference to f somewhere else > r = yield f > # ... later, elsewhere > r = f.result() Why do you have to squirrel away the reference? Are you assuming that the async scheduler will mess with the locals so that f is no longer valid? > Yield-from: > ??????????????? This, you cannot reasonably do; the nature of yield-from means that the unresolved futures were never visible within this generator; they were resolved by the scheduler and the results handed straight to the generator's ancestor. > (9) Canceling an operation > > Futures: > f.cancel() > > Yield-from: > ??????????????? > > Note: I haven't needed canceling yet, and I believe Devin said that > Twisted just got rid of it. However some of the JS Deferred > implementations seem to support it. I think that once you've called "yield from", the generator making that call is suspended until the child generator completes. But a different thread of control could cancel the active (most-descended) generator. > (10) Registering additional callbacks > > Futures: > f.add_done_callback(callback) > > Yield-from: > ??????? > > Note: this is used in NDB to trigger "hooks" that should run e.g. when > a database write completes. The user's code just writes yield > ent.put_async(); the trigger is automatically called by the Future's > machinery. This also uses (8). I think you would have to do add the callbacks within the subgenerator that is spawning f. That, or un-inline the yield from, and lose the automated send-throw forwarding. -jJ From greg.ewing at canterbury.ac.nz Fri Oct 19 07:15:33 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 19 Oct 2012 18:15:33 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5080A91A.3020804@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> Message-ID: <5080E1F5.7090709@canterbury.ac.nz> Christian Tismer wrote: > - generators are able to free the stack, when they yield. But when they > are active, they use the full stack. At least when I follow the pattern > "generator is calling sub-generator". > A deeply nested recursion is therefore something to avoid. :-( Only if yield-from chains aren't optimised the way they used to be. In any case, for the application we're talking about here, the difference will probably not be noticeable. > But this function that wants to > switch needs to pass the fact that it wants to switch, plus the target > somewhere. As I understood it, I would need to yield that to the > driver function. You understand incorrectly. In my scheduler, the yields don't send or receive values at all. Communicating with the scheduler, for example to tell it to allow another task to run, is done by calling functions. A yield must be done to actually allow a switch, but the yield itself doesn't send any information. > Do you see it? In my understanding, a switch would not be driven from > the top and then dispatched upon, but a called function below the > function to be switched would modify something that leads to a > switch as a result. That's pretty much what happens in my scheduler. > Do you understand, and maybe see where I have the wrong > brain shortcuts? > How do you write something composable that scales? I think you should study my scheduler tutorial. If you can understand how that works, I think it will answer many of your questions. http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/ -- Greg From tismer at stackless.com Fri Oct 19 14:05:20 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 14:05:20 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5080E1F5.7090709@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> Message-ID: <50814200.2050903@stackless.com> On 19.10.12 07:15, Greg Ewing wrote: > Christian Tismer wrote: > >> - generators are able to free the stack, when they yield. But when they >> are active, they use the full stack. At least when I follow the >> pattern >> "generator is calling sub-generator". >> A deeply nested recursion is therefore something to avoid. :-( > > Only if yield-from chains aren't optimised the way they > used to be. Does that mean a very deep recursion would be efficient? I'm trying to find that change in the hg history right now. Can you give me a hint how your initial implementation works, the initial patch source? > > ... >> But this function that wants to >> switch needs to pass the fact that it wants to switch, plus the target >> somewhere. As I understood it, I would need to yield that to the >> driver function. > > You understand incorrectly. In my scheduler, the yields > don't send or receive values at all. Communicating with the > scheduler, for example to tell it to allow another task to > run, is done by calling functions. A yield must be done to > actually allow a switch, but the yield itself doesn't send > any information. I have studied that yesterday already in depth and like that quite much. It is probably just the problem that I had with generators from their beginning. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ironfroggy at gmail.com Fri Oct 19 14:46:31 2012 From: ironfroggy at gmail.com (Calvin Spealman) Date: Fri, 19 Oct 2012 08:46:31 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett wrote: > Is the goal really to provide "The async API of the future", or just > to provide "a stdlib module which provides one adequate way to do > async"? > > I think the yield and yield from solutions all need too much magical > scaffolding to be The One True Way, but I don't mind such conventions > as much when they're part of a particular example class, such as > concurrent.schedulers.YieldScheduler. > > To stretch an analogy, generators and context managers are different > concepts. Allowing certain generators to be used as context managers > (by using the "with" keyword) is fine. But I wouldn't want to give > up all the other uses of generators. > > If yield starts implying other magical properties that are only useful > when communicating with a scheduler, rather than a regular caller ... > I'm afraid that muddies the concept up too much for my taste. I think it is important that this is more than convention. I think that we need our old friend TOOOWTDI (There's Only One Obvious Way To Do It) here more than ever. This stuff is complicated, and following that interoperability of what eventually is written on top of it is going to be complicated. Our focus should be not on providing simple things like "async file read" but crafting an environment where people can continue to write wonderfully expressive and useful libraries that others can combine to their own needs. If we don't provide the layer upon which this disparate pieces cooperate, I fear much of the effort is all for too little gain to be worth the effort. > More specific concerns below: > > On 10/12/12, Guido van Rossum wrote: > >> But the only use for send() on a generator is when using it as a >> coroutine for a concurrent tasks system -- send() really makes no >> sense for generators used as iterators. And you're claiming, it seems, >> that you prefer yield-from for concurrent tasks. > > But the data doesn't have to be scheduling information; it can be new > data, a seed for an algorithm, a command to switch or reset the state > ... locking it to the scheduler is part of what worries me. When a coroutine yields, it yields *to the scheduler* so for whom else should these values be? >> On Thu, Oct 11, 2012 at 6:32 PM, Greg Ewing > >>> Keep in mind that a value yielded by a generator being used as >>> part of a coroutine is *not* seen by code calling it with >>> yield-from. > > That is part of what bugs me about the yield-from examples. > > Until this discussion, I had thought of yield-from as factoring out > some code that was still conceptually embedded within the parent > generator. This (perhaps correctly) makes it seem more like a > temporary replacement, as if the parent were no longer there at all. > > But with the yield channel reserved for scheduling overhead, the > "generator" can't really generate anything, except through side > effects... Don't forget that yield-from is an expression, not a statement. The value eventually returned from the generator is the result of the yield-from, so the generator still produces a final value. The fact that these are generators is for their ability to suspend, not to iterate. >> ... I feel that "value = yield " >> is quite a good paradigm, > > To me, it seems fine for a particular concrete scheduler, but too > strong an assumption for an abstract API. > > I can mostly* understand: > > YieldScheduler assumes any yielded data is another Task; it will > schedule that task, and cause the original (yielding) Task to wait > until the new task is completed. > > But I wonder what I'm missing with: > > Generators should only yield (expressions that create) Futures; > the scheduler will automatically unwrap the future and send (or > throw) the result back into the parent (or other ancestor) > Generator, which will then be resumed. > > * "mostly", because if my task is willing to wait for the subtask to > complete, then why not just use a blocking call in the first place? > Is it just because switching to another task is lighter weight than > letting a thread block? By blocking call do you mean "x = foo()" or "x = yield from foo()"? Blocking call usually means the former, so if you mean that, then you neglect to think of all the other tasks running which are not willing to wait. > What happens if a generator does yield something other than a Future? > Will the generator be rescheduled in an already-runnable (as opposed > to waiting) state? Will it never be resumed? Will that object be > auto-wrapped in a Future for the benefit of whichever other co-routine > originally made the request? I think if the scheduler doesn't know what to do with something, it should be an error. That makes it easier to change things in the future. > Are generators assumed to run to exhaustion, or is some sort of driver > needed to keep pumping them? > > >> ... It would be horrible to require C to create a fake generator. > > Would it have to wrap results in a fake Future, so that the scheduler > could properly unwrap? > >> ...Well, I'm talking about a decorator that you *always* apply, and which >> does nothing (or very little) when wrapping a generator, but adds >> generator behavior when wrapping a non-generator function. > > Why is an always-applied decorator any less intrusive than a mandatory > (mixin) base class? > >> (1) Calling an async operation and waiting for its result, using yield > >> Futures: >> result = yield some_async_op(args) > > I was repeatedly confused over whether "result" would be a Future that > still needed resolution, and the example code wasn't always > consistent. As I understand it now, the scheduler (not just the > particular implementation, but the API) has to automatically treat any > yielded data as a future, resolve that future to its result, and then > send (or throw) that result (as opposed to the future) back into > either the parent task or the least distant ancestor task not to be > using "yield from". > > >> Yield-from: >> result = yield from some_async_op(args) > > So the generator containing this code suspends itself entirely until > some_async_op is exhausted, at which point result will be the > StopIteration? (Or None?) Non-Exception results get passed straight > to the least-distant ancestor task not using "yield from", but > Exceptions propagate through one generation at a time. The result is not an exception, but the return of some_async_op(args) >> (2) Setting the result of an async operation > >> Futures: >> f.set_result(value) # From any callback > > PEP 3148 considers set_result private to the executor. Can that > always be done from arbitrary callbacks? Can it be done more than > once? > > I think for the normal case, a task should just return its value, and > the Future or the Scheduler should be responsible for calling > set_result. I agree >> Yield-from: >> return value # From the outermost generator > > Why only the outermost? I'm guessing it is because everything else is > suspended, and even if a mid-level generator is explicitly re-added to > the task queue, it can't actually continue because of re-entrancy. > > >> (3) Handling an exception >> >> Futures: >> try: >> result = yield some_async_op(args) >> except MyException: >> > > So the scheduler does have to unpack the future, and throw rather than send. > >> (4) Raising an exception as the outcome of an async operation > >> Futures: >> f.set_exception() > > Again, shouldn't the task itself just raise, and let the future (or > the scheduler) call that? > >> Yield-from: >> raise # From any of the generators > > So it doesn't need to be wrapped in a Future, until it needs to cross > back over a "schedule this asynchronously" gulf? > >> (5) Having one async operation invoke another async operation > >> Futures: >> @task >> def outer(args): >> res = yield inner(args) >> return res > >> Yield-from: >> def outer(args): >> res = yield from inner(args) >> return res > > Will it ever get to continue processing (under either model) before > inner exhausts itself and stops yielding? > >> Note: I'm including this because in the Futures case, each level of >> yield requires the creation of a separate Future. > > Only because of the auto-unboxing. And if the generator suspends > itself to wait for the future, then the future will be resolved before > control returns to the generator's own parents, so those per-layer > Futures won't really add anything. > >> (6) Spawning off multiple async subtasks >> >> Futures: >> f1 = subtask1(args1) # Note: no yield!!! >> f2 = subtask2(args2) >> res1, res2 = yield f1, f2 > > ah. That makes a bit more sense, though the tuple of futures does > complicate the automagic unboxing. (Which containers, to which > levels, have to be resolved?) > >> Yield-from: >> ?????????? >> >> *** Greg, can you come up with a good idiom to spell concurrency at >> this level? Your example only has concurrency in the philosophers >> example, but it appears to interact directly with the scheduler, and >> the philosophers don't return values. *** > > Why wouldn't this be the same as you already wrote without yield-from? > Two subtasks were submitted but not waited for. I suppose you could > yield from a generator that submits new subtasks every time it > generates something, but that would be solving a more complicated > problem. (So it wouldn't be a consequence of the "yield from".) > > > >> (7) Checking whether an operation is already complete > >> Futures: >> if f.done(): ... > > If f was yielded, it is done, or this code wouldn't be running again to check. > >> Yield-from: >> ????????????? > > And again, if the futures were yielded (even through a yield from) > then they're already unboxed; otherwise, you can still check f.done > >> (8) Getting the result of an operation multiple times >> >> Futures: >> >> f = async_op(args) >> # squirrel away a reference to f somewhere else >> r = yield f >> # ... later, elsewhere >> r = f.result() > > Why do you have to squirrel away the reference? Are you assuming that > the async scheduler will mess with the locals so that f is no longer > valid? > >> Yield-from: >> ??????????????? > > This, you cannot reasonably do; the nature of yield-from means that > the unresolved futures were never visible within this generator; they > were resolved by the scheduler and the results handed straight to the > generator's ancestor. > >> (9) Canceling an operation >> >> Futures: >> f.cancel() >> >> Yield-from: >> ??????????????? >> >> Note: I haven't needed canceling yet, and I believe Devin said that >> Twisted just got rid of it. However some of the JS Deferred >> implementations seem to support it. > > I think that once you've called "yield from", the generator making > that call is suspended until the child generator completes. But a > different thread of control could cancel the active (most-descended) > generator. > >> (10) Registering additional callbacks >> >> Futures: >> f.add_done_callback(callback) >> >> Yield-from: >> ??????? >> >> Note: this is used in NDB to trigger "hooks" that should run e.g. when >> a database write completes. The user's code just writes yield >> ent.put_async(); the trigger is automatically called by the Future's >> machinery. This also uses (8). > > I think you would have to do add the callbacks within the subgenerator > that is spawning f. > > That, or un-inline the yield from, and lose the automated send-throw forwarding. > > -jJ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy From tismer at stackless.com Fri Oct 19 14:55:57 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 14:55:57 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> Message-ID: <50814DDD.9070206@stackless.com> Hi Nick, On 16.10.12 03:49, Nick Coghlan wrote: > On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing > wrote: >> My original implementation of yield-from actually *did* avoid >> this, by keeping a C-level pointer chain of yielding-from frames. >> But that part was ripped out at the last minute when someone >> discovered that it had a detrimental effect on tracebacks. >> >> There are probably other ways the traceback problem could be >> fixed, so maybe we will get this optimisation back one day. > Ah, I thought I remembered something along those lines. IIRC, it was a > bug report on one of the alphas that prompted us to change it. > I was curious and searched quite a lot. It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 from Marc Shannon, patched by Benjamin. Now I found the original implementation. That looks very much as I'm thinking it should be. Quite a dramatic change which works well, but really seems to remove what I would call "now I can emulate most of Stackless" efficiently. Maybe I should just try to think it would be implemented as before, build an abstraction and just use it for now. I will spend my time at PyCon de for sprinting on "yield from". cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ncoghlan at gmail.com Fri Oct 19 15:56:04 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 19 Oct 2012 23:56:04 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50814DDD.9070206@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer wrote: > I was curious and searched quite a lot. > It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 > from Marc Shannon, patched by Benjamin. > > Now I found the original implementation. That looks very much > as I'm thinking it should be. > > Quite a dramatic change which works well, but really seems to remove > what I would call "now I can emulate most of Stackless" efficiently. > > Maybe I should just try to think it would be implemented as before, > build an abstraction and just use it for now. > > I will spend my time at PyCon de for sprinting on "yield from". Yeah, if we can get Greg's original optimised behaviour while still supporting introspection properly, that's really where we want to be. That's the main reason I'm a fan of Mark's other patches moving more of the generator state from the frame objects out into the generator objects - my suspicion is that generator objects themselves need to be maintaining a full "generator stack" independent of the frame stack in the main eval loop in order to get the best of both worlds (i.e. optimised suspend/resume with confusing debuggers). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From breamoreboy at yahoo.co.uk Fri Oct 19 16:05:54 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 19 Oct 2012 15:05:54 +0100 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: On 19/10/2012 14:56, Nick Coghlan wrote: > On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer wrote: >> I was curious and searched quite a lot. >> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >> from Marc Shannon, patched by Benjamin. >> >> Now I found the original implementation. That looks very much >> as I'm thinking it should be. >> >> Quite a dramatic change which works well, but really seems to remove >> what I would call "now I can emulate most of Stackless" efficiently. >> >> Maybe I should just try to think it would be implemented as before, >> build an abstraction and just use it for now. >> >> I will spend my time at PyCon de for sprinting on "yield from". > > Yeah, if we can get Greg's original optimised behaviour while still > supporting introspection properly, that's really where we want to be. > That's the main reason I'm a fan of Mark's other patches moving more > of the generator state from the frame objects out into the generator > objects - my suspicion is that generator objects themselves need to be > maintaining a full "generator stack" independent of the frame stack in > the main eval loop in order to get the best of both worlds (i.e. > optimised suspend/resume with confusing debuggers). > > Cheers, > Nick. > There's nothing like confusing debuggers or have I read that wrong? :) -- Cheers. Mark Lawrence. From ncoghlan at gmail.com Fri Oct 19 16:16:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Oct 2012 00:16:01 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 12:05 AM, Mark Lawrence wrote: > There's nothing like confusing debuggers or have I read that wrong? :) Yeah, that was the main issue that resulted in the design change - the optimised approach confused a lot of the introspection machinery. So the challenge is to restore the optimisation while *also* adding in mechanisms to preserve the introspection support. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Fri Oct 19 18:05:05 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 09:05:05 -0700 Subject: [Python-ideas] The async API of the future Message-ID: Work priorities don't allow me to spend another day replying in detail to the various emails on this topic, but I am still keeping up reading! I have read Greg's response to my comparison between Future+yield-based coroutines and his yield-from-based, Future-free coroutines, and after having written a small prototype, I am now pretty much convinced that Greg's way is superior. This doesn't mean you can't use generators or yield-from for other purposes! It's just that *if* you are writing a coroutine for use with a certain schedule, you must use yield and yield-from in accordance to the scheduler's rules. However, code you call can still use yield and yield-from for iteration, and you can still use for-loops. In particular, if f is a coroutine, it can still write "for x in g(): ..." where g is a generator meant to be an iterator. However if g were instead a coroutine, f should call it using "yield from g()", and f and g should agree on the interface of their scheduler. As to other topics, my current feeling is that we should try to separately develop requirements and prototype implementations of the I/O loop of the future, and to figure the loosest possible coupling between that and a coroutine scheduler (or any other type of scheduler). In particular, I think the I/O loop should not assume the event handlers are implemented using coroutines -- but if someone wants to write an awesome coroutine scheduler, they should be able to delegate all their I/O waiting needs to the I/O loop with very little trouble. To me, this means that the I/O loop probably should use "plain" callback functions (i.e., not Futures, Deferreds or coroutines). We should also standardize the interface to the I/O loop so that 3rd parties can plug in their own I/O loop -- I don't see an end to the debate whether the best C library for event handling is libevent, libev or libuv. While the focus of the I/O loop should be on single-threaded event handling, some standard interface should exist so that you can run certain code in a separate thread and wait for its completion -- I've found this handy when calling socket.getaddrinfo(), which may block. (Apparently async DNS lookups are really hard -- I read some complaints about libevent's DNS lookups, and IIUC many Firefox lockups are due to this.) But there may be other uses for this too. An issue in the design of the I/O loop is the strain between a ready-based and completion-based design. The typical Unix design (whether based on select or any of the poll variants) is usually ready-based; but on Windows, the only way to get high performance is to base it on IOCP, which is completion-based (i.e. you start a specific async operation, like writing N bytes, and the I/O loop tells you when it is done). I would like people to be able to write fast event handling programs on Windows too, and ideally the only change would be the implementation of the I/O loop. But I don't know how tenable that is given the dramatically different style used by IOCP and the need to use native Windows API for all async I/O -- it sounds like we could only do this if the library providing the I/O loop implementation also wrapped all I/O operations, andthat may be a bit much. Finally, there should also be some minimal interface so that multiple I/O loops can interact -- at least in the case where one I/O loop belongs to a GUI library. It seems this is a solved problem (as well solved as you can hope for) to Twisted, so we should just adopt their approach. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Oct 19 18:07:24 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 09:07:24 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50814200.2050903@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> <50814200.2050903@stackless.com> Message-ID: On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer wrote: > On 19.10.12 07:15, Greg Ewing wrote: >> >> Christian Tismer wrote: >> >>> - generators are able to free the stack, when they yield. But when they >>> are active, they use the full stack. At least when I follow the >>> pattern >>> "generator is calling sub-generator". >>> A deeply nested recursion is therefore something to avoid. :-( >> >> >> Only if yield-from chains aren't optimised the way they >> used to be. > > > Does that mean a very deep recursion would be efficient? TBH, I am not interested in making very deep recursion work at all. If you need that, you're doing it wrong in my opinion. > I'm trying to find that change in the hg history right now. > > Can you give me a hint how your initial implementation > works, the initial patch source? >> >> >> ... >> >>> But this function that wants to >>> switch needs to pass the fact that it wants to switch, plus the target >>> somewhere. As I understood it, I would need to yield that to the >>> driver function. >> >> >> You understand incorrectly. In my scheduler, the yields >> don't send or receive values at all. Communicating with the >> scheduler, for example to tell it to allow another task to >> run, is done by calling functions. A yield must be done to >> actually allow a switch, but the yield itself doesn't send >> any information. > > > I have studied that yesterday already in depth and like that quite much. > It is probably just the problem that I had with generators from their > beginning. > > > -- > Christian Tismer :^) > Software Consulting : Have a break! Take a ride on Python's > Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ > 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de > phone +49 173 24 18 776 fax +49 (30) 700143-0023 > PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 > whom do you want to sponsor today? http://www.stackless.com/ > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From tismer at stackless.com Fri Oct 19 18:18:42 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 18:18:42 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: <50817D62.4040607@stackless.com> On 19.10.12 15:56, Nick Coghlan wrote: > On Fri, Oct 19, 2012 at 10:55 PM, Christian Tismer wrote: >> I was curious and searched quite a lot. >> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >> from Marc Shannon, patched by Benjamin. >> >> Now I found the original implementation. That looks very much >> as I'm thinking it should be. >> >> Quite a dramatic change which works well, but really seems to remove >> what I would call "now I can emulate most of Stackless" efficiently. >> >> Maybe I should just try to think it would be implemented as before, >> build an abstraction and just use it for now. >> >> I will spend my time at PyCon de for sprinting on "yield from". > Yeah, if we can get Greg's original optimised behaviour while still > supporting introspection properly, that's really where we want to be. > That's the main reason I'm a fan of Mark's other patches moving more > of the generator state from the frame objects out into the generator > objects - my suspicion is that generator objects themselves need to be > maintaining a full "generator stack" independent of the frame stack in > the main eval loop in order to get the best of both worlds (i.e. > optimised suspend/resume with confusing debuggers). That may be very true in order to get real generators. The storm in my brain is quite intense the last days... Actually I would like to have a python context where it gets into "async mode" and interprets all functions defined in that mode as generators. In that mode, generators are not meant as generators, but async-enabled functions. I see "yield from" as a low-level construct that should not even be exposed, but be applied automatically in async mode. That way, we could write normal functions and could implement a real "Yield" without the "yield from" helper visible everywhere. Not sure how to do that right. I'm playing with AST a bit to get a feeling for this. To give you an idea where my thoughts are meandering around, I would like to point you at http://doc.pypy.org/en/latest/stackless.html That is an implementation that comes close to what I'm thinking. The drawback of the current PyPy implementation is that it used greenlet style for its underlying switching. That is what I want to replace with some "yield from" construct. cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Fri Oct 19 18:50:39 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 18:50:39 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> <50814200.2050903@stackless.com> Message-ID: <508184DF.1050907@stackless.com> On 19.10.12 18:07, Guido van Rossum wrote: > On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer wrote: >> On 19.10.12 07:15, Greg Ewing wrote: >>> Christian Tismer wrote: >>> >>>> - generators are able to free the stack, when they yield. But when they >>>> are active, they use the full stack. At least when I follow the >>>> pattern >>>> "generator is calling sub-generator". >>>> A deeply nested recursion is therefore something to avoid. :-( >>> >>> Only if yield-from chains aren't optimised the way they >>> used to be. >> >> Does that mean a very deep recursion would be efficient? > TBH, I am not interested in making very deep recursion work at all. If > you need that, you're doing it wrong in my opinion. Misunderstanding I think. Of course I don't want to use deep recursion. But people might write things that happen several levels deep and then iterating over lots of stuff. A true generator would have no problem with that. Assume just five layers of generators that have to be re-invoked for a tight yielding loop is quite some overhead that can be avoided. The reason why I care is that existing implementations that use greenlet style could be turned into pure python, given that I manage to write the right support functions, and replace all functions by generators that emulate functions with async behavior. It would just be great if that worked at the same speed, independent from at which stack level an iteration happens. Agreed that new code like that would be bad style. ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido at python.org Fri Oct 19 19:18:38 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 10:18:38 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <508184DF.1050907@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> <50814200.2050903@stackless.com> <508184DF.1050907@stackless.com> Message-ID: On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer wrote: > On 19.10.12 18:07, Guido van Rossum wrote: >> >> On Fri, Oct 19, 2012 at 5:05 AM, Christian Tismer >> wrote: >>> >>> On 19.10.12 07:15, Greg Ewing wrote: >>>> >>>> Christian Tismer wrote: >>>> >>>>> - generators are able to free the stack, when they yield. But when they >>>>> are active, they use the full stack. At least when I follow the >>>>> pattern >>>>> "generator is calling sub-generator". >>>>> A deeply nested recursion is therefore something to avoid. :-( >>>> >>>> >>>> Only if yield-from chains aren't optimised the way they >>>> used to be. >>> >>> >>> Does that mean a very deep recursion would be efficient? >> >> TBH, I am not interested in making very deep recursion work at all. If >> you need that, you're doing it wrong in my opinion. > > > Misunderstanding I think. Of course I don't want to use deep recursion. > But people might write things that happen several levels deep and > then iterating over lots of stuff. A true generator would have no > problem with that. Okay, good. I agree that this use case should be as fast as possible -- as long as we still see every frame involved when a traceback is printed. > Assume just five layers of generators that have to be re-invoked > for a tight yielding loop is quite some overhead that can be avoided. > > The reason why I care is that existing implementations that use > greenlet style could be turned into pure python, given that I manage > to write the right support functions, and replace all functions by > generators that emulate functions with async behavior. > > It would just be great if that worked at the same speed, independent > from at which stack level an iteration happens. Yup. > Agreed that new code like that would be bad style. Like "what"? -- --Guido van Rossum (python.org/~guido) From tismer at stackless.com Fri Oct 19 19:36:39 2012 From: tismer at stackless.com (Christian Tismer) Date: Fri, 19 Oct 2012 19:36:39 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> <50814200.2050903@stackless.com> <508184DF.1050907@stackless.com> Message-ID: <50818FA7.7000000@stackless.com> On 19.10.12 19:18, Guido van Rossum wrote: > On Fri, Oct 19, 2012 at 9:50 AM, Christian Tismer wrote: >> On 19.10.12 18:07, Guido van Rossum wrote: >>> ... >>> TBH, I am not interested in making very deep recursion work at all. If >>> you need that, you're doing it wrong in my opinion. >> ... >> Agreed that new code like that would be bad style. > Like "what"? > Like code that excercises deep recursion thoughtlessly ;-) in contrast to code that happens to be quite nested because of a systematic transformation. So correctness first, big Oh later. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From jimjjewett at gmail.com Fri Oct 19 22:10:00 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 19 Oct 2012 16:10:00 -0400 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: On 10/19/12, Calvin Spealman wrote: > On Thu, Oct 18, 2012 at 11:46 PM, Jim Jewett wrote: >> [I think the yield solutions are (too magic)/(prematurely lock too >> much policy) to be "The" API, but work fine as "an example API"] > I think it is important that this is more than convention. ... Our > focus should be not on providing simple things like "async file read" but > crafting an environment where people can continue to write wonderfully > expressive and useful libraries that others can combine to their own needs. And I think that adding (requirements for generator usage) / (implied meaning of yield) prevents that. >> On 10/12/12, Guido van Rossum wrote: >>> But the only use for send() on a generator is when using it as a >>> coroutine for a concurrent tasks system -- send() really makes no >>> sense for generators used as iterators. >> But the data doesn't have to be scheduling information; it can be new >> data, a seed for an algorithm, a command to switch or reset the state >> ... locking it to the scheduler is part of what worries me. > When a coroutine yields, it yields *to the scheduler* so for whom else > should these values be? Who says that there has to be a scheduler? Or at least a single scheduler? To me, the "obvious" solution is that each co-routine is "scheduled" only by its own caller, and runs on its own micro-thread. The caller thread may or may not wait for a result to be yielded, but would not normally wait for the entire generator to be exhausted forever (the "return"). The next call to the co-routine may well be from an entirely different caller, particularly if the co-routine is a generic source or sink. There may well be several other co-routines (as opposed to a single scheduler) that enforce policy, and may send messages about things like "switch to that source of randomness", "start using this other database instance as a backup", "stop listening on that port". They would certainly want to use throw, and perhaps send as well. In practice, creating a full thread for each such co-routine probably won't work well under current threading systems, because an OS thread (let alone an OS process) is too heavy-weight. And without OS support, python has to do some internal scheduling. But I'm not convinced that the current situation will last forever, so I don't want to muddy up the *abstraction* just to coddle temporary limitations. >> But with the yield channel reserved for scheduling overhead, the >> "generator" can't really generate anything, except through side >> effects... > Don't forget that yield-from is an expression, not a statement. The > value eventually returned from the generator is the result of the > yield-from, so the generator still produces a final value. Assuming it terminates, then yes. But that isn't (conceptually) a generator; it is an ordinary function call. > The fact that these are generators is for their ability to suspend, not to > iterate. So "generator" is not really the right term. Abusing that for one module is acceptable, but I'm reluctant to bake that change into an officially sanctioned API, let alone one important enough that it might eventually be seen as the primary definition. >> * "mostly", because if my task is willing to wait for the subtask to >> complete, then why not just use a blocking call in the first place? >> Is it just because switching to another task is lighter weight than >> letting a thread block? > By blocking call do you mean "x = foo()" or "x = yield from foo()"? > Blocking call usually means the former, so if you mean that, then you > neglect to think of all the other tasks running which are not willing to wait. Exactly. From my own code's perspective, is there any difference between those two? (Well, besides the fact that the second is wordier, and puts more constraints on what I can use for foo.) So why not just use the first spelling, let the (possibly OS-level) scheduler notice that I'm blocked (if I happen to be), and let it suspend my thread waiting on foo? Is it just that *current* ways to suspend a thread of execution are expensive, and we hope to do it more cheaply? If so, that is a perfectly sensible justification for conventions within a single stdlib module. But since the trade-offs may change with time, the current costs shouldn't drive decisions about the async API, let alone changes to the meaning of "yield" or "generator". >> [Questions about generators that do not follow the new constraints] > I think if the scheduler doesn't know what to do with something, it should > be an error. That makes it easier to change things in the future. Those were all things that could reasonably happen simply by reusing correct existing code. For a specific implementation, even a stdlib module, it is OK to treat them as errors; a specific module can always be viewed as incomplete. But for "the asynchronous API of the future", undefined behavior just guarantees warts. We may eventually decide that the warts are in the existing legacy code, but there would still be warts. -jJ From guido at python.org Fri Oct 19 22:22:55 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 13:22:55 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: Jim, relax. We're not changing the meaning of yield or generator. We're just making it *possible* to use yield(-from) and generators as coroutines; that's actually a long path that started with PEP 342. No freedom is taken away by PEP 380; it just adds the possibility to do it without managing an explicit stack of coroutine calls in the scheduler. If we believed that there was no advantage to spelling a blocking call as "yield from foo()", we would just spell it as "foo()" and somehow make it work. But (and even Christian Tismer agrees) there is a problem with the shorter spelling -- you lose track of which calls may cause a task-switch. Using yield-from (or yield, for that matter) for this purpose ensures that all callers in the call chain have to explicitly mark the suspension points, and this serves as a useful reminder that after resumption, the world may look differently, because other tasks may have run in the mean time. -- --Guido van Rossum (python.org/~guido) From mark at hotpy.org Fri Oct 19 23:15:38 2012 From: mark at hotpy.org (Mark Shannon) Date: Fri, 19 Oct 2012 22:15:38 +0100 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50814DDD.9070206@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: <5081C2FA.3050107@hotpy.org> On 19/10/12 13:55, Christian Tismer wrote: > Hi Nick, > > On 16.10.12 03:49, Nick Coghlan wrote: >> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing >> wrote: >>> My original implementation of yield-from actually *did* avoid >>> this, by keeping a C-level pointer chain of yielding-from frames. >>> But that part was ripped out at the last minute when someone >>> discovered that it had a detrimental effect on tracebacks. >>> >>> There are probably other ways the traceback problem could be >>> fixed, so maybe we will get this optimisation back one day. >> Ah, I thought I remembered something along those lines. IIRC, it was a >> bug report on one of the alphas that prompted us to change it. >> > > I was curious and searched quite a lot. > It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 > from Marc Shannon, patched by Benjamin. > > Now I found the original implementation. That looks very much > as I'm thinking it should be. > > Quite a dramatic change which works well, but really seems to remove > what I would call "now I can emulate most of Stackless" efficiently. > > Maybe I should just try to think it would be implemented as before, > build an abstraction and just use it for now. > > I will spend my time at PyCon de for sprinting on "yield from". > The current implementation may not be much slower than Greg's original version. One of the main costs of making a call is the creation of a new frame. But calling into a generator does not need a new frame, so the cost will be reduced. Unless anyone has evidence to the contrary :) Rather than increasing the performance of this special case, I would suggest that improving the performance of calls & returns in general would be a more worthwhile goal. Calls and returns ought to be cheap. Cheers, Mark From guido at python.org Fri Oct 19 23:31:17 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 14:31:17 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5081C2FA.3050107@hotpy.org> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon wrote: > On 19/10/12 13:55, Christian Tismer wrote: >> >> Hi Nick, >> >> On 16.10.12 03:49, Nick Coghlan wrote: >>> >>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing >>> wrote: >>>> >>>> My original implementation of yield-from actually *did* avoid >>>> this, by keeping a C-level pointer chain of yielding-from frames. >>>> But that part was ripped out at the last minute when someone >>>> discovered that it had a detrimental effect on tracebacks. >>>> >>>> There are probably other ways the traceback problem could be >>>> fixed, so maybe we will get this optimisation back one day. >>> >>> Ah, I thought I remembered something along those lines. IIRC, it was a >>> bug report on one of the alphas that prompted us to change it. >>> >> >> I was curious and searched quite a lot. >> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >> from Marc Shannon, patched by Benjamin. >> >> Now I found the original implementation. That looks very much >> as I'm thinking it should be. >> >> Quite a dramatic change which works well, but really seems to remove >> what I would call "now I can emulate most of Stackless" efficiently. >> >> Maybe I should just try to think it would be implemented as before, >> build an abstraction and just use it for now. >> >> I will spend my time at PyCon de for sprinting on "yield from". >> > > The current implementation may not be much slower than Greg's original > version. One of the main costs of making a call is the creation of a new > frame. But calling into a generator does not need a new frame, so the cost > will be reduced. > Unless anyone has evidence to the contrary :) > > Rather than increasing the performance of this special case, I would suggest > that improving the performance of calls & returns in general would be a more > worthwhile goal. > Calls and returns ought to be cheap. I did a basic timing test using a simple recursive function and a recursive PEP-380 coroutine computing the same value (see attachment). The coroutine version is a little over twice as slow as the function version. I find that acceptable. This went 20 deep, making 2 recursive calls at each level (except at the deepest level). Output on my MacBook Pro: plain 2097151 0.5880069732666016 coro. 2097151 1.2958409786224365 This was a Python 3.3 built a few days ago from the 3.3 branch. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- A non-text attachment was scrubbed... Name: p3time.py Type: application/octet-stream Size: 675 bytes Desc: not available URL: From tismer at stackless.com Sat Oct 20 00:31:15 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 00:31:15 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: Hi Guido, Marc, all, this is a veery promising result, telling me that the big Oh can in fact be neglected in real applications. 20 for twi is good! I will of course do an analysis and find the parameters of the quadratic, but my concern is pretty much tamed. For me that means there will soon be a library that contains real generators and more building blocks. I think using those would simplify the design of the async API quite a lot. I suggest to regard current generator constructs as low-level helpers for Implementing the real concurrency building blocks. Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract from this now and think in terms of higher level constructs. Let's assume generators and coroutines, and model concurrency from that. I believe this unwinds the brains and clarifies things a lot. I will provide sone classes for that at the pycon.de sprint, unless somebody implements it earlier (please don't). This email was written in non-linear order, so please ignore logic inversions. Cheers - chris Sent from my Ei4Steve On Oct 19, 2012, at 23:31, Guido van Rossum wrote: > On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon wrote: >> On 19/10/12 13:55, Christian Tismer wrote: >>> >>> Hi Nick, >>> >>> On 16.10.12 03:49, Nick Coghlan wrote: >>>> >>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing >>>> wrote: >>>>> >>>>> My original implementation of yield-from actually *did* avoid >>>>> this, by keeping a C-level pointer chain of yielding-from frames. >>>>> But that part was ripped out at the last minute when someone >>>>> discovered that it had a detrimental effect on tracebacks. >>>>> >>>>> There are probably other ways the traceback problem could be >>>>> fixed, so maybe we will get this optimisation back one day. >>>> >>>> Ah, I thought I remembered something along those lines. IIRC, it was a >>>> bug report on one of the alphas that prompted us to change it. >>> >>> I was curious and searched quite a lot. >>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >>> from Marc Shannon, patched by Benjamin. >>> >>> Now I found the original implementation. That looks very much >>> as I'm thinking it should be. >>> >>> Quite a dramatic change which works well, but really seems to remove >>> what I would call "now I can emulate most of Stackless" efficiently. >>> >>> Maybe I should just try to think it would be implemented as before, >>> build an abstraction and just use it for now. >>> >>> I will spend my time at PyCon de for sprinting on "yield from". >> >> The current implementation may not be much slower than Greg's original >> version. One of the main costs of making a call is the creation of a new >> frame. But calling into a generator does not need a new frame, so the cost >> will be reduced. >> Unless anyone has evidence to the contrary :) >> >> Rather than increasing the performance of this special case, I would suggest >> that improving the performance of calls & returns in general would be a more >> worthwhile goal. >> Calls and returns ought to be cheap. > > I did a basic timing test using a simple recursive function and a > recursive PEP-380 coroutine computing the same value (see attachment). > The coroutine version is a little over twice as slow as the function > version. I find that acceptable. This went 20 deep, making 2 recursive > calls at each level (except at the deepest level). > > Output on my MacBook Pro: > > plain 2097151 0.5880069732666016 > coro. 2097151 1.2958409786224365 > > This was a Python 3.3 built a few days ago from the 3.3 branch. > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From tismer at stackless.com Sat Oct 20 00:45:15 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 00:45:15 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: <7BAFF9D3-CFE6-48BC-8721-0F520FF1C924@stackless.com> s/twi/two/ Sent from my Ei4Steve On Oct 20, 2012, at 0:31, Christian Tismer wrote: > Hi Guido, Marc, all, > > this is a veery promising result, telling > me that the big Oh can in fact be > neglected in real applications. 20 for > twi is good! > > I will of course do an analysis and find > the parameters of the quadratic, but my > concern is pretty much tamed. > > For me that means there will soon be > a library that contains real generators > and more building blocks. > > I think using those would simplify the > design of the async API quite a lot. > > I suggest to regard current generator > constructs as low-level helpers for > Implementing the real concurrency > building blocks. > > Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract > from this now and think in terms of > higher level constructs. > > Let's assume generators and coroutines, > and model concurrency from that. I > believe this unwinds the brains and > clarifies things a lot. > > I will provide sone classes for that at > the pycon.de sprint, unless somebody > implements it earlier (please don't). > > This email was written in non-linear > order, so please ignore logic inversions. > > Cheers - chris > > Sent from my Ei4Steve > > On Oct 19, 2012, at 23:31, Guido van Rossum wrote: > >> On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon wrote: >>> On 19/10/12 13:55, Christian Tismer wrote: >>>> >>>> Hi Nick, >>>> >>>> On 16.10.12 03:49, Nick Coghlan wrote: >>>>> >>>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing >>>>> wrote: >>>>>> >>>>>> My original implementation of yield-from actually *did* avoid >>>>>> this, by keeping a C-level pointer chain of yielding-from frames. >>>>>> But that part was ripped out at the last minute when someone >>>>>> discovered that it had a detrimental effect on tracebacks. >>>>>> >>>>>> There are probably other ways the traceback problem could be >>>>>> fixed, so maybe we will get this optimisation back one day. >>>>> >>>>> Ah, I thought I remembered something along those lines. IIRC, it was a >>>>> bug report on one of the alphas that prompted us to change it. >>>> >>>> I was curious and searched quite a lot. >>>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >>>> from Marc Shannon, patched by Benjamin. >>>> >>>> Now I found the original implementation. That looks very much >>>> as I'm thinking it should be. >>>> >>>> Quite a dramatic change which works well, but really seems to remove >>>> what I would call "now I can emulate most of Stackless" efficiently. >>>> >>>> Maybe I should just try to think it would be implemented as before, >>>> build an abstraction and just use it for now. >>>> >>>> I will spend my time at PyCon de for sprinting on "yield from". >>> >>> The current implementation may not be much slower than Greg's original >>> version. One of the main costs of making a call is the creation of a new >>> frame. But calling into a generator does not need a new frame, so the cost >>> will be reduced. >>> Unless anyone has evidence to the contrary :) >>> >>> Rather than increasing the performance of this special case, I would suggest >>> that improving the performance of calls & returns in general would be a more >>> worthwhile goal. >>> Calls and returns ought to be cheap. >> >> I did a basic timing test using a simple recursive function and a >> recursive PEP-380 coroutine computing the same value (see attachment). >> The coroutine version is a little over twice as slow as the function >> version. I find that acceptable. This went 20 deep, making 2 recursive >> calls at each level (except at the deepest level). >> >> Output on my MacBook Pro: >> >> plain 2097151 0.5880069732666016 >> coro. 2097151 1.2958409786224365 >> >> This was a Python 3.3 built a few days ago from the 3.3 branch. >> >> -- >> --Guido van Rossum (python.org/~guido) >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From greg.ewing at canterbury.ac.nz Sat Oct 20 01:02:17 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 12:02:17 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50814200.2050903@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <5080A91A.3020804@stackless.com> <5080E1F5.7090709@canterbury.ac.nz> <50814200.2050903@stackless.com> Message-ID: <5081DBF9.7020200@canterbury.ac.nz> Christian Tismer wrote: > Can you give me a hint how your initial implementation > works, the initial patch source? You can find my initial patches here: http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/yield_from.html Essentially, an extra field f_yieldfrom is added to frame objects. When a 'yield from' is started, the f_yieldfrom field of the calling frame is set to point to the called frame. The __next__ method of a generator first traverses the f_yieldfrom chain to find the frame at the end, and then resumes that frame. So most of the time, only the innermost frame of a nested yield-from chain is actually entered in response to a next() call. (There are some complications due to the fact that you can 'yield from' something that's not a generator, but the above is effectively what happens when all the objects in the chain are generators.) -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 20 01:29:25 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 12:29:25 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <5081E255.7000107@canterbury.ac.nz> Calvin Spealman wrote: > I think it is important that this is more than convention. I think that we > need our old friend TOOOWTDI (There's Only One Obvious Way To Do It) > here more than ever. This is part of the reason that I don't like the idea of controlling the scheduler by yielding instructions to it. There are a great many ways that such a "scheduler instruction set" could be designed, none of them any more obvious than the others. So rather than single out an arbitrarily chosen set of operations to be regarded as primitives that the scheduler knows about directly, I would rather have *no* such primitives in the public API. -- Greg From guido at python.org Sat Oct 20 01:39:05 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 19 Oct 2012 16:39:05 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <5081E255.7000107@canterbury.ac.nz> References: <5081E255.7000107@canterbury.ac.nz> Message-ID: On Fri, Oct 19, 2012 at 4:29 PM, Greg Ewing wrote: > Calvin Spealman wrote: > >> I think it is important that this is more than convention. I think that we >> need our old friend TOOOWTDI (There's Only One Obvious Way To Do It) >> here more than ever. > > > This is part of the reason that I don't like the idea of > controlling the scheduler by yielding instructions to it. > There are a great many ways that such a "scheduler instruction > set" could be designed, none of them any more obvious than > the others. > > So rather than single out an arbitrarily chosen set of > operations to be regarded as primitives that the scheduler > knows about directly, I would rather have *no* such > primitives in the public API. But you have that problem anyway. In your current style you write things like this: block(self.queue) yield I don't see how this decouples the call site of the primitive from the scheduler any more than if you were to write e.g. this: yield block(self.queue) In fact, you can write it in your current framework and it would have the exact same effect! That's because block() returns None, so it comes down to calling block(self.queue) and then yielding None, which is exactly what happens in the first form as well. And even if block() were to return a value, since the scheduler ignores the return value from next(), it still works the same way. Not that I recommend doing this just because it works -- but if we liked the second form better, we could easily implement block() in such a way that you'd *have* to write it like that. So, I don't see what we gain by writing it the first way. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sat Oct 20 01:50:20 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 12:50:20 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> Message-ID: <5081E73C.5060206@canterbury.ac.nz> Nick Coghlan wrote: > my suspicion is that generator objects themselves need to be > maintaining a full "generator stack" independent of the frame stack in > the main eval loop in order to get the best of both worlds (i.e. > optimised suspend/resume with confusing debuggers). The f_yieldfrom chain effectively *is* a generator stack, it's just linked in the opposite direction to the way stacks normally are. While you probably could move f_yieldfrom out of the frame object and into the generator-iterator object, I don't see how it would make any difference to the traceback issue. I'm not even sure why my original implementation was getting tracebacks wrong. What *should* happen is that if an exception comes out of a generator being yielded from, the tail is chopped off the f_yieldfrom chain and the exception is thrown into the next frame up, thereby adding its frame to the traceback. It may simply be that there was a minor bug in my implementation that could be fixed without ditching the whole f_yieldfrom idea. I may look into this if I find time. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 20 02:33:31 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 13:33:31 +1300 Subject: [Python-ideas] The async API of the future In-Reply-To: References: Message-ID: <5081F15B.1040403@canterbury.ac.nz> Guido van Rossum wrote: > I would like people to be able to write fast > event handling programs on Windows too, ... But I don't know how > tenable that is given the dramatically different style used by IOCP > and the need to use native Windows API for all async I/O -- it sounds > like we could only do this if the library providing the I/O loop > implementation also wrapped all I/O operations, and that may be a bit > much. That's been bothering me, too. It seems like an interface accommodating the completion-based style will have to be *extremely* fat. That's not just a burden for anyone implementing the interface, it's a problem for any library wanting to *wrap* it as well. For example, to maintain separation between the async layer and the generator layer, we will probably want to have an AsyncSocket object in the async layer, and a separate GeneratorSocket in the generator layer that wraps an AsyncSocket. If the AsyncSocket needs to provide methods for all the possible I/O operations that one might want to perform on a socket, then GeneratorSocket needs to provide its own versions of all those methods as well. Multiply that by the number of different kinds of I/O objects (files, sockets, message queues, etc. -- there seem to be quite a lot of them on Windows) and that's a *lot* of stuff to be wrapped. > Finally, there should also be some minimal interface so that multiple > I/O loops can interact -- at least in the case where one I/O loop > belongs to a GUI library. That's another thing that worries me. With a ready-based event loop, this is fairly straightforward. If you can get hold of the file descriptor or handle that the GUI is ultimately reading its input from, all you need to do is add it as an event source to your main loop, and when it's ready, tell the GUI event loop to run itself once. But you can't do that with a completion-based main loop, because the actual reading of the input needs to be done in a different way, and that's usually buried somewhere deep in the GUI library where you can't easily change it. > It seems this is a solved problem (as well > solved as you can hope for) to Twisted, so we should just adopt their > approach. Do they actually do it for an IOCP-based main loop on Windows? If so, I'd be interested to know how. -- Greg From jstpierre at mecheye.net Sat Oct 20 02:50:00 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 19 Oct 2012 20:50:00 -0400 Subject: [Python-ideas] The async API of the future In-Reply-To: <5081F15B.1040403@canterbury.ac.nz> References: <5081F15B.1040403@canterbury.ac.nz> Message-ID: On Fri, Oct 19, 2012 at 8:33 PM, Greg Ewing wrote: ... snip ... > That's another thing that worries me. With a ready-based > event loop, this is fairly straightforward. If you can get > hold of the file descriptor or handle that the GUI is > ultimately reading its input from, all you need to do is > add it as an event source to your main loop, and when it's > ready, tell the GUI event loop to run itself once. For most windowing systems, this isn't true. You need to call some function to check if you have events pending. For X11, this is "XPending". For Win32, this is "GetQueueStatus". But overall, the thing is that most GUI libraries have their own event loops. In GTK+, this is done with a "GSource", which can have support for custom sources (which is how the calls to the above APIs are made). What Twisted does is this case is swap out their own select loop with another implementation built around GLib's GMainLoop, which uses whatever internally. I'd highly recommend taking Twisted's approach of having swappable event loops. The question then becomes how you swap out the main loop: Twisted does this with a global reactor which you "install", which the community has found rather ugly, but there isn't really a better solution they've come up with. They've had a few proposals over the years to add better functionality, so I'd like to hear their experience on this. > -- > Greg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From greg.ewing at canterbury.ac.nz Sat Oct 20 02:50:28 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 13:50:28 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50817D62.4040607@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> Message-ID: <5081F554.5090404@canterbury.ac.nz> Christian Tismer wrote: > Actually I would like to have a python context where it gets into > "async mode" and interprets all functions defined in that mode as > generators. That sounds somewhat similar to another idea I proposed a while ago: There would be a special kind of function called a "cofunction", that you define using "codef" instead of "def". A cofunction is essentially a generator, but with a special property: when one cofunction calls another, the call is implicitly made as a "yield from" call. This scheme wouldn't be completely transparent, since the cofunctions have to be defined in a special way. But the calls would look like ordinary calls. There's a PEP describing a variation on the idea here: http://www.python.org/dev/peps/pep-3152/ In that version, calls to cofunctions are specially marked using a "cocall" keyword. But since writing that, I've come to believe that my original idea (where the cocalls are implicit) was better. -- Greg From tismer at stackless.com Sat Oct 20 03:17:02 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 03:17:02 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: <5081FB8E.2000908@stackless.com> Errhm...., On 20.10.12 00:31, Christian Tismer wrote: > Hi Guido, Marc, all, > > this is a veery promising result, telling > me that the big Oh can in fact be > neglected in real applications. 20 for > twi is good! > > I will of course do an analysis and find > the parameters of the quadratic, but my > concern is pretty much tamed. > > For me that means there will soon be > a library that contains real generators > and more building blocks. > > I think using those would simplify the > design of the async API quite a lot. > > I suggest to regard current generator > constructs as low-level helpers for > Implementing the real concurrency > building blocks. > > Instead of using the existing re.compile("yield (from)?") pattern, I think we can abstract > from this now and think in terms of > higher level constructs. > > Let's assume generators and coroutines, > and model concurrency from that. I > believe this unwinds the brains and > clarifies things a lot. > > I will provide sone classes for that at > the pycon.de sprint, unless somebody > implements it earlier (please don't). > > This email was written in non-linear > order, so please ignore logic inversions. > > Cheers - chris > > Sent from my Ei4Steve > > On Oct 19, 2012, at 23:31, Guido van Rossum wrote: > >> On Fri, Oct 19, 2012 at 2:15 PM, Mark Shannon wrote: >>> On 19/10/12 13:55, Christian Tismer wrote: >>>> Hi Nick, >>>> >>>> On 16.10.12 03:49, Nick Coghlan wrote: >>>>> On Tue, Oct 16, 2012 at 10:44 AM, Greg Ewing >>>>> wrote: >>>>>> My original implementation of yield-from actually *did* avoid >>>>>> this, by keeping a C-level pointer chain of yielding-from frames. >>>>>> But that part was ripped out at the last minute when someone >>>>>> discovered that it had a detrimental effect on tracebacks. >>>>>> >>>>>> There are probably other ways the traceback problem could be >>>>>> fixed, so maybe we will get this optimisation back one day. >>>>> Ah, I thought I remembered something along those lines. IIRC, it was a >>>>> bug report on one of the alphas that prompted us to change it. >>>> I was curious and searched quite a lot. >>>> It was v3.3.0a1 from 2012-03-15 as a reaction to #14230 and #14220 >>>> from Marc Shannon, patched by Benjamin. >>>> >>>> Now I found the original implementation. That looks very much >>>> as I'm thinking it should be. >>>> >>>> Quite a dramatic change which works well, but really seems to remove >>>> what I would call "now I can emulate most of Stackless" efficiently. >>>> >>>> Maybe I should just try to think it would be implemented as before, >>>> build an abstraction and just use it for now. >>>> >>>> I will spend my time at PyCon de for sprinting on "yield from". >>> The current implementation may not be much slower than Greg's original >>> version. One of the main costs of making a call is the creation of a new >>> frame. But calling into a generator does not need a new frame, so the cost >>> will be reduced. >>> Unless anyone has evidence to the contrary :) >>> >>> Rather than increasing the performance of this special case, I would suggest >>> that improving the performance of calls & returns in general would be a more >>> worthwhile goal. >>> Calls and returns ought to be cheap. >> I did a basic timing test using a simple recursive function and a >> recursive PEP-380 coroutine computing the same value (see attachment). >> The coroutine version is a little over twice as slow as the function >> version. I find that acceptable. This went 20 deep, making 2 recursive >> calls at each level (except at the deepest level). >> >> Output on my MacBook Pro: >> >> plain 2097151 0.5880069732666016 >> coro. 2097151 1.2958409786224365 >> >> This was a Python 3.3 built a few days ago from the 3.3 branch. >> What you are comparing seems to have a constant factor of about 2.5. minimax:py3 tismer$ python3 p3time.py plain 0 1 0.00000 coro. 0 1 0.00001 relat 0 1 8.50000 plain 1 3 0.00000 coro. 1 3 0.00001 relat 1 3 2.77778 plain 2 7 0.00000 coro. 2 7 0.00001 relat 2 7 3.62500 plain 3 15 0.00000 coro. 3 15 0.00001 relat 3 15 2.87500 plain 4 31 0.00001 coro. 4 31 0.00002 relat 4 31 2.42424 plain 5 63 0.00002 coro. 5 63 0.00004 relat 5 63 2.46032 plain 6 127 0.00003 coro. 6 127 0.00007 relat 6 127 2.52542 plain 7 255 0.00006 coro. 7 255 0.00014 relat 7 255 2.38272 plain 8 511 0.00011 coro. 8 511 0.00028 relat 8 511 2.49356 plain 9 1023 0.00022 coro. 9 1023 0.00055 relat 9 1023 2.50327 plain 10 2047 0.00042 coro. 10 2047 0.00106 relat 10 2047 2.50956 plain 11 4095 0.00083 coro. 11 4095 0.00204 relat 11 4095 2.44699 plain 12 8191 0.00167 coro. 12 8191 0.00441 relat 12 8191 2.64792 plain 13 16383 0.00340 coro. 13 16383 0.00855 relat 13 16383 2.51881 plain 14 32767 0.00876 coro. 14 32767 0.01823 relat 14 32767 2.08106 plain 15 65535 0.01419 coro. 15 65535 0.03507 relat 15 65535 2.47131 plain 16 131071 0.02669 coro. 16 131071 0.06874 relat 16 131071 2.57515 plain 17 262143 0.05448 coro. 17 262143 0.13699 relat 17 262143 2.51467 plain 18 524287 0.10843 coro. 18 524287 0.27395 relat 18 524287 2.52660 plain 19 1048575 0.21310 coro. 19 1048575 0.54573 relat 19 1048575 2.56095 plain 20 2097151 0.42802 coro. 20 2097151 1.06199 relat 20 2097151 2.48114 plain 21 4194303 0.86531 coro. 21 4194303 2.19048 relat 21 4194303 2.53143 ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: p3time.py Type: text/x-python Size: 946 bytes Desc: not available URL: From tjreedy at udel.edu Sat Oct 20 03:55:22 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 19 Oct 2012 21:55:22 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: On 10/19/2012 5:31 PM, Guido van Rossum wrote: > I did a basic timing test using a simple recursive function and a > recursive PEP-380 coroutine computing the same value (see attachment). > The coroutine version is a little over twice as slow as the function > version. I find that acceptable. This went 20 deep, making 2 recursive > calls at each level (except at the deepest level). > > Output on my MacBook Pro: > > plain 2097151 0.5880069732666016 > coro. 2097151 1.2958409786224365 > > This was a Python 3.3 built a few days ago from the 3.3 branch. At the top level, the coroutine version adds 2097151 next() calls. Suspecting that that, not the addition of 'yield from' was responsible for most of the extra time, I added def trivial(): for i in range(2097151): yield raise StopIteration(2097151) ... t0 = time.time() try: g = trivial() while True: next(g) except StopIteration as err: k = err.value t1 = time.time() print('triv.', k, t1-t0) The result supports the hypothesis. plain 2097151 0.4590260982513428 coro. 2097151 0.9180529117584229 triv. 2097151 0.39902305603027344 I don't know what to make of this in the context of asynch operations, but in 'traditional' use, the generator would not replace a function returning a single number but one returning a list (of, in this case, 2097151 numbers), so each next replaces a .append method call. -- Terry Jan Reedy From ncoghlan at gmail.com Sat Oct 20 03:56:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Oct 2012 11:56:45 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5081F554.5090404@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: On Sat, Oct 20, 2012 at 10:50 AM, Greg Ewing wrote: > Christian Tismer wrote: >> >> Actually I would like to have a python context where it gets into >> "async mode" and interprets all functions defined in that mode as >> generators. > > > That sounds somewhat similar to another idea I proposed a while > ago: > > There would be a special kind of function called a "cofunction", > that you define using "codef" instead of "def". A cofunction > is essentially a generator, but with a special property: when > one cofunction calls another, the call is implicitly made as > a "yield from" call. > > This scheme wouldn't be completely transparent, since the > cofunctions have to be defined in a special way. But the calls > would look like ordinary calls. Please don't lose sight of the fact that yield-based suspension points looking like something other than an ordinary function call is a *feature*, not a bug. The idea is that the flow control, especially the fact that "other code may run here, so the world may have changed before we get to the next expression", is visible *locally* in each function, rather than relying on global knowledge of which calls may lead to a task switch. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Steve.Dower at microsoft.com Sat Oct 20 04:41:52 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 20 Oct 2012 02:41:52 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz>, Message-ID: I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active. I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented. I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys". The write-up is here: http://stevedower.id.au/blog/async-api-for-python/ I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!) And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth. Let the tearing apart of my approach commence! :) Cheers, Steve From greg.ewing at canterbury.ac.nz Sat Oct 20 04:44:51 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 15:44:51 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5081E255.7000107@canterbury.ac.nz> Message-ID: <50821023.2090303@canterbury.ac.nz> Guido van Rossum wrote: > In your current style you write > things like this: > > block(self.queue) > yield > > I don't see how this decouples the call site of the primitive from the > scheduler any more than if you were to write e.g. this: > > yield block(self.queue) If I wrote a library intended for serious use, the end user probably wouldn't write either of those. Instead he would write something like yield from block(self.queue) and it would be an implementation detail of the library where abouts the 'yield' happened and whether it needed to send a value or not. When I say I don't like scheduler instructions, all I really mean is that they shouldn't be part of the public API. A scheduler can use them internally if it wants, I don't care. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 20 05:11:08 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 16:11:08 +1300 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <5081F15B.1040403@canterbury.ac.nz> Message-ID: <5082164C.20600@canterbury.ac.nz> Jasper St. Pierre wrote: > For most windowing systems, this isn't true. You need to call some > function to check if you have events pending. For X11, this is > "XPending". For Win32, this is "GetQueueStatus". X11 is ultimately reading its events from the socket to the display server. If you select() that socket, it will tell you whenever the X11 event loop could possibly have something to do. On Windows, I imagine the equivalent would be to pass your message queue handle to a WaitForMultipleObjects call. I've never tried to do anything like that, though, so I don't know if it would really work. > What Twisted does is this case is swap out their own select > loop with another implementation built around GLib's GMainLoop, If it's truly impossible to incorporate GMainLoop as a sub-loop of something else, then this is a bad situation. What happens if you also want to use some other library that insists on *its* main loop being in charge? This cannot be a general solution. -- Greg From jstpierre at mecheye.net Sat Oct 20 05:20:22 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 19 Oct 2012 23:20:22 -0400 Subject: [Python-ideas] The async API of the future In-Reply-To: <5082164C.20600@canterbury.ac.nz> References: <5081F15B.1040403@canterbury.ac.nz> <5082164C.20600@canterbury.ac.nz> Message-ID: On Fri, Oct 19, 2012 at 11:11 PM, Greg Ewing wrote: > Jasper St. Pierre wrote: >> >> For most windowing systems, this isn't true. You need to call some >> function to check if you have events pending. For X11, this is >> "XPending". For Win32, this is "GetQueueStatus". > > > X11 is ultimately reading its events from the socket to > the display server. If you select() that socket, it will > tell you whenever the X11 event loop could possibly have > something to do. Nope. libX11/XCB keep their own queue of events and do their own socket management, so it's not just "poll on this FD, thanks" http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/Pending.c http://cgit.freedesktop.org/xorg/lib/libX11/tree/src/xcb_io.c#n344 > On Windows, I imagine the equivalent would be to pass your > message queue handle to a WaitForMultipleObjects call. > I've never tried to do anything like that, though, so > I don't know if it would really work. > > >> What Twisted does is this case is swap out their own select >> loop with another implementation built around GLib's GMainLoop, > > > If it's truly impossible to incorporate GMainLoop as a > sub-loop of something else, then this is a bad situation. > What happens if you also want to use some other library > that insists on *its* main loop being in charge? This > cannot be a general solution. GLib has a way of embedding its main loop in another, but it's not easy or viable to use in a situation like this. It basically splits up its event loop into multiple pieces (prepare, check, dispatch), which you call at various times. Qt uses this for their GLib mainloop integration. It's clear there's never going to be one event loop solution (as Guido already mentioned, there's wars about libuv/libevent/libev that we can't possibly resolve), so why pretend like there is? > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From greg.ewing at canterbury.ac.nz Sat Oct 20 07:00:00 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 18:00:00 +1300 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <5081F15B.1040403@canterbury.ac.nz> <5082164C.20600@canterbury.ac.nz> Message-ID: <50822FD0.2010005@canterbury.ac.nz> Jasper St. Pierre wrote: > Nope. libX11/XCB keep their own queue of events and do their own > socket management, so it's not just "poll on this FD, thanks" So you keep going until the internal buffer is empty. "Run once" is probably a bit inaccurate; it's really more like "run until you don't think there's anything more to do". > It's clear there's never going to be one event loop solution (as Guido > already mentioned, there's wars about libuv/libevent/libev that we > can't possibly resolve), so why pretend like there is? This discussion seems to have got off track. I'm not opposed to being able to choose whichever top-level event loop works the best for your application. All I set out to say is that a wait-for-ready style event loop seems more amenable to having other event loops plugged into it than a wait-for-completion one. But maybe that's not a problem if we provide an IOCP-based event loop that can be plugged into the wait-for-ready loop of your choice. Is that likely to be feasible? -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 20 07:19:11 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 18:19:11 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <5082344F.7060702@canterbury.ac.nz> Jim Jewett wrote: > Who says that there has to be a scheduler? Or at least a single scheduler? > > To me, the "obvious" solution is that each co-routine is "scheduled" > only by its own caller, and runs on its own micro-thread. I think you may be confused about what we mean by a "scheduler". The scheduler is not something that you tell which task should run next. Rather, the scheduler decides which task to run next when the current task says "I'm waiting for something, let someone else have a turn." The task that gets run will very often be one that the suspending task knows nothing about. It's for that reason -- not all the tasks know about each other -- that I think it's best to have only one scheduler in any given system, so that it can make the best decision about what to run next. -- Greg From jeanpierreda at gmail.com Sat Oct 20 07:27:52 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Sat, 20 Oct 2012 01:27:52 -0400 Subject: [Python-ideas] The async API of the future: yield-from Message-ID: On Fri, Oct 19, 2012 at 10:44 PM, Greg Ewing wrote: > If I wrote a library intended for serious use, the end user > probably wouldn't write either of those. Instead he would > write something like > > yield from block(self.queue) > > and it would be an implementation detail of the library > where abouts the 'yield' happened and whether it needed > to send a value or not. What's the benefit of having both "yield" and "yield from" as opposed to just "yield"? It seems like an attractive nuisance if "yield" works but doesn't let the function have implementation details and wait for more than one thing or somesuch. With the existing generator-coroutine decorators (monocle, inlineCallbacks), there is no such trap. "yield foo()" will work no matter how many things foo() will wait for. My understanding is that the only benefit we get here is nicer tracebacks. I hope there's more. -- Devin From greg.ewing at canterbury.ac.nz Sat Oct 20 07:37:54 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 18:37:54 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: <508238B2.4040808@canterbury.ac.nz> Nick Coghlan wrote: > Please don't lose sight of the fact that yield-based suspension points > looking like something other than an ordinary function call is a > *feature*, not a bug. People keep asserting that, but I don't think we have enough experience with the explicit-switching-point-markers-all-the- way-up style of coding to tell whether it's a good idea or not. My gut feeling is that the explicit markers will help at the lowest levels, where you're trying to protect a critical section, but at the upper levels they will just be noise that causes unnecessary worry. In one of Guido's earlier posts (which I can't find now, unfortunately), he said something that made it sound like he was coming around to that point of view too, but he seems to have gone back on that recently. -- Greg From greg.ewing at canterbury.ac.nz Sat Oct 20 07:52:53 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 20 Oct 2012 18:52:53 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <50823C35.8020800@canterbury.ac.nz> Devin Jeanpierre wrote: >>If I wrote a library intended for serious use, the end user >>probably wouldn't write either of those. Instead he would >>write something like >> >> yield from block(self.queue) > What's the benefit of having both "yield" and "yield from" as opposed > to just "yield"? It seems like an attractive nuisance if "yield" works > but doesn't let the function have implementation details and wait for > more than one thing or somesuch. The documentation would say to use "yield from", and if someone misreads that and just says "yield" instead, it's their own fault. I don't think it's worth the rather large increase in the complexity of the scheduler implementation that would be required to make "yield foo()" do the same thing as "yield from foo()" in all circumstances, just to rescue people who make this kind of mistake. It's unfortunate that "yield" and "yield from" look so similar. This is one way that cofunctions would help, by making calls to subtasks look very different from yields. > My understanding is that the only benefit we get here is nicer > tracebacks. I hope there's more. You also get a much simpler and much more efficient scheduler implementation. -- Greg From ubershmekel at gmail.com Sat Oct 20 10:00:28 2012 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 20 Oct 2012 10:00:28 +0200 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <507FB480.1090009@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507F27FC.4040706@canterbury.ac.nz> <507FB480.1090009@canterbury.ac.nz> Message-ID: On Thu, Oct 18, 2012 at 9:49 AM, Greg Ewing wrote: > I've converted my tutorial on generator-based tasks > for Python 3.3, tidied it up a bit and posted it here: > > http://www.cosc.canterbury.ac.**nz/greg.ewing/python/tasks/ > > > -- > Greg > > > Thanks for writing this. I've used threads all my life so this coroutine/yield-from paradigm is hard for me to grok even after reading this quite a few times. I can't wrap my head around the block and unblock functions. block() removes the current task from the ready_list, but is the current task guaranteed to be my task? If so, then I'd never run again after the yield in acquire(), that is unless a gracious other player unblocks me. block() in acquire() is the philosopher or fork avoiding the scheduler? yield in acquire() is the philosopher relinquishing control or the fork? I think I finally figured it out after staring at it for long enough. I'm not sure it makes sense for scheduler functions to store waiting tasks in a queue owned by the app and invisible from the scheduler. This can cause *invisible deadlocks* such as: schedule(philosopher("Socrates", 8, 3, 1, forks[0], forks[2]), "Socrates") schedule(philosopher("Euclid", 5, 1, 4, forks[2], forks[0]), "Euclid") Which may be really hard to debug. Is there a coroutine strategy for tackling these challenges? Or will I just get better at overcoming them with practice? --Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Sat Oct 20 10:33:07 2012 From: glyph at twistedmatrix.com (Glyph) Date: Sat, 20 Oct 2012 01:33:07 -0700 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <5081F15B.1040403@canterbury.ac.nz> Message-ID: <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> Greg Ewing wrote: > Guido van Rossum wrote: >> >> I would like people to be able to write fast >> event handling programs on Windows too, ... But I don't know how >> tenable that is given the dramatically different style used by IOCP >> and the need to use native Windows API for all async I/O -- it sounds >> like we could only do this if the library providing the I/O loop >> implementation also wrapped all I/O operations, and that may be a bit >> much. > > That's been bothering me, too. It seems like an interface accommodating the completion-based style will have to be *extremely* fat. No, not really. Quite the opposite, in fact. The way to make the interface thin is to abstract out all the details related to the particulars of the multiplexing I/O underneath everything and the transport functions necessary to read data out of it. The main interfaces you need are here: which have maybe a dozen methods between them, and could be cleaned up for a standardized version. The interface required for unifying over completion-based and socket-based is actually much thinner than the interface you get if you start exposing sockets all over the place. But, focusing on I/O completion versus readiness-notification is, like the triggering modes discussion, missing the forest for the trees. Some of IOCP's triggering modes are itself an interesting example of a pattern, but, by itself, is a bit of a red herring. Another thing you want to abstract over is pipes versus sockets versus files versus UNIX sockets versus UNIX sockets with CMSG extensions versus TLS over TCP versus SCTP versus bluetooth. 99% of applications do not care: a stream of bytes is a stream of bytes and you have to turn it into a stream of some other, higher-layer event protocol. I would really, really encourage everyone interested in this area of design to go read all of twisted.internet.interfaces and familiarize yourselves with the contents there and make specific comments about those existing interfaces rather than some hypothetical ideal. Also, the Twisted chapter in "the architecture of open source applications" explains some of Twisted's architectural decisions. If you're going to re-invent the wheel, it behooves you to at least check whether the existing ones are round. I'm happy to answer questions about specifics of how things are implemented, whether the Twisted APIs have certain limitations, and filling in gaps in the documentation. There are certainly an embarrassing profusion of those, especially in these decade-old, core APIs that haven't changed since we started requiring docstrings; if you find any, please file bugs and I will try to do what I can to get them fixed. But I'd rather not have to keep re-describing the basics. > That's not just a burden for anyone implementing the interface, it's a problem for any library wanting to *wrap* it as well. I really have no idea what you mean by this. Writing and wrapping ITransport and IProtocol is pretty straightforward. With the enhanced interfaces I'm working on in , it's almost automatic. , for example, is a complete bi-directional proxying of all interfaces related to transports (even TCP transport specific APIs, not just the core interfaces above), in addition to implementing all the glue necessary for TLS, with thorough docstrings and comments, all in just over 600 lines. This also easily deals with the fact that, for example, sometimes in order to issue a read-ready notification, TLS needs to write some bytes; and in order to issue a write-ready notification, TLS sometimes needs to read some bytes. > For example, to maintain separation between the async layer and the generator layer, we will probably want to have an AsyncSocket object in the async layer, and a separate GeneratorSocket in the generator layer that wraps an AsyncSocket. Yes, generator scheduling and async I/O are really different layers, as I explained in a previous email. This is a good thing as it provides a common basis for developing things in different styles as appropriate to different problem domains. If you smash them together you're just conflating responsibilities and requiring abstraction inversions, not making it easier to implement anything. > If the AsyncSocket needs to provide methods for all the possible I/O operations that one might want to perform on a socket, then GeneratorSocket needs to provide its own versions of all those methods as well. GeneratorSocket does not even need to exist in the first implementation of this kind of a spec, let alone provide all possible operations. Python managed to survive without "all the possible I/O operations that one might want to perform on a socket" for well over a decade; sendmsg and recvmsg didn't arrive until 3.3: . Plus, GeneratorSocket isn't that hard to write. You just need a method for each I/O operation that returns a Future (by which I mean Deferred, of course :)) and then fires that Future from the relevant I/O operation's callback. > Multiply that by the number of different kinds of I/O objects (files, sockets, message queues, etc. -- there seem to be quite a lot of them on Windows) and that's a *lot* of stuff to be wrapped. The common operations here are by far the most important. But, yes, if you want to have support for all the wacky things that Windows provides, you have to write wrappers for all the wacky things you need to call. >> Finally, there should also be some minimal interface so that multiple I/O loops can interact -- at least in the case where one I/O loop belongs to a GUI library. > > That's another thing that worries me. With a ready-based event loop, this is fairly straightforward. If you can get hold of the file descriptor or handle that the GUI is ultimately reading its input from, all you need to do is add it as an event source to your main loop, and when it's ready, tell the GUI event loop to run itself once. No. That is how X windows and ncurses work, not how GUIs in general work. On Windows, the GUI is a message pump on a thread (and possibly a collection thereof); there's no discrete event which represents it and no completion port or event object that gets its I/O, but at the low level, you're still expected to write your own loop and call something that blocks waiting for GUI input. (This actually causes some problems, see below.) On Mac OS X, the GUI is an event loop of its own; you have to integrate with CFRunLoop via CFRunLoopRun (or something that eventually calls it, like NSApplicationMain), not write your own loop that calls a blocking function. You don't get to invent your own thing with kqueue or select() and then explicitly observe "the GUI" as some individual discrete event; there's nothing to read, the GUI just calls directly into your application. Underneath there's some mach messages and stuff, but I honestly couldn't tell you how that all works; it's not necessary to understand. (And in fact "the GUI" is not actually just the GUI, but a whole host of notifications from other processes, the display, the sound device, and so on, that you can register for. The documentation for NSNotificationCenter is illuminating.) (I don't know anything about Android. Can anyone comment authoritatively about that?) This really doesn't have anything to do with the readiness-based-ness of the API, but rather that there is more on heaven and earth (and kernel interrupt handlers) than is dreamt of in your philosophy (and file descriptor dispatch functions). Once again: the important thing is to separate out these fiddly low layers for each platform and get something that exposes the high layer that most python programmers care about - "incoming connection", "here are some bytes", "your connection was dropped" - in such a way that you can plug in an implementation that uses it to any one of these low-level things. > But you can't do that with a completion-based main loop, because the actual reading of the input needs to be done in a different way, and that's usually buried somewhere deep in the GUI library where you can't easily change it. Indeed not, but this phrasing makes it sound like "completion-based" main loops are some weird obscure thing. This is not an edge-case problem you can sweep under the rug with the assumption that somebody will be able to wrestle a file descriptor out of the GUI somehow or emulate it eventually. The GUI platforms that basically everyone in the world uses don't observe file descriptors for their input events. >> It seems this is a solved problem (as well solved as you can hope for) to Twisted, so we should just adopt their >> approach. > > Do they actually do it for an IOCP-based main loop on Windows? No, but it's hypothetically possible. For GUIs, we have win32eventreactor, which can't support as many sockets, but runs the message pump, which causes the GUI to run (for most GUI toolkits). Several low-level Windows applications have used this to good effect. (Although I don't know of any that are open source, unfortunately.) There's also the fact that most people writing Python GUIs want to use a cross-platform library, so most of the demand for GUI sockets on Windows have been for integrating with Wx, Qt, or GTK, and we have support for all of those separately from the IOCP stuff. It's usually possible to call the wrapped socket functions in those libraries, but more difficult to reach below the GUI library and dispatch to it one windows message pump message at a time. > If so, I'd be interested to know how. It's definitely possible to get a GUI to cooperate nicely with IOCP, but it's a bit challenging to figure out how. I had a very long, unpleasant conversation with the IOCP reactor's maintainer while we refreshed our memories about the frankly sadistic IOCP API, and put together all of our various experiences working with it, trying to refresh our collective memory to the point where we remembered enough about the way IOCP actually works to be able to explain it, so I hope you enjoy this :-). Right now Twisted's IOCP reactor uses the mode of IOCP where it passes NULL to both the lpCompletionRoutine and lpOverlapped->hEvent member of everything (i.e. WSARecv, WSASend, WSAAccept, etc). Later, the reactor thread blocks on GetQueuedCompletionStatus, which only blocks on the associated completion port's scheduled I/O, which precludes noticing when messages arrive from the message pump. As I mentioned above, the message pump is a discrete source of events and can't be waited upon as a C runtime "file descriptor", WSA socket, IOCP completion or thread event. Also, you can't translate it into one of those sources, because the message pump is associated with a particular thread; you can't call a function in a different thread to call PostQueuedCompletionStatus. There are two ways to fix this; there already is a lengthy and confusing digression in comments in the implementation explaining parts of this. The first, and probably easiest option, is simply to create an event with CreateEvent(bManualReset=False) and fill out the hEvent structure of all queued Event objects with that same event, pass that event handle to MsgWaitForMultipleObjectsEx. Then, if the message queue wakes up the thread, you dispatch messages the standard way (doing what win32eventreactor already does: see win32gui.PumpWaitingMessages). If instead, the event signals, you call GetQueuedCompletionStatus as IOCP already does, and it will always return immediately. The second (and probably higher performance) option is to fill out the lpCompletionRoutine parameter to all I/O functions, and effectively have the reactor's "loop" integrated into the implicit asynchronous procedure dispatch of any alertable function. This would have to be MsgWaitForMultipleObjectsEx in order to wait on events added with addEvent(), in the reactor's core. The reactor's core itself could actually just call WaitForSingleObjectEx() and it would be roughly the same except for those external events, as long as the thread is put into an alertable state. This option is likely higher performance because it removes all the function call and iteration overhead because you effectively go straight from the kernel to the I/O handling function. In addition to being slightly trickier though, there's also the fact that someone else might put the thread into an alertable state and the I/O completion might be done with a surprising stack frame. If you want to integrate this with a modern .NET application (i.e. windows platform-specific stuff), I think this is the relevant document: ; I am not sure how you'd integrate it with Wx/Tk/Qt/GTK+. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Sat Oct 20 12:56:41 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 20 Oct 2012 11:56:41 +0100 Subject: [Python-ideas] The async API of the future In-Reply-To: <5081F15B.1040403@canterbury.ac.nz> References: <5081F15B.1040403@canterbury.ac.nz> Message-ID: On 20/10/2012 1:33am, Greg Ewing wrote: > That's been bothering me, too. It seems like an interface > accommodating the completion-based style will have to be > *extremely* fat. > > That's not just a burden for anyone implementing the > interface, it's a problem for any library wanting to *wrap* > it as well. > > For example, to maintain separation between the async > layer and the generator layer, we will probably want to > have an AsyncSocket object in the async layer, and a > separate GeneratorSocket in the generator layer that wraps > an AsyncSocket. > > If the AsyncSocket needs to provide methods for all the > possible I/O operations that one might want to perform on > a socket, then GeneratorSocket needs to provide its own > versions of all those methods as well. > > Multiply that by the number of different kinds of I/O > objects (files, sockets, message queues, etc. -- there > seem to be quite a lot of them on Windows) and that's > a *lot* of stuff to be wrapped. I don't see why a completion api needs to create wrappers for sockets. See http://pastebin.com/7tDmeYXz for an implementation of a completion api implemented for Unix (plus a stupid reactor class and some example server/client code). The AsyncIO class is independent of reactors, futures etc. The methods for starting an operation are recv(key, sock, nbytes, flags=0) send(key, sock, buf, flags=0) accept(key, sock) connect(key, sock, address) The "key" argument is used as an identifier for the operation. You wait for something to complete using wait(timeout=None) which returns a list of tuples of the form "(key, success, value)" representing completed operations. "key" is the identifier used when starting the operation, "success" is a boolean indicating whether an error occurred, and "value" is the return/exception value. To check whether there are any outstanding operations, use empty() (To make the AsyncIO class usable without a reactor one should probably implement a "filtered" wait so that you can restrict the keys you want to wait for.) -- Richard From tismer at stackless.com Sat Oct 20 13:52:19 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 13:52:19 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5081F554.5090404@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: <50829073.90909@stackless.com> Hi Greg, On 20.10.12 02:50, Greg Ewing wrote: > Christian Tismer wrote: >> Actually I would like to have a python context where it gets into >> "async mode" and interprets all functions defined in that mode as >> generators. > > That sounds somewhat similar to another idea I proposed a while > ago: > > There would be a special kind of function called a "cofunction", > that you define using "codef" instead of "def". A cofunction > is essentially a generator, but with a special property: when > one cofunction calls another, the call is implicitly made as > a "yield from" call. > Whow, I had a look at the patch. Without talking about the syntax, this is very close to what I'm trying without a patch. No, it is almost identical. > This scheme wouldn't be completely transparent, since the > cofunctions have to be defined in a special way. But the calls > would look like ordinary calls. > > There's a PEP describing a variation on the idea here: > > http://www.python.org/dev/peps/pep-3152/ > > In that version, calls to cofunctions are specially marked > using a "cocall" keyword. But since writing that, I've come to > believe that my original idea (where the cocalls are implicit) > was better. Yes, without the keyword it looks better. Would you raise an exception if something is called that is not a cofunction? Or would that be an ordinary call? The only difference is that I'm not aiming at coroutines in the first place, but just having the concept of a *suspendable* function. What has happened to the PEP, was it rejected? ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From ncoghlan at gmail.com Sat Oct 20 14:16:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 20 Oct 2012 22:16:46 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50829073.90909@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer wrote: > What has happened to the PEP, was it rejected? No, it's still open. We just wanted to give the yield from PEP a chance to see some use on its own before we started trying to take it any further, and Greg was amenable to that approach. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tismer at stackless.com Sat Oct 20 14:54:00 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 14:54:00 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50829073.90909@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> Message-ID: <50829EE8.7000209@stackless.com> On 20.10.12 13:52, Christian Tismer wrote: > Hi Greg, > > On 20.10.12 02:50, Greg Ewing wrote: >> Christian Tismer wrote: >>> Actually I would like to have a python context where it gets into >>> "async mode" and interprets all functions defined in that mode as >>> generators. >> >> That sounds somewhat similar to another idea I proposed a while >> ago: >> >> There would be a special kind of function called a "cofunction", >> that you define using "codef" instead of "def". A cofunction >> is essentially a generator, but with a special property: when >> one cofunction calls another, the call is implicitly made as >> a "yield from" call. >> > > Whow, I had a look at the patch. Without talking about the syntax, > this is very close to what I'm trying without a patch. > No, it is almost identical. >> This scheme wouldn't be completely transparent, since the >> cofunctions have to be defined in a special way. But the calls >> would look like ordinary calls. >> >> There's a PEP describing a variation on the idea here: >> >> http://www.python.org/dev/peps/pep-3152/ >> >> In that version, calls to cofunctions are specially marked >> using a "cocall" keyword. But since writing that, I've come to >> believe that my original idea (where the cocalls are implicit) >> was better. > > Yes, without the keyword it looks better. Would you raise an > exception if something is called that is not a cofunction? Or > would that be an ordinary call? > > The only difference is that I'm not aiming at coroutines in > the first place, but just having the concept of a *suspendable* > function. > > What has happened to the PEP, was it rejected? > I just saw that it is in flux and did not please you as well. A rough idea would be to start the whole interpreter in suspendable mode. Maybe that's too much. I'm seeking a way to tell a whole bunch of functions that they should be suspendable. What if we had a flag (unclear how) that function calls should behave like cofunctions now. That flag would be initiated by a root call and then propagated to the callees, without any syntax change? In any case it should be possible to inquire/assert that a function is running as cofunction. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Sat Oct 20 17:39:45 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 17:39:45 +0200 Subject: [Python-ideas] Cofunctions - Back to Basics In-Reply-To: References: <4EA8BD66.6010807@canterbury.ac.nz> <4EA93D02.2030201@hotpy.org> <4EA94055.3080207@canterbury.ac.nz> <4EA94304.1010909@hotpy.org> <4EAB2994.6010103@canterbury.ac.nz> <4EAB2F42.2020704@stoneleaf.us> Message-ID: <5082C5C1.9090308@stackless.com> Hi, I found again a misunderstanding. On 29.10.11 04:10, Nick Coghlan wrote: > On Sat, Oct 29, 2011 at 8:40 AM, Ethan Furman wrote: >> Greg Ewing wrote: >>> Mark Shannon wrote: >>> >>>> Stackless provides coroutines. Greenlets are also coroutines (I think). >>>> >>>> Lua has them, and is implemented in ANSI C, so it can be done portably. >>> These all have drawbacks. Greenlets are based on non-portable >>> (and, I believe, slightly dangerous) C hackery, and I'm given >>> to understand that Lua coroutines can't be suspended from >>> within a C function. >>> >>> My proposal has limitations, but it has the advantage of >>> being based on fully portable and well-understood techniques. >> If Stackless has them, could we use that code? > That's what the greenlets module *is* - the coroutine code from > Stackless, lifted out and provided as an extension module instead of a > forked version of the runtime. > No, the greenlet code is a subset of stackless. Stackless could remove its greenlet part and become an assembler-free implementation. It would just not get over the C extension problem. But that could then be handled by using the greenlet as an optional external module. (I think I said that before. Just wanted it to appear here) -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Sat Oct 20 17:59:41 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 17:59:41 +0200 Subject: [Python-ideas] Cofunctions - Back to Basics In-Reply-To: <4EABAD40.1010002@canterbury.ac.nz> References: <4EA8BD66.6010807@canterbury.ac.nz> <4EA93D02.2030201@hotpy.org> <4EA94055.3080207@canterbury.ac.nz> <4EA94304.1010909@hotpy.org> <4EAB2994.6010103@canterbury.ac.nz> <4EABAD40.1010002@canterbury.ac.nz> Message-ID: <5082CA6D.3080800@stackless.com> Picking that up, too... On 29.10.11 09:37, Greg Ewing wrote: > Nick Coghlan wrote: > >> The limitation of Lua style coroutines is that they can't be suspended >> from inside a function implemented in C. Without greenlets/Stackless >> style assembly code, coroutines in Python would likely have the same >> limitation. >> >> PEP 3152 (and all generator based coroutines) have the limitation that >> they can't suspend if there's a *Python* function on the stack. Can >> you see why I know consider this approach categorically worse than one >> that pursued the Lua approach? > > Ouch, yes, point taken. Fortunately, I think I may have an > answer to this... > > Now that the cocall syntax is gone, the bytecode generated for > a cofunction is actually identical to that of an ordinary > function. The only difference is a flag in the code object. > > If the flag were moved into the stack frame instead, it would > be possible to run any function in either "normal" or "coroutine" > mode, depending on whether it was invoked via __call__ or > __cocall__. > > So there would no longer be two kinds of function, no need for > 'codef', and any pure-Python code could be used either way. > > This wouldn't automatically handle the problem of C code -- > existing C functions would run in "normal" mode and therefore > wouldn't be able to yield. However, there is at least a clear > way for C-implemented objects to participate, by providing > a __cocall__ method that returns an iterator. > What about this idea? I think I just wrote exactly the same thing in another thread ;-) Is it still under consideration? (I missed quite a lot when recovering from my strokes ...) -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido at python.org Sat Oct 20 18:10:40 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 09:10:40 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: On Fri, Oct 19, 2012 at 10:27 PM, Devin Jeanpierre wrote: > On Fri, Oct 19, 2012 at 10:44 PM, Greg Ewing > wrote: >> If I wrote a library intended for serious use, the end user >> probably wouldn't write either of those. Instead he would >> write something like >> >> yield from block(self.queue) >> >> and it would be an implementation detail of the library >> where abouts the 'yield' happened and whether it needed >> to send a value or not. > > What's the benefit of having both "yield" and "yield from" as opposed > to just "yield"? It seems like an attractive nuisance if "yield" works > but doesn't let the function have implementation details and wait for > more than one thing or somesuch. > > With the existing generator-coroutine decorators (monocle, > inlineCallbacks), there is no such trap. "yield foo()" will work no > matter how many things foo() will wait for. > > My understanding is that the only benefit we get here is nicer > tracebacks. I hope there's more. It is also *much* faster. In the "yield " style (what I use in NDB) every level that blocks involves the creation of a Future and a bunch of code that sets its result. The scheduler has to do a lot of work to make it work. In Greg's "yield from " style most of those futures disappear, so adding extra layers of logic is much cheaper. (And believe me, in a real system, like NDB is, you have to add a lot of extra logic layers to make your API easy to use.) As a result Greg's scheduler is much simpler. (In the last week I wrote one to test this hypothesis, so I know.) I do have one concern, but it can easily be addressed. Users have the tendency to make mistakes. In NDB, a common mistake is leaving out the yield keyword. Fortunately when you do that, nothing works, so you typically find out quickly. The other mistake is found even easier: writing yield where you shouldn't. The NDB scheduler rejects values that aren't Futures so this is diagnosed precisely and with a decent stack trace. In the PEP 380 world, there will be a new type of mistake possible: writing yield instead of yield from. Fortunately the scheduler can easily test for this -- if the result of its calling next() is not None, the user yielded something. In Greg's strict design, you should never yield a value from a coroutine, so that's always an error. Even in a design where values yielded are used as scheduler instructions (albeit only by the lowest levels of the I/O wrappers), we can assume that a value yielded should never be a generator -- so the scheduler can throw back an exception if it receives a generator, and it can even hint to the user "did you mean yield from instead of yield?". The exception thrown in will show exactly the point where the from keyword is missing. (Making diagnosing cases like this more robust actually pleads for adopting Greg's strict stance.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Oct 20 18:30:16 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 09:30:16 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <508238B2.4040808@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> Message-ID: On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing wrote: > Nick Coghlan wrote: > >> Please don't lose sight of the fact that yield-based suspension points >> looking like something other than an ordinary function call is a >> *feature*, not a bug. (Ironically, Christian just revived an old thread where Nick was of a different opinion.) > People keep asserting that, but I don't think we have enough > experience with the explicit-switching-point-markers-all-the- > way-up style of coding to tell whether it's a good idea or not. Hm. I would say I have a lot of real world experience with this style: App Engine's NDB. It's in use by 1000s of people. I've written a lot of complicated database code in it (integrating several layers of caching). This style really does hold up well. Now, I think that if it could yield from, NDB would be even better, but for most code it would be a very minimal change, and the issue here (the requirement to mark suspension points) is the same. In C# they also have a lot of experience with this style -- functions declared as async must be waited for using await, and the type checker enforces that it's await all the way up (I think a function using await must be declared as async, too). > My gut feeling is that the explicit markers will help at the > lowest levels, where you're trying to protect a critical section, > but at the upper levels they will just be noise that causes > unnecessary worry. Actually, an earlier experience (like 25 years earier :-) suggests that it's at the higher levels where you get in trouble without the markers -- because you still have critical sections in end user code, but it's impossible to remember which functions you call may cause a task switch. > In one of Guido's earlier posts (which I can't find now, > unfortunately), he said something that made it sound like > he was coming around to that point of view too, but he > seems to have gone back on that recently. I was probably more waxing philosophically on the reasons why people like greenlets/gevent (if they like it). I feel I am pretty consistently in favor of marking switch points, at least in the context we are currently discussing (where high-speed async event handling is the thing to do). For less-performant situations I'm fine with writing classic synchronous-looking code, and running it in multiple OS threads for concurrency reasons. But the purpose of designing a new async API is to break the barrier of one thread per connection. -- --Guido van Rossum (python.org/~guido) From tismer at stackless.com Sat Oct 20 19:48:33 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 19:48:33 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> Message-ID: <5082E3F1.6080004@stackless.com> On 20.10.12 18:30, Guido van Rossum wrote: > On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing > wrote: >> Nick Coghlan wrote: >> >>> Please don't lose sight of the fact that yield-based suspension points >>> looking like something other than an ordinary function call is a >>> *feature*, not a bug. > (Ironically, Christian just revived an old thread where Nick was of a > different opinion.) > >> People keep asserting that, but I don't think we have enough >> experience with the explicit-switching-point-markers-all-the- >> way-up style of coding to tell whether it's a good idea or not. > Hm. I would say I have a lot of real world experience with this style: > App Engine's NDB. It's in use by 1000s of people. I've written a lot > of complicated database code in it (integrating several layers of > caching). This style really does hold up well. > > Now, I think that if it could yield from, NDB would be even better, > but for most code it would be a very minimal change, and the issue > here (the requirement to mark suspension points) is the same. > > In C# they also have a lot of experience with this style -- functions > declared as async must be waited for using await, and the type checker > enforces that it's await all the way up (I think a function using > await must be declared as async, too). > >> My gut feeling is that the explicit markers will help at the >> lowest levels, where you're trying to protect a critical section, >> but at the upper levels they will just be noise that causes >> unnecessary worry. > Actually, an earlier experience (like 25 years earier :-) suggests > that it's at the higher levels where you get in trouble without the > markers -- because you still have critical sections in end user code, > but it's impossible to remember which functions you call may cause a > task switch. > >> In one of Guido's earlier posts (which I can't find now, >> unfortunately), he said something that made it sound like >> he was coming around to that point of view too, but he >> seems to have gone back on that recently. > I was probably more waxing philosophically on the reasons why people > like greenlets/gevent (if they like it). I feel I am pretty > consistently in favor of marking switch points, at least in the > context we are currently discussing (where high-speed async event > handling is the thing to do). > > For less-performant situations I'm fine with writing classic > synchronous-looking code, and running it in multiple OS threads for > concurrency reasons. But the purpose of designing a new async API is > to break the barrier of one thread per connection. It is of course a bit confusing to find out who thought what and when ;-) And yes, I see your point, but I have difficulties to see how it is done best. If I take Stackless as an example, well, there would everything potentially be marked as some codef, because it is simply everywhere enabled. But just for the fact that something _supports_ suspension or switching is IMHO not enough reason to clutter the code everywhere. What I think is need is a way to distinguish critical code paths. Not sure how this should be. Maybe it's my limited understanding. The generator-based functions do not get switched from alone. If they want to do that, they call some special function, and I would mark them for doing that. But all the tree up? I can't see the benefit so much. Maybe it would be less verbose to have decorators that assert something does _not_ switch, like guards? Or maybe add properties? I agree that one needs certain information about the program that can easily be extracted. Perhaps this could be done with an analysing tool. This tool would only need additional hints if things are very dynamic, like variables holding certain constructs which are known at runtime only. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From guido at python.org Sat Oct 20 20:12:37 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 11:12:37 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: On Fri, Oct 19, 2012 at 7:41 PM, Steve Dower wrote: > I'm not entirely sure whether I'm hijacking the thread here... I have to admit I've somewhat lost track with all the renames. The discussion has been very interesting (I really like the 'codef' idea, and decorators can provide this without requiring syntax changes) regardless of which thread is active. > > I have spent a bit of time writing up the approach that we (Dino, who posted it here originally, myself and with some advice from colleagues who are working on a similar API for C++) came up with and implemented. > > I must apologise for the length - I got a little carried away with background information, but I do believe that it is important for us to understand exactly what problem we're trying to solve so that we aren't distracted by "new toys". > > The write-up is here: http://stevedower.id.au/blog/async-api-for-python/ > > I included code, since there have been a few people asking for prototype implementations, so if you want to skip ahead to the code (which is quite heavily annotated) it is at http://stevedower.id.au/blog/async-api-for-python/#thecode or http://stevedower.id.au/downloads/PythonAsync.zip (I based my example on Greg's socket spam, so thanks for that!) > > And no, I'm not collecting any ad revenue from the page, so feel free to visit as often as you like and use up my bandwidth. > > Let the tearing apart of my approach commence! :) Couple of questions and comments. - You mention a query interface a few times but there are no details in your example code; can you elaborate? (Or was that a typo for queue?) - This is almost completely isomorphic with NDB's tasklets, except that you borrow the Future class implementation from concurrent.futures -- I think that's the wrong building block to start with, because it is linked too closely to threads. - There is a big speed difference between yield from and yield . With yield , the scheduler has to do significant work for each yield at an intermediate level, whereas with yield from, the schedule is only involved when actual blocking needs to be performed. In my experience, real code has lots of intermediate levels. Therefore I would like to use yield from. You can already do most things with yield from that you can do with Futures; there are a few operations that need a helper (in particular spawning truly concurrent tasks), but the helper code can be much simpler than the Future object, and isn't needed as often, so it's still a bare win. - Nit: I don't like calling the event loop context; there are too many things called context (e.g. context managers in Python), so I prefer to call it what it is -- event loop or I/O loop. - Nittier nit: you have a few stray colons, e.g. "it = iter(fn(*args, **kwargs)):" . -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Oct 20 20:29:09 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 11:29:09 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5082E3F1.6080004@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> <5082E3F1.6080004@stackless.com> Message-ID: Maybe it would help if I was more straightforward. I do not want to solve this problem by introducing yet another language mechanism. This rules out codef, greenlets, and Stackless. I want to solve it using what we have in Python 3.3. And I want to solve it for all platforms where Python 3.3 runs and for all Python 3.3 implementations (assuming Jython, IronPython and PyPy will eventually get there). Basically this leaves as options OS threads, callbacks (+ Deferred), or yield [from]. Using OS threads the problem is solved without writing any code, but does not scale, so it does not really solve the problem. To scale we need either?callbacks or yield [from], or both. I accept that some people prefer to use callbacks and Deferred. I want to respect this choice and I want to see integration with this style at the event loop level. But for myself, I know that I want to write *most* code without callbacks (or Deferreds), and I am quite comfortable to use yield or yield from instead. (I have written a lot of code using yield , and I am now practicing yield from -- the transition is quite easy and I like what I see.) If you are not happy with what we can do in (portable) Python 3.3, we are not talking about solving the same problem. If you are happy using OS threads, we are also not talking about solving the same problem. (To be sure, there is a place for them in my solution -- but it is only needed for things we cannot run asynchronously, such as socket.getaddrinfo().) If you are not happy using callbacks/Deferred nor using yield[from], you're welcome to use greenlets or Stackless. But they will not be in the standard library. -- --Guido van Rossum (python.org/~guido) From jstpierre at mecheye.net Sat Oct 20 21:25:44 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Sat, 20 Oct 2012 15:25:44 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> <5082E3F1.6080004@stackless.com> Message-ID: I'm curious now... you keep mentioning Futures and Deferreds like they're two separate entities. What distinction between the two do you see? On Sat, Oct 20, 2012 at 2:29 PM, Guido van Rossum wrote: > Maybe it would help if I was more straightforward. > > I do not want to solve this problem by introducing yet another > language mechanism. This rules out codef, greenlets, and Stackless. > > I want to solve it using what we have in Python 3.3. And I want to > solve it for all platforms where Python 3.3 runs and for all Python > 3.3 implementations (assuming Jython, IronPython and PyPy will > eventually get there). > > Basically this leaves as options OS threads, callbacks (+ Deferred), > or yield [from]. > > Using OS threads the problem is solved without writing any code, but > does not scale, so it does not really solve the problem. > > To scale we need either callbacks or yield [from], or both. > > I accept that some people prefer to use callbacks and Deferred. I want > to respect this choice and I want to see integration with this style > at the event loop level. > > But for myself, I know that I want to write *most* code without > callbacks (or Deferreds), and I am quite comfortable to use yield or > yield from instead. (I have written a lot of code using yield > , and I am now practicing yield from -- the > transition is quite easy and I like what I see.) > > If you are not happy with what we can do in (portable) Python 3.3, we > are not talking about solving the same problem. > > If you are happy using OS threads, we are also not talking about > solving the same problem. (To be sure, there is a place for them in my > solution -- but it is only needed for things we cannot run > asynchronously, such as socket.getaddrinfo().) > > If you are not happy using callbacks/Deferred nor using yield[from], > you're welcome to use greenlets or Stackless. But they will not be in > the standard library. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From Steve.Dower at microsoft.com Sat Oct 20 21:31:12 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 20 Oct 2012 19:31:12 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> , Message-ID: > - Nit: I don't like calling the event loop context; there are too many > things called context (e.g. context managers in Python), so I prefer > to call it what it is -- event loop or I/O loop. The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate. > - You mention a query interface a few times but there are no details > in your example code; can you elaborate? (Or was that a typo for > queue?) I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.) > - This is almost completely isomorphic with NDB's tasklets, except > that you borrow the Future class implementation from > concurrent.futures -- I think that's the wrong building block to start > with, because it is linked too closely to threads. As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can. > - There is a big speed difference between yield from and > yield . With yield , the scheduler has to do > significant work for each yield at an intermediate level, whereas with > yield from, the schedule is only involved when actual blocking needs > to be performed. In my experience, real code has lots of intermediate > levels. Therefore I would like to use yield from. You can already do > most things with yield from that you can do with Futures; there are a > few operations that need a helper (in particular spawning truly > concurrent tasks), but the helper code can be much simpler than the > Future object, and isn't needed as often, so it's still a bare win. I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task. yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?) From tismer at stackless.com Sat Oct 20 22:06:26 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 22:06:26 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> Message-ID: <5381FF96-0537-4AA4-B3D2-73D9AA0C0487@stackless.com> Thanks! Then I have another short one... Sent from my Ei4Steve On Oct 20, 2012, at 14:16, Nick Coghlan wrote: > On Sat, Oct 20, 2012 at 9:52 PM, Christian Tismer wrote: >> What has happened to the PEP, was it rejected? > > No, it's still open. We just wanted to give the yield from PEP a > chance to see some use on its own before we started trying to take it > any further, and Greg was amenable to that approach. I often see the phrase "coroutine" but without explicit mention if symmetric (greenlet) or asymmetric (Lua). Then when I hear myself quibbling about "full generators" then I mean generators that are made up of a few functions that all can yield to the caller of this generator. Question: is "full generators" equivalent to "asymmetric coroutine"? Question 2: when people talk about coroutines, is "asymmetric" the default here? The reason that I ask is my fear to create problems when answering to messages with different default meanings in mind. Thanks - chris From tismer at stackless.com Sat Oct 20 22:55:15 2012 From: tismer at stackless.com (Christian Tismer) Date: Sat, 20 Oct 2012 22:55:15 +0200 Subject: [Python-ideas] Language "macros" in discussions Message-ID: <41E28745-3928-4323-9334-B9E9AE4BB444@stackless.com> Clarification: I have a tendency to mention constructs from other threads in a discussion. This might suggest that I'm propose using this not-yet-included or even accepted feature as a solution. For instance Guido's reaction to my last message might be an indicator of misinterpreting this, although I'm not sure if I was Prinarily addressed at all (despite that the to/cc suggested it). Anyway, I just want to make sure: If I'm mentioning stackless or codef or greenlet, this does not imply that I propose to code the solution to async by implementing such a thing, first. The opposite is true. I mean such meantioning more like a macro-like feature: I'm implementing structures using the existing things, but adhere to a coding style that stays compatible to one of the mentioned principles. This is like a macro feature of my brain - I talk about codef, but code it using yield-from. So please don't take me wrong that I want to push for features to be included. This is only virtual. I use yield constructs, but obey the codef protocol, for instance. And as an addition: when I'm talking of generators implemented by yield from, then this is just a generator that can yield from any of its sub-functions. I am not talking about tasks or schedulars. These constructs do not belong there. I'm strongly against using "yield from" for this. It is a building block for generatos resp. coroutines, and there it stops ! Higher level stuff should by no means use those primitives at all. Sent from my Ei4Steve From guido at python.org Sun Oct 21 00:37:17 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 15:37:17 -0700 Subject: [Python-ideas] Language "macros" in discussions In-Reply-To: <41E28745-3928-4323-9334-B9E9AE4BB444@stackless.com> References: <41E28745-3928-4323-9334-B9E9AE4BB444@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 1:55 PM, Christian Tismer wrote: > Clarification: > > I have a tendency to mention constructs > from other threads in a discussion. > This might suggest that I'm propose > using this not-yet-included or even > accepted feature as a solution. For instance > Guido's reaction to my last message > might be an indicator of misinterpreting > this, although I'm not sure if I was > Prinarily addressed at all (despite that the to/cc > suggested it). > > Anyway, I just want to make sure: > > If I'm mentioning stackless or codef > or greenlet, this does not imply that I > propose to code the solution to async > by implementing such a thing, first. > The opposite is true. > > I mean such meantioning more like > a macro-like feature: > I'm implementing structures using the existing > things, but adhere to a coding style > that stays compatible to one of the mentioned > principles. > > This is like a macro feature of my brain > - I talk about codef, but code it using > yield-from. > > So please don't take me wrong that I > want to push for features to be > included. This is only virtual. I use yield > constructs, but obey the codef protocol, > for instance. > > And as an addition: when I'm talking > of generators implemented by yield from, > then this is just a generator that can > yield from any of its sub-functions. > > I am not talking about tasks or schedulars. > These constructs do not belong there. > I'm strongly against using "yield from" > for this. > It is a building block for generatos > resp. coroutines, and there it stops ! > > Higher level stuff should by no means > use those primitives at all. Ok, understood, and sorry if I mistook your intention before. Here's how I tend to use some terminology: - generator function: any function containing yield - generator object: the iterator returned by calling a generator function - generator: either of the above, when the context makes it clear which one I mean, or when it doesn't matter - iterator generator: a generator used to produce values that one would consume with an implicit or explicit for-loop - coroutine: a generator function used to implement an async computation instead of an iterator - Future: something with roughly the interface but not necessarily the implementation of PEP 3148 - Deferred: the Twisted Deferred class or something with similar functionality (there are some in the JavaScript world) Note that I use coroutine for both PEP-342-style and PEP-380-style generators (i.e. "yield " vs. "yield from "). The big difference between Futures and Deferreds is that Deferreds can easily be chains together to create multiple stages, and each callback is called with the value returned from the previous stage; also, Deferreds have separate callback chains for regular values and errors. > Sent from my Ei4Steve Does it happen to have a 40-char wide screen? :-) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 21 00:38:52 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 15:38:52 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> <5082E3F1.6080004@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre wrote: > I'm curious now... you keep mentioning Futures and Deferreds like > they're two separate entities. What distinction between the two do you > see? They have different interfaces and you end up using them differently. In particular, quoting myself from another thread, here is how I use the terms: - Future: something with roughly the interface but not necessarily the implementation of PEP 3148. - Deferred: the Twisted Deferred class or something with very similar functionality (there are some in the JavaScript world). The big difference between Futures and Deferreds is that Deferreds can easily be chains together to create multiple stages, and each callback is called with the value returned from the previous stage; also, Deferreds have separate callback chains for regular values and errors. -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 21 00:39:59 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 15:39:59 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5381FF96-0537-4AA4-B3D2-73D9AA0C0487@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> <5381FF96-0537-4AA4-B3D2-73D9AA0C0487@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 1:06 PM, Christian Tismer wrote: > I often see the phrase "coroutine" but > without explicit mention if symmetric > (greenlet) or asymmetric (Lua). > > Then when I hear myself quibbling > about "full generators" then I mean > generators that are made up of a few > functions that all can yield to the > caller of this generator. > > Question: is "full generators" equivalent > to "asymmetric coroutine"? > > Question 2: when people talk about > coroutines, is "asymmetric" the default > here? > > The reason that I ask is my fear to > create problems when answering to > messages with different default meanings > in mind. I believe I just answered this is another thread (that you also started). -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 21 01:11:15 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 16:11:15 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: On Sat, Oct 20, 2012 at 12:31 PM, Steve Dower wrote: >> - Nit: I don't like calling the event loop context; there are too many >> things called context (e.g. context managers in Python), so I prefer >> to call it what it is -- event loop or I/O loop. > > The naming collision with context managers has been brought up before, so I'm okay with changing that. We used context mainly because it's close to the terminology used in .NET, where you schedule tasks/continuations in a particular SynchronizationContext. I believe "I/O loop" would be inaccurate, but "event loop" is probably appropriate. I'm happy to settle on event loop. (Terminology in this area seems fraught with conflicting conventions; Twisted calls it a reactor, after the reactor pattern, but I've been chided by others for using this term without explanation; Tornado calls it I/O loop.) >> - You mention a query interface a few times but there are no details >> in your example code; can you elaborate? (Or was that a typo for >> queue?) > > I think I just changed terminology while writing - this is the 'get_future_for' call, which is not guaranteed to provide a waitable/pollable object for any type. Then what is the use? What *is* its contract? > The intent is to allow an event loop to optionally provide support for (say) select(), but not to force that upon all implementations. If (when) someone implements a Windows GetMessage() based loop then requiring 'native' select() support is unfair. (Also, an implementation for Windows 8 would not directly involve an event loop, but would pass everything through to the underlying OS.) I'm all for declaring select() an implementation detail. It doesn't scale on any platform; on Windows it only works for sockets; the properly scaling alternative varies per platform. (It is IOCP on Windows, right?) This *probably* also means that the concept of file descriptor is out the window (even though Tornado apparently cannot do anything without it -- it's probably not used on Windows at all). And I suspect that it means that the implementation of the socket abstraction will vary per platform. The collection of other implementations of the same abstraction available, and even available other abstractions, will also vary per platform -- on Unix, there are pseudo ttys, pipes, named pipes, and unix domain sockets; I don't recall the set available on Windows, but I'm sure it is different. Then there is SSL/TLS, which feels like it requires special handling but in the end implements an abstraction similar to sockets. I assume that in many cases it is easy to bridge from the various platform-specific abstractions and implementation to more cross-platform abstractions; this is where the notions of transports and protocols seem most important. I haven't explored those enough, sadly. One note inspired by my mention of SSL, but also by discussions about GUI event loops in other threads: it is easy to think that everything is reducible to a file descriptor, but often it is not that easy. E.g. with something like SSL, you can't just select on the underlying socket, and then when it's ready call the read() method of the SSL layer -- it's possible that the read() will still block because the socket didn't have enough bytes to be able to decrypt the next block of data. Similar for sockets associated with e.g. GUI event management (e.g. X). >> - This is almost completely isomorphic with NDB's tasklets, except >> that you borrow the Future class implementation from >> concurrent.futures -- I think that's the wrong building block to start >> with, because it is linked too closely to threads. > > As far as I can see, the only link that futures have with threads is that the ThreadPoolExecutor class is in the same module. `Future` itself is merely an object that can be polled, waited on, or assigned a callback, which means it represents all asynchronous operations. Some uses are direct (e.g., polling a future that represents pollable I/O) while others require emulation (adding a callback for pollable I/O), which is partly why the 'get_future_for' function exists - to allow the event loop to use the object directly if it can. I wish it was true. But the Future class contains a condition variable, and the Waiter class used by the implementation uses an event. Both are directly imported from the threading module, and if you block on either of these, it is a hard block (not even interruptable by a signal). Don't worry too much about this -- it's just the particular implementation (concurrent.futures.Future). We can define a better Future class for our purposes elsewhere, with the same interface (or a subset -- I don't care much for the whole cancellation feature) but without references to threading. For those Futures, we'll have to decide what should happen if you call result() when the Future isn't done yet -- raise an error (similar to EWOULDBLOCK), or somehow block, possibly running a recursive event loop? (That's what NDB does, but not everybody likes it.) I think the normal approach would be to ask the scheduler to suspend the current task until the Future is ready -- it can easily arrange for that by adding a callback. In NDB this is spelled "yield ". In the yield-from world we could spell it that way too (i.e. yield, not yield from), or we could make it so that we can write yield from , or perhaps we need a helper call: yield from wait() or maybe a method on the Future class (since it is our own), yield from .wait(). These are API design details. (I also have a need to block for the Futures returned by ThreadPoolExecutor and ProcessPoolExecutor -- those are handy when you really can't run something inline in the event loop -- the simplest example being getaddrinfo(), which may block for DNS.) >> - There is a big speed difference between yield from and >> yield . With yield , the scheduler has to do >> significant work for each yield at an intermediate level, whereas with >> yield from, the schedule is only involved when actual blocking needs >> to be performed. In my experience, real code has lots of intermediate >> levels. Therefore I would like to use yield from. You can already do >> most things with yield from that you can do with Futures; there are a >> few operations that need a helper (in particular spawning truly >> concurrent tasks), but the helper code can be much simpler than the >> Future object, and isn't needed as often, so it's still a bare win. > > I don't believe the scheduler is involved that frequently, but it is true that more Futures than are strictly necessary are created. IIUC every yield must pass a Future, and every time that happens the scheduler gets it and must arrange for a callback on that Future which resumes the generator. I have code like that in NDB and you have very similar code like that in your version (wrapper in @async, and later _Awaiter._step()). > The first step (up to a yield) of any @async method is always run immediately - if there is no yield, then the returned future is already completed and has the result. The event loop as implemented could be optimised slightly for this case, but since Future calls new callbacks immediately if it has already completed then we never 'unschedule' the task. Interesting that you always run the first step immediately. I don't do this in NDB. Can you explain why you think you need it? (It may simply be an optimization I've overlooked. :-) > yield from can of course be used for the intermediate levels in exactly the same way as it is used for refactoring generators. The difference is that the top level is an @async decorator, at which point a Future is created. So 'read_async' might have @async applied, but it can 'yield from' any other generators that yield futures. Then the person calling 'read_async' is free to use any Future compatible interface rather than being forced into continuing the 'yield from' chain all the way to the top. (In particular, I think this works much better in the interactive scenario - I can write "x = read_async().result()", but how do you implement a 'yield from' approach in a REPL?) Yeah, this is what I do in NDB, as I mentioned above (the recursive event loop call). But I suspect it would be very easy to write a helper function that you give a generator and which runs it to completion. It would also have to invoke the event loop, but that seems unavoidable, and otherwise the event loop isn't running in interactive mode, right? (Unless it runs in a separate thread, in which case the helper function should just communicate with that thread.) Final remark: I keep wondering if it's better to try and stay "pure" in the public API and use only yield from, plus some helpers like spawn(), join() and par(), or if a decent, pragmatic public API can offer a combination. I worry that most users will have a hard time remembering when to use yield and when yield from. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sun Oct 21 01:21:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 12:21:05 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507F27FC.4040706@canterbury.ac.nz> <507FB480.1090009@canterbury.ac.nz> Message-ID: <508331E1.1010302@canterbury.ac.nz> Yuval Greenfield wrote: > block() removes the current task from the ready_list, but is the current > task guaranteed to be my task? Yes, block() always operates on the currently running task. > If so, then I'd never run again after the > yield in acquire(), that is unless a gracious other player unblocks me. Yes, the unblocking needs to be done by another task, or by something outside the task system such as an I/O callback. > I'm not sure it makes sense for scheduler functions to store waiting > tasks in a queue owned by the app and invisible from the scheduler. This > can cause *invisible deadlocks* such as: > > schedule(philosopher("Socrates", 8, 3, 1, forks[0], forks[2]), "Socrates") > schedule(philosopher("Euclid", 5, 1, 4, forks[2], forks[0]), "Euclid") Deadlocks are a potential problem in any system involving concurrency, and have to be dealt with on a case-by-case basis. Simply having the scheduler know where all the tasks are will not prevent deadlocks. It might make it possible for the scheduler to *detect* deadlocks, but you still have to do something about them. Having said that, I'm thinking about writing a more elaborate version of my scheduler that does keep track of which queue a task is waiting on, mainly so that tasks can be cancelled cleanly. > Is there a coroutine strategy for tackling these challenges? Or will I > just get better at overcoming them with practice? If you've been using threads all your life as you say, then you're probably already pretty good at dealing with them. All of the same techniques apply. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 21 01:26:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 12:26:04 +1300 Subject: [Python-ideas] The async API of the future In-Reply-To: <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> References: <5081F15B.1040403@canterbury.ac.nz> <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> Message-ID: <5083330C.6010609@canterbury.ac.nz> Glyph wrote: > The main interfaces you need are here: > > > > > These don't look anywhere near adequate to me. How do I make a sendmsg() call on a unix-domain socket and pass access rights over it? How do I do a readdir() on a file descriptor representing a directory? Etc. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 21 01:41:41 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 12:41:41 +1300 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <5081F15B.1040403@canterbury.ac.nz> Message-ID: <508336B5.6080806@canterbury.ac.nz> Richard Oudkerk wrote: > I don't see why a completion api needs to create wrappers for sockets. See > > http://pastebin.com/7tDmeYXz > > ... > > The AsyncIO class is independent of reactors, futures etc. The methods > for starting an operation are > > recv(key, sock, nbytes, flags=0) > send(key, sock, buf, flags=0) > accept(key, sock) > connect(key, sock, address) That looks awfully like a wrapper for a socket to me. All of those system calls are peculiar to sockets. There doesn't necessarily have to be a wrapper class for each kind of file descriptor. There could be one I/O class that handles everything, or there could just be a collection of functions. The point is that, with a completion-based model, you need a function or method for every possible system call that you might want to perform asynchronously. -- Greg From greg.ewing at canterbury.ac.nz Sun Oct 21 01:52:40 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 12:52:40 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50829073.90909@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> Message-ID: <50833948.40406@canterbury.ac.nz> Christian Tismer wrote: > Would you raise an > exception if something is called that is not a cofunction? Or > would that be an ordinary call? A cofunction calling a non-cofunction is fine, it just makes an ordinary call. But if a non-cofunction tries to call a cofunction using an ordinary call, an exception raised. Effectively, cofunctions do *not* implement __call__ (instead they implement a new protocol __cocall__). > The only difference is that I'm not aiming at coroutines in > the first place, but just having the concept of a *suspendable* > function. I'm not sure what the distinction is. > What has happened to the PEP, was it rejected? No, its status is still listed as "draft". It's probably too soon to consider whether it should be accepted or rejected; we need more experience with yield-from based task systems first. -- Greg From guido at python.org Sun Oct 21 01:53:06 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 16:53:06 -0700 Subject: [Python-ideas] The async API of the future In-Reply-To: <508336B5.6080806@canterbury.ac.nz> References: <5081F15B.1040403@canterbury.ac.nz> <508336B5.6080806@canterbury.ac.nz> Message-ID: On Sat, Oct 20, 2012 at 4:41 PM, Greg Ewing wrote: > The point is that, with a completion-based model, you need a function > or method for every possible system call that you might want to perform > asynchronously. TBH, I like APIs that wrap all system calls. System calls have too many low-level details that you have to be aware of, and they too often vary per platform. (I just wrote a simple event loop plus scheduler along the lines of your essay, extending it to the point where I could do basic, fully-async, HTTP exchanges. The number of details I had to take care of was excruciating; and then there were the subtle differences between OSX and Ubuntu.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Sun Oct 21 02:02:53 2012 From: guido at python.org (Guido van Rossum) Date: Sat, 20 Oct 2012 17:02:53 -0700 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: <508331E1.1010302@canterbury.ac.nz> References: <5078F6B1.2030309@canterbury.ac.nz> <507BA60B.2030806@canterbury.ac.nz> <507D109F.3020302@canterbury.ac.nz> <507F27FC.4040706@canterbury.ac.nz> <507FB480.1090009@canterbury.ac.nz> <508331E1.1010302@canterbury.ac.nz> Message-ID: On Sat, Oct 20, 2012 at 4:21 PM, Greg Ewing wrote: > Simply having the scheduler know where all the tasks are > will not prevent deadlocks. It might make it possible for the > scheduler to *detect* deadlocks, but you still have to do > something about them. > > Having said that, I'm thinking about writing a more elaborate > version of my scheduler that does keep track of which queue a > task is waiting on, mainly so that tasks can be cancelled > cleanly. In NDB, I have a feature that detects most deadlocks -- the Future class keeps track of all incomplete instances, and it can dump this list at request. Futures also keep some information about where and for what purpose they were created. Finally, to tie it all together, there's code that detects that you're waiting for something to happen but the event loop is out of things to do (i.e. no pending RPCs, no "call later" callbacks left -- hence, no progress can possibly be made). This feature has caught mostly bugs in NDB itself -- because NDB is primarily a database API, regular NDB users don't normally write code that is likely to deadlock. But in the wider Python 3 world, where regular users would be writing (presumably buggy) protocol implementations and their own servers and clients, I suspect debugging features can make and break a system like this. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sun Oct 21 02:09:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 13:09:04 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50829EE8.7000209@stackless.com> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> <50829EE8.7000209@stackless.com> Message-ID: <50833D20.4030607@canterbury.ac.nz> Christian Tismer wrote: > A rough idea would be to start the whole interpreter in suspendable > mode. Maybe that's too much. I'm seeking a way to tell a whole bunch > of functions that they should be suspendable. I'm not sure it's really feasible to do that. It seems easy enough at first sight, but keep in mind that it would only work for pure Python code called directly from other pure Python code. There are many corners where it wouldn't work -- all the operator methods, for example, and anything else called through a type slot -- unless you went to a *lot* of work to provide alternative suspendable versions of all the type slots. -- Greg From yaroslav at fedevych.name Sun Oct 21 02:09:43 2012 From: yaroslav at fedevych.name (Yaroslav Fedevych) Date: Sun, 21 Oct 2012 03:09:43 +0300 Subject: [Python-ideas] asyncore and stuff Message-ID: So Guido told it's better to discuss things here. Mostly reiterating what I said in the G+ thread. I'm by no means a greybeard in library/language design, and have no successful async project behind me, so please take what I'm saying with a grain of salt. I just want to highlight a point I feel is very important. There should be standard library, but no standard framework. Please. You see, there is a number of event-driven frameworks, and they all suck. Sorry for being blunt, but each one of them is more or less voluntarily described as being almost the ultimate silver bullet, or even a silver grenade, the One Framework to rule them all. The truth is that every framework that prospers today was started as a scratch to a specific itch, and it might be a perfect scratch for that class of itches. I know of no application framework designed as being the ultimate scratch for every itch that is not dead and forgotten, or described on a resource other than thedailywtf. There is a reason for this state of things, mainly that the real world is a rather complex pile of crap, and there is no nice uniform model into which you can fit all of that crap and expect the model still to be good for any practical use. Thus in the world of software, which is notoriously complex subset of the crap the real world is, we are going to live with dozens of event models, asynchronous I/O models, gobs of event loops. Every one of them (even WaitForMultipleObjects() kind of loop) is a priceless tool for a specific class of problem it's designed to solve, so it won't go away, ever. The standard library, on the other hand, IS the ultimate tool. It is the way things should work. The people look at it as the reference, the code written the way it should be, encompassing the best of the best practices out there. Everyone keeps telling, just look at how this thing is implemented. Look, it's in the stdlib, don't reinvent the wheel. It illustrates the Right Way to use the language and the runtime, the ultimate argument to end doubts. In my opinion, the reason a standard library can be regarded this high is exactly because it provides high-quality examples (or at least it should do that), materials, bits and tools, but does not limit you in the way those tools can be used, and does not impose its rules on you if you want to actually roll something of your own. No framework in the world should have this power, as it would defeat the very reason frameworks do exist. And that's why I think, while asyncore and other expired batteries need to be cleaned up and upgraded (so they are of any use), I expect that no existing frameworks would enter the stdlib as de jure standard. I would expect instead that there would be useful primitives each of these frameworks implements anyway, and make the standard networking modules aware of those. But please, no bringing $MYFAVORITEFRAMEWORK into stdlib. You will either end up with something horrendous to support every existing mainloop implemetation out there, unwieldy and buggy, or you will make a clean, elegant framework that won't solve anyone's problem, will be incompatible with the rest of the world and fall into disuse. Or you can bring some gevent, or Tornado, you name it, into stdlib, and make the users of the remaining dozens of frameworks feel like damned outcasts. I feel the same about web things. Picking the tools to parse HTTP requests and forming the responses is okay, as HTTP is not a simple thing; bringing into the standard library the templating engine, routing engine, or, God forbid, an ORM, would be totally insane. From solipsis at pitrou.net Sun Oct 21 02:07:40 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Oct 2012 02:07:40 +0200 Subject: [Python-ideas] The async API of the future References: <5081F15B.1040403@canterbury.ac.nz> <508336B5.6080806@canterbury.ac.nz> Message-ID: <20121021020740.75ba98f9@pitrou.net> On Sun, 21 Oct 2012 12:41:41 +1300 Greg Ewing wrote: > Richard Oudkerk wrote: > > I don't see why a completion api needs to create wrappers for sockets. See > > > > http://pastebin.com/7tDmeYXz > > > > ... > > > > The AsyncIO class is independent of reactors, futures etc. The methods > > for starting an operation are > > > > recv(key, sock, nbytes, flags=0) > > send(key, sock, buf, flags=0) > > accept(key, sock) > > connect(key, sock, address) > > That looks awfully like a wrapper for a socket to me. All of those > system calls are peculiar to sockets. > > There doesn't necessarily have to be a wrapper class for each kind > of file descriptor. There could be one I/O class that handles everything, > or there could just be a collection of functions. > > The point is that, with a completion-based model, you need a function > or method for every possible system call that you might want to perform > asynchronously. There aren't that many of them, though: the four Richard listed should already be enough for most network applications, AFAIK. I really think Richard's proposal is a sane building block. Regards Antoine. From solipsis at pitrou.net Sun Oct 21 02:09:14 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Oct 2012 02:09:14 +0200 Subject: [Python-ideas] The async API of the future References: <5081F15B.1040403@canterbury.ac.nz> <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> <5083330C.6010609@canterbury.ac.nz> Message-ID: <20121021020914.76fd4b71@pitrou.net> On Sun, 21 Oct 2012 12:26:04 +1300 Greg Ewing wrote: > Glyph wrote: > > > The main interfaces you need are here: > > > > > > > > > > > > These don't look anywhere near adequate to me. How do I make > a sendmsg() call on a unix-domain socket and pass access rights > over it? Looks like your question is answered here: http://twistedmatrix.com/documents/current/api/twisted.internet.interfaces.IUNIXTransport.html Regards Antoine. From greg.ewing at canterbury.ac.nz Sun Oct 21 02:18:55 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 13:18:55 +1300 Subject: [Python-ideas] Cofunctions - Back to Basics In-Reply-To: <5082CA6D.3080800@stackless.com> References: <4EA8BD66.6010807@canterbury.ac.nz> <4EA93D02.2030201@hotpy.org> <4EA94055.3080207@canterbury.ac.nz> <4EA94304.1010909@hotpy.org> <4EAB2994.6010103@canterbury.ac.nz> <4EABAD40.1010002@canterbury.ac.nz> <5082CA6D.3080800@stackless.com> Message-ID: <50833F6F.5040100@canterbury.ac.nz> Christian Tismer wrote: > Picking that up, too... > > On 29.10.11 09:37, Greg Ewing wrote: > >> If the flag were moved into the stack frame instead, it would >> be possible to run any function in either "normal" or "coroutine" >> mode, depending on whether it was invoked via __call__ or >> __cocall__. > > What about this idea? > I think I just wrote exactly the same thing in another thread ;-) Yes, it's the same idea. As you can see, I did consider it at one point, but I had second thoughts when I realised that it wouldn't work through type slots, meaning that there would be some areas of what appear to be pure Python code, but are not suspendable for non-obvious reasons. Maybe this is not a fatal problem -- we just tell people that __xxx__ methods are not suspendable. It's something to consider. -- Greg From andrew.robert.moffat at gmail.com Sun Oct 21 02:33:48 2012 From: andrew.robert.moffat at gmail.com (Andrew Moffat) Date: Sat, 20 Oct 2012 19:33:48 -0500 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib Message-ID: Hi, I'm the author of sh.py, an intuitive interface for launching subprocesses in Linux and OSX http://amoffat.github.com/sh/. It has been maintained on github https://github.com/amoffat/sh for about 10 months and currently has about 25k installs, according to pythonpackages.com ( http://pythonpackages.com/package/sh, http://pythonpackages.com/package/pbs) Andy Grover maintains the Fedora rpm for sh.py http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick Moffit has submitted an older version of sh.py (which was called pbs) to be included in Debian distros http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html I'm interested in making sh.py more accessible to help bring Python forward in the area of shell scripting, so I'm interested in seeing if sh would be suitable for the standard library. Is there any other interest in something like this? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From mikegraham at gmail.com Sun Oct 21 03:02:58 2012 From: mikegraham at gmail.com (Mike Graham) Date: Sat, 20 Oct 2012 21:02:58 -0400 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat wrote: > Hi, > > I'm the author of sh.py, an intuitive interface for launching subprocesses > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained on > github https://github.com/amoffat/sh for about 10 months and currently has > about 25k installs, according to pythonpackages.com > (http://pythonpackages.com/package/sh, > http://pythonpackages.com/package/pbs) > > Andy Grover maintains the Fedora rpm for sh.py > http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick > Moffit has submitted an older version of sh.py (which was called pbs) to be > included in Debian distros > http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html > > I'm interested in making sh.py more accessible to help bring Python forward > in the area of shell scripting, so I'm interested in seeing if sh would be > suitable for the standard library. Is there any other interest in something > like this? > > Thanks sh.py strikes me as on the clever side for the stdlib and the lack of Windows support would be very unfortunate for a stdlib module (I don't know if this is relatively easily fixed, though it seems possible) Mike From glyph at twistedmatrix.com Sun Oct 21 03:52:48 2012 From: glyph at twistedmatrix.com (Glyph) Date: Sat, 20 Oct 2012 18:52:48 -0700 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <5081F15B.1040403@canterbury.ac.nz> <508336B5.6080806@canterbury.ac.nz> Message-ID: On Oct 20, 2012, at 4:53 PM, Guido van Rossum wrote: > On Sat, Oct 20, 2012 at 4:41 PM, Greg Ewing wrote: >> The point is that, with a completion-based model, you need a function >> or method for every possible system call that you might want to perform >> asynchronously. > > TBH, I like APIs that wrap all system calls. System calls have too > many low-level details that you have to be aware of, and they too > often vary per platform. (I just wrote a simple event loop plus > scheduler along the lines of your essay, extending it to the point > where I could do basic, fully-async, HTTP exchanges. The number of > details I had to take care of was excruciating; and then there were > the subtle differences between OSX and Ubuntu.) The layer that wraps the system calls does not necessarily be visible to applications. You absolutely need the syscalls to be exposed directly at some lower, non-standardized level, because it takes on average 15 years to shake out all the differences between platform behavior that you observed here :-). If applications try to do this, they will always get it wrong, and besides, they want to be making different syscalls for different transports. Much of Twisted's development has been about discovering exciting new behaviors on new platforms or new versions of supported platforms in the face of new levels of load, concurrency, or some other attribute. A minor nitpick: system calls aren't usually be performed asynchronously; you execute the syscall non-blockingly, and then you complete the action asynchronously. The whole idea of asynchronous I/O via non-blocking APIs implies some level of syscall wrapping.) -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 21 04:18:31 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 21 Oct 2012 12:18:31 +1000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> Message-ID: On Sun, Oct 21, 2012 at 2:30 AM, Guido van Rossum wrote: > On Fri, Oct 19, 2012 at 10:37 PM, Greg Ewing > wrote: >> Nick Coghlan wrote: >> >>> Please don't lose sight of the fact that yield-based suspension points >>> looking like something other than an ordinary function call is a >>> *feature*, not a bug. > > (Ironically, Christian just revived an old thread where Nick was of a > different opinion.) I like greenlets too, just for the ease of converting the scaling constraints of existing concurrent code from number-of-threads-per-process to number-of-open-sockets-per-process. I've come to the conclusion that they're no substitute for explicitly asynchronous code, though, and the assembler magic needed to make them work with arbitrary C code (either in the language core or in C extensions) makes them a poor fit for the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From greg.ewing at canterbury.ac.nz Sun Oct 21 04:46:05 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 15:46:05 +1300 Subject: [Python-ideas] The async API of the future: yield-from In-Reply-To: References: Message-ID: <508361ED.9040806@canterbury.ac.nz> Guido van Rossum wrote: > In the PEP 380 world, there will be a new type of mistake possible: > writing yield instead of yield from. Fortunately the scheduler can > easily test for this -- if the result of its calling next() is not > None, the user yielded something. That will catch some mistakes of that kind, but not all -- it won't catch 'yield foo()' where foo() returns None. One way to fix that would be to require yielding some unique sentinel value. If the yields are all hidden inside primitives called with 'yield from', that could be kept an implementation detail. -- Greg From tismer at stackless.com Sun Oct 21 06:40:51 2012 From: tismer at stackless.com (Christian Tismer) Date: Sun, 21 Oct 2012 06:40:51 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50833948.40406@canterbury.ac.nz> References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <50829073.90909@stackless.com> <50833948.40406@canterbury.ac.nz> Message-ID: <50837CD3.6000301@stackless.com> On 21.10.12 01:52, Greg Ewing wrote: > Christian Tismer wrote: > ... >> The only difference is that I'm not aiming at coroutines in >> the first place, but just having the concept of a *suspendable* >> function. > > I'm not sure what the distinction is. This comes maybe from my use of 'coroutine', 'tasklet', 'generator' etc. which differs from the meaning where others are thinking of. I'm mostly talking in the PyPy and Stackless community, which creates confusion. In that world, 'generator' for instance means a whole bunch of functions that can play together and yield to the caller of _the_ generator. The same holds for coroutines in that world. In python-world, things seem to be more often made of single functions. Switching context to that: 'coroutine' implies to think about coroutines, the intended action. 'suspendable' instead is neutral without any a-priori intent to switch or something. It just tells the ability that it can be suspended. That sounds more like a property. The 'suspendable' is meant as a building block for higher-level things, which include for instance coroutines (in any flavor). Technically the same, when we're talking about one single function that implements it. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Sun Oct 21 06:58:27 2012 From: tismer at stackless.com (Christian Tismer) Date: Sun, 21 Oct 2012 06:58:27 +0200 Subject: [Python-ideas] Language "macros" in discussions In-Reply-To: References: <41E28745-3928-4323-9334-B9E9AE4BB444@stackless.com> Message-ID: <508380F3.8050503@stackless.com> On 21.10.12 00:37, Guido van Rossum wrote: > On Sat, Oct 20, 2012 at 1:55 PM, Christian Tismer wrote: > ... >> Sent from my Ei4Steve > Does it happen to have a 40-char wide screen? :-) > Ah, sometimes I write from my iPhone, and I don't know how to do proper line breaks. Sometimes I break them line-by-line, sometimes I don't and rely on the email reader's wrapping. None is perfect. The iPhone 4S means 'iphone for Steve' for me, in memoriam ;-) cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From greg.ewing at canterbury.ac.nz Sun Oct 21 07:19:46 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 21 Oct 2012 18:19:46 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: <508385F2.5000707@canterbury.ac.nz> Guido van Rossum wrote: > In the yield-from world we could spell it that > way too (i.e. yield, not yield from), or we could make it so that we > can write yield from , or perhaps we need a helper call: yield > from wait() or maybe a method on the Future class (since it is > our own), yield from .wait(). This will depend to some extent on whether Futures are considered part of the tasks layer or part of the callbacks layer. If they're considered part of the callbacks layer, they shouldn't have any methods that must be called with yield-from. > Final remark: I keep wondering if it's better to try and stay "pure" > in the public API and use only yield from, plus some helpers like > spawn(), join() and par(), or if a decent, pragmatic public API can > offer a combination. I worry that most users will have a hard time > remembering when to use yield and when yield from. As I've said, I think it would be better to have only 'yield from' calls in the public API, because it gives the implementation the greatest freedom. -- Greg From solipsis at pitrou.net Sun Oct 21 12:18:20 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 21 Oct 2012 12:18:20 +0200 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib References: Message-ID: <20121021121820.13b7b7d4@pitrou.net> On Sat, 20 Oct 2012 21:02:58 -0400 Mike Graham wrote: > On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat > wrote: > > Hi, > > > > I'm the author of sh.py, an intuitive interface for launching subprocesses > > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained on > > github https://github.com/amoffat/sh for about 10 months and currently has > > about 25k installs, according to pythonpackages.com > > (http://pythonpackages.com/package/sh, > > http://pythonpackages.com/package/pbs) > > > > Andy Grover maintains the Fedora rpm for sh.py > > http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick > > Moffit has submitted an older version of sh.py (which was called pbs) to be > > included in Debian distros > > http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html > > > > I'm interested in making sh.py more accessible to help bring Python forward > > in the area of shell scripting, so I'm interested in seeing if sh would be > > suitable for the standard library. Is there any other interest in something > > like this? > > > > Thanks > > sh.py strikes me as on the clever side for the stdlib and the lack of > Windows support would be very unfortunate for a stdlib module (I don't > know if this is relatively easily fixed, though it seems possible) Ditto for me. The basic concept of the sh module looks like some fancy wrapper around subprocess.check_output: http://docs.python.org/dev/library/subprocess.html#subprocess.check_output The "easy chaining of subprocesses" part does not look that useful to me, or at least the examples aren't very convincing. If I want to sort the results of a shell command, it makes much more sense to me to do so using Python's text processing and sorting capabilities, than trying to find the right invocation of Unix "sort" and other utilities. That said, I do find the "fancy wrapper" part somewhat pretty. Regards Antoine. From rosuav at gmail.com Sun Oct 21 14:11:44 2012 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 21 Oct 2012 23:11:44 +1100 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: On Sun, Oct 21, 2012 at 11:33 AM, Andrew Moffat wrote: > I'm the author of sh.py, an intuitive interface for launching subprocesses > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained on > github https://github.com/amoffat/sh for about 10 months and currently has > about 25k installs, according to pythonpackages.com > (http://pythonpackages.com/package/sh, > http://pythonpackages.com/package/pbs) Is this on PyPI? I tried a search, but 'sh' comes up with rather a lot of hits. ChrisA From christian at python.org Sun Oct 21 15:35:46 2012 From: christian at python.org (Christian Heimes) Date: Sun, 21 Oct 2012 15:35:46 +0200 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: <5083FA32.7050900@python.org> Am 21.10.2012 02:33, schrieb Andrew Moffat: > I'm interested in making sh.py more accessible to help bring Python > forward in the area of shell scripting, so I'm interested in seeing if > sh would be suitable for the standard library. Is there any other > interest in something like this? I like to ignore the technical issues for now and concentrate on the legal and organizational problems. In order to get sh.py into Python's stdlib you have to relicense and donate the code under the PSF license. You and every contributor must agree on the relicensing. At least you must submit a signed contributor agreement, maybe every contributor. Are you able to get hold of everybody? Are you willing to maintain your code for several years, at least five years or more? Regards, Christian From benjamin at python.org Sun Oct 21 16:03:11 2012 From: benjamin at python.org (Benjamin Peterson) Date: Sun, 21 Oct 2012 14:03:11 +0000 (UTC) Subject: [Python-ideas] Interest in seeing sh.py in the stdlib References: <20121021121820.13b7b7d4@pitrou.net> Message-ID: Antoine Pitrou writes: > That said, I do find the "fancy wrapper" part somewhat pretty. One thing that's not very pretty about it is the need to use "-" prefixed parameters to get special behavior. It might be nicer if the command was an object and you called special methods on it to get special behavior. Ex: wget.background("example.com") sudo.context("extra", "args") Benjamin From shibturn at gmail.com Sun Oct 21 16:56:35 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Sun, 21 Oct 2012 15:56:35 +0100 Subject: [Python-ideas] The async API of the future In-Reply-To: <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> References: <5081F15B.1040403@canterbury.ac.nz> <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> Message-ID: On 20/10/2012 9:33am, Glyph wrote: > ... Also, you can't translate it into one > of those sources, because the message pump is associated with a > particular thread; you can't call a function in a different thread to > call PostQueuedCompletionStatus. I thought that the whole point of completion ports was inter-thread communication, and that PostQueuedCompletionStatus() is the equivalent of Queue.put(). Why does the message pump's thread matter? -- Richard From Steve.Dower at microsoft.com Sun Oct 21 18:47:04 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sun, 21 Oct 2012 16:47:04 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <508385F2.5000707@canterbury.ac.nz> References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> , <508385F2.5000707@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > This will depend to some extent on whether Futures are considered > part of the tasks layer or part of the callbacks layer. If they're > considered part of the callbacks layer, they shouldn't have any > methods that must be called with yield-from. I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation). Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not. > As I've said, I think it would be better to have only 'yield from' > calls in the public API, because it gives the implementation the > greatest freedom. I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of this. For example, the two following pieces of code are basically equivalent: @async def task1(): yield do_something_async_returning_a_future() @async def task2(): yield task1() yield task1() @async def task3(): yield task2() task3().result() And doing the same thing with yield from: def task1(): yield do_something_async_returning_a_future() def task2(): yield from task1() yield from task1() @async def task3(): yield from task2() task3().result() This is also equivalent to this code: @async def task3(): yield do_something_async_returning_a_future() yield do_something_async_returning_a_future() task3().result() And this: def task(): f = Future() do_something_async_returning_a_future().add_done_callback( lambda _: do_something_async_returning_a_future().add_done_callback( lambda _: f.set_result(None) ) ) return f My point is that once we are using yield, yield from automatically becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.) Cheers, Steve From guido at python.org Sun Oct 21 19:08:42 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 21 Oct 2012 10:08:42 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Oct 21, 2012 9:48 AM, "Steve Dower" wrote: > > Greg Ewing wrote: > > This will depend to some extent on whether Futures are considered > > part of the tasks layer or part of the callbacks layer. If they're > > considered part of the callbacks layer, they shouldn't have any > > methods that must be called with yield-from. > > I put Futures very firmly in the callbacks layer (I guess the easiest reasoning for this is the complete lack of threading/async code in their implementation). Did you check the source? That's simply incorrect. It uses locks, of the threading variety. ( However one could write an implementation with the same interface that doesn't.) > Every time someone suggests "yielding a sentinel value" it seems that a Future is ideal for this - it even provides the other thread/tasklet/coroutine with a way to reactivate the original one, whether the two functions were written with knowledge of each other or not. This I like. > > As I've said, I think it would be better to have only 'yield from' > > calls in the public API, because it gives the implementation the > > greatest freedom. > > I agree with this, though I still feel that we should be aiming for only 'yield' in the public API and leaving 'yield from' as a generalisation of this. For example, the two following pieces of code are basically equivalent: > > @async > def task1(): > yield do_something_async_returning_a_future() > > @async > def task2(): > yield task1() > yield task1() > > @async > def task3(): > yield task2() > > task3().result() > > And doing the same thing with yield from: > > def task1(): > yield do_something_async_returning_a_future() > > def task2(): > yield from task1() > yield from task1() > > @async > def task3(): > yield from task2() > > task3().result() > > This is also equivalent to this code: > > @async > def task3(): > yield do_something_async_returning_a_future() > yield do_something_async_returning_a_future() > > task3().result() > > And this: > > def task(): > f = Future() > do_something_async_returning_a_future().add_done_callback( > lambda _: do_something_async_returning_a_future().add_done_callback( > lambda _: f.set_result(None) > ) > ) > return f > > My point is that once we are using yield, yield from automatically becomes an option for composing operations. Teaching and validating this style is also easier, because the rule can be 'always use @async/yield in public APIs and just yield from in private APIs', and the biggest problem with not using yield from is that more Future objects are created. (The upsides were in my essay, but include compatibility with other Future-based APIs and composability between code from different sources.) Hm. I think it'll be confusing. And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. --Guido van Rossum (sent from Android phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Sun Oct 21 20:41:50 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sun, 21 Oct 2012 18:41:50 +0000 (UTC) Subject: [Python-ideas] Interest in seeing sh.py in the stdlib References: Message-ID: Andrew Moffat writes: > I'm interested in making sh.py more accessible to help bring Python forward in > the area of shell scripting, so I'm interested in seeing if sh would be > suitable for the standard library. ?Is there any other interest in something > like this? I would agree with others who have replied saying that the approach is cute, but a little too magical. Disclosure: this is an area of interest for me, and I maintain a project called sarge [1] which sort of fits in the same space as pbs/sh. It doesn't have the cute shell-command-as-Python-function idiom (which, in my view, buys very little readability), but it does aim to offer some features which (AFAICT) sh doesn't have. I'll just list sarge's features briefly below, if for no other reason than to show that there are other contenders worth considering (should there be a consensus that the stdlib needs batteries in this area). Sarge improves upon subprocess when: * You want to use command pipelines, but using subprocess out of the box often leads to deadlocks because pipe buffers get filled up. * You want to use bash-style pipe syntax on Windows, but some Windows shells don?t support some of the syntax you want to use, like &&, ||, |& and so on. * You want to process output from commands in a flexible way, and communicate() is not flexible enough for your needs ? for example, you need to process output a line at a time. * You want to avoid shell injection problems by having the ability to quote your command arguments safely. * subprocess allows you to let stderr be the same as stdout, but not the other way around ? and you need to do that. It offers: * A simple run command which allows a rich subset of Bash-style shell command syntax, but parsed and run by sarge so that you can run identically on Windows without cygwin. This includes asynchronous calls (using "&" just as in bash). * The ability to format shell commands with placeholders, such that variables are quoted to prevent shell injection attacks. * The ability to capture output streams without requiring you to program your own threads. You just use a Capture object and then you can read from it as and when you want. A Capture object can capture the output from multiple chained commands. * Delays in commands (e.g. "sleep") are honoured in asynchronous calls. I would also concur with others who've pointed out that stdlib maintenance is a long haul affair. I've been maintaining the logging package for around 10 years now :-) Regards, Vinay Sajip [1] http://sarge.readthedocs.org/ From Steve.Dower at microsoft.com Sun Oct 21 22:07:49 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sun, 21 Oct 2012 20:07:49 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> , Message-ID: > Did you check the source? That's simply incorrect. It uses locks, of the threading variety. Yes, I've spent a lot of time in the source for Future while working on this. It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads. > ( However one could write an implementation with the same interface that doesn't.) And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted. (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.) (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.) > Hm. I think it'll be confusing. I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated. > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.) Cheers, Steve From guido at python.org Mon Oct 22 02:23:52 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 21 Oct 2012 17:23:52 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower wrote: >> Did you check the source? That's simply incorrect. It uses locks, of the threading variety. > > Yes, I've spent a lot of time in the source for Future while working on this. Sorry, I should have realized this, since your code example contained monkey-patching that Future class... > It has synchronisation which is _aware_ of threads, but it never creates, requires or uses them. It simply ensures thread-safe reentrancy, which will be required for any general solution unless it is completely banned from interacting across CPU threads. I don't see it that way. Any time you acquire a lock, you may be blocked for a long time. In a typical event loop that's an absolute no-no. Typically, to wait for another thread, you give the other thread a callback that adds a new event for *this* thread. Now, it's possible that in Windows, when using IOCP, the philosophy is different -- I think I've read in http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that there can be multiple threads reading events from a single queue. But AFAIK, in Twisted and Tornado and similar systems, and probably even in gevent and Stackless, there is a strong culture around having only a single thread handling events (at least only one thread at a time), since the assumption is that as long as you don't suspend, you can trust that the world doesn't change, and that assumption becomes invalid when other threads may also be handling events from the same queue. It's possible to design a world where different threads have their own event queues, and this assumption would only be valid for events belonging to the same queue; however that seems complicated. And you still don't want to ever attempt to acquire a *threading* lock, because you end up blocking the entire event loop. >> ( However one could write an implementation with the same interface that doesn't.) > > And this is as simple as replacing threading.Condition() with no-op acquire() and release() functions. Regardless, the big advantage of requiring 'Future' as an interface* is that other implementations can be substituted. Yes, here I think we are in (possibly violent :-) agreement. > (Maybe making the implementation of future a property of the active event loop? I don't mind particular event loops from banning CPU threads, but the entire API should allow their existence.) Perhaps. Lots of possibilities in this design space. > (*I'm inclined to define this as 'result()', 'done()', 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' functions. Maybe more, but I think that's sufficient. The current '_waiters' list is an optimisation for add_done_callback(), and doesn't need to be part of the interface.) Agreed. I don't see much use for the cancellation stuff and all the extra complexity that adds to the interface. BTW, I think concurrent.futures.Future doesn't stop you from calling set_result() or set_exception() more than once, which I think is a mistake -- I do enforce that in NDB's Futures. [Here you snipped some context. You proposed having public APIs that use "yield " and leaving "yield from " as something the user can use in her own program. To which I replied:] >> Hm. I think it'll be confusing. > > I think the basic case ("just make it work") will be simpler, and the advanced case ("minimise memory/CPU usage") will be more complicated. Let's agree to disagree on this. I think they are both valid design choices with different trade-offs. We should explore both directions further so as to form a better opinion. >> And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. > > Personally, I'd prefer developers to get a correct solution without having to understand how the whole thing works (the "pit of success"). I'm also sceptical of any other rule being as portable and composable - I don't think a standard library should have APIs where "you must only call this function with yield-from". ('await' in C# is not compulsory - you can take the Task returned from an async method and do whatever you like with it.) Surely "whatever you like" is constrained by whatever the Task type defines. Maybe it looks like a Future and has a blocking method to wait for the result, like .result() on concurrent.futures.Future? If you want that functionality for generators you just have to call some function, passing it the generator as an argument. Remember, Python doesn't consider that an inferior choice of API design compared to making something a method of the object itself -- witness len(), repr() and many others. FWIW, if I may sound antagonistic, I actually think that we're mostly in violent agreement, and I think we're getting closer to coming up with a sensible set of requirements and possibly even an API proposal. Keep it coming! -- --Guido van Rossum (python.org/~guido) From eric at trueblade.com Mon Oct 22 03:18:35 2012 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 21 Oct 2012 21:18:35 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: <50849EEB.1070201@trueblade.com> On 10/21/2012 8:23 PM, Guido van Rossum wrote: > I don't see it that way. Any time you acquire a lock, you may be > blocked for a long time. In a typical event loop that's an absolute > no-no. Typically, to wait for another thread, you give the other > thread a callback that adds a new event for *this* thread. > > Now, it's possible that in Windows, when using IOCP, the philosophy is > different -- I think I've read in > http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that > there can be multiple threads reading events from a single queue. Correct. The typical usage of an IOCP is that you create as many threads as you have CPUs (or cores, or execution units, or whatever the kids call them these days), then they can all wait on the same IOCP. So if you have, say 4 CPUs so 4 threads, they can all be woken up to do useful work if the IOCP has work items for them. -- Eric. From andrew.robert.moffat at gmail.com Mon Oct 22 04:40:07 2012 From: andrew.robert.moffat at gmail.com (Andrew Moffat) Date: Sun, 21 Oct 2012 21:40:07 -0500 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: <5083FA32.7050900@python.org> References: <5083FA32.7050900@python.org> Message-ID: I would be interested in relicensing and donating. I am able to reach out to the contributors, and I am pretty positive I could reach out and get the signing off from them. I would be more than willing to maintain the package as well...I'm in it for the long haul, it seems to resonated well with the community throughout its development. On Sun, Oct 21, 2012 at 8:35 AM, Christian Heimes wrote: > Am 21.10.2012 02:33, schrieb Andrew Moffat: > > I'm interested in making sh.py more accessible to help bring Python > > forward in the area of shell scripting, so I'm interested in seeing if > > sh would be suitable for the standard library. Is there any other > > interest in something like this? > > I like to ignore the technical issues for now and concentrate on the > legal and organizational problems. > > In order to get sh.py into Python's stdlib you have to relicense and > donate the code under the PSF license. You and every contributor must > agree on the relicensing. At least you must submit a signed contributor > agreement, maybe every contributor. Are you able to get hold of everybody? > > Are you willing to maintain your code for several years, at least five > years or more? > > Regards, > Christian > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.robert.moffat at gmail.com Mon Oct 22 04:40:53 2012 From: andrew.robert.moffat at gmail.com (Andrew Moffat) Date: Sun, 21 Oct 2012 21:40:53 -0500 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: The main criticism has been the cleverness of the dynamic lookups. There is also the ability to use a Command object for more explicit calls: cmd = sh.Command("/some/command") cmd(arg) So you have the best of both worlds. If you like the idea of the programs being attributes on the module, you can use the advertised way, if you don't, you can use the more explicit way. Windows support would be a little more difficult. It existed in an old version of sh, when it was merely a wrapper around the subprocess module. Now that sh.py no longer relies on the subprocess module and does fork-exec itself (in order to get more flexible access to the processes), Windows is currently unsupported. My current understanding is that most of the value comes from the linux/OSX folks, but Windows support is scheduled for the future. On Sat, Oct 20, 2012 at 8:02 PM, Mike Graham wrote: > On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat > wrote: > > Hi, > > > > I'm the author of sh.py, an intuitive interface for launching > subprocesses > > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained > on > > github https://github.com/amoffat/sh for about 10 months and currently > has > > about 25k installs, according to pythonpackages.com > > (http://pythonpackages.com/package/sh, > > http://pythonpackages.com/package/pbs) > > > > Andy Grover maintains the Fedora rpm for sh.py > > http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick > > Moffit has submitted an older version of sh.py (which was called pbs) to > be > > included in Debian distros > > > http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html > > > > I'm interested in making sh.py more accessible to help bring Python > forward > > in the area of shell scripting, so I'm interested in seeing if sh would > be > > suitable for the standard library. Is there any other interest in > something > > like this? > > > > Thanks > > sh.py strikes me as on the clever side for the stdlib and the lack of > Windows support would be very unfortunate for a stdlib module (I don't > know if this is relatively easily fixed, though it seems possible) > > Mike > -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Mon Oct 22 04:41:51 2012 From: glyph at twistedmatrix.com (Glyph) Date: Sun, 21 Oct 2012 19:41:51 -0700 Subject: [Python-ideas] The async API of the future In-Reply-To: <5083330C.6010609@canterbury.ac.nz> References: <5081F15B.1040403@canterbury.ac.nz> <65B7E04F-965D-4D07-A60B-121997703BC0@twistedmatrix.com> <5083330C.6010609@canterbury.ac.nz> Message-ID: <121478D3-DEA9-42DF-9E39-887B885A743F@twistedmatrix.com> On Oct 20, 2012, at 4:26 PM, Greg Ewing wrote: > Glyph wrote: > >> The main interfaces you need are here: >> >> >> >> >> > > These don't look anywhere near adequate to me. How do I make > a sendmsg() call on a unix-domain socket and pass access rights > over it? How do I do a readdir() on a file descriptor representing > a directory? Etc. You don't. Notice I didn't even include basic datagram transports in those interfaces, and those are relatively straightforward compared to your questions. Antoine already cited the answer to your first question - on POSIX-y operating systems, you add another interface, something like . Except, the first question doesn't even make sense as posed on Windows. Windows doesn't have any support for AF_UNIX/AF_LOCAL; if you want to send a "file descriptor" (read: object handle) to another process, you have to have either use DuplicateHandle or WSADuplicateSocket. Note that these work differently, so it depends which type of "file descriptor" you're trying to pass; if you are passing a socket you need a PID, if you're passing an anonymous pipe you need a process handle. The second question doesn't make sense anywhere. readdir() is blocking, always and forever; opendir() doesn't take flags and you can't effectively or portably set O_NONBLOCK on a directory descriptor with ioctls. All filesystem operations also block, for all practical purposes. So, really your question reverts to "how does one integrate a thread pool with an event loop", to which the answer is . Of course, all of these operations _can_ be made 'really' non-blocking with sufficient terrifyingly deep platform-specific knowledge. A DIR* is just a file descriptor, eventually, and readdir() is eventually some kind of syscall on it. POSIX AIO operations might be used to read without blocking[1]. However, trying to get consensus about, standardize, and implement every possible I/O operation before implementing the core I/O loop interface for sockets and timed calls is pretty extreme cart-before-horse-putting. People implemented gazillions of amazing networking applications in Python over the past two decades, despite the fact that it only even got sendmsg support recently. Heck, even Twisted's support of sendmsg and file-descriptor sending is relatively recent. I realize that this list is dedicated to the discussion of all proposals regardless of how radical and insane they might be, but that doesn't mean that *every* proposal must have its scope expanded until it is as radical and insane as possible. The fact that Twisted has so many separate interfaces is not a coincidence. We took an explicitly layered approach to building the main loop, so that anyone who needed an esoteric I/O operation could always write their own platform-specific handler. Anyone building an application that requires that layer will probably need to write something specific to their operating system and their preferred framework (Twisted, Tornado, the stdlib loop whether it's based on asyncore or not, etc). I believe that trying to cover every case in advance so they won't have to use an escape hatch and then not providing such an escape hatch is just going to make the API huge and confusing for newcomers and frustratingly limiting for people with really advanced use-cases. -glyph [1]: Actually, no, they can't: . But maybe one day this would be consistently implemented. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 22 06:10:57 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 21 Oct 2012 21:10:57 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50849EEB.1070201@trueblade.com> References: <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> <50849EEB.1070201@trueblade.com> Message-ID: On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith wrote: > On 10/21/2012 8:23 PM, Guido van Rossum wrote: >> I don't see it that way. Any time you acquire a lock, you may be >> blocked for a long time. In a typical event loop that's an absolute >> no-no. Typically, to wait for another thread, you give the other >> thread a callback that adds a new event for *this* thread. >> >> Now, it's possible that in Windows, when using IOCP, the philosophy is >> different -- I think I've read in >> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that >> there can be multiple threads reading events from a single queue. > > Correct. The typical usage of an IOCP is that you create as many threads > as you have CPUs (or cores, or execution units, or whatever the kids > call them these days), then they can all wait on the same IOCP. So if > you have, say 4 CPUs so 4 threads, they can all be woken up to do useful > work if the IOCP has work items for them. So what's the typical way to do locking in such a system? Waiting for a lock seems bad; and you can't assume that no other callbacks may run while you are running. What synchronization primitives are typically used? -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Mon Oct 22 07:30:24 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 05:30:24 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> , Message-ID: (Sorry about cutting context, I'll try not to do that again, but I also try to avoid reposting an entire email.) > > It has synchronisation which is _aware_ of threads, but it never > > creates, requires or uses them. It simply ensures thread-safe > > reentrancy, which will be required for any general solution unless > > it is completely banned from interacting across CPU threads. > > I don't see it that way. Any time you acquire a lock, you may be > blocked for a long time. In a typical event loop that's an absolute > no-no. Typically, to wait for another thread, you give the other > thread a callback that adds a new event for *this* thread. Agreed, but when you're waiting for another thread to stop reading its queue so you can add to it, how are you supposed to queue an event while you wait? The lock in Future is only an issue in result() where we wait for another thread to complete the event, but that is the entire point of that function. FWIW I don't see any scheduler ever calling result(), but there are valid situations for a user to call it (REPL, already on a worker thread, unit tests). Everywhere else the lock is required for thread safety. It could be a different lock to the one in result, but I don't think anything is gained from that. Rewriting Future in C and using CPU CAS primitives might be possible, but probably only of limited value. > Now, it's possible that in Windows, when using IOCP, the philosophy is > different -- I think I've read in > http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that > there can be multiple threads reading events from a single queue. > But AFAIK, in Twisted and Tornado and similar systems, and probably > even in gevent and Stackless, there is a strong culture around having > only a single thread handling events (at least only one thread at a > time), since the assumption is that as long as you don't suspend, you > can trust that the world doesn't change, and that assumption becomes > invalid when other threads may also be handling events from the same > queue. This is true, and my understanding is that IOCP is basically just a thread pool, and the 'single queue' means that all the threads are waiting on all the events and you can't guarantee which thread will get which. This is better than creating a new thread for each file, but I think that's all it is meant to be. We can easily write a single thread that can wait on all I/O, scheduling callbacks on the main thread, if necessary. I'm pretty sure that all platforms have better ways to do this though, but because they're all different it will need different implementations. > It's possible to design a world where different threads have > their own event queues, and this assumption would only be valid for > events belonging to the same queue; however that seems complicated. > And you still don't want to ever attempt to acquire a *threading* > lock, because you end up blocking the entire event loop. Multiple threads with independent queues should be okay, though definitely an advanced scenario. I'm sure this would be preferable to having multiple processes with one thread/queue each in some cases. In any case, this is easy enough to implement with TLS. > > (*I'm inclined to define [the required Future interface] as 'result()', 'done()', > > 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' > > functions. Maybe more, but I think that's sufficient. The current '_waiters' > > list is an optimisation for add_done_callback(), and doesn't need to be part > > of the interface.) > > Agreed. I don't see much use for the cancellation stuff and all the > extra complexity that adds to the interface. BTW, I think > concurrent.futures.Future doesn't stop you from calling set_result() > or set_exception() more than once, which I think is a mistake -- I do > enforce that in NDB's Futures. I agree, there should be no way to set the result or exception more than once. On cancellation, while there is some complexity involved I do think we can make use of 'cancel' and 'cancelled' functions to pass a signal back into the worker: op = do_something_async() # not yielded button.on_click += lambda: op.cancel() try: result = yield op except CancelledError: return False def do_something_async(): f = Future() def threadproc(): total = 0 for i in range(10000): if f.cancelled(): raise CancelledError total += i f.set_result(total) Thread(target=threadproc).run() return f I certainly would not want to see the CancelledError be raised automatically - this is no thread.abort() call - but it may be convenient to have an interface for "self._cancelled = True" and "return self._cancelled" that at least saves people from coming up with their own way of passing it in. The worker may completely ignore it, or complete anyway, but for long running operations it may be very handy. (I'll stop before I start thinking about partial results... :) ) > [Here you snipped some context. You proposed having public APIs that > use "yield " and leaving "yield from " as something > the user can use in her own program. To which I replied:] > > >> Hm. I think it'll be confusing. > > > > I think the basic case ("just make it work") will be simpler, and the advanced > > case ("minimise memory/CPU usage") will be more complicated. > > Let's agree to disagree on this. I think they are both valid design > choices with different trade-offs. We should explore both directions > further so as to form a better opinion. Probably we need some working implementations to code against. > > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. > > > > Personally, I'd prefer developers to get a correct solution without having to > > understand how the whole thing works (the "pit of success"). I'm also sceptical > > of any other rule being as portable and composable - I don't think a standard > > library should have APIs where "you must only call this function with yield-from". > > ('await' in C# is not compulsory - you can take the Task returned from an async > > method and do whatever you like with it.) > > Surely "whatever you like" is constrained by whatever the Task type > defines. Maybe it looks like a Future and has a blocking method to > wait for the result, like .result() on concurrent.futures.Future? If > you want that functionality for generators you just have to call some > function, passing it the generator as an argument. Remember, Python > doesn't consider that an inferior choice of API design compared to > making something a method of the object itself -- witness len(), > repr() and many others. I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value. (And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.) > FWIW, if I may sound antagonistic, I actually think that we're mostly > in violent agreement, and I think we're getting closer to coming up > with a sensible set of requirements and possibly even an API proposal. > Keep it coming! I do my best work when someone is arguing with me :) Cheers, Steve From mal at egenix.com Mon Oct 22 08:58:17 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Mon, 22 Oct 2012 08:58:17 +0200 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: <5083FA32.7050900@python.org> Message-ID: <5084EE89.7050603@egenix.com> On 22.10.2012 04:40, Andrew Moffat wrote: > I would be interested in relicensing and donating. I am able to reach out > to the contributors, and I am pretty positive I could reach out and get the > signing off from them. I would be more than willing to maintain the > package as well...I'm in it for the long haul, it seems to resonated well > with the community throughout its development. > > On Sun, Oct 21, 2012 at 8:35 AM, Christian Heimes wrote: > >> Am 21.10.2012 02:33, schrieb Andrew Moffat: >>> I'm interested in making sh.py more accessible to help bring Python >>> forward in the area of shell scripting, so I'm interested in seeing if >>> sh would be suitable for the standard library. Is there any other >>> interest in something like this? >> >> I like to ignore the technical issues for now and concentrate on the >> legal and organizational problems. >> >> In order to get sh.py into Python's stdlib you have to relicense and >> donate the code under the PSF license. You and every contributor must >> agree on the relicensing. At least you must submit a signed contributor >> agreement, maybe every contributor. Are you able to get hold of everybody? Small correction: The contributors would have to sign a contributor agreement with the PSF to enable the PSF to distribute the code under the PSF license: http://www.python.org/psf/contrib/ This usually is much easier to have than a copyright sign-over, since it's only a special license and the copyright remains with the authors. >> Are you willing to maintain your code for several years, at least five >> years or more? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 22 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-09-27: Released eGenix PyRun 1.1.0 ... http://egenix.com/go35 2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34 2012-09-25: Released mxODBC 3.2.1 ... http://egenix.com/go33 2012-10-23: Python Meeting Duesseldorf ... tomorrow eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From techtonik at gmail.com Mon Oct 22 12:51:52 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 22 Oct 2012 13:51:52 +0300 Subject: [Python-ideas] Windows temporary file association for Python files Message-ID: I wonder if it will make the life easier if Python was installed with .py association to "%PYTHON_HOME%\python.exe" "%1" %* It will remove the need to run .py scripts in virtualenv with explicit 'python' prefix. Example how it doesn't work right now E:\virtenv32\Scripts>echo import sys; print(sys.version) > test.py E:\virtenv32\Scripts>test.py 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] E:\virtenv32\Scripts>python test.py 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] If Python file association was specified with "%PYTHON_HOME%\python.exe" "%1" %* then virtualenv could override this variable when setting the environment to set correct executable for .py files. -- anatoly t. From p.f.moore at gmail.com Mon Oct 22 13:44:04 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 22 Oct 2012 12:44:04 +0100 Subject: [Python-ideas] Windows temporary file association for Python files In-Reply-To: References: Message-ID: On 22 October 2012 11:51, anatoly techtonik wrote: > I wonder if it will make the life easier if Python was installed with > .py association to "%PYTHON_HOME%\python.exe" "%1" %* > It will remove the need to run .py scripts in virtualenv with explicit > 'python' prefix. In Python 3.3 and later, the "py.exe" launcher is installed, and this is the association for ".py" files by default. It looks at the #! line of .py files, so you can run a specific Python interpreter by giving its full path. You can also specify (for example) "python3" or "python3.2" to run a specific Python version. A less known fact is that you can define custom commands for py.exe in a py.ini file. So you can have [commands] vpy=python in your py.ini, and then start your script with #!vpy to make it use the currently active Python (whichever is on %PATH%). Hope that helps, Paul From tismer at stackless.com Mon Oct 22 13:52:55 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 22 Oct 2012 13:52:55 +0200 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: <50853397.5070304@stackless.com> On 22.10.12 04:40, Andrew Moffat wrote: > The main criticism has been the cleverness of the dynamic lookups. > There is also the ability to use a Command object for more explicit > calls: > > cmd = sh.Command("/some/command") > cmd(arg) > > So you have the best of both worlds. If you like the idea of the > programs being attributes on the module, you can use the advertised > way, if you don't, you can use the more explicit way. > > Windows support would be a little more difficult. It existed in an > old version of sh, when it was merely a wrapper around the subprocess > module. Now that sh.py no longer relies on the subprocess module and > does fork-exec itself (in order to get more flexible access to the > processes), Windows is currently unsupported. My current > understanding is that most of the value comes from the linux/OSX > folks, but Windows support is scheduled for the future. > This is what I don't like: subprocess is not used, but you implement stuff yourself. Instead of bypassing subprocess I would improve subprocess and not duplicate the windows problem, which is most of the time _not_ easy to get right. Can you explain why you went this path? cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From tismer at stackless.com Mon Oct 22 14:01:38 2012 From: tismer at stackless.com (Christian Tismer) Date: Mon, 22 Oct 2012 14:01:38 +0200 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: <50853397.5070304@stackless.com> References: <50853397.5070304@stackless.com> Message-ID: <508535A2.2090400@stackless.com> On 22.10.12 13:52, Christian Tismer wrote: > On 22.10.12 04:40, Andrew Moffat wrote: >> The main criticism has been the cleverness of the dynamic lookups. >> There is also the ability to use a Command object for more explicit >> calls: >> >> cmd = sh.Command("/some/command") >> cmd(arg) >> >> So you have the best of both worlds. If you like the idea of the >> programs being attributes on the module, you can use the advertised >> way, if you don't, you can use the more explicit way. >> >> Windows support would be a little more difficult. It existed in an >> old version of sh, when it was merely a wrapper around the subprocess >> module. Now that sh.py no longer relies on the subprocess module and >> does fork-exec itself (in order to get more flexible access to the >> processes), Windows is currently unsupported. My current >> understanding is that most of the value comes from the linux/OSX >> folks, but Windows support is scheduled for the future. >> > > This is what I don't like: > > subprocess is not used, but you implement stuff yourself. > Instead of bypassing subprocess I would improve subprocess > and not duplicate the windows problem, which is most of the > time _not_ easy to get right. > > Can you explain why you went this path? > Sorry, while we are at it: The package name is a problem for me. A two-character name for a package?? That is something that I would never do in the global package namespace. It also is IMHO not nice to have such short names in PyPI. -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Mon Oct 22 14:38:36 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Mon, 22 Oct 2012 12:38:36 +0000 (UTC) Subject: [Python-ideas] Interest in seeing sh.py in the stdlib References: Message-ID: Andrew Moffat writes: > The main criticism has been the cleverness of the dynamic lookups. I would add: * The plethora of special keyword arguments like _bg, _iter, _in, _piped etc. doesn't look good. * Using callbacks for processing stream output makes it harder to do certain kinds of processing on that output. > Windows support would be a little more difficult. ?It existed in an old > version of sh, when it was merely a wrapper around the subprocess module. > Now that sh.py no longer relies on the subprocess module and does fork-exec > itself This isn't good. You may have resorted to bypassing subprocess because it didn't do what you needed, but it certainly wouldn't look good if a proposed stdlib module wasn't eating its own dog food (by which I mean, using subprocess). Though there have been precedents (optparse / argparse), a determined effort was made there to work with the existing stdlib module before giving up on it. From my own experience, subprocess has not been that intractable, so I'm curious - what flexibility of access did you need that subprocess couldn't offer? I would guess things that are essentially non-portable, like tty access to provide pexpect-like behaviour. (I had to eschew this for sarge, in the interests of cross-platform compatibility.) > (in order to get more flexible access to the processes), Windows is currently > unsupported. ?My current understanding is that most of the value comes from > the linux/OSX folks, but Windows support is scheduled for the future. It seems to me premature to propose sh.py for inclusion in the stdlib before offering Windows support. After all, those who need it can readily get hold of it from PyPI, as the impressive download numbers show. Just as its design has changed a fair bit going from pbs to sh.py, it may change yet more when Windows support is added, and it can be looked at again then. Regards, Vinay Sajip From techtonik at gmail.com Mon Oct 22 14:42:26 2012 From: techtonik at gmail.com (anatoly techtonik) Date: Mon, 22 Oct 2012 15:42:26 +0300 Subject: [Python-ideas] Windows temporary file association for Python files In-Reply-To: References: Message-ID: On Mon, Oct 22, 2012 at 2:44 PM, Paul Moore wrote: > On 22 October 2012 11:51, anatoly techtonik wrote: >> I wonder if it will make the life easier if Python was installed with >> .py association to "%PYTHON_HOME%\python.exe" "%1" %* >> It will remove the need to run .py scripts in virtualenv with explicit >> 'python' prefix. > > In Python 3.3 and later, the "py.exe" launcher is installed, and this > is the association for ".py" files by default. It looks at the #! line > of .py files, so you can run a specific Python interpreter by giving > its full path. You can also specify (for example) "python3" or > "python3.2" to run a specific Python version. Yes, I've noticed that this nasty launcher gets in the way. So, do you propose to edit source files every time I need to test them with a new version of Python? My original user story: I want to execute scripts in virtual environment (i.e. with Python installed for this virtual environment) without 'python' prefix. Here is another one. Currently Sphinx doesn't install with Python 3.2 and with Python 3.3 [1]. Normally I'd create 3 environments to troubleshoot it and I can not modify all Sphinx files to point to the correct interpreter to just execute 'setup.py install'. A solution would be to teach launcher to honor PYTHON_PATH variable if it is set (please don't confuse it with PYTHONPATH which purpose is still unclear on Windows). 1. https://bitbucket.org/birkenfeld/sphinx/issue/1022/doesnt-install-with-python-32-and-33-on From breamoreboy at yahoo.co.uk Mon Oct 22 15:16:49 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Mon, 22 Oct 2012 14:16:49 +0100 Subject: [Python-ideas] Windows temporary file association for Python files In-Reply-To: References: Message-ID: On 22/10/2012 13:42, anatoly techtonik wrote: > On Mon, Oct 22, 2012 at 2:44 PM, Paul Moore wrote: >> On 22 October 2012 11:51, anatoly techtonik wrote: >>> I wonder if it will make the life easier if Python was installed with >>> .py association to "%PYTHON_HOME%\python.exe" "%1" %* >>> It will remove the need to run .py scripts in virtualenv with explicit >>> 'python' prefix. >> >> In Python 3.3 and later, the "py.exe" launcher is installed, and this >> is the association for ".py" files by default. It looks at the #! line >> of .py files, so you can run a specific Python interpreter by giving >> its full path. You can also specify (for example) "python3" or >> "python3.2" to run a specific Python version. > > Yes, I've noticed that this nasty launcher gets in the way. So, do you > propose to edit source files every time I need to test them with a new > version of Python? My original user story: I see nothing nasty in the launcher, rather it's extremely useful. You don't have to edit your scripts. Just use py -3.2, py -2 or whatever to run the script, the launcher will work out which version to run for you if you're not specific. > > I want to execute scripts in virtual environment (i.e. with Python > installed for this virtual environment) without 'python' prefix. > > Here is another one. Currently Sphinx doesn't install with Python 3.2 > and with Python 3.3 [1]. Normally I'd create 3 environments to > troubleshoot it and I can not modify all Sphinx files to point to the > correct interpreter to just execute 'setup.py install'. Please try running your scripts with the mechanism I've given above and report back what happens, hopefully success :) > > A solution would be to teach launcher to honor PYTHON_PATH variable if > it is set (please don't confuse it with PYTHONPATH which purpose is > still unclear on Windows). What is PYTHON_PATH? IIRC I was told years ago *NOT* to use PYTHONPATH on Windows so its purpose to me isn't unclear, it's completely baffling. > > 1. https://bitbucket.org/birkenfeld/sphinx/issue/1022/doesnt-install-with-python-32-and-33-on > -- Cheers. Mark Lawrence. From jstpierre at mecheye.net Mon Oct 22 16:52:31 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Mon, 22 Oct 2012 10:52:31 -0400 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat wrote: > Hi, > > I'm the author of sh.py, an intuitive interface for launching subprocesses > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained on > github https://github.com/amoffat/sh for about 10 months and currently has > about 25k installs, according to pythonpackages.com > (http://pythonpackages.com/package/sh, > http://pythonpackages.com/package/pbs) > > Andy Grover maintains the Fedora rpm for sh.py > http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick > Moffit has submitted an older version of sh.py (which was called pbs) to be > included in Debian distros > http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html > > I'm interested in making sh.py more accessible to help bring Python forward > in the area of shell scripting, so I'm interested in seeing if sh would be > suitable for the standard library. Is there any other interest in something > like this? I'm not one for the sugar. Seems like you're stuffing the Python syntax where it doesn't quite belong, as evidenced by the many escape hatches. Basic query of things not covered in the documentation: If I import a non-existant program, will it give me back a function that will fail or raise an ImportError? How do I run a program with a - in the name? You say you replace - with _, but thatdoesn't specify what happens in the edge case of "if I have google-chrome and google_chrome, which one wins? What about /usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is, will it exhaust the PATH before trying fallbacks replacements or will it check all replacements at once? If I have a program that's not on PATH, what do I do? I can manipulate the PATH environment variable, but am I guaranteed that will work? Are you going to double fork forever to guarantee that environment? Can I build a custom prefix, like p = sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work like the regular sh module? p.gcc("whatever") ? Even with the existence of a regular gcc in the path? I wonder what happens if you do from sh import *. Does it block execution before continuing? How can I do parallel execution of four subprocesses, and get notified when all four are done? (Seems like this might be a thing for a Future as well, even in the absence of any scheduler or event loop). Are newcomers going to be confused by this? What happens if I try and do something like sh.ls("-l -a")? Will you use the POSIX shell parsing algorithm, pass it to bash, or pass it as one parameter? Will some form of injection attack be mitigated by this design? If you see this magic syntax as your one unique feature, I'd propose that you add it to the subprocess module, and improve the standard subprocess module's interface to cope with the new feature. But I don't see this as a worthwhile thing to have. -1 on the thing. > Thanks > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper From guido at python.org Mon Oct 22 16:59:56 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 07:59:56 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower wrote: [Stuff about Futures and threads] Personally, I'm interested in designing a system, including an event loop, where you can rely on the properties of cooperative scheduling to avoid ever touching (OS) threading locks. I think such a system should be "pure" and all interaction with threads should be mediated by the event loop. (It's okay if this means that the implementation of the event loop must at some point acquire a threading lock.) The Futures used by the tasks to coordinate amongst themselves should not require locking -- they should themselves be able to rely on the guarantees of the event loop not to invoke multiple callbacks in parallel. IIUC you can do this on Windows with IOCP too, simply by only having a single thread reading events. >> > > And the Futures-only-in-public-APIs rule seems to encourage less efficient solutions. >> > >> > Personally, I'd prefer developers to get a correct solution without having to >> > understand how the whole thing works (the "pit of success"). I'm also sceptical >> > of any other rule being as portable and composable - I don't think a standard >> > library should have APIs where "you must only call this function with yield-from". >> > ('await' in C# is not compulsory - you can take the Task returned from an async >> > method and do whatever you like with it.) >> >> Surely "whatever you like" is constrained by whatever the Task type >> defines. Maybe it looks like a Future and has a blocking method to >> wait for the result, like .result() on concurrent.futures.Future? If >> you want that functionality for generators you just have to call some >> function, passing it the generator as an argument. Remember, Python >> doesn't consider that an inferior choice of API design compared to >> making something a method of the object itself -- witness len(), >> repr() and many others. > > I'm interested that you skipped my "portable and composable" claim and went straight for my aside about another language. I'd prefer to avoid introducing top-level names, especially since this is an API with plenty of predecessors... what sort of trouble would we be having if sched or asyncore had claimed 'wait()'? Even more so because it's Python, since it is so easy to overwrite the value. Sorry, probably just got distracted (I was reading on three different devices while on a family outing :-). But my answer is short: to me, the PEP 380 style is perfectly portable and composable. If you think it isn't, please elaborate. > (And as it happens, Task handles both the asynchrony and the callbacks, so it looks a bit like Thread and Future mixed together. Personally, I prefer to keep the concepts separate.) Same here. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Mon Oct 22 17:55:27 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 15:55:27 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: > Personally, I'm interested in designing a system, including an event loop, > where you can rely on the properties of cooperative scheduling to avoid > ever touching (OS) threading locks. I think such a system should be "pure" > and all interaction with threads should be mediated by the event loop. > (It's okay if this means that the implementation of the event loop must at > some point acquire a threading lock.) The Futures used by the tasks to > coordinate amongst themselves should not require locking -- they should > themselves be able to rely on the guarantees of the event loop not to > invoke multiple callbacks in parallel. Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O. That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea. (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.) > IIUC you can do this on Windows with IOCP too, simply by only having a > single thread reading events. Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization. [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ] > To me, the PEP 380 style is perfectly portable and composable. If you think > it isn't, please elaborate. I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function. By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Here, the user still gets the benefits like: def not_an_async_func(): ops = list(map(get_url_async, list_of_urls)) # all URLs are now downloading in parallel, let's do some other synchronous stuff results = list(map(Future.result, ops)) Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running. (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.) However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]). Cheers, Steve From eric at trueblade.com Mon Oct 22 13:21:07 2012 From: eric at trueblade.com (Eric V. Smith) Date: Mon, 22 Oct 2012 07:21:07 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> <50849EEB.1070201@trueblade.com> Message-ID: <50852C23.8020608@trueblade.com> On 10/22/2012 12:10 AM, Guido van Rossum wrote: > On Sun, Oct 21, 2012 at 6:18 PM, Eric V. Smith wrote: >> On 10/21/2012 8:23 PM, Guido van Rossum wrote: >>> I don't see it that way. Any time you acquire a lock, you may be >>> blocked for a long time. In a typical event loop that's an absolute >>> no-no. Typically, to wait for another thread, you give the other >>> thread a callback that adds a new event for *this* thread. >>> >>> Now, it's possible that in Windows, when using IOCP, the philosophy is >>> different -- I think I've read in >>> http://msdn.microsoft.com/en-us/library/aa365198%28VS.85%29.aspx that >>> there can be multiple threads reading events from a single queue. >> >> Correct. The typical usage of an IOCP is that you create as many threads >> as you have CPUs (or cores, or execution units, or whatever the kids >> call them these days), then they can all wait on the same IOCP. So if >> you have, say 4 CPUs so 4 threads, they can all be woken up to do useful >> work if the IOCP has work items for them. > > So what's the typical way to do locking in such a system? Waiting for > a lock seems bad; and you can't assume that no other callbacks may run > while you are running. What synchronization primitives are typically > used? When I've done it (admittedly 10 years ago) we just used critical sections, since we weren't blocking for long (mostly memory management). I'm not sure if that's a best practice or not. The IOCP will actually let you block, then it will release another thread. So if you know you're going to block, you should create more threads than you have CPUs. Here's the relevant paragraph from the IOCP link you posted above: "The system also allows a thread waiting in GetQueuedCompletionStatus to process a completion packet if another running thread associated with the same I/O completion port enters a wait state for other reasons, for example the SuspendThread function. When the thread in the wait state begins running again, there may be a brief period when the number of active threads exceeds the concurrency value. However, the system quickly reduces this number by not allowing any new active threads until the number of active threads falls below the concurrency value. This is one reason to have your application create more threads in its thread pool than the concurrency value. Thread pool management is beyond the scope of this topic, but a good rule of thumb is to have a minimum of twice as many threads in the thread pool as there are processors on the system. For additional information about thread pooling, see Thread Pools." From jstpierre at mecheye.net Mon Oct 22 18:46:47 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Mon, 22 Oct 2012 12:46:47 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> <5082E3F1.6080004@stackless.com> Message-ID: On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum wrote: > On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre > wrote: >> I'm curious now... you keep mentioning Futures and Deferreds like >> they're two separate entities. What distinction between the two do you >> see? > > They have different interfaces and you end up using them differently. Who is "you" supposed to refer to? > In particular, quoting myself from another thread, here is how I use > the terms: > > - Future: something with roughly the interface but not necessarily the > implementation of PEP 3148. > > - Deferred: the Twisted Deferred class or something with very similar > functionality (there are some in the JavaScript world). > > The big difference between Futures and Deferreds is that Deferreds can > easily be chains together to create multiple stages, and each callback > is called with the value returned from the previous stage; also, > Deferreds have separate callback chains for regular values and errors. Chaining is an add-on to the system and not necessarily required. Dojo's Deferreds, modelled directly after Twisted's, don't have direct chaining with multiple callbacks per Deferred, but instead addCallback returns a new Deferred, which it may pass on to. This means that each Deferred has one result, and chaining is done slightly differently. The whole point of chaining is just convenience of mutating a value before it's passed to the caller. It's possible to live without it. Compare: from async_http_client import fetch_page from some_xml_library import parse_xml def fetch_xml(url): d = fetch_page(url) d.add_callback(parse_xml) return d with: def fetch_xml(url): def parse_page(result): d.callback(parse_xml(result)) d = Deferred() page = fetch_page(url) page.add_callback(parse_page) return d The two functions, treated as a black box, are equivalent. The distinction is convenience. > -- > --Guido van Rossum (python.org/~guido) -- Jasper From guido at python.org Mon Oct 22 19:03:53 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 10:03:53 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508238B2.4040808@canterbury.ac.nz> <5082E3F1.6080004@stackless.com> Message-ID: On Mon, Oct 22, 2012 at 9:46 AM, Jasper St. Pierre wrote: > On Sat, Oct 20, 2012 at 6:38 PM, Guido van Rossum wrote: >> On Sat, Oct 20, 2012 at 12:25 PM, Jasper St. Pierre >> wrote: >>> I'm curious now... you keep mentioning Futures and Deferreds like >>> they're two separate entities. What distinction between the two do you >>> see? >> >> They have different interfaces and you end up using them differently. > > Who is "you" supposed to refer to? > >> In particular, quoting myself from another thread, here is how I use >> the terms: >> >> - Future: something with roughly the interface but not necessarily the >> implementation of PEP 3148. >> >> - Deferred: the Twisted Deferred class or something with very similar >> functionality (there are some in the JavaScript world). >> >> The big difference between Futures and Deferreds is that Deferreds can >> easily be chains together to create multiple stages, and each callback >> is called with the value returned from the previous stage; also, >> Deferreds have separate callback chains for regular values and errors. > > Chaining is an add-on to the system and not necessarily required. > Dojo's Deferreds, modelled directly after Twisted's, don't have direct > chaining with multiple callbacks per Deferred, but instead addCallback > returns a new Deferred, which it may pass on to. This means that each > Deferred has one result, and chaining is done slightly differently. > > The whole point of chaining is just convenience of mutating a value > before it's passed to the caller. It's possible to live without it. > Compare: > > from async_http_client import fetch_page > from some_xml_library import parse_xml > > def fetch_xml(url): > d = fetch_page(url) > d.add_callback(parse_xml) > return d > > with: > > def fetch_xml(url): > def parse_page(result): > d.callback(parse_xml(result)) > > d = Deferred() > page = fetch_page(url) > page.add_callback(parse_page) > return d > > The two functions, treated as a black box, are equivalent. The > distinction is convenience. Jasper, I don't know you. You may be a wizard-levelTwisted user, or maybe you once saw a Twisted tutorial. All I know is that when I started this discussion I used the term Future thinking Deferreds were just Futures, and then Twisted core developers started explaining me that Deferreds are so much more than Futures (I think it may have been Glyph himself, in one of his longer posts). So please go argue the distinction or similarity with the Twisted core developers, not with me. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 22 19:34:38 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 10:34:38 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower wrote: >> Personally, I'm interested in designing a system, including an event loop, >> where you can rely on the properties of cooperative scheduling to avoid >> ever touching (OS) threading locks. I think such a system should be "pure" >> and all interaction with threads should be mediated by the event loop. >> (It's okay if this means that the implementation of the event loop must at >> some point acquire a threading lock.) The Futures used by the tasks to >> coordinate amongst themselves should not require locking -- they should >> themselves be able to rely on the guarantees of the event loop not to >> invoke multiple callbacks in parallel. > > Unfortunately, a "pure" system means that no async operation can ever have an OS provided callback (or one that comes from outside the world of the scheduler). The purity in this case becomes infectious and limits what operations can be continued from(/waited on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for that loop, whether by modifying a queue or setting a Future. This kind of system does not help with callback-based I/O. I'm curious what the Twisted folks have to say about this. Or the folks using gevent. I think your world view is colored by Windows; that's fine, we need input from experienced Windows users. But I can certainly imagine other ways of dealing with this. For example, in CPython, at least, a callback that is called directly by the OS cannot call straight into Python anyway -- you have to acquire the GIL first. This pretty much means that an unconstrained callback directly from the OS cannot call straight into Python -- it has to put something into a queue, and the bytecode interpreter will eventuall call it (possibly in another thread). This is how signal handlers are invoked too. > That's not to say that I want big heavy locks everywhere, but as soon as you potentially have two interrupt-scheduled pieces of code If interrupt-scheduled means what I think it means, this can only be C code. For the Python callback, see above. > queuing to the same loop you need to synchronise access to the data structure. As soon as you get the state and result of a future non-atomically, you need synchronization. I don't doubt there are ways around this (CAS goes a long way, also the GIL will probably help, assuming it's all Python code), and the current implementation of Future is a bit on the heavy side (but also suitable for much more arbitrary uses), but I really believe that avoiding all locks is a bad idea. I don't actually believe we should avoid all locks. I do believe that there should be a separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" threading world can hand off data to each other. It is probably unavoidable that the implementation of this mechanism touches a threading lock. But this does not mean that the rest of the "pure" world should need to use a Future class that touches threading locks. > (Also, I don't consider cooperative multitasking to be "async" - async requires at least two simultaneous (or at least non-deterministically switching) tasks, whether these are CPU threads or hardware-controlled I/O.) This sounds like a potentially fatal clash in terminology. In the way I use 'async', Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API where there is no non-deterministic switching in sight -- in fact, they all carefully fence off the part that does interact with threads. For example, the Twisted folks have argued that one of the big advantages of using Twisted's Deferred class is that while a callback is running, the state of the world remains constant (except for actions made by the callback itself, obviously). What other term should we use to encompass this world view (which IMO is a perfectly valid abstraction for a lot of I/O-related concurrency)? >> IIUC you can do this on Windows with IOCP too, simply by only having a >> single thread reading events. > > Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any more completions) you need to schedule it back to another thread. This requires synchronization. It does sound like this may be unique to Windows, or at least not shared with most of the UNIX world (UNIX ports of IOCP notwithstanding). > [ My claim that using "yield from" exclusively is less portable and composable than "yield" predominantly. ] >> To me, the PEP 380 style is perfectly portable and composable. If you think >> it isn't, please elaborate. > > I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) requires (a) that the caller is a generator and (b) that the callee is a generator. For the scheduling behavior to work correctly, it requires the event loop to be the one enumerating the generator, which means that if "open_async" must be called with YF then the entire user's call stack must be generators. Suddenly, wanting to use one async function has affected every single function. And that is by design -- Greg *wants* it to be that way, and so far I haven't found a reason to disagree with him. It seems you just fundamentally disagree with the design, but your arguments come from a fundamentally different world view. > By contrast, with @async/yield, the "scheduler" is actually in @async, so as soon as the function is called the subsequent step can be scheduled. There is no need to yield all the way up to the event loop, since the Future that was yielded inside open_async will queue the continuation when it completes (possibly triggered from another thread). Note that in the YF world, there are also ways to stop the yield to bubble all the way to the top. You simply call the generator function, which gives you a generator object, and the scheduler module or class can offer a variety of APIs to do things with it -- e.g. run it without waiting for it (yet), run several of these in parallel until one of them (or all of them) completes, etc. > Here, the user still gets the benefits like: > > def not_an_async_func(): > ops = list(map(get_url_async, list_of_urls)) > # all URLs are now downloading in parallel, let's do some other synchronous stuff > results = list(map(Future.result, ops)) And in the YF world you can do that too. > Where multiple tasks are running simultaneously, even though they eventually use a blocking wait (or a wait_all or as_completed). Doing this with YF based tasks will require the user to create the scheduler explicitly (unlike the implicit one with @async) and prevent any other asynchronous tasks from running. I don't see that. The user just has to be able to get a reference to the schedule, which should be part of the scheduler's API (e.g. a function in its module that returns the current scheduler instance). > (And as I mentioned in earlier emails, YF can be used for its stated purpose by delegating to subgenerators - an @async function is a generator yielding futures, so there is no problem with it YFing subgenerators that also yield futures. But the @async decorator is where they are collected, and not the very base of the stack.) With YF it doesn't have to be the base of the stack. It just usually is. I feel we are going around in circles. > However, as you pointed out earlier, if all you are trying to achieve is "pure" coroutines, then YF is perfectly appropriate. But this is because of the high level of cooperation required between the involved tasklets. As I understand it, coroutines gain me nothing once I call into a long OpenCV operation, because OpenCV does not know that it is supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are great for within a program, but they don't extend so well into libraries, and certainly provide no compatibility with existing ones (whereas, at worst, I can always write "yield thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library [except maybe a threading library... don't take that "any" too literally]). I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add a primitive operation to the YF scheduler that says "run this in a threadpool and wake me up when it produces a result". The public API for that primitive can certainly use YF itself -- the messing interface with threads can be completely hidden from view. IMO YF scheduler worth using for real work must provide such a primitive (it was one of the first things I had to do in my own prototype, to be able to call socket.getaddrinfo()). -- --Guido van Rossum (python.org/~guido) From ned at nedbatchelder.com Mon Oct 22 20:59:55 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Mon, 22 Oct 2012 14:59:55 -0400 Subject: [Python-ideas] Windows temporary file association for Python files In-Reply-To: References: Message-ID: <508597AB.4030501@nedbatchelder.com> On 10/22/2012 8:42 AM, anatoly techtonik wrote: > A solution would be to teach launcher to honor PYTHON_PATH variable if > it is set (please don't confuse it with PYTHONPATH which purpose is > still unclear on Windows). What are you talking about? PYTHON_PATH doesn't appear in the CPython sources at all. PYTHONPATH has the same purpose on Windows that it has anywhere: a list of directories to prefix to sys.path to find modules when importing. --Ned. From Steve.Dower at microsoft.com Mon Oct 22 21:18:02 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 19:18:02 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: >>> Personally, I'm interested in designing a system, including an event >>> loop, where you can rely on the properties of cooperative scheduling >>> to avoid ever touching (OS) threading locks. I think such a system should be "pure" >>> and all interaction with threads should be mediated by the event loop. >>> (It's okay if this means that the implementation of the event loop >>> must at some point acquire a threading lock.) The Futures used by the >>> tasks to coordinate amongst themselves should not require locking -- >>> they should themselves be able to rely on the guarantees of the event >>> loop not to invoke multiple callbacks in parallel. >> >> Unfortunately, a "pure" system means that no async operation can ever have an OS >> provided callback (or one that comes from outside the world of the scheduler). The purity >> in this case becomes infectious and limits what operations can be continued from(/waited >> on/blocked on/yielded/etc.). Only code invoked by the loop could schedule other code for >> that loop, whether by modifying a queue or setting a Future. This kind of system does not >> help with callback-based I/O. > > I'm curious what the Twisted folks have to say about this. Or the folks using gevent. So am I, but my guess would be that as long as you stay within their 'world' everything is fine (I haven't seen any Twisted code to make me believe otherwise, but happy to accept examples - I have no experience with it directly, though I believe I've used similar concepts before). This is fine for a library or framework, but I don't think it's appropriate for a standard library - maybe this is where our views differ? > I think your world view is colored by Windows; that's fine, we need input from experienced > Windows users. But I can certainly imagine other ways of dealing with this. Coloured by threads is probably more accurate, but then again, throwing threads around wildly is definitely a Windows thing :). I also have a background in microcontrollers, including writing my own pre-emptive and cooperative schedulers that worked with external devices, so I'm trying to draw on that as much as my Windows experience. > For example, in CPython, at least, a callback that is called directly by the OS cannot > call straight into Python anyway -- you have to acquire the GIL first. This pretty much > means that an unconstrained callback directly from the OS cannot call straight into Python > -- it has to put something into a queue, and the bytecode interpreter will eventuall call > it (possibly in another thread). This is how signal handlers are invoked too. I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever). >> That's not to say that I want big heavy locks everywhere, but as soon >> as you potentially have two interrupt-scheduled pieces of code > > If interrupt-scheduled means what I think it means, this can only be C code. For the > Python callback, see above. I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it. >> queuing to the same loop you need to synchronise access to the data structure. As soon >> as you get the state and result of a future non-atomically, you need synchronization. I >> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably >> help, assuming it's all Python code), and the current implementation of Future is a bit on >> the heavy side (but also suitable for much more arbitrary uses), but I really believe that >> avoiding all locks is a bad idea. > > I don't actually believe we should avoid all locks. I do believe that there should be a > separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" > threading world can hand off data to each other. It is probably unavoidable that the > implementation of this mechanism touches a threading lock. But this does not mean that the > rest of the "pure" world should need to use a Future class that touches threading locks. We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user. >> (Also, I don't consider cooperative multitasking to be "async" - async >> requires at least two simultaneous (or at least non-deterministically >> switching) tasks, whether these are CPU threads or hardware-controlled >> I/O.) > > This sounds like a potentially fatal clash in terminology. In the way I use 'async', > Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API > where there is no non-deterministic switching in sight -- in fact, they all carefully > fence off the part that does interact with threads. For example, the Twisted folks have > argued that one of the big advantages of using Twisted's Deferred class is that while a > callback is running, the state of the world remains constant (except for actions made by > the callback itself, obviously). > > What other term should we use to encompass this world view (which IMO is a perfectly valid > abstraction for a lot of I/O-related concurrency)? It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.) >>> IIUC you can do this on Windows with IOCP too, simply by only having >>> a single thread reading events. >> >> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any > more completions) you need to schedule it back to another thread. This requires > synchronization. > > It does sound like this may be unique to Windows, or at least not shared with most of the > UNIX world (UNIX ports of IOCP notwithstanding). IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop. >> However, as you pointed out earlier, if all you are trying to achieve is "pure" >> coroutines, then YF is perfectly appropriate. But this is because of the high level of >> cooperation required between the involved tasklets. As I understand it, coroutines gain me >> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is >> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are >> great for within a program, but they don't extend so well into libraries, and certainly >> provide no compatibility with existing ones (whereas, at worst, I can always write "yield >> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library >> [except maybe a threading library... don't take that "any" too literally]). > > I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then > it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add > a primitive operation to the YF scheduler that says "run this in a threadpool and wake me > up when it produces a result". The public API for that primitive can certainly use YF > itself -- the messing interface with threads can be completely hidden from view. IMO YF > scheduler worth using for real work must provide such a primitive (it was one of the first > things I had to do in my own prototype, to be able to call socket.getaddrinfo()). Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these). Cheers, Steve From guido at python.org Mon Oct 22 22:26:06 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 13:26:06 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 12:18 PM, Steve Dower wrote: [Quoting me] >> For example, in CPython, at least, a callback that is called directly by the OS cannot >> call straight into Python anyway -- you have to acquire the GIL first. This pretty much >> means that an unconstrained callback directly from the OS cannot call straight into Python >> -- it has to put something into a queue, and the bytecode interpreter will eventuall call >> it (possibly in another thread). This is how signal handlers are invoked too. > > I'm nervous about relying on the GIL like this, especially since many (most? all?) other interpreters often promote the fact that they don't have a GIL. In any case, it's an implementation detail - if the lock already exists, then we don't need to add another one, but it will need to be noted (in code comments) that we rely on keeping the GIL during the entire callback (which, as I'll go into more detail on later, I don't expect to be very long at all, ever). Ok, forget the GIL (though PyPy has one). Anyway, the existing mechanism I was referring to does *not* guarantee that the callback keeps the GIL as long as it runs. The GIL is used to emulate preemptive scheduling while still protecting CPython's internal data structures from concurrent access. It makes no guarantees for user data. Even "x = d[key]" may release the GIL if the dict contains keys whose __eq__ is implemented in Python. But the crucial point of the mechanism is that you don't call straight into Python from the OS-level callback (which is written in C or some other low-level language). You arrange for the interpreter to call the Python-level callback at some later time. So you might as well use this to enforce single-threading, if that's the way of your world. >>> That's not to say that I want big heavy locks everywhere, but as soon >>> as you potentially have two interrupt-scheduled pieces of code >> >> If interrupt-scheduled means what I think it means, this can only be C code. For the >> Python callback, see above. > > I basically meant it to mean any code running that interrupts the current code, whether because of a callback or preemption. Because of the GIL, you are right, but since arbitrary Python code could release the GIL at any time I don't think we could rely on it. At least in CPython, it's not just the GIL. The queue I'm talking about above must exist even in a CPython version that has no threading support (and hence no GIL). You still cannot call into Python from a signal handler or other callback called directly by the OS kernel. You must delay it until the bytecode interpreter is at a good stopping point. Check out this code: http://hg.python.org/cpython/file/daad150b4670/Python/ceval.c#l496 (AddPendingCall and friends). >>> queuing to the same loop you need to synchronise access to the data structure. As soon >>> as you get the state and result of a future non-atomically, you need synchronization. I >>> don't doubt there are ways around this (CAS goes a long way, also the GIL will probably >>> help, assuming it's all Python code), and the current implementation of Future is a bit on >>> the heavy side (but also suitable for much more arbitrary uses), but I really believe that >>> avoiding all locks is a bad idea. >> >> I don't actually believe we should avoid all locks. I do believe that there should be a >> separate mechanism, likely OS-specific, whereby the "pure" async world and the "messy" >> threading world can hand off data to each other. It is probably unavoidable that the >> implementation of this mechanism touches a threading lock. But this does not mean that the >> rest of the "pure" world should need to use a Future class that touches threading locks. > > We can achieve this by making the implementation of Future a property of the scheduler. So rather than using 'concurrent.futures.Future' to construct a new future, it could be 'concurrent.eventloop.get_current().Future()'. This way a user can choose a non-thread safe event loop if they know they don't need one (though I guess users/libraries could use a thread-safe Future deliberately when they know that a thread will be involved). This adds another level of optimization on top of the 'get_future_for' function I've already suggested, and does it without exposing any complexity to the user. Yes, this sounds find. I note that the existing APIs already encourage leaving the creation of the Future to library code -- you don't construct a Future, typically, but call an executor's submit() method. >>> (Also, I don't consider cooperative multitasking to be "async" - async >>> requires at least two simultaneous (or at least non-deterministically >>> switching) tasks, whether these are CPU threads or hardware-controlled >>> I/O.) >> >> This sounds like a potentially fatal clash in terminology. In the way I use 'async', >> Twisted, Tornado and gevent certainly qualify, and all those have huge parts of their API >> where there is no non-deterministic switching in sight -- in fact, they all carefully >> fence off the part that does interact with threads. For example, the Twisted folks have >> argued that one of the big advantages of using Twisted's Deferred class is that while a >> callback is running, the state of the world remains constant (except for actions made by >> the callback itself, obviously). >> >> What other term should we use to encompass this world view (which IMO is a perfectly valid >> abstraction for a lot of I/O-related concurrency)? > > It depends on the significance of the callback. In my world view, the callback only ever schedules a task (or I sometime use the word 'continuation') in the main loop. Because the callback could run anywhere, it needs to synchronise the queue, but the continuation is going to run synchronously anyway, so it does not require any locks. (I included the with_options(f, callback_context=None) function to allow the continuation to run wherever the callback does, which _would_ require synchronization, but it also requires an explicit declaration by the developer that they know what they are doing.) Hm. I guess you are talking about the low-level (or should I say OS-kernel-called) callback; most event frameworks for Python (except perhaps gevent?) use user-level callback extensively -- in fact that's where Twisted wants you to do all the work. So, again a clash of terminology... (Aside: please don't use 'continuation' for 'task'. The use of this term in Scheme has forever tainted the word for me.) >>>> IIUC you can do this on Windows with IOCP too, simply by only having >>>> a single thread reading events. >>> >>> Yes, but unless you run all subsequent code on the IOCP thread (thereby blocking any >> more completions) you need to schedule it back to another thread. This requires >> synchronization. >> >> It does sound like this may be unique to Windows, or at least not shared with most of the >> UNIX world (UNIX ports of IOCP notwithstanding). > > IOCP looks like a solution to a problem that was so common they shared it with everyone (I don't say it _IS_ a solution, because I know nothing about its history and I have to be careful of anything I say being taken as fact). You can create threads in any OS to wait for blocking I/O, so it's probably most accurate to say it's unique to IOCP or threadpools in general. Again, it's an implementation detail that doesn't change the public API, which is required to execute continuations within the event loop. So maybe IOCP is not all that relevant. Very early on in this discussion, IOCP was brought up as an important example of a system for async I/O that had a significantly *different* API than the typical select/poll/etc.-based systems found on UNIX platforms. But its relevance may well decompose into a few separable concerns: - Don't assume everything is a file descriptor. - On some systems, the natural way to do async I/O is *not* to wait until the socket (or other event source) is ready, but to ask it to perform a specific operation in "overlapping" (or async) mode, and you will get an event back when it is done. - Event queues are powerful. - You cannot ignore threads everywhere. >>> However, as you pointed out earlier, if all you are trying to achieve is "pure" >>> coroutines, then YF is perfectly appropriate. But this is because of the high level of >>> cooperation required between the involved tasklets. As I understand it, coroutines gain me >>> nothing once I call into a long OpenCV operation, because OpenCV does not know that it is >>> supposed to yield occasionally (or substitute any library for OpenCV). Coroutines are >>> great for within a program, but they don't extend so well into libraries, and certainly >>> provide no compatibility with existing ones (whereas, at worst, I can always write "yield >>> thread_pool_executor.queue(cv.do_something, params)" with @async with any existing library >>> [except maybe a threading library... don't take that "any" too literally]). >> >> I don't know what OpenCV is, but assuming it is something that doesn't know about YF, then >> it needs to run in a thread of its own (or a threadpool). It is perfectly possible to add >> a primitive operation to the YF scheduler that says "run this in a threadpool and wake me >> up when it produces a result". The public API for that primitive can certainly use YF >> itself -- the messing interface with threads can be completely hidden from view. IMO YF >> scheduler worth using for real work must provide such a primitive (it was one of the first >> things I had to do in my own prototype, to be able to call socket.getaddrinfo()). > > Here's that violent agreement again :) I think this may be a difference of opinion on API design: with @async the user never needs to touch the scheduler directly. All they need are tools that are already in the standard library - threads and futures - and presumably the new set of *_async() functions we will add. The only new thing to learn is @async (and for advanced users, with_options() and YF, but having taught Python to classes of undergraduates I can guarantee that not everyone needs these). But @async must imported from *somewhere*, and that's where the decisions are made on how the scheduler works. If you want to use a different scheduler you still have to import a different @async. (TBH I don't understand your with_options() thing. If that's how you propose switching scheduler implementations, there's still a default behavior that you'd have to change on a per-call basis.) And about threads and futures: I am making a principled stance that you shouldn't have to use threads, and you shouldn't have to use a future implementation that's tied to threads. But maybe we should hear from some Twisted folks... -- --Guido van Rossum (python.org/~guido) From drobinow at gmail.com Mon Oct 22 22:37:30 2012 From: drobinow at gmail.com (David Robinow) Date: Mon, 22 Oct 2012 16:37:30 -0400 Subject: [Python-ideas] Windows temporary file association for Python files In-Reply-To: References: Message-ID: On Mon, Oct 22, 2012 at 6:51 AM, anatoly techtonik wrote: > I wonder if it will make the life easier if Python was installed with > .py association to "%PYTHON_HOME%\python.exe" "%1" %* > It will remove the need to run .py scripts in virtualenv with explicit > 'python' prefix. > > > Example how it doesn't work right now > > E:\virtenv32\Scripts>echo import sys; print(sys.version) > test.py > > E:\virtenv32\Scripts>test.py > 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] > > E:\virtenv32\Scripts>python test.py > 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] > If Python file association was specified with > "%PYTHON_HOME%\python.exe" "%1" %* then virtualenv could override this > variable when setting the environment to set correct executable for > .py files. I believe you can solve your problem with the PY_PYTHON environment variable or the user's py.ini file. See section 3.4.4 of the documentation. From guido at python.org Mon Oct 22 22:58:14 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 13:58:14 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: Steve, I realize that continued point-by-point rebuttals probably are getting pointless. Maybe your enthusiasm and energy would be better spent trying to propose and implement (a prototype) of an API in the style that you prefer? Maybe we can battle it out in code more easily... -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Mon Oct 22 23:32:42 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 21:32:42 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples. Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples. -----Original Message----- From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf Of Guido van Rossum Sent: Monday, October 22, 2012 1358 To: Steve Dower Cc: python-ideas at python.org Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) Steve, I realize that continued point-by-point rebuttals probably are getting pointless. Maybe your enthusiasm and energy would be better spent trying to propose and implement (a prototype) of an API in the style that you prefer? Maybe we can battle it out in code more easily... -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 22 23:41:55 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 14:41:55 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 2:32 PM, Steve Dower wrote: > Sounds good. I'll make some revisions to the code I posted earlier and come up with some comparable/benchmarkable examples. > > Apart from the network server and client examples that have already been discussed, any particular problems I should be looking at solving with this? (Anyone?) I don't want to only come up with 'good' examples. I have a prototype implementing an async web client that fetches a page given a URL. Primitives I have in mind include running several of these concurrently and waiting for the first to come up with a result, or waiting for all results, or getting the results as they are ready. I have an event loop that can use select, poll, epoll, and kqueue (though I've only lightly tested it, on Linux and OSX, so I'm sure I've missed some corner cases and optimization possibilities). The fetcher calls socket.getaddrinfo() in a threadpool. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Oct 23 00:09:41 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Oct 2012 11:09:41 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> Message-ID: <5085C425.60709@canterbury.ac.nz> Guido van Rossum wrote: > On Mon, Oct 22, 2012 at 8:55 AM, Steve Dower wrote: >> Yes, but unless you run all subsequent code on the IOCP thread (thereby >> blocking any more completions) you need to schedule it back to another thread. >> This requires synchronization. I think there's an assumption behind this whole async tasks discussion that the tasks being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and different I/O activities with each other. We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that anyway for pure Python code. The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool of threads, any of which can handle any I/O operation. That's not helpful for us; when one of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that particular task*, and it must be run using the same OS thread that was running it before. I gather that Windows provides a way of making an async I/O request and specifying a callback for that request. If that's the case, do we need to bother with an IOCP at all? Just have the callback wake up the associated task directly. -- Greg From guido at python.org Tue Oct 23 00:30:46 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 15:30:46 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085C425.60709@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 3:09 PM, Greg Ewing wrote: > I think there's an assumption behind this whole async tasks discussion > that the tasks being scheduled are I/O bound. We're trying to overlap > CPU activity with I/O, and different I/O activities with each other. > We're *not* trying to achieve concurrency of CPU-bound tasks -- the > GIL prevents that anyway for pure Python code. Right. Of course. > The whole Windows IOCP thing, on the other hand, seems to be geared > towards having a pool of threads, any of which can handle any I/O > operation. That's not helpful for us; when one of our tasks blocks > waiting for I/O, the completion of that I/O must wake up *that particular > task*, and it must be run using the same OS thread that was running > it before. The reason we can't ignore IOCP is that it is apparently the *only* way to do async I/O in a scalable way. The only other polling primitive available is select() which does not scale. (Or so it is asserted by many folks; I haven't tested this, but I believe the argument against select() scaling in general.) > I gather that Windows provides a way of making an async I/O request > and specifying a callback for that request. If that's the case, do > we need to bother with an IOCP at all? Just have the callback wake > up the associated task directly. AFAICT the way to do that goes through IOCP... -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Oct 23 00:33:00 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Oct 2012 11:33:00 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> Message-ID: <5085C99C.3060905@canterbury.ac.nz> Guido van Rossum wrote: > (Aside: please don't use 'continuation' for 'task'. The use of this > term in Scheme has forever tainted the word for me.) It has a broader meaning than the one in Scheme; essentially it's a synonym for "callback". I agree it shouldn't be used as a synonym for "task", though. In any of its forms, a continuation isn't an entire task, it's something that you call to cause the resumption of a task from a particular suspension point. -- Greg From Steve.Dower at microsoft.com Tue Oct 23 00:31:10 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 22:31:10 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085C425.60709@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> Message-ID: >>> Yes, but unless you run all subsequent code on the IOCP thread >>> (thereby blocking any more completions) you need to schedule it back to another thread. >>> This requires synchronization. > > I think there's an assumption behind this whole async tasks discussion that the tasks > being scheduled are I/O bound. We're trying to overlap CPU activity with I/O, and > different I/O activities with each other. > We're *not* trying to achieve concurrency of CPU-bound tasks -- the GIL prevents that > anyway for pure Python code. Sure, but it's easy enough to slip it in for (nearly) free. The only other option is complete exclusion of CPU-bound concurrency, which also rules out running C functions (outside the GIL) on a separate thread. > The whole Windows IOCP thing, on the other hand, seems to be geared towards having a pool > of threads, any of which can handle any I/O operation. That's not helpful for us; when one > of our tasks blocks waiting for I/O, the completion of that I/O must wake up *that > particular task*, and it must be run using the same OS thread that was running it before. > > I gather that Windows provides a way of making an async I/O request and specifying a > callback for that request. If that's the case, do we need to bother with an IOCP at all? > Just have the callback wake up the associated task directly. IOCP is probably not useful at all, and as Guido said, it was brought up as an example of a non-select style of waiting. APIs like ReadFileEx/WriteFileEx let you provide the callback directly without using IOCP. In any case, even if we did use IOCP it would be an implementation detail and would not affect how the API is exposed. (Also, love your work on PEP 380. Despite my hesitation about using yield from for this API, I do really like using it with generators.) Cheers, Steve From guido at python.org Tue Oct 23 00:35:12 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 15:35:12 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085C99C.3060905@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> (Aside: please don't use 'continuation' for 'task'. The use of this >> term in Scheme has forever tainted the word for me.) > > It has a broader meaning than the one in Scheme; essentially > it's a synonym for "callback". (Off-topic:) But does that meaning apply to Scheme? If so, I wish someone would have told me 15 years ago... > I agree it shouldn't be used as a synonym for "task", though. > In any of its forms, a continuation isn't an entire task, it's > something that you call to cause the resumption of a task > from a particular suspension point. I guess that was just Steve showing off. :-) -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 23 00:49:40 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 22 Oct 2012 22:49:40 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> Message-ID: Alertable I/O () and overlapped I/O are two alternatives to IOCP on Windows. >> I agree [continuation] shouldn't be used as a synonym for "task", though. >> In any of its forms, a continuation isn't an entire task, it's >> something that you call to cause the resumption of a task >> from a particular suspension point. > > I guess that was just Steve showing off. :-) Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...). From guido at python.org Tue Oct 23 00:56:48 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 15:56:48 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 3:49 PM, Steve Dower wrote: > Alertable I/O () and overlapped I/O are two alternatives to IOCP on Windows. > >>> I agree [continuation] shouldn't be used as a synonym for "task", though. >>> In any of its forms, a continuation isn't an entire task, it's >>> something that you call to cause the resumption of a task >>> from a particular suspension point. >> >> I guess that was just Steve showing off. :-) > > Not intentionally - the team here that did async/await in C# talks a lot about "continuation-passing style", which is where I picked the term up from. I don't use it as a synonym for "task" - it's always meant the "bit that runs after we come back from the yield" (hmm... I think that definition needs some work...). Yeah, I have the same terminology hang-up with the term "continuation-passing-style" for web callbacks. Reading back what you wrote, you were indeed trying to distinguish between the "callback" (which you consider the thing that's directly invoked by the OS) and "the rest of the task" (e.g. the code that runs when the yield is resumed), which you were referring to as "continuation". I'd just use "the rest of the task" here. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Tue Oct 23 01:04:57 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Oct 2012 12:04:57 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> Message-ID: <5085D119.6020002@canterbury.ac.nz> Guido van Rossum wrote: > The reason we can't ignore IOCP is that it is apparently the *only* > way to do async I/O in a scalable way. The only other polling > primitive available is select() which does not scale. There seems to be an alternative to polling, though. There are functions called ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when the operation completes: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx Is there some reason that this doesn't scale either? -- Greg From guido at python.org Tue Oct 23 01:09:28 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 16:09:28 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085D119.6020002@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> <5085D119.6020002@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> The reason we can't ignore IOCP is that it is apparently the *only* >> way to do async I/O in a scalable way. The only other polling >> primitive available is select() which does not scale. > > > There seems to be an alternative to polling, though. There are > functions called ReadFileEx and WriteFileEx that allow you to > pass in a routine to be called when the operation completes: > > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx > > Is there some reason that this doesn't scale either? I don't know, we've reached territory I don't know at all. Are there also similar calls for Accept() and Connect() on sockets? Those seem the other major blocking primitives that are frequently used. FWIW, here is where I read about IOCP being the only scalable way on Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x-only.html -- --Guido van Rossum (python.org/~guido) From thoover at alum.mit.edu Tue Oct 23 01:36:35 2012 From: thoover at alum.mit.edu (Tom Hoover) Date: Mon, 22 Oct 2012 16:36:35 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> <5085D119.6020002@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 4:09 PM, Guido van Rossum wrote: > > On Mon, Oct 22, 2012 at 4:04 PM, Greg Ewing wrote: > > Guido van Rossum wrote: > > > >> The reason we can't ignore IOCP is that it is apparently the *only* > >> way to do async I/O in a scalable way. The only other polling > >> primitive available is select() which does not scale. > > > > > > There seems to be an alternative to polling, though. There are > > functions called ReadFileEx and WriteFileEx that allow you to > > pass in a routine to be called when the operation completes: > > > > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468%28v=vs.85%29.aspx > > http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx > > > > Is there some reason that this doesn't scale either? > > I don't know, we've reached territory I don't know at all. Are there > also similar calls for Accept() and Connect() on sockets? Those seem > the other major blocking primitives that are frequently used. > > FWIW, here is where I read about IOCP being the only scalable way on > Windows: http://google-opensource.blogspot.com/2010/01/libevent-20x-like-libevent-14x-only.html It's been years since I've looked at this stuff, but I believe that you want to use AcceptEx and ConnectEx in conjunction with IOCP. event_iocp.c and listener.c in libevent 2.0.x could help shed some light on the details. From greg.ewing at canterbury.ac.nz Tue Oct 23 01:48:39 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Oct 2012 12:48:39 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> Message-ID: <5085DB57.4010504@canterbury.ac.nz> Guido van Rossum wrote: > On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing wrote: > >>It has a broader meaning than the one in Scheme; essentially >>it's a synonym for "callback". > > (Off-topic:) But does that meaning apply to Scheme? If so, I wish > someone would have told me 15 years ago... It does, in the sense that a continuation appears to the Scheme programmer as a callable object. The connection goes deeper as well. There's a style of programming called "continuation-passing style", in which nothing ever returns -- every function is passed another function to be called with its result. In a language such as Scheme that supports tail calls, you can use this style extensively without fear of overflowing the call stack. You're using this style whenever you chain callbacks together using Futures or Deferreds. The callbacks don't return values; instead, each callback arranges for another callback to be called, passing it the result. This is also the way monadic I/O works in Haskell. None of the I/O functions ever return, they just call another function and pass it the result. A combination of currying and syntactic sugar is used to hide the fact that you're passing callbacks -- aka continuations -- around all over the place. Now, it turns out that you can define all the semantics of Scheme, including its continuations, by writing a Scheme interpreter in Scheme that doesn't itself use Scheme continuations. You do it by writing the whole interpereter in continuation-passing style, and it becomes clear that at that level, the "continuations" are just ordinary functions, relying on lexical scoping to capture all of the necessary state. > I guess that was just Steve showing off. :-) Not really -- to someone with a Scheme or FP background, it's near-impossible to look at something like a chain of Deferred callbacks without the word "continuation" springing to mind. I agree that it's not helpful to anyone without such a background, however. -- Greg From guido at python.org Tue Oct 23 01:54:41 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 22 Oct 2012 16:54:41 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085DB57.4010504@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> <5085DB57.4010504@canterbury.ac.nz> Message-ID: And, predictably, that gave me a headache... :-) --Guido van Rossum (sent from Android phone) On Oct 22, 2012 4:49 PM, "Greg Ewing" wrote: > Guido van Rossum wrote: > >> On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing >> wrote: >> >> It has a broader meaning than the one in Scheme; essentially >>> it's a synonym for "callback". >>> >> >> (Off-topic:) But does that meaning apply to Scheme? If so, I wish >> someone would have told me 15 years ago... >> > > It does, in the sense that a continuation appears to the > Scheme programmer as a callable object. > > The connection goes deeper as well. There's a style of > programming called "continuation-passing style", in which > nothing ever returns -- every function is passed another > function to be called with its result. In a language such > as Scheme that supports tail calls, you can use this style > extensively without fear of overflowing the call stack. > > You're using this style whenever you chain callbacks > together using Futures or Deferreds. The callbacks don't > return values; instead, each callback arranges for another > callback to be called, passing it the result. > > This is also the way monadic I/O works in Haskell. None > of the I/O functions ever return, they just call another > function and pass it the result. A combination of currying > and syntactic sugar is used to hide the fact that you're > passing callbacks -- aka continuations -- around all > over the place. > > Now, it turns out that you can define all the semantics > of Scheme, including its continuations, by writing a Scheme > interpreter in Scheme that doesn't itself use Scheme > continuations. You do it by writing the whole interpereter > in continuation-passing style, and it becomes clear that > at that level, the "continuations" are just ordinary > functions, relying on lexical scoping to capture all of the > necessary state. > > I guess that was just Steve showing off. :-) >> > > Not really -- to someone with a Scheme or FP background, > it's near-impossible to look at something like a chain > of Deferred callbacks without the word "continuation" > springing to mind. I agree that it's not helpful to > anyone without such a background, however. > > -- > Greg > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Oct 23 02:07:01 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 23 Oct 2012 13:07:01 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <5085C99C.3060905@canterbury.ac.nz> <5085DB57.4010504@canterbury.ac.nz> Message-ID: <5085DFA5.4060204@canterbury.ac.nz> Guido van Rossum wrote: > And, predictably, that gave me a headache... :-) Oops, sorry, Guido -- I shouldn't have mentioned the M-word. :-) -- Greg From andrew.robert.moffat at gmail.com Tue Oct 23 04:52:59 2012 From: andrew.robert.moffat at gmail.com (Andrew Moffat) Date: Mon, 22 Oct 2012 21:52:59 -0500 Subject: [Python-ideas] Interest in seeing sh.py in the stdlib In-Reply-To: References: Message-ID: On Mon, Oct 22, 2012 at 9:52 AM, Jasper St. Pierre wrote: > On Sat, Oct 20, 2012 at 8:33 PM, Andrew Moffat > wrote: > > Hi, > > > > I'm the author of sh.py, an intuitive interface for launching > subprocesses > > in Linux and OSX http://amoffat.github.com/sh/. It has been maintained > on > > github https://github.com/amoffat/sh for about 10 months and currently > has > > about 25k installs, according to pythonpackages.com > > (http://pythonpackages.com/package/sh, > > http://pythonpackages.com/package/pbs) > > > > Andy Grover maintains the Fedora rpm for sh.py > > http://arm.koji.fedoraproject.org/koji/buildinfo?buildID=94247 and Nick > > Moffit has submitted an older version of sh.py (which was called pbs) to > be > > included in Debian distros > > > http://pkgs.org/debian-wheezy/debian-main-i386/python-pbs_0.95-1_all.deb.html > > > > I'm interested in making sh.py more accessible to help bring Python > forward > > in the area of shell scripting, so I'm interested in seeing if sh would > be > > suitable for the standard library. Is there any other interest in > something > > like this? > > I'm not one for the sugar. Seems like you're stuffing the Python > syntax where it doesn't quite belong, as evidenced by the many escape > hatches. Basic query of things not covered in the documentation: > > If I import a non-existant program, will it give me back a function > that will fail or raise an ImportError? > > How do I run a program with a - in the name? You say you replace - > with _, but thatdoesn't specify what happens in the edge case of "if I > have google-chrome and google_chrome, which one wins? What about > /usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is, > will it exhaust the PATH before trying fallbacks replacements or will > it check all replacements at once? > > If I have a program that's not on PATH, what do I do? I can manipulate > the PATH environment variable, but am I guaranteed that will work? Are > you going to double fork forever to guarantee that environment? Can I > build a custom prefix, like p = > sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work > like the regular sh module? p.gcc("whatever") ? Even with the > existence of a regular gcc in the path? > > I wonder what happens if you do from sh import *. > > Does it block execution before continuing? How can I do parallel > execution of four subprocesses, and get notified when all four are > done? (Seems like this might be a thing for a Future as well, even in > the absence of any scheduler or event loop). > > Are newcomers going to be confused by this? What happens if I try and > do something like sh.ls("-l -a")? Will you use the POSIX shell parsing > algorithm, pass it to bash, or pass it as one parameter? Will some > form of injection attack be mitigated by this design? > > > If you see this magic syntax as your one unique feature, I'd propose > that you add it to the subprocess module, and improve the standard > subprocess module's interface to cope with the new feature. > > But I don't see this as a worthwhile thing to have. -1 on the thing. > > > Thanks > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > Jasper > Hi Jasper, thanks for your questions If I import a non-existant program, will it give me back a function > that will fail or raise an ImportError? Yes, an exception will be raised How do I run a program with a - in the name? You say you replace - > with _, but thatdoesn't specify what happens in the edge case of "if I > have google-chrome and google_chrome, which one wins? What about > /usr/bin/google-chrome and /usr/local/bin/google_chrome"? That is, > will it exhaust the PATH before trying fallbacks replacements or will > it check all replacements at once? The full PATH will be exhausted for the exact command, as typed, before any kind of "-" replacement is exercised. There hasn't been much concern about this because most people who want to call commands with special characters prefer to use the Command class (e.g. chrome = Command("/usr/bin/google-chrome")), so the documentation makes a note of this on this issue. If I have a program that's not on PATH, what do I do? I can manipulate > the PATH environment variable, but am I guaranteed that will work? Are > you going to double fork forever to guarantee that environment? Can I > build a custom prefix, like p = > sh.MagicPrefix(path="/opt/android_devtools/bin"), and have that work > like the regular sh module? p.gcc("whatever") ? Even with the > existence of a regular gcc in the path? You could manipulate the PATH, but a better way would be to use the Command class, which can take a full path of a command. The returned object can be used just like other commands. I wonder what happens if you do from sh import *. "ImportError: Cannot import * from sh. Please import sh or import programs individually." Commands are lazy resolved on sh anyways, so loading from all would be undefined. Does it block execution before continuing? How can I do parallel > execution of four subprocesses, and get notified when all four are > done? (Seems like this might be a thing for a Future as well, even in > the absence of any scheduler or event loop). Commands may be run in the background, like this: job1 = sh.tar("-zc", "-f", "archive-name.tar.gz", "/some/directory", _bg=True) job2 = sh.tar(..., _bg=True) job3 = sh.tar(..., _bg=True) job1.wait() job2.wait() job3.wait() Are newcomers going to be confused by this? What happens if I try and > do something like sh.ls("-l -a")? Will you use the POSIX shell parsing > algorithm, pass it to bash, or pass it as one parameter? Will some > form of injection attack be mitigated by this design? Bash--nor any shell-- is called into play. Sh.py doesn't do any argument parsing either. Arguments are passed into commands exactly as they're sent through sh.py. Newcomers have loved it so far, and after seeing some examples, there's been minimal confusion about how to use it. My thoughts about the magical-ness of sh.py.. I typically don't support very magical or dynamically resolving modules. I can be a little apprehensive of ORMs for this reason... I like to know how my code behaves explicitly. I think clever/magic can be confusing for people and inexplicit, and it's important to know more or less what's going on under the hood. But I also think that sh.py scratches an itch that a less dynamic approach fails to reach. My goal for sh.py has been to make writing system scripts as easy for Python as it is for Bash. People who write Bash scripts do so for a few reasons, one being that a shell script is pretty portable for *nix systems, and also because it's very easy to call programs with arguments, feed them input, and parse their output. But the shortcomings of shell scripts are how obfuscated and unnecessarily difficult they are to accomplish generic programming tasks. This is somewhere where Python excels. But unfortunately, until sh.py, I have not found any tool that lets you call commands nearly as easily as Bash, that didn't rely on Bash. Subprocess is painful. Other modules are extremely verbose. Sh.py, yes, uses a dynamic lookup mechanism (but it should be noted that you don't have to rely on it *at all* if you don't like it), but it does so with a very specific intention, and that is to make Python more suited and commonplace for writing system shell scripts. My push here to see if there is support is because I believe if sh.py could enter the stdlib, and therefore become more ubiquitous on Linux and OSX, that more shell-style scripts could be written in Python, more new users would be comfortable in using Python, and Bash scripts could go the way of the dodo :) Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From dinov at microsoft.com Tue Oct 23 06:46:43 2012 From: dinov at microsoft.com (Dino Viehland) Date: Tue, 23 Oct 2012 04:46:43 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <5085D119.6020002@canterbury.ac.nz> References: <508385F2.5000707@canterbury.ac.nz> <5085C425.60709@canterbury.ac.nz> <5085D119.6020002@canterbury.ac.nz> Message-ID: Greg wrote: > Guido van Rossum wrote: > > > The reason we can't ignore IOCP is that it is apparently the *only* > > way to do async I/O in a scalable way. The only other polling > > primitive available is select() which does not scale. > > There seems to be an alternative to polling, though. There are functions called > ReadFileEx and WriteFileEx that allow you to pass in a routine to be called when > the operation completes: > > http://msdn.microsoft.com/en- > us/library/windows/desktop/aa365468%28v=vs.85%29.aspx > http://msdn.microsoft.com/en- > us/library/windows/desktop/aa365748%28v=vs.85%29.aspx > > Is there some reason that this doesn't scale either? I suspect it's because it has the completion routine is being invoked on the same thread that issued the I/O. The thread has to first block in an alertable wait (e.g. WaitForMultipleObjectsEx or WSAWaitForMultipleEvents). So you'll only get 1 thread doing I/Os and CPU work vs IOCP's where many threads can share both workloads. From ericsnowcurrently at gmail.com Tue Oct 23 07:07:17 2012 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Mon, 22 Oct 2012 23:07:17 -0600 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Mon, Oct 22, 2012 at 9:55 AM, Steve Dower wrote: > I think the abstract for PEP 380 sums is up pretty well: "A syntax is proposed for a generator to > delegate part of its operations to another generator." Using 'yield from' (YF, for convenience) > requires (a) that the caller is a generator and (b) that the callee is a generator. Rather, the callee must be some iterable: def f(): yield from [1, 2, 3] for x in f(): print(x) -eric From jimjjewett at gmail.com Tue Oct 23 09:17:01 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 23 Oct 2012 03:17:01 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: On 10/19/12, Guido van Rossum wrote: > I did a basic timing test using a simple recursive function and a > recursive PEP-380 coroutine computing the same value (see attachment). > The coroutine version is a little over twice as slow as the function > version. I find that acceptable. This went 20 deep, making 2 recursive > calls at each level (except at the deepest level). Note that the co-routine code (copied below) does not involve a scheduler that unwraps futures; there is no scheduler, and nothing runs concurrently. def coroutine(n): if n <= 0: return 1 l = yield from coroutine(n-1) r = yield from coroutine(n-1) return l + 1 + r I like the above code; my concern was that yield might get co-opted for use with scheduler loops, which would have to track the parent task explicitly, and prevent it from being rescheduled too early. -jJ From benoitc at gunicorn.org Tue Oct 23 09:19:59 2012 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Tue, 23 Oct 2012 09:19:59 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@c anterbury.ac.nz> Message-ID: On Oct 22, 2012, at 4:59 PM, Guido van Rossum wrote: > On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower wrote: > [Stuff about Futures and threads] > > Personally, I'm interested in designing a system, including an event > loop, where you can rely on the properties of cooperative scheduling > to avoid ever touching (OS) threading locks. I think such a system > should be "pure" and all interaction with threads should be mediated > by the event loop. (It's okay if this means that the implementation of > the event loop must at some point acquire a threading lock.) The > Futures used by the tasks to coordinate amongst themselves should not > require locking -- they should themselves be able to rely on the > guarantees of the event loop not to invoke multiple callbacks in > parallel. > > IIUC you can do this on Windows with IOCP too, simply by only having a > single thread reading events. > Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, ?) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it: - It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems - Have a job queue system for threds that is working mostly like the Futures but using the event loop In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3] I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled: - In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue. - For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want. In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it. In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that? - beno?t [1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv [2] https://github.com/saghul/pyuv [3] https://github.com/saghul/twisted-pyuv [4] https://github.com/benoitc/flower [5] http://golang.org/ref/mem [6] http://doc.pypy.org/en/latest/stackless.html#continulets From jimjjewett at gmail.com Tue Oct 23 09:34:58 2012 From: jimjjewett at gmail.com (Jim Jewett) Date: Tue, 23 Oct 2012 03:34:58 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On 10/21/12, Guido van Rossum wrote: > On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower > wrote: >> It has synchronisation which is _aware_ of threads, but it never creates, >> requires or uses them. It simply ensures thread-safe reentrancy, which >> will be required for any general solution unless it is completely banned >> from interacting across CPU threads. > I don't see it that way. Any time you acquire a lock, you may be > blocked for a long time. In a typical event loop that's an absolute > no-no. Typically, to wait for another thread, you give the other > thread a callback that adds a new event for *this* thread. That (with or without rescheduling this thread to actually process the event) is a perfectly reasonable solution, but I'm not sure how obvious it is. People willing to deal with the conventions and contortions of twisted are likely to just use twisted. A general API should have a straightforward way to weight for a result; even explicitly calling wait() may be too much to ask if you want to keep assuming that other events will cooperate. > Perhaps. Lots of possibilities in this design space. > >> (*I'm inclined to define this [the Future interface] as 'result()', 'done()', >> 'add_done_callback()', 'exception()', 'set_result()' and 'set_exception()' >> functions. Maybe more, but I think that's sufficient. The current >> '_waiters' list is an optimisation for add_done_callback(), and doesn't >> need to be part of the interface.) > Agreed. I don't see much use for the cancellation stuff and all the > extra complexity that adds to the interface. wait_for_any may well be launching different strategies to solve the same problem, and intending to ignore all but the fastest. It makes sense to go ahead and cancel the slower strategies. (That said, I agree that the API shouldn't guarantee that other tasks are actually cancelled, let alone that they are cancelled before side effects occur.) -jJ From guido at python.org Tue Oct 23 16:44:31 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2012 07:44:31 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507B0F0B.8080700@pearwood.info> <507B26C6.10602@stackless.com> <507BA075.4030508@canterbury.ac.nz> <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <5081C2FA.3050107@hotpy.org> Message-ID: On Tue, Oct 23, 2012 at 12:17 AM, Jim Jewett wrote: > On 10/19/12, Guido van Rossum wrote: > >> I did a basic timing test using a simple recursive function and a >> recursive PEP-380 coroutine computing the same value (see attachment). >> The coroutine version is a little over twice as slow as the function >> version. I find that acceptable. This went 20 deep, making 2 recursive >> calls at each level (except at the deepest level). > > Note that the co-routine code (copied below) does not involve a > scheduler that unwraps futures; there is no scheduler, and nothing > runs concurrently. > > def coroutine(n): > if n <= 0: > return 1 > l = yield from coroutine(n-1) > r = yield from coroutine(n-1) > return l + 1 + r > > I like the above code; my concern was that yield might get co-opted > for use with scheduler loops, which would have to track the parent > task explicitly, and prevent it from being rescheduled too early. Don't worry. There is no way that a scheduler can change the meaning of yield from. All its power stems from its ability to decide when to call next(), and that is the same power that the app has itself. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 23 16:48:36 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2012 07:48:36 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: Thanks for the pointer to and description of libuv; it had come up in my research yet but so far I have not looked it up actively. Now I will. Also thanks for your reminder of the Goroutine model -- this is definitely something to look at for inspiration as well. (Though does Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-) --Guido On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau wrote: > > On Oct 22, 2012, at 4:59 PM, Guido van Rossum wrote: > >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower wrote: >> [Stuff about Futures and threads] >> >> Personally, I'm interested in designing a system, including an event >> loop, where you can rely on the properties of cooperative scheduling >> to avoid ever touching (OS) threading locks. I think such a system >> should be "pure" and all interaction with threads should be mediated >> by the event loop. (It's okay if this means that the implementation of >> the event loop must at some point acquire a threading lock.) The >> Futures used by the tasks to coordinate amongst themselves should not >> require locking -- they should themselves be able to rely on the >> guarantees of the event loop not to invoke multiple callbacks in >> parallel. >> >> IIUC you can do this on Windows with IOCP too, simply by only having a >> single thread reading events. >> > > Maybe it is worth to have a look on libuv and the way it mixes threads and and event loop [1]. Libuv is one of the good event loop around able to use IOCP and other events systems on other arch (kqueue, ?) and I was thinking when reading all the exchange around that it would perfectly fit in our cases. Or at least something like it: > > - It provides a common api for IO watchers: read, write, writelines, readable, writeable that can probably be extend over remote systems > - Have a job queue system for threds that is working mostly like the Futures but using the event loop > > In any case there is a pyuv binding [2] if some want to test. Even a twisted reactor [3] > > I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are handled: > > - In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue. > > - For I/Os it exists a common api to all Connections and Listeners (Conn & Listen classes) that generally ask on a poll server. This poll server has for only task to register FDs and wake up the groutines that wait on read or fd events. This this poll server is running in a blocking loop it is automatically let by the scheduler in a thread. This pol server could be likely be replaced by an event loop if someone want. > > In my opinion the Go concurrency & memory model [5] could perfectly fit in the Python world and I'm surprised none already spoke about it. > > In flower greenlets could probably be replaced by generators but i like the API proposed by any coroutine pattern. I wonder if continulets [6] couldn't be ported in cpython to handle that? > > - beno?t > > > [1] http://nikhilm.github.com/uvbook/threads.html & http://github.com/joyent/libuv > [2] https://github.com/saghul/pyuv > [3] https://github.com/saghul/twisted-pyuv > [4] https://github.com/benoitc/flower > [5] http://golang.org/ref/mem > [6] http://doc.pypy.org/en/latest/stackless.html#continulets -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 23 16:54:46 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2012 07:54:46 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> <508385F2.5000707@canterbury.ac.nz> Message-ID: On Tue, Oct 23, 2012 at 12:34 AM, Jim Jewett wrote: > On 10/21/12, Guido van Rossum wrote: >> On Sun, Oct 21, 2012 at 1:07 PM, Steve Dower >> wrote: > >>> It has synchronisation which is _aware_ of threads, but it never creates, >>> requires or uses them. It simply ensures thread-safe reentrancy, which >>> will be required for any general solution unless it is completely banned >>> from interacting across CPU threads. > >> I don't see it that way. Any time you acquire a lock, you may be >> blocked for a long time. In a typical event loop that's an absolute >> no-no. Typically, to wait for another thread, you give the other >> thread a callback that adds a new event for *this* thread. > > That (with or without rescheduling this thread to actually process the > event) is a perfectly reasonable solution, but I'm not sure how > obvious it is. People willing to deal with the conventions and > contortions of twisted are likely to just use twisted. I think part of my point is that we can package all this up in a way that is a lot less scary than Twisted's reputation. And remember, there are many other frameworks that use similar machinery. There's Tornado, Monocle (which runs on top of Tornado *or* Twisted), and of course the stdlib's asyncore, which is antiquated but still much used -- AFAIL Zope is still built around it. > A general API > should have a straightforward way to wait for a result; even > explicitly calling wait() may be too much to ask if you want to keep > assuming that other events will cooperate. Here I have some real world relevant experience: NDB, App Engine's new Datastore API (which I wrote). It is async under the hood (yield + its own flavor of Futures), and users who want the most performance from their app are encouraged to use the async APIs directly -- but users who don't care can ignore their existence completely. There are thousands of users, and I've seen people explain the async stuff to each other on StackOverflow, so I think it is quite accessible. >> Agreed. I don't see much use for the cancellation stuff and all the >> extra complexity that adds to the interface. > > wait_for_any may well be launching different strategies to solve the > same problem, and intending to ignore all but the fastest. It makes > sense to go ahead and cancel the slower strategies. (That said, I > agree that the API shouldn't guarantee that other tasks are actually > cancelled, let alone that they are cancelled before side effects > occur.) Agreed. And it's not hard to implement a custom cancellation mechanism either. -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Oct 23 18:09:56 2012 From: brett at python.org (Brett Cannon) Date: Tue, 23 Oct 2012 12:09:56 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: Go is available for Windows: http://golang.org/doc/install#windows On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum wrote: > Thanks for the pointer to and description of libuv; it had come up in > my research yet but so far I have not looked it up actively. Now I > will. Also thanks for your reminder of the Goroutine model -- this is > definitely something to look at for inspiration as well. (Though does > Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-) > > --Guido > > On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau > wrote: > > > > On Oct 22, 2012, at 4:59 PM, Guido van Rossum wrote: > > > >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower < > Steve.Dower at microsoft.com> wrote: > >> [Stuff about Futures and threads] > >> > >> Personally, I'm interested in designing a system, including an event > >> loop, where you can rely on the properties of cooperative scheduling > >> to avoid ever touching (OS) threading locks. I think such a system > >> should be "pure" and all interaction with threads should be mediated > >> by the event loop. (It's okay if this means that the implementation of > >> the event loop must at some point acquire a threading lock.) The > >> Futures used by the tasks to coordinate amongst themselves should not > >> require locking -- they should themselves be able to rely on the > >> guarantees of the event loop not to invoke multiple callbacks in > >> parallel. > >> > >> IIUC you can do this on Windows with IOCP too, simply by only having a > >> single thread reading events. > >> > > > > Maybe it is worth to have a look on libuv and the way it mixes threads > and and event loop [1]. Libuv is one of the good event loop around able to > use IOCP and other events systems on other arch (kqueue, ?) and I was > thinking when reading all the exchange around that it would perfectly fit > in our cases. Or at least something like it: > > > > - It provides a common api for IO watchers: read, write, writelines, > readable, writeable that can probably be extend over remote systems > > - Have a job queue system for threds that is working mostly like the > Futures but using the event loop > > > > In any case there is a pyuv binding [2] if some want to test. Even a > twisted reactor [3] > > > > I myself toying with the idea of porting the Go concurrency model to > Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are > handled: > > > > - In Go all coroutines are independent from each others and can only > communicate via channel. Which has the advantage to allows them to run on > different threads when one is blocking. In normal case they are mostly > working like grrenlets on a single thread and are simply scheduled in a > round-robin way. (mostly like in stackless). On the difference that > goroutines can be executed in parallel. When one is blocking another thread > will be created to handle other goroutines in the runnable queue. > > > > - For I/Os it exists a common api to all Connections and Listeners (Conn > & Listen classes) that generally ask on a poll server. This poll server has > for only task to register FDs and wake up the groutines that wait on read > or fd events. This this poll server is running in a blocking loop it is > automatically let by the scheduler in a thread. This pol server could be > likely be replaced by an event loop if someone want. > > > > In my opinion the Go concurrency & memory model [5] could perfectly fit > in the Python world and I'm surprised none already spoke about it. > > > > In flower greenlets could probably be replaced by generators but i like > the API proposed by any coroutine pattern. I wonder if continulets [6] > couldn't be ported in cpython to handle that? > > > > - beno?t > > > > > > [1] http://nikhilm.github.com/uvbook/threads.html & > http://github.com/joyent/libuv > > [2] https://github.com/saghul/pyuv > > [3] https://github.com/saghul/twisted-pyuv > > [4] https://github.com/benoitc/flower > > [5] http://golang.org/ref/mem > > [6] http://doc.pypy.org/en/latest/stackless.html#continulets > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrewfr_ice at yahoo.com Tue Oct 23 18:51:21 2012 From: andrewfr_ice at yahoo.com (Andrew Francis) Date: Tue, 23 Oct 2012 09:51:21 -0700 (PDT) Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: Message-ID: <1351011081.93695.YahooMailNeo@web140703.mail.bf1.yahoo.com> Hi Benoit and folks: >Message: 3 >Date: Tue, 23 Oct 2012 09:19:59 +0200 >From: Benoit Chesneau >To: Guido van Rossum >Cc: Python-Ideas >Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The ?> ? async??? API of the future: yield-from) >Message-ID: >Content-Type: text/plain; charset=windows-1252 (I learnt about this mailing list from Christian Tismer's post in the Stackless mailing list and I am catching up) >I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler >and the way IOs are handled: >- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to >allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single >thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be >executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue. What aspect of the Go concurrency model? Maybe you already know this but ?Go and Stackless Python share a common ancestor: Limbo. More specifically the way channels work.? This may be tangential to the discussion but in the past, I have used the stackless.py module in conjunction with CPython and greenlets to rapidly?prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) language feature.? Rob Pike and Russ?Cox were really helpful in answering my questions. Newer stackless.py implementations use? continuelets so look for an older PyPy implementation.? I have also prototyped a subset of Polyphonic C# join patterns. ?After I got the prototype running, I had an interesting discussion with the authors of "Scalable Join Patterns." For networking support, I run Twisted as a tasklet. There are a few tricks to make Stackless and Twisted co-operate. Cheers, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 23 18:54:15 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 23 Oct 2012 09:54:15 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: But does it let you use any Windows APIs? On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon wrote: > Go is available for Windows: http://golang.org/doc/install#windows > > On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum wrote: >> >> Thanks for the pointer to and description of libuv; it had come up in >> my research yet but so far I have not looked it up actively. Now I >> will. Also thanks for your reminder of the Goroutine model -- this is >> definitely something to look at for inspiration as well. (Though does >> Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-) >> >> --Guido >> >> On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau >> wrote: >> > >> > On Oct 22, 2012, at 4:59 PM, Guido van Rossum wrote: >> > >> >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower >> >> wrote: >> >> [Stuff about Futures and threads] >> >> >> >> Personally, I'm interested in designing a system, including an event >> >> loop, where you can rely on the properties of cooperative scheduling >> >> to avoid ever touching (OS) threading locks. I think such a system >> >> should be "pure" and all interaction with threads should be mediated >> >> by the event loop. (It's okay if this means that the implementation of >> >> the event loop must at some point acquire a threading lock.) The >> >> Futures used by the tasks to coordinate amongst themselves should not >> >> require locking -- they should themselves be able to rely on the >> >> guarantees of the event loop not to invoke multiple callbacks in >> >> parallel. >> >> >> >> IIUC you can do this on Windows with IOCP too, simply by only having a >> >> single thread reading events. >> >> >> > >> > Maybe it is worth to have a look on libuv and the way it mixes threads >> > and and event loop [1]. Libuv is one of the good event loop around able to >> > use IOCP and other events systems on other arch (kqueue, ?) and I was >> > thinking when reading all the exchange around that it would perfectly fit in >> > our cases. Or at least something like it: >> > >> > - It provides a common api for IO watchers: read, write, writelines, >> > readable, writeable that can probably be extend over remote systems >> > - Have a job queue system for threds that is working mostly like the >> > Futures but using the event loop >> > >> > In any case there is a pyuv binding [2] if some want to test. Even a >> > twisted reactor [3] >> > >> > I myself toying with the idea of porting the Go concurrency model to >> > Python [4] using greenlets and pyuv. Both the scheduler and the way IOs are >> > handled: >> > >> > - In Go all coroutines are independent from each others and can only >> > communicate via channel. Which has the advantage to allows them to run on >> > different threads when one is blocking. In normal case they are mostly >> > working like grrenlets on a single thread and are simply scheduled in a >> > round-robin way. (mostly like in stackless). On the difference that >> > goroutines can be executed in parallel. When one is blocking another thread >> > will be created to handle other goroutines in the runnable queue. >> > >> > - For I/Os it exists a common api to all Connections and Listeners (Conn >> > & Listen classes) that generally ask on a poll server. This poll server has >> > for only task to register FDs and wake up the groutines that wait on read or >> > fd events. This this poll server is running in a blocking loop it is >> > automatically let by the scheduler in a thread. This pol server could be >> > likely be replaced by an event loop if someone want. >> > >> > In my opinion the Go concurrency & memory model [5] could perfectly fit >> > in the Python world and I'm surprised none already spoke about it. >> > >> > In flower greenlets could probably be replaced by generators but i like >> > the API proposed by any coroutine pattern. I wonder if continulets [6] >> > couldn't be ported in cpython to handle that? >> > >> > - beno?t >> > >> > >> > [1] http://nikhilm.github.com/uvbook/threads.html & >> > http://github.com/joyent/libuv >> > [2] https://github.com/saghul/pyuv >> > [3] https://github.com/saghul/twisted-pyuv >> > [4] https://github.com/benoitc/flower >> > [5] http://golang.org/ref/mem >> > [6] http://doc.pypy.org/en/latest/stackless.html#continulets >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > -- --Guido van Rossum (python.org/~guido) From brett at python.org Tue Oct 23 19:08:12 2012 From: brett at python.org (Brett Cannon) Date: Tue, 23 Oct 2012 13:08:12 -0400 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: On Tue, Oct 23, 2012 at 12:54 PM, Guido van Rossum wrote: > But does it let you use any Windows APIs? > > That I don't know. > On Tue, Oct 23, 2012 at 9:09 AM, Brett Cannon wrote: > > Go is available for Windows: http://golang.org/doc/install#windows > > > > On Tue, Oct 23, 2012 at 10:48 AM, Guido van Rossum > wrote: > >> > >> Thanks for the pointer to and description of libuv; it had come up in > >> my research yet but so far I have not looked it up actively. Now I > >> will. Also thanks for your reminder of the Goroutine model -- this is > >> definitely something to look at for inspiration as well. (Though does > >> Go run on Windows? Or is it part of a secret anti-Microsoft plan? :-) > >> > >> --Guido > >> > >> On Tue, Oct 23, 2012 at 12:19 AM, Benoit Chesneau > > >> wrote: > >> > > >> > On Oct 22, 2012, at 4:59 PM, Guido van Rossum > wrote: > >> > > >> >> On Sun, Oct 21, 2012 at 10:30 PM, Steve Dower > >> >> wrote: > >> >> [Stuff about Futures and threads] > >> >> > >> >> Personally, I'm interested in designing a system, including an event > >> >> loop, where you can rely on the properties of cooperative scheduling > >> >> to avoid ever touching (OS) threading locks. I think such a system > >> >> should be "pure" and all interaction with threads should be mediated > >> >> by the event loop. (It's okay if this means that the implementation > of > >> >> the event loop must at some point acquire a threading lock.) The > >> >> Futures used by the tasks to coordinate amongst themselves should not > >> >> require locking -- they should themselves be able to rely on the > >> >> guarantees of the event loop not to invoke multiple callbacks in > >> >> parallel. > >> >> > >> >> IIUC you can do this on Windows with IOCP too, simply by only having > a > >> >> single thread reading events. > >> >> > >> > > >> > Maybe it is worth to have a look on libuv and the way it mixes threads > >> > and and event loop [1]. Libuv is one of the good event loop around > able to > >> > use IOCP and other events systems on other arch (kqueue, ?) and I was > >> > thinking when reading all the exchange around that it would perfectly > fit in > >> > our cases. Or at least something like it: > >> > > >> > - It provides a common api for IO watchers: read, write, writelines, > >> > readable, writeable that can probably be extend over remote systems > >> > - Have a job queue system for threds that is working mostly like the > >> > Futures but using the event loop > >> > > >> > In any case there is a pyuv binding [2] if some want to test. Even a > >> > twisted reactor [3] > >> > > >> > I myself toying with the idea of porting the Go concurrency model to > >> > Python [4] using greenlets and pyuv. Both the scheduler and the way > IOs are > >> > handled: > >> > > >> > - In Go all coroutines are independent from each others and can only > >> > communicate via channel. Which has the advantage to allows them to > run on > >> > different threads when one is blocking. In normal case they are mostly > >> > working like grrenlets on a single thread and are simply scheduled in > a > >> > round-robin way. (mostly like in stackless). On the difference that > >> > goroutines can be executed in parallel. When one is blocking another > thread > >> > will be created to handle other goroutines in the runnable queue. > >> > > >> > - For I/Os it exists a common api to all Connections and Listeners > (Conn > >> > & Listen classes) that generally ask on a poll server. This poll > server has > >> > for only task to register FDs and wake up the groutines that wait on > read or > >> > fd events. This this poll server is running in a blocking loop it is > >> > automatically let by the scheduler in a thread. This pol server could > be > >> > likely be replaced by an event loop if someone want. > >> > > >> > In my opinion the Go concurrency & memory model [5] could perfectly > fit > >> > in the Python world and I'm surprised none already spoke about it. > >> > > >> > In flower greenlets could probably be replaced by generators but i > like > >> > the API proposed by any coroutine pattern. I wonder if continulets [6] > >> > couldn't be ported in cpython to handle that? > >> > > >> > - beno?t > >> > > >> > > >> > [1] http://nikhilm.github.com/uvbook/threads.html & > >> > http://github.com/joyent/libuv > >> > [2] https://github.com/saghul/pyuv > >> > [3] https://github.com/saghul/twisted-pyuv > >> > [4] https://github.com/benoitc/flower > >> > [5] http://golang.org/ref/mem > >> > [6] http://doc.pypy.org/en/latest/stackless.html#continulets > >> > >> > >> > >> -- > >> --Guido van Rossum (python.org/~guido) > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrewfr_ice at yahoo.com Tue Oct 23 19:18:56 2012 From: andrewfr_ice at yahoo.com (Andrew Francis) Date: Tue, 23 Oct 2012 10:18:56 -0700 (PDT) Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: Message-ID: <1351012736.29549.YahooMailNeo@web140702.mail.bf1.yahoo.com> Hi Greg: Message: 2 Date: Tue, 23 Oct 2012 12:48:39 +1300 From: Greg Ewing To: "python-ideas at python.org" Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The ??? async API of the future: yield-from) Message-ID: <5085DB57.4010504 at canterbury.ac.nz> Content-Type: text/plain; charset=UTF-8; format=flowed >It does, in the sense that a continuation appears to the >Scheme programmer as a callable object. >The connection goes deeper as well. There's a style of >programming called "continuation-passing style", in which >nothing ever returns -- every function is passed another >function to be called with its result. In a language such >as Scheme that supports tail calls, you can use this style >extensively without fear of overflowing the call stack. >You're using this style whenever you chain callbacks >together using Futures or Deferreds. The callbacks don't >return values; instead, each callback arranges for another >callback to be called, passing it the result. There is a really nice Microsoft Research called "Cooperative Task Management without Manual Stackless Management."[1] In this paper, the authors introduce the term "stack ripping" to describe how asynchronous events with callbacks handle memory. I think this is a nice way to describe the fundamental differences between continuations and Twisted callbacks/deferred. Cheers, Andrew [1] http://research.microsoft.com/apps/pubs/default.aspx?id=74219 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Tue Oct 23 20:49:37 2012 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Tue, 23 Oct 2012 18:49:37 +0000 (UTC) Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) References: <507BC855.4070802@stackless.com> <507BE79D.1090100@stackless.com> <507C1661.5070206@stackless.com> <507CADE6.7050604@canterbury.ac.nz> <50814DDD.9070206@stackless.com> <50817D62.4040607@stackless.com> <5081F554.5090404@canterbury.ac.nz> Message-ID: Guido van Rossum writes: > > But does it let you use any Windows APIs? > It seems you can: https://github.com/AllenDang/w32 Quote from that page: "w32 is a wrapper of windows apis for the Go Programming Language. It wraps win32 apis to "Go style" to make them easier to use." Regards, Vinay Sajip From yselivanov.ml at gmail.com Tue Oct 23 21:33:58 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 23 Oct 2012 15:33:58 -0400 Subject: [Python-ideas] Async API Message-ID: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> Hello, First of all, sorry for the lengthy email. I've really tried to make it concise and I hope that I didn't fail entirely. At the beginning I want to describe the framework my company has been working on for several years, and on which we successfully deployed several web applications that are exposed to 1000s of users today. It survived multiple architecture reviews and redesigns, so I believe its design is worth to be introduced here. Secondly, I'm really glad that many advanced python developers find that use of "yield" is viable for async API, and that it even may be "The Right Way". Because when we started working on our framework that sounded nuts (and still sounds...) The framework ============= I'll describe here only the core functionality, not touching message bus & dispatch, protocols design, IO layers, etc. If someone gets interested - I can start another thread. The very core of the system is Scheduler. I prefer it to be called "Scheduler", and not "Reactor" or something else, because it's not just an event loop. It loops over micro-threads, where a micro-thread is a primitive that holds a pointer to the current running/suspended task. Task can be anything, from coroutine, to a Sleep command. A Task may be suspended because of IO waiting, a lock primitive, a timeout or something else. You can even write programs that are not IO-bound at all. To the code. So when you have:: @coroutine def foo(): bar_value = yield bar() defined, and then executed, 'foo' will send a Task object (wrapped around 'bar'), so that it will be executed in the foo's micro-thread. And because we return a Task, we can also do:: yield bar().with_timeout(1) or even (alike coroutines with Futures):: bar_value_promise = yield bar().with_timeout(1).async() [some code] bar_value = yield bar_value_promise So far there is nothing new. The need for something "new" emerged when we started to use it in "real world" applications. Consider you have some ORM, and the following piece of code:: topics = FE.select([ FE.publication_date, FE.body, FE.category, (FE.creator, [ (FE.creator.subject, [ (gpi, [ gpi.avatar ]) ]) ]) ]).filter(FE.publication_date < FE.publication_date.now(), FE.category == self.category) and later:: for topic in topics: print(topic.avatar.bucket.path, topic.category.name) Everything is lazily-loaded, so a DB query here can be run at virtually any point. When you iterate it pre-fetches objects, or addressing an attribute which wasn't told to be loaded, etc. The thing is that there is no way to express with 'yield' all that semantics. There is no 'for yield' statement, there is no pretty way of resolving an attribute with 'yield'. So even if you decide to write everything around you from scratch supporting 'yields', you still can't make a nice python API for some problems. Another problem is that "writing everything from scratch" thing. Nobody wants it. We always want to reuse, nobody wants to write an SMTP client from scratch, when there is a decent one available right in the stdlib. So the solution was simple. Incorporate greenlets. With greenlets we got a 'yield_()' function, that can be called from any coroutine, and from framework user's point of view it is the same as 'yield' statement. Now we were able to create a green-socket object, that looks as a plain stdlib socket, and fix our ORM. With it help we also were able to wrap lots and lots of existing libraries in a nice 'yield'-style design, without rewriting their internals. At the end - we have a hybrid approach. For 95% we use explicit 'yields', and for the rest 5% - well, we know that when we use ORM it may do some implicit 'yields', but that's OK. Now, with adopting greenlets a whole new optimization set of strategies became available. For instance, we can substitute 'yield' statements with 'yield_' command transparently by messing with opcodes, and by tweaking 'yield_' and reusing 'Task' objects we can achieve near regular-python-call performance, but with a tight control over our coroutines & micro-threads. And when PyPy finally adds support for Python 3, STM & JIT-able continulets, it would be very interesting to see how we can improve performance even further. Conclusion ========== The whole point of this text was to show, that pure 'yield' approach will not work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding' and 'yield-fromming'. There are so many ways of doing that: with @coroutine decorator, plain generators, futures and Tasks, and perhaps more. And I honestly don't know with one is the right one. What we really need now (and I think Guido has already mentioned that) is a callback-based (Deferreds, Futures, plain callbacks) design that is easy to plug-and-play in any coroutine-framework. It has to be low-level and simple. Sort of WSGI for async frameworks ;) We also need to work on the stdlib, so that it is easy to inject a custom socket in any object. Ideally, by passing it in the constructor (as everybody hates monkey-patching.) With all that said, I'd be happy to dedicate a fair amount of my time to help with the design and implementation. Thank you! Yury From sam-pydeas at rushing.nightmare.com Tue Oct 23 23:25:21 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 23 Oct 2012 14:25:21 -0700 Subject: [Python-ideas] Async API In-Reply-To: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> Message-ID: <50870B41.6090504@rushing.nightmare.com> On 10/23/12 12:33 PM, Yury Selivanov wrote: > The whole point of this text was to show, that pure 'yield' approach will not > work. Moreover, I don't think it's time to pronounce "The Right Way" of 'yielding' > and 'yield-fromming'. There are so many ways of doing that: with @coroutine > decorator, plain generators, futures and Tasks, and perhaps more. And I honestly > don't know with one is the right one. [Thanks Yury for giving me a convenient place to jump in] I abandoned the callback-driven approach in 1999, after pushing it as far as I could handle. IMHO you can build single pieces in a relatively clean fashion, but you cannot easily combine those pieces together to build real systems. Over the past year I've played a little with some generator-based code (tlslite & bluelets for example), and I don't think they're much of an improvement. Whether it's decorated callbacks, generators, whatever, it all reminds me of impenetrable monad code in Haskell. Continuation-passing-style isn't something that humans should be expected to do, it's a trick for compilers. 8^) > What we really need now (and I think Guido has already mentioned that) is a > callback-based (Deferreds, Futures, plain callbacks) design that is easy to > plug-and-play in any coroutine-framework. It has to be low-level and simple. > Sort of WSGI for async frameworks ;) I've been trying to play catch-up since being told about this thread a couple of days ago. If I understand it correctly, 'yield-from' looks like it can help make generator-based-concurrency a little more sane by cutting back on endless chains of 'for x in ...: yield ...', right? That certainly sounds like an improvement, but does the generator nature of the API bubble all the way up to the top? Can you send an email with a function call? > We also need to work on the stdlib, so that it is easy to inject a custom socket > in any object. Ideally, by passing it in the constructor (as everybody hates > monkey-patching.) > I second this one. Having a way to [optionally] pass in a factory for sockets would help with portability, and would cut down on the temptation to monkey-patch. It'd be really great to use standard 'async' protocol implementations in a performant way... although I'm not sure how/if I can wedge such code into systems like shrapnel*, but it all starts with being able to pass in a socket-like object (or factory). -Sam (*) Since no one else has mentioned it yet, a tiny plug here for shrapnel: https://github.com/ironport/shrapnel -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 194 bytes Desc: OpenPGP digital signature URL: From benoitc at gunicorn.org Tue Oct 23 23:48:44 2012 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Tue, 23 Oct 2012 23:48:44 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <1351011081.93695.YahooMailNeo@web140703.mail.bf1.yahoo.com> References: <1351011081.93695.YahooMailNeo@web140703.mail.bf1.yahoo.com> Message-ID: <07271596-2300-473F-B381-7EA3A7DE6CA3@gunicorn.org> On Oct 23, 2012, at 6:51 PM, Andrew Francis wrote: > Hi Benoit and folks: > > >Message: 3 > >Date: Tue, 23 Oct 2012 09:19:59 +0200 > >From: Benoit Chesneau > >To: Guido van Rossum > >Cc: Python-Ideas > >Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The > > async API of the future: yield-from) > >Message-ID: > >Content-Type: text/plain; charset=windows-1252 > > (I learnt about this mailing list from Christian Tismer's post in the Stackless mailing list and I am catching up) > > >I myself toying with the idea of porting the Go concurrency model to Python [4] using greenlets and pyuv. Both the scheduler >and the way IOs are handled: > > >- In Go all coroutines are independent from each others and can only communicate via channel. Which has the advantage to >allows them to run on different threads when one is blocking. In normal case they are mostly working like grrenlets on a single >thread and are simply scheduled in a round-robin way. (mostly like in stackless). On the difference that goroutines can be >executed in parallel. When one is blocking another thread will be created to handle other goroutines in the runnable queue. > > > What aspect of the Go concurrency model? Maybe you already know this but Go and Stackless Python share a common ancestor: Limbo. More specifically the way channels work. Indeed :) I would have say Plan 9 and tasks inside but right channnels are in limbo too. > > This may be tangential to the discussion but in the past, I have used the stackless.py module in conjunction with CPython and greenlets to rapidly prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) language feature. > Rob Pike and Russ Cox were really helpful in answering my questions. Newer stackless.py implementations use > continuelets so look for an older PyPy implementation. > > I have also prototyped a subset of Polyphonic C# join patterns. After I got the prototype running, I had an interesting discussion with the authors of "Scalable Join Patterns." Yes saw that. And actually some part of the Task code is based on stackless.py but using greenlets, Channels have been slightly modified to be thread-safe and support buffering. Did you release your code somewhere ? It could be interesting to put the experience further. > > For networking support, I run Twisted as a tasklet. There are a few tricks to make Stackless and Twisted co-operate. I plan to release a new version of flower this week. For now i am also running a libuv eventloop in a tasklet, but since the tasklet need to be blocking for performance, i am writing some new code to run the tasklet in its proper thread when needed. Not sure how it will go. Current implementation handle events when the scheduler come on the eventloop which isn't the more efficient way imo. Another thing to considers is also rust. Rust is using libuv and put the eventloop in its own task thread : http://dl.rust-lang.org/doc/0.4/std/uv_global_loop.html I find this idea quite elegant. Best, - beno?t > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 24 00:05:52 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 23 Oct 2012 18:05:52 -0400 Subject: [Python-ideas] Async API In-Reply-To: <50870B41.6090504@rushing.nightmare.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> Message-ID: <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> Sam, BTW, kudos for shrapnel! On 2012-10-23, at 5:25 PM, Sam Rushing wrote: [snip] > On 10/23/12 12:33 PM, Yury Selivanov wrote: >> >> What we really need now (and I think Guido has already mentioned that) is a >> callback-based (Deferreds, Futures, plain callbacks) design that is easy to >> plug-and-play in any coroutine-framework. It has to be low-level and simple. >> Sort of WSGI for async frameworks ;) > > I've been trying to play catch-up since being told about this thread a > couple of days ago. If I understand it correctly, 'yield-from' looks > like it can help make generator-based-concurrency a little more sane by > cutting back on endless chains of 'for x in ...: yield ...', right? > That certainly sounds like an improvement, but does the generator nature > of the API bubble all the way up to the top? Can you send an email with > a function call? Well, I guess so. Let's say, urllib is rewritten internally in async-style, exposing publicly its old API, like:: def urlopen(*args, **kwargs): return run_coro(urlopen_async, args, kwargs) where 'run_coro' takes care of setting up a Scheduler/event-loop and running yield-style or callback-style private code. So that 'urllib' is blocking, but there is an option of using 'urlopen_async' for those who need it. For basic library functions that will work. And that's already a huge win. But developing a complicated library will become twice as hard, as you'll need to maintain two versions of API - sync & async all the way through the code. There is only one way to 'magically' make existing code both sync- & async- friendly--greenlets, but I think there is no chance for them (or stackless) to land in cpython in the foreseeable future (although it would be awesome.) BTW, why didn't you use greenlets in shrapnel and ended up with your own implementation? >> We also need to work on the stdlib, so that it is easy to inject a custom socket >> in any object. Ideally, by passing it in the constructor (as everybody hates >> monkey-patching.) >> > > I second this one. Having a way to [optionally] pass in a factory for > sockets would help with portability, and would cut down on the > temptation to monkey-patch. Great. Let's see - if nobody is opposed to this we can start with submitting patches :) Or is there a need for a separate small PEP? Thanks, Yury From sam-pydeas at rushing.nightmare.com Wed Oct 24 01:00:30 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 23 Oct 2012 16:00:30 -0700 Subject: [Python-ideas] Async API In-Reply-To: <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> Message-ID: <5087218E.8090805@rushing.nightmare.com> On 10/23/12 3:05 PM, Yury Selivanov wrote: > Sam, > > BTW, kudos for shrapnel! Thanks! > > For basic library functions that will work. And that's already a huge win. > But developing a complicated library will become twice as hard, as you'll need > to maintain two versions of API - sync & async all the way through the code. This is really difficult, if you want to see a great example of trying to make all parties happy, look at Pika (an AMQP implementation). Actually this reminds me, it would be really great if there was a standardized with_timeout()API. It's better than adding timeout args to all the functions. I'm sure that systems like Twisted & gevent could also implement it (if they don't already have it): In shrapnel it is simply: coro.with_timeout (, , *args, **kwargs) Timeouts are caught thus: try: coro.with_timeout (...) except coro.TimeoutError: ... > There is only one way to 'magically' make existing code both sync- & async- > friendly--greenlets, but I think there is no chance for them (or stackless) to > land in cpython in the foreseeable future (although it would be awesome.) > > BTW, why didn't you use greenlets in shrapnel and ended up with your own > implementation? I think shrapnel predates greenlets... some of the core asm code for greenlets may have come from one of shrapnel's precursors at ironport... Unfortunately it took many years to get shrapnel open-sourced - I remember talking with Guido about it over lunch in ~2006. -Sam -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 194 bytes Desc: OpenPGP digital signature URL: From yselivanov.ml at gmail.com Wed Oct 24 01:30:35 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 23 Oct 2012 19:30:35 -0400 Subject: [Python-ideas] Async API In-Reply-To: <5087218E.8090805@rushing.nightmare.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> Message-ID: On 2012-10-23, at 7:00 PM, Sam Rushing wrote: > On 10/23/12 3:05 PM, Yury Selivanov wrote: >> Sam, >> >> BTW, kudos for shrapnel! > Thanks! >> >> For basic library functions that will work. And that's already a huge win. >> But developing a complicated library will become twice as hard, as you'll need >> to maintain two versions of API - sync & async all the way through the code. > > This is really difficult, if you want to see a great example of trying > to make all parties happy, look at Pika (an AMQP implementation). Thanks, will take a look! > Actually this reminds me, it would be really great if there was a > standardized with_timeout()API. It's better than adding timeout args to > all the functions. I'm sure that systems like Twisted & gevent could > also implement it (if they don't already have it): > > In shrapnel it is simply: > > coro.with_timeout (, , *args, **kwargs) > > Timeouts are caught thus: > > try: > coro.with_timeout (...) > except coro.TimeoutError: > ... You're right--if we want to ship some "standard" async API in python, API for timeouts is a must. We will at least need to handle timeouts in async code in the stdlib, won't we... A question: How do you protect finally statements in shrapnel? If we have a following coroutine (greenlet style): def foo(): connection = open_connection() try: spam() finally: [some code] connection.close() What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point? Will you just abort 'foo', possibly preventing 'connection' from being closed? - Yury From sam-pydeas at rushing.nightmare.com Wed Oct 24 02:24:54 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 23 Oct 2012 17:24:54 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> Message-ID: <50873556.4000001@rushing.nightmare.com> On 10/23/12 4:30 PM, Yury Selivanov wrote: > > How do you protect finally statements in shrapnel? If we have a following > coroutine (greenlet style): > > def foo(): > connection = open_connection() > try: > spam() > finally: > [some code] > connection.close() > > What happens if you run 'foo.with_timeout(1)' and timeout occurs at > "[some code]" point? Will you just abort 'foo', possibly preventing > 'connection' from being closed? > Timeouts are raised as normal exceptions - for exactly this reason. The interesting part of the implementation is keeping each with_timeout() call separate. If you have nested with_timeout() calls and the outer timeout goes off, it will skip the inner exception handler and fire only the outer one. In other words, the code for with_timeout() verifies that any timeouts propagating through it belong to it. https://github.com/ironport/shrapnel/blob/master/coro/_coro.pyx#L1126-1142 -Sam From greg.ewing at canterbury.ac.nz Wed Oct 24 02:24:49 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Oct 2012 13:24:49 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> Message-ID: <50873551.5040207@canterbury.ac.nz> Yury Selivanov wrote: > def foo(): > connection = open_connection() > try: > spam() > finally: > [some code] > connection.close() > > What happens if you run 'foo.with_timeout(1)' and timeout occurs at > "[some code]" point? I would say that vital cleanup code probably shouldn't do anything that could block. If you really need to do that, it should be protected by a finally clause of its own: def foo(): connection = open_connection() try: spam() finally: try: [some code] finally: connection.close() -- Greg From tismer at stackless.com Wed Oct 24 02:24:49 2012 From: tismer at stackless.com (Christian Tismer) Date: Wed, 24 Oct 2012 02:24:49 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: References: <508385F2.5000707@canterbury.ac.nz> <5085C99C.3060905@canterbury.ac.nz> Message-ID: <50873551.60000@stackless.com> On 23.10.12 00:35, Guido van Rossum wrote: > On Mon, Oct 22, 2012 at 3:33 PM, Greg Ewing wrote: >> Guido van Rossum wrote: >> >>> (Aside: please don't use 'continuation' for 'task'. The use of this >>> term in Scheme has forever tainted the word for me.) >> It has a broader meaning than the one in Scheme; essentially >> it's a synonym for "callback". > (Off-topic:) But does that meaning apply to Scheme? If so, I wish > someone would have told me 15 years ago... > As used quite often, the definition is more like "half a coroutine", that means the part that can resume it at some point. Sticking two together, you get a coroutine (tasklet, greenlet etc). The are one-shot continuations, they are gone after resuming. The meaning in Scheme is much weider, and you were right to be scared. In Scheme, these beasts survive their reactivation as a constant. My big design error in 1998 was to implement exactly those full continuations for Python. I'm scared myself when I recall that ... ;-) ciao - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From tismer at stackless.com Wed Oct 24 02:43:38 2012 From: tismer at stackless.com (Christian Tismer) Date: Wed, 24 Oct 2012 02:43:38 +0200 Subject: [Python-ideas] Async API In-Reply-To: <5087218E.8090805@rushing.nightmare.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> Message-ID: <508739BA.9080101@stackless.com> On 24.10.12 01:00, Sam Rushing wrote: > On 10/23/12 3:05 PM, Yury Selivanov wrote: > ... >> There is only one way to 'magically' make existing code both sync- & async- >> friendly--greenlets, but I think there is no chance for them (or stackless) to >> land in cpython in the foreseeable future (although it would be awesome.) >> >> BTW, why didn't you use greenlets in shrapnel and ended up with your own >> implementation? > I think shrapnel predates greenlets... some of the core asm code for > greenlets may have come from one of shrapnel's precursors at ironport... > Unfortunately it took many years to get shrapnel open-sourced - I > remember talking with Guido about it over lunch in ~2006. > Hi Sam, greenlets were developed in 2004 by Armin Rigo, on the first (and maybe only) Stackless sprint here in Berlin. The greenlet asm code was ripped out of Stackless and slightly improved, but has the same old stack-slicing idea. cheers - chris -- Christian Tismer :^) Software Consulting : Have a break! Take a ride on Python's Karl-Liebknecht-Str. 121 : *Starship* http://starship.python.net/ 14482 Potsdam : PGP key -> http://pgp.uni-mainz.de phone +49 173 24 18 776 fax +49 (30) 700143-0023 PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04 whom do you want to sponsor today? http://www.stackless.com/ From yselivanov.ml at gmail.com Wed Oct 24 02:52:52 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 23 Oct 2012 20:52:52 -0400 Subject: [Python-ideas] Async API In-Reply-To: <50873551.5040207@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> Message-ID: Hi Greg, On 2012-10-23, at 8:24 PM, Greg Ewing wrote: > Yury Selivanov wrote: > >> def foo(): >> connection = open_connection() >> try: >> spam() >> finally: >> [some code] >> connection.close() >> What happens if you run 'foo.with_timeout(1)' and timeout occurs at "[some code]" point? > > I would say that vital cleanup code probably shouldn't do > anything that could block. If you really need to do that, > it should be protected by a finally clause of its own: > > def foo(): > connection = open_connection() > try: > spam() > finally: > try: > [some code] > finally: > connection.close() Please take a look at the problem definition in PEP 419. It's not about try..finally nesting, it's about Scheduler being aware that a coroutine is in its 'finally' block and thus shouldn't be interrupted at the moment (a problem that doesn't exist in a non-coroutine world). Speaking about your solution, imagine if you have three connections to close, what will you write? finally: try: c1.close() # coroutine call finally: try: c2.close() # coroutine call finally: c3.close() # coroutine call But if you somehow make scheduler aware of 'finally' block, through PEP 419 (which I don't like), or like in my framework where we inline special code in finally statement by modifying coroutine opcodes (which I don't like too), you can simply write:: finally: c1.close() c2.close() c3.close() And scheduler will gladly wait until finally is over. And the code snippet above is something, that is familiar to every user of python--nobody expects code in the finally section to be interrupted from the *outside* world. If we fail to guarantee 'finally' block safety, then coroutine-style programming is going to be much tougher. Or we have to abandon timeouts and coroutines interruptions. So eventually, we'll need to figure out the best mechanism/approach for this. Now, I don't think it's the right moment to shift discussion into this particular problem, but I would rather like to bring up the point, that implementing 'yield'-style coroutines is a very hard thing, and I'm not sure that we should implement them in 3.4. Setting guidelines and standard protocols, adding socket-factories support where necessary in the stdlib is a better approach (in my humble opinion.) - Yury From sam-pydeas at rushing.nightmare.com Wed Oct 24 02:58:16 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 23 Oct 2012 17:58:16 -0700 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50873551.60000@stackless.com> References: <5085C99C.3060905@canterbury.ac.nz> <50873551.60000@stackless.com> Message-ID: <50873D28.10401@rushing.nightmare.com> On 10/23/12 5:24 PM, Christian Tismer wrote: > > As used quite often, the definition is more like "half a coroutine", > that means the part that can resume it at some point. > Sticking two together, you get a coroutine (tasklet, greenlet etc). > The are one-shot continuations, they are gone after resuming. > > The meaning in Scheme is much weider, and you were right to be scared. > In Scheme, these beasts survive their reactivation as a constant. > My big design error in 1998 was to implement exactly those full > continuations for Python. > > I'm scared myself when I recall that ... ;-) > Come on Christian, take the red pill and see how far down the rabbit hole goes... 8^) https://github.com/samrushing/irken-compiler I never noticed before, but there really are two different meanings to 'continuation': 1) in the phrase 'continuation-passing style', it means a 'callback' of sorts. 2) as a separate term, it means an object that represents the future of a computation. Like Greg said, you can apply the CPS transformation to any bit of code (or write it that way from the start), and when you do you might be tempted to refer to your callbacks as 'continuations'. -Sam From Steve.Dower at microsoft.com Wed Oct 24 03:53:52 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 24 Oct 2012 01:53:52 +0000 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50873D28.10401@rushing.nightmare.com> References: <5085C99C.3060905@canterbury.ac.nz> <50873551.60000@stackless.com>,<50873D28.10401@rushing.nightmare.com> Message-ID: Since I was the one to first mention the term 'continuation' in this discussion, I'll clarify that I meant it as the "'callback' of sorts", and specifically in the situation where the person writing it does not realise that it is a callback. For example: @async def my_func(): # part a x = yield y # part b Part B is the continuation here - the piece of code that continues after 'y' completes. There are various other pieces involved (a callback and a generator, and possibly others, depending on the implementation) so rather than muddying the waters with adjectives I muddied the waters with a noun. "The rest of the task" is close enough (when used in context) that I'm happy to stick to that. "Callback" is an implementation detail IMO, and not one that is necessary to leak through our abstraction. (I also didn't realise people were so traumatised by the C-word, or I would have picked another one. Add this to the list of reasons to not learn functional programming... :) ) From sam-pydeas at rushing.nightmare.com Wed Oct 24 06:21:47 2012 From: sam-pydeas at rushing.nightmare.com (Sam Rushing) Date: Tue, 23 Oct 2012 21:21:47 -0700 Subject: [Python-ideas] Async API In-Reply-To: <508739BA.9080101@stackless.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <508739BA.9080101@stackless.com> Message-ID: <50876CDB.50500@rushing.nightmare.com> On 10/23/12 5:43 PM, Christian Tismer wrote: > > greenlets were developed in 2004 by Armin Rigo, on the first > (and maybe only) Stackless sprint here in Berlin. > The greenlet asm code was ripped out of Stackless and slightly > improved, but has the same old stack-slicing idea. > Ah, ok. I remember talking with you at the 2005 PyCon about my two-stack solution*, but don't remember if anything came of it. Do greenlets use a single stack? -Sam (*) nothing to do with Israel From greg.ewing at canterbury.ac.nz Wed Oct 24 07:06:36 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 24 Oct 2012 18:06:36 +1300 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <50873D28.10401@rushing.nightmare.com> References: <5085C99C.3060905@canterbury.ac.nz> <50873551.60000@stackless.com> <50873D28.10401@rushing.nightmare.com> Message-ID: <5087775C.2020000@canterbury.ac.nz> Sam Rushing wrote: > 1) in the phrase 'continuation-passing style', it means a 'callback' of > sorts. > 2) as a separate term, it means an object that represents the future of > a computation. They're not really different things. When you call a continuation function in a continuation-passing style program, you're effectively invoking *all* of the rest of the computation, not just the part represented by that function. This becomes particularly clear if you're able to make the continuation calls using tail calls. Then it's not so much a "callback" as a "callforward". Nothing ever returns, so forward is the only way to go! -- Greg From andrewfr_ice at yahoo.com Wed Oct 24 19:03:09 2012 From: andrewfr_ice at yahoo.com (Andrew Francis) Date: Wed, 24 Oct 2012 10:03:09 -0700 (PDT) Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <07271596-2300-473F-B381-7EA3A7DE6CA3@gunicorn.org> References: <1351011081.93695.YahooMailNeo@web140703.mail.bf1.yahoo.com> <07271596-2300-473F-B381-7EA3A7DE6CA3@gunicorn.org> Message-ID: <1351098189.40095.YahooMailNeo@web140703.mail.bf1.yahoo.com> Hi Benoit: ________________________________ From: Benoit Chesneau To: Andrew Francis Cc: "python-ideas at python.org" Sent: Tuesday, October 23, 2012 5:48 PM Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) On Oct 23, 2012, at 6:51 PM, Andrew Francis wrote: AF>This may be tangential to the discussion but in the past, I have used the stackless.py module in conjunction with AF>CPython and greenlets to rapidly?prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) AF>language feature.?Rob Pike and Russ?Cox were really helpful in answering my questions. Newer stackless.py AF>implementations use?continuelets so look for an older PyPy implementation.? AF>I have also prototyped a subset of Polyphonic C# join patterns. ?After I got the prototype running, I had an interesting AF>discussion with the authors of "Scalable Join Patterns." >Yes saw that. And actually some part of the Task code is based on stackless.py ?but using greenlets, >Channels have been slightly modified to be thread-safe and support buffering. Did you release your code >somewhere ? It could be interesting to put the experience further. You may be mistaking my work with someone else. I didn't add buffering but that t is relatively easy to do without altering Stackless Python's internals. However I believe that synchronous channels with buffering is a simple and powerful concurrency model. Go's implementers got it right. John Reppy (currently a NSF director) talks about synchronous channel's power in a Concurrent ML book.? If ?you go to to the Stackless repository example page http://code.google.com/p/stacklessexamples/wiki/StacklessExamples you will find the code for a modified stackless.py that implements Go's select statement.? Since I am giving a talk in Toronto soon, I will soon release a new version of the join pattern version with documentation and examples. The code is about a year old and I have learnt new things. ?I can mail you an archive and you are free to play with it and ask questions.? Since this is somewhat off-topic, the reason I mention all this is that if you want to experiment with a Go style system, I think it easiest to work from something like stackless.py and greenlets than start from scratch.? Cheers, Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From benoitc at gunicorn.org Wed Oct 24 23:54:17 2012 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Wed, 24 Oct 2012 23:54:17 +0200 Subject: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) In-Reply-To: <1351098189.40095.YahooMailNeo@web140703.mail.bf1.yahoo.com> References: <1351011081.93695.YahooMailNeo@web140703.mail.bf1.yahoo.com> <07271596-2300-473F-B381-7EA3A7DE6CA3@gunicorn.org> <1351098189.40095.YahooMailNeo@web140703.mail.bf1.yahoo.com> Message-ID: <8B2DC343-3C7E-4B0D-9074-854148AAB0AD@gunicorn.org> On Oct 24, 2012, at 7:03 PM, Andrew Francis wrote: > Hi Benoit: > > From: Benoit Chesneau > To: Andrew Francis > Cc: "python-ideas at python.org" > Sent: Tuesday, October 23, 2012 5:48 PM > Subject: Re: [Python-ideas] yield from multiple iterables (was Re: The async API of the future: yield-from) > > > On Oct 23, 2012, at 6:51 PM, Andrew Francis wrote: >> AF>This may be tangential to the discussion but in the past, I have used the stackless.py module in conjunction with AF>CPython and greenlets to rapidly prototype parts of Go's model that are not present in Stackless, i.e. the select (ALT) AF>language feature. Rob Pike and Russ Cox were really helpful in answering my questions. Newer stackless.py AF>implementations use continuelets so look for an older PyPy implementation. >> AF>I have also prototyped a subset of Polyphonic C# join patterns. After I got the prototype running, I had an interesting AF>discussion with the authors of "Scalable Join Patterns." > > >Yes saw that. And actually some part of the Task code is based on stackless.py but using greenlets, >Channels have been slightly modified to be thread-safe and support buffering. Did you release your code >somewhere ? It could be interesting to put the experience further. > > You may be mistaking my work with someone else. Oh was just saying i made this change. > > I didn't add buffering but that t is relatively easy to do without altering Stackless Python's internals. However I believe that synchronous channels with buffering is a simple and powerful concurrency model. Go's implementers got it right. John Reppy (currently a NSF director) talks about synchronous channel's power in a Concurrent ML book. > > If you go to to the Stackless repository example page > > http://code.google.com/p/stacklessexamples/wiki/StacklessExamples > > you will find the code for a modified stackless.py that implements Go's select statement. Thanks for the link. > > Since I am giving a talk in Toronto soon, I will soon release a new version of the join pattern version with documentation and examples. The code is about a year old and I have learnt new things. I can mail you an archive and you are free to play with it and ask questions. > > Since this is somewhat off-topic, the reason I mention all this is that if you want to experiment with a Go style system, I think it easiest to work from something like stackless.py and greenlets than start from scratch. > > Cheers, > Andrew > Right. Actually flower is working well for simle purpose. My goal is more about testing new ideas about concurrency and async handling on python. Tomorrow I will push a new branch using Futures and libuv in its own thread. - beno?t > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Oct 24 23:30:04 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 10:30:04 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> Message-ID: <50885DDC.4050108@canterbury.ac.nz> Yury Selivanov wrote: > It's not about try..finally nesting, it's about Scheduler being aware > that a coroutine is in its 'finally' block and thus shouldn't be interrupted > at the moment It would be a bad idea to make a close() method, or anything else that might be needed for cleanup purposes, be a 'yield from' call. If it's an ordinary function, it can't be interrupted in the world we're talking about, so the PEP 419 problem doesn't apply. If I were feeling in a radical mood, I might go as far as suggesting that 'yield' and 'yield from' be syntactically forbidden inside a finally clause. That would force you to design your cleanup code to be safe from interruptions. -- Greg From guido at python.org Thu Oct 25 00:43:27 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Oct 2012 15:43:27 -0700 Subject: [Python-ideas] Async API In-Reply-To: <50885DDC.4050108@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> Message-ID: On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing wrote: > Yury Selivanov wrote: > >> It's not about try..finally nesting, it's about Scheduler being aware >> that a coroutine is in its 'finally' block and thus shouldn't be >> interrupted >> at the moment > > > It would be a bad idea to make a close() method, or anything else > that might be needed for cleanup purposes, be a 'yield from' call. > If it's an ordinary function, it can't be interrupted in the world > we're talking about, so the PEP 419 problem doesn't apply. > > If I were feeling in a radical mood, I might go as far as suggesting > that 'yield' and 'yield from' be syntactically forbidden inside > a finally clause. That would force you to design your cleanup > code to be safe from interruptions. What's the problem with just letting the cleanup take as long as it wants to and do whatever it wants? That's how try/finally works in regular Python code. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Thu Oct 25 00:47:02 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 18:47:02 -0400 Subject: [Python-ideas] Async API In-Reply-To: <50885DDC.4050108@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> Message-ID: Greg, On 2012-10-24, at 5:30 PM, Greg Ewing wrote: > Yury Selivanov wrote: > >> It's not about try..finally nesting, it's about Scheduler being aware >> that a coroutine is in its 'finally' block and thus shouldn't be interrupted >> at the moment > > It would be a bad idea to make a close() method, or anything else > that might be needed for cleanup purposes, be a 'yield from' call. > If it's an ordinary function, it can't be interrupted in the world > we're talking about, so the PEP 419 problem doesn't apply. > > If I were feeling in a radical mood, I might go as far as suggesting > that 'yield' and 'yield from' be syntactically forbidden inside > a finally clause. That would force you to design your cleanup > code to be safe from interruptions. I'm not sure it would be a good idea... Cleanup code for a DB connection *will* need to run queries to the database (at least in some circumstances). And we can't make them blocking. - Yury From yselivanov.ml at gmail.com Thu Oct 25 01:03:17 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 19:03:17 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> Message-ID: <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Hi Guido, On 2012-10-24, at 6:43 PM, Guido van Rossum wrote: > On Wed, Oct 24, 2012 at 2:30 PM, Greg Ewing wrote: >> Yury Selivanov wrote: >> >>> It's not about try..finally nesting, it's about Scheduler being aware >>> that a coroutine is in its 'finally' block and thus shouldn't be >>> interrupted >>> at the moment >> >> >> It would be a bad idea to make a close() method, or anything else >> that might be needed for cleanup purposes, be a 'yield from' call. >> If it's an ordinary function, it can't be interrupted in the world >> we're talking about, so the PEP 419 problem doesn't apply. >> >> If I were feeling in a radical mood, I might go as far as suggesting >> that 'yield' and 'yield from' be syntactically forbidden inside >> a finally clause. That would force you to design your cleanup >> code to be safe from interruptions. > > What's the problem with just letting the cleanup take as long as it > wants to and do whatever it wants? That's how try/finally works in > regular Python code. The problem appears when you add timeouts support. Let me show you an abstract example (I won't use yield_froms, but I'm sure that the problem is the same with them): @coroutine def fetch_comments(app): session = yield app.new_session() try: return (yield session.query(...)) finally: yield session.close() and now we execute that with: #: Get a list of comments; throw a TimeoutError if it #: takes more than 1 second comments = yield fetch_comments(app).with_timeout(1.0) Now, scheduler starts with 'fetch_comments', then executes 'new_session', then executes 'session.query' in a round-robin fashion. Imagine, that database query took a bit less than a second to execute, scheduler pushes the result in coroutine, and then a timeout event occurs. So scheduler throws a 'TimeoutError' in the coroutine, thus preventing the 'session.close' to be executed. There is no way for a scheduler to understand, that there is no need in pushing the exception right now, as the coroutine is in its finally block. And this situation is a pretty common when you have such timeouts mechanism in place and widely used. - Yury From guido at python.org Thu Oct 25 01:12:00 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Oct 2012 16:12:00 -0700 Subject: [Python-ideas] Async API In-Reply-To: <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Message-ID: On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov wrote: > Hi Guido, > > On 2012-10-24, at 6:43 PM, Guido van Rossum wrote: >> What's the problem with just letting the cleanup take as long as it >> wants to and do whatever it wants? That's how try/finally works in >> regular Python code. > The problem appears when you add timeouts support. > > Let me show you an abstract example (I won't use yield_froms, but I'm > sure that the problem is the same with them): > > @coroutine > def fetch_comments(app): > session = yield app.new_session() > try: > return (yield session.query(...)) > finally: > yield session.close() > > and now we execute that with: > > #: Get a list of comments; throw a TimeoutError if it > #: takes more than 1 second > comments = yield fetch_comments(app).with_timeout(1.0) > > Now, scheduler starts with 'fetch_comments', then executes > 'new_session', then executes 'session.query' in a round-robin fashion. > > Imagine, that database query took a bit less than a second to execute, > scheduler pushes the result in coroutine, and then a timeout event occurs. > So scheduler throws a 'TimeoutError' in the coroutine, thus preventing > the 'session.close' to be executed. There is no way for a scheduler to > understand, that there is no need in pushing the exception right now, > as the coroutine is in its finally block. > > And this situation is a pretty common when you have such timeouts > mechanism in place and widely used. Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this? As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long): try: yield finally: with protect_finally(): yield Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Thu Oct 25 01:25:15 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 24 Oct 2012 23:25:15 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Message-ID: >On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov wrote: >> Hi Guido, >> >> On 2012-10-24, at 6:43 PM, Guido van Rossum wrote: >>> What's the problem with just letting the cleanup take as long as it >>> wants to and do whatever it wants? That's how try/finally works in >>> regular Python code. > >> The problem appears when you add timeouts support. >> >> Let me show you an abstract example (I won't use yield_froms, but I'm >> sure that the problem is the same with them): >> >> @coroutine >> def fetch_comments(app): >> session = yield app.new_session() >> try: >> return (yield session.query(...)) >> finally: >> yield session.close() >> >> and now we execute that with: >> >> #: Get a list of comments; throw a TimeoutError if it >> #: takes more than 1 second >> comments = yield fetch_comments(app).with_timeout(1.0) >> >> Now, scheduler starts with 'fetch_comments', then executes >> 'new_session', then executes 'session.query' in a round-robin fashion. >> >> Imagine, that database query took a bit less than a second to execute, >> scheduler pushes the result in coroutine, and then a timeout event occurs. >> So scheduler throws a 'TimeoutError' in the coroutine, thus preventing >> the 'session.close' to be executed. There is no way for a scheduler >> to understand, that there is no need in pushing the exception right >> now, as the coroutine is in its finally block. >> >> And this situation is a pretty common when you have such timeouts >> mechanism in place and widely used. > >Ok, I can understand. But still, this is a problem with timeouts in general, not just with timeouts in a yield-based environment. How does e.g. Twisted deal with this? > >As a work-around, I could imagine some kind of with-statement that tells the scheduler we're already in the finally clause (it could still send you a timeout if your cleanup takes way too long): > >try: > yield >finally: > with protect_finally(): > yield > >Of course this could be abused, but at your own risk -- the scheduler only gives you a fixed amount of extra time and then it's quits. > > Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception. I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done. Cheers, Steve From yselivanov.ml at gmail.com Thu Oct 25 01:26:32 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 19:26:32 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Message-ID: <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: > On Wed, Oct 24, 2012 at 4:03 PM, Yury Selivanov wrote: >> Hi Guido, >> >> On 2012-10-24, at 6:43 PM, Guido van Rossum wrote: >>> What's the problem with just letting the cleanup take as long as it >>> wants to and do whatever it wants? That's how try/finally works in >>> regular Python code. > >> The problem appears when you add timeouts support. >> >> Let me show you an abstract example (I won't use yield_froms, but I'm >> sure that the problem is the same with them): >> >> @coroutine >> def fetch_comments(app): >> session = yield app.new_session() >> try: >> return (yield session.query(...)) >> finally: >> yield session.close() >> >> and now we execute that with: >> >> #: Get a list of comments; throw a TimeoutError if it >> #: takes more than 1 second >> comments = yield fetch_comments(app).with_timeout(1.0) >> >> Now, scheduler starts with 'fetch_comments', then executes >> 'new_session', then executes 'session.query' in a round-robin fashion. >> >> Imagine, that database query took a bit less than a second to execute, >> scheduler pushes the result in coroutine, and then a timeout event occurs. >> So scheduler throws a 'TimeoutError' in the coroutine, thus preventing >> the 'session.close' to be executed. There is no way for a scheduler to >> understand, that there is no need in pushing the exception right now, >> as the coroutine is in its finally block. >> >> And this situation is a pretty common when you have such timeouts >> mechanism in place and widely used. > > Ok, I can understand. But still, this is a problem with timeouts in > general, not just with timeouts in a yield-based environment. How does > e.g. Twisted deal with this? I don't know, I hope someone with an expertise in Twisted can tell us. But I would imagine that they don't have this particular problem, as it should be related only to coroutines and schedulers that run them. I.e. it's a problem when you run some code and may interrupt it. And you can't interrupt a plain python code that uses callbacks without yields and greenlets. > As a work-around, I could imagine some kind of with-statement that > tells the scheduler we're already in the finally clause (it could > still send you a timeout if your cleanup takes way too long): > > try: > yield > finally: > with protect_finally(): > yield > > Of course this could be abused, but at your own risk -- the scheduler > only gives you a fixed amount of extra time and then it's quits. Right, that's the basic approach. But it also gives you a feeling of a "broken" language feature. I.e. we have coroutines, but we can not implement timeouts on top of them without making 'finally' blocks look ugly. And if we assume that you can run any coroutine with a timeout - you'll need to use 'protect_finally' in virtually every 'finally' statement. I solved the problem by dynamically inlining 'with protect_finally()' code in @coroutine decorator (something that I would never suggest to put in the stdlib, btw). There is also PEP 419, but I don't like it as well, as it is tied to frames--two low level (and I'm not sure how it will work with future CPython optimizations and PyPy's JIT.) BUT, the concept is nice. I've implemented a number of protocols with yield-coroutines, and managing timeouts with a simple ".with_timeout()" call is a very handy and readable feature. So, I hope, that we can all brainstorm this problem to make coroutines "complete", if we decide to start using them widely. - Yury From yselivanov.ml at gmail.com Thu Oct 25 01:37:57 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 19:37:57 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Message-ID: On 2012-10-24, at 7:25 PM, Steve Dower wrote: [snip] > Could another workaround be to spawn the cleanup code without yielding - in effect saying "go and do this, but don't come back"? Then there is nowhere for the scheduler to throw the exception. > > I ask because this falls out naturally with my implementation (code is coming, but work is taking priority right now): "do_cleanup()" instead of "yield do_cleanup()". I haven't tried it in this context yet, so no idea whether it works, but I don't see why it wouldn't. In a system without the @async decorator you'd need a "scheduler.current.spawn(do_cleanup)" instead of yield [from]s, but it can still be done. Well, yes, this will work. If we have the following: # "async()" is a way to launch coroutines in my framework without # "coming back"; with it they just return a promise/future that needs # to be yielded again finally: yield c.close().async() The solution is very limited though. Imagine if you have lots of cleanup code finally: yield c1.close().async() # go and do this, but don't come back yield c2.close().async() The above won't work, as scheduler would have an opportunity to break everything on the second 'yield'. You may solve it by grouping cleanup code in a separate inner coroutine, like: @coroutine def do_stuff(): try: ... finally: @coroutine def cleanup(): yield c1.close() yield c2.close() yield cleanup().async() # go and do this, but don't come back But that looks even worse than using 'with protect_finally()'. - Yury From guido at python.org Thu Oct 25 01:43:13 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Oct 2012 16:43:13 -0700 Subject: [Python-ideas] Async API In-Reply-To: <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov wrote: > On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: >> Ok, I can understand. But still, this is a problem with timeouts in >> general, not just with timeouts in a yield-based environment. How does >> e.g. Twisted deal with this? > I don't know, I hope someone with an expertise in Twisted can tell us. > > But I would imagine that they don't have this particular problem, as it > should be related only to coroutines and schedulers that run them. I.e. > it's a problem when you run some code and may interrupt it. And you can't > interrupt a plain python code that uses callbacks without yields and > greenlets. Well, but in the Twisted world, if a cleanup callback requires more blocking calls, it has to spawn more deferred callbacks. So I think they *do* have the problem, unless they don't have a way at all to constrain the total running time of an action involving cascading callbacks. Also, they have inlineCallbacks which does use yield. >> As a work-around, I could imagine some kind of with-statement that >> tells the scheduler we're already in the finally clause (it could >> still send you a timeout if your cleanup takes way too long): >> >> try: >> yield >> finally: >> with protect_finally(): >> yield >> >> Of course this could be abused, but at your own risk -- the scheduler >> only gives you a fixed amount of extra time and then it's quits. > > Right, that's the basic approach. But it also gives you a feeling of > a "broken" language feature. I.e. we have coroutines, but we can not > implement timeouts on top of them without making 'finally' blocks > look ugly. And if we assume that you can run any coroutine with a > timeout - you'll need to use 'protect_finally' in virtually every > 'finally' statement. I think the problem may be with timeouts, or with doing blocking I/O in cleanup clauses. I suspect that any system implementing timeouts has subtle bugs. > I solved the problem by dynamically inlining 'with protect_finally()' > code in @coroutine decorator (something that I would never suggest to > put in the stdlib, btw). There is also PEP 419, but I don't like it as > well, as it is tied to frames--two low level (and I'm not sure how it > will work with future CPython optimizations and PyPy's JIT.) > > BUT, the concept is nice. I've implemented a number of protocols with > yield-coroutines, and managing timeouts with a simple ".with_timeout()" > call is a very handy and readable feature. So, I hope, that we can > all brainstorm this problem to make coroutines "complete", if we decide > to start using them widely. I think the with-clause is the solution. Note that in a world with only blocking calls this *can* be a problem (despite your repeated claims that it's not a problem there) -- a common approach to giving operations a timeout is sending it a SIGTERM (which you can easily call with a signal handler in Python) when the deadline is over, then sending it more SIGTERM signals every few seconds until it dies, and sending SIGKILL (which can't be caught) if it takes too long to die. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Thu Oct 25 02:00:50 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 20:00:50 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: <19E238DF-349E-4B43-93F4-F63202302F14@gmail.com> On 2012-10-24, at 7:43 PM, Guido van Rossum wrote: > On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov wrote: >> On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: >>> Ok, I can understand. But still, this is a problem with timeouts in >>> general, not just with timeouts in a yield-based environment. How does >>> e.g. Twisted deal with this? > >> I don't know, I hope someone with an expertise in Twisted can tell us. >> >> But I would imagine that they don't have this particular problem, as it >> should be related only to coroutines and schedulers that run them. I.e. >> it's a problem when you run some code and may interrupt it. And you can't >> interrupt a plain python code that uses callbacks without yields and >> greenlets. > > Well, but in the Twisted world, if a cleanup callback requires more > blocking calls, it has to spawn more deferred callbacks. So I think > they *do* have the problem, unless they don't have a way at all to > constrain the total running time of an action involving cascading > callbacks. Also, they have inlineCallbacks which does use yield. Right. I was under impression that you don't just use 'finally' stmt but rather setup a Deferred with a cleanup callback. Anyways, I'm now curious enough so I'll take a look... > Note that in a world with only blocking calls this *can* be a problem > (despite your repeated claims that it's not a problem there) -- a > common approach to giving operations a timeout is sending it a SIGTERM > (which you can easily call with a signal handler in Python) when the > deadline is over, then sending it more SIGTERM signals every few > seconds until it dies, and sending SIGKILL (which can't be caught) if > it takes too long to die. Yes, you're right. I guess I've just never seen anybody trying to protect their 'finally' statements from being interrupted with a signal. Whereas with coroutines we needed to protect lots of them, as otherwise we had many and many bugs with unclosed database connections etc. So 'protect_finally' is going to be a very common thing to use. - Yury From yselivanov.ml at gmail.com Thu Oct 25 02:16:31 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 20:16:31 -0400 Subject: [Python-ideas] Async API In-Reply-To: <19E238DF-349E-4B43-93F4-F63202302F14@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <19E238DF-349E-4B43-93F4-F63202302F14@gmail.com> Message-ID: <5A742516-DB10-48A1-A95E-2817BA62321C@gmail.com> On 2012-10-24, at 8:00 PM, Yury Selivanov wrote: > On 2012-10-24, at 7:43 PM, Guido van Rossum wrote: >> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov wrote: >>> On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: >>>> Ok, I can understand. But still, this is a problem with timeouts in >>>> general, not just with timeouts in a yield-based environment. How does >>>> e.g. Twisted deal with this? >> >>> I don't know, I hope someone with an expertise in Twisted can tell us. >>> >>> But I would imagine that they don't have this particular problem, as it >>> should be related only to coroutines and schedulers that run them. I.e. >>> it's a problem when you run some code and may interrupt it. And you can't >>> interrupt a plain python code that uses callbacks without yields and >>> greenlets. >> >> Well, but in the Twisted world, if a cleanup callback requires more >> blocking calls, it has to spawn more deferred callbacks. So I think >> they *do* have the problem, unless they don't have a way at all to >> constrain the total running time of an action involving cascading >> callbacks. Also, they have inlineCallbacks which does use yield. > > Right. > > I was under impression that you don't just use 'finally' stmt but rather > setup a Deferred with a cleanup callback. Anyways, I'm now curious enough > so I'll take a look... Well, that wasn't too hard to find: Timeouts: http://stackoverflow.com/questions/221745/is-it-possible-to-set-a-timeout-on-a-socket-in-twisted - Yury From Steve.Dower at microsoft.com Thu Oct 25 02:24:11 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 25 Oct 2012 00:24:11 +0000 Subject: [Python-ideas] Async API In-Reply-To: <5A742516-DB10-48A1-A95E-2817BA62321C@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <19E238DF-349E-4B43-93F4-F63202302F14@gmail.com> <5A742516-DB10-48A1-A95E-2817BA62321C@gmail.com> Message-ID: >On 2012-10-24, at 8:00 PM, Yury Selivanov wrote: >> On 2012-10-24, at 7:43 PM, Guido van Rossum wrote: >>> On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov wrote: >>>> On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: >>>>> Ok, I can understand. But still, this is a problem with timeouts in >>>>> general, not just with timeouts in a yield-based environment. How >>>>> does e.g. Twisted deal with this? >>> >>>> I don't know, I hope someone with an expertise in Twisted can tell us. >>>> >>>> But I would imagine that they don't have this particular problem, as >>>> it should be related only to coroutines and schedulers that run them. I.e. >>>> it's a problem when you run some code and may interrupt it. And you >>>> can't interrupt a plain python code that uses callbacks without >>>> yields and greenlets. >>> >>> Well, but in the Twisted world, if a cleanup callback requires more >>> blocking calls, it has to spawn more deferred callbacks. So I think >>> they *do* have the problem, unless they don't have a way at all to >>> constrain the total running time of an action involving cascading >>> callbacks. Also, they have inlineCallbacks which does use yield. >> >> Right. >> >> I was under impression that you don't just use 'finally' stmt but >> rather setup a Deferred with a cleanup callback. Anyways, I'm now >> curious enough so I'll take a look... > >Well, that wasn't too hard to find: > >Timeouts: >http://stackoverflow.com/questions/221745/is-it-possible-to-set-a-timeout-on-a-socket-in-twisted Maybe our approach to timeouts should be based on running two tasks in parallel, where the second delays for the timeout period and then cancels the first (I believe this is what they're doing in Twisted). My vision for cancellation involves the worker task polling (or whatever is appropriate for low-level tasks), rather than an exception being forced in by the scheduler, so this avoids the finally issue - it's too late to cancel the task at that point. It also strengthens the case for including a cancellation protocol, which I was keen on anyway. Cheers, Steve From greg.ewing at canterbury.ac.nz Thu Oct 25 02:49:30 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 13:49:30 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> Message-ID: <50888C9A.5040203@canterbury.ac.nz> On 25/10/12 11:43, Guido van Rossum wrote: > What's the problem with just letting the cleanup take as long as it > wants to and do whatever it wants? IIUC, the worry is not about time, it's that either 1) another task could run during the cleanup and mess something up, or 2) an exception could be thrown into the task during the cleanup and prevent it being completed. From a correctness standpoint, it doesn't matter if the cleanup takes a long time, as long as it doesn't yield. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 25 02:52:30 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 13:52:30 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> Message-ID: <50888D4E.30002@canterbury.ac.nz> On 25/10/12 11:47, Yury Selivanov wrote: > Cleanup code for a DB connection > *will* need to run queries to the database (at least in some circumstances). That smells like a design problem to me. If something goes wrong, the most you should have to do is roll back any transactions you were in the middle of. Trying to perform further queries is just inviting more trouble. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 25 02:56:52 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 13:56:52 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> Message-ID: <50888E54.90307@canterbury.ac.nz> On 25/10/12 12:12, Guido van Rossum wrote: > Of course this could be abused, but at your own risk -- the scheduler > only gives you a fixed amount of extra time and then it's quits. Which is another good reason to design your cleanup code so that it can't take an arbitrarily long time. -- Greg From yselivanov.ml at gmail.com Thu Oct 25 03:07:45 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 21:07:45 -0400 Subject: [Python-ideas] Async API In-Reply-To: <50888D4E.30002@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <50888D4E.30002@canterbury.ac.nz> Message-ID: <1138B4C9-B880-4A28-B906-A62B47EF9CAA@gmail.com> On 2012-10-24, at 8:52 PM, Greg Ewing wrote: > On 25/10/12 11:47, Yury Selivanov wrote: >> Cleanup code for a DB connection >> *will* need to run queries to the database (at least in some circumstances). > > That smells like a design problem to me. If something goes wrong, > the most you should have to do is roll back any transactions > you were in the middle of. Trying to perform further queries > is just inviting more trouble. Right. And that rolling back - a tiny db query "rollback" - is an async code, and where there is an async code, no matter how tiny and fast, - scheduler has an opportunity to screw it up. Guido's 'with protected_finally' should work, although it probably will look weird for for people unfamiliar with coroutines and this particular problem. - Yury From greg.ewing at canterbury.ac.nz Thu Oct 25 03:29:32 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 14:29:32 +1300 Subject: [Python-ideas] Async API In-Reply-To: <1138B4C9-B880-4A28-B906-A62B47EF9CAA@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <50888D4E.30002@canterbury.ac.nz> <1138B4C9-B880-4A28-B906-A62B47EF9CAA@gmail.com> Message-ID: <508895FC.9000706@canterbury.ac.nz> On 25/10/12 14:07, Yury Selivanov wrote: > Right. And that rolling back - a tiny db query "rollback" - is an > async code, Only if we implement it as a blocking operation as far as our task scheduler is concerned. I wouldn't do it that way -- I'd perform it synchronously and assume it'll be fast enough for that not to be a problem. BTW, we seem to be using different definitions for the term "query". To my way of thinking, a rollback is *not* a query, even if it happens to be triggered by sending a "rollback" command to the SQL interpreter. At the Python API level, it should appear as a distinct operation with its own method. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 25 03:34:59 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 25 Oct 2012 14:34:59 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: <50889743.8060402@canterbury.ac.nz> On 25/10/12 12:43, Guido van Rossum wrote: > Note that in a world with only blocking calls this *can* be a problem... > a common approach to giving operations a timeout is sending it a SIGTERM Well, yes, if you have preemptive interruptions of some kind, then things are a lot trickier. But I'm assuming we're using cooperative scheduling *instead* of things like that. (Note that in the face of preemption, I don't think it's possible to solve this problem completely without language support, because there will always be a small window of opportunity between entering the finally clause and getting into the with-statement or whatever that you're using to block asynchronous signals.) -- Greg From yselivanov.ml at gmail.com Thu Oct 25 04:25:16 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 22:25:16 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508895FC.9000706@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <50888D4E.30002@canterbury.ac.nz> <1138B4C9-B880-4A28-B906-A62B47EF9CAA@gmail.com> <508895FC.9000706@canterbury.ac.nz> Message-ID: Greg, On 2012-10-24, at 9:29 PM, Greg Ewing wrote: > On 25/10/12 14:07, Yury Selivanov wrote: >> Right. And that rolling back - a tiny db query "rollback" - is an >> async code, > > Only if we implement it as a blocking operation as far as our > task scheduler is concerned. I wouldn't do it that way -- I'd > perform it synchronously and assume it'll be fast enough for > that not to be a problem. In a non-blocking application there is no way of running a blocking code, even if it's anticipated to block for a mere millisecond. Because if something gets out of control and it blocks for a longer period of time - everything just stops, right? Or did you mean something else with "synchronously" (perhaps Steve Dower's approach)? > BTW, we seem to be using different definitions for the term > "query". To my way of thinking, a rollback is *not* a query, > even if it happens to be triggered by sending a "rollback" > command to the SQL interpreter. At the Python API level, > it should appear as a distinct operation with its own > method. Right. I meant that "sending a rollback command to the SQL interpreter" part--this should be done through a non-blocking socket. To invoke an operation on a non-blocking socket we have to do it through 'yield' or 'yield from', hence - give scheduler a chance to interrupt the coroutine. Given the fact that we know, that the clean-up code should be simple and fast, it still contains coroutine context switches in real world code, be it due to the need of sending some information via a socket, or just by calling some other coroutine. If you write a single 'yield' in your finally block, and that (or caller) coroutine is called with a timeout, there is a chance that its 'finally' block execution will be aborted by a scheduler. Writing this yield/non-blocking type of code in finally blocks is a necessity, unfortunately. And even if that cleanup code is incredibly fast, if you have a webserver that runs for days/weeks/months, bad things will happen. So if we decide to adopt Guido's approach with explicitly marking critical finally blocks (well, they are all critical) with 'with protected_finally()' - allright. If we somehow invent a mechanism that would allow us to hide this all from user and protect finally blocks implicitly in scheduler - that's even better. Or we should design a totally different approach of handling timeouts, and try to not to interrupt coroutines at all. - Yury From yselivanov.ml at gmail.com Thu Oct 25 04:28:15 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 24 Oct 2012 22:28:15 -0400 Subject: [Python-ideas] Async API In-Reply-To: <50889743.8060402@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> Message-ID: On 2012-10-24, at 9:34 PM, Greg Ewing wrote: [...] > (Note that in the face of preemption, I don't think it's possible > to solve this problem completely without language support, because > there will always be a small window of opportunity between > entering the finally clause and getting into the with-statement > or whatever that you're using to block asynchronous signals.) Agree. In my experience, though, broken finally blocks due to interruption by a signal is a very rare thing (again, that maybe different for someone else.) - Yury From guido at python.org Thu Oct 25 04:51:04 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 24 Oct 2012 19:51:04 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> Message-ID: On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov wrote: > On 2012-10-24, at 9:34 PM, Greg Ewing wrote: > [...] >> (Note that in the face of preemption, I don't think it's possible >> to solve this problem completely without language support, because >> there will always be a small window of opportunity between >> entering the finally clause and getting into the with-statement >> or whatever that you're using to block asynchronous signals.) > > Agree. > > In my experience, though, broken finally blocks due to interruption > by a signal is a very rare thing (again, that maybe different for > someone else.) We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too. In any case this sounds like something that each framework should decide for itself. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Thu Oct 25 05:28:47 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 25 Oct 2012 03:28:47 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> , Message-ID: This could alse be another application for extension options on futures: try: ... finally: yield do_cleanup_1().set_options(never_raise=True) yield do_cleanup_2().set_options(never_raise=True) The scheduler can then ignore exceptions (including CancelledError) instead of raising them. ('set_scheduler_hint' may be a better name than 'set_options', now I come to think of it. I like the extensibility of this, since I don't think anyone can predict what advanced options every scheduler may want - the function takes **params and updates a (lazily created) dict on the future.) Of course, this will also work (and is pretty much equivalent): try: ... finally: try: yield do_cleanup_1() except: pass try: yield do_cleanup_2() except: pass We'll probably need/want some form of 'atomic' primitive anyway, which might work like this: yield atomically(do_cleanup_1, do_cleanup_2, ...) Though the behaviour of this when exceptions are involved gets complicated - do we abort all of them? Pass the exception on? Continue anyway? Which exception gets reported? Cheers, Steve ________________________________________ From: Python-ideas [python-ideas-bounces+steve.dower=microsoft.com at python.org] on behalf of Guido van Rossum [guido at python.org] Sent: Wednesday, October 24, 2012 7:51 PM To: Yury Selivanov Cc: python-ideas at python.org Subject: Re: [Python-ideas] Async API On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov wrote: > On 2012-10-24, at 9:34 PM, Greg Ewing wrote: > [...] >> (Note that in the face of preemption, I don't think it's possible >> to solve this problem completely without language support, because >> there will always be a small window of opportunity between >> entering the finally clause and getting into the with-statement >> or whatever that you're using to block asynchronous signals.) > > Agree. > > In my experience, though, broken finally blocks due to interruption > by a signal is a very rare thing (again, that maybe different for > someone else.) We're far from our starting point: in a the yield-from (or yield) world, there are no truly async interrupts, but anything that yields may be interrupted, if we decide to implement timeouts by throwing an exception into the generator (which seems the logical thing to do). The with-statement can deal with this fine (there's no yield between entering the finally and entering the with-block) but making the cleanup into its own task (like Steve proposed) sounds fine too. In any case this sounds like something that each framework should decide for itself. -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas at python.org http://mail.python.org/mailman/listinfo/python-ideas From yselivanov.ml at gmail.com Thu Oct 25 06:37:57 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 00:37:57 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> Message-ID: <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> On 2012-10-24, at 10:51 PM, Guido van Rossum wrote: > On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov wrote: >> On 2012-10-24, at 9:34 PM, Greg Ewing wrote: >> [...] >>> (Note that in the face of preemption, I don't think it's possible >>> to solve this problem completely without language support, because >>> there will always be a small window of opportunity between >>> entering the finally clause and getting into the with-statement >>> or whatever that you're using to block asynchronous signals.) >> >> Agree. >> >> In my experience, though, broken finally blocks due to interruption >> by a signal is a very rare thing (again, that maybe different for >> someone else.) > > We're far from our starting point: in a the yield-from (or yield) > world, there are no truly async interrupts, but anything that yields > may be interrupted, if we decide to implement timeouts by throwing an > exception into the generator (which seems the logical thing to do). > The with-statement can deal with this fine (there's no yield between > entering the finally and entering the with-block) but making the > cleanup into its own task (like Steve proposed) sounds fine too. > > In any case this sounds like something that each framework should > decide for itself. BTW, is there a way of adding a read-only property to generator objects - 'in_finally'? Will it actually slow down things? - Yury From yselivanov.ml at gmail.com Thu Oct 25 08:18:51 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 02:18:51 -0400 Subject: [Python-ideas] Async API In-Reply-To: <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: On 2012-10-25, at 12:37 AM, Yury Selivanov wrote: > On 2012-10-24, at 10:51 PM, Guido van Rossum wrote: > >> On Wed, Oct 24, 2012 at 7:28 PM, Yury Selivanov wrote: >>> On 2012-10-24, at 9:34 PM, Greg Ewing wrote: >>> [...] >>>> (Note that in the face of preemption, I don't think it's possible >>>> to solve this problem completely without language support, because >>>> there will always be a small window of opportunity between >>>> entering the finally clause and getting into the with-statement >>>> or whatever that you're using to block asynchronous signals.) >>> >>> Agree. >>> >>> In my experience, though, broken finally blocks due to interruption >>> by a signal is a very rare thing (again, that maybe different for >>> someone else.) >> >> We're far from our starting point: in a the yield-from (or yield) >> world, there are no truly async interrupts, but anything that yields >> may be interrupted, if we decide to implement timeouts by throwing an >> exception into the generator (which seems the logical thing to do). >> The with-statement can deal with this fine (there's no yield between >> entering the finally and entering the with-block) but making the >> cleanup into its own task (like Steve proposed) sounds fine too. >> >> In any case this sounds like something that each framework should >> decide for itself. > > BTW, is there a way of adding a read-only property to generator objects - > 'in_finally'? Will it actually slow down things? Well, I couldn't resist and just implemented a *proof of concept* myself. The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch The patch adds 'gi_in_finally' read-only property to generator objects. There is no observable difference between patched & unpatched python (latest master) in pybench. Some small demo: >>> def spam(): ... try: ... yield 1 ... finally: ... yield 2 ... yield 3 >>> gen = spam() >>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally (0, 1, 0) >>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally (0, 2, 1) >>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally (1, 3, 0) >>> gen.gi_in_finally, gen.send(None), gen.gi_in_finally Traceback (most recent call last): File "", line 1, in StopIteration If we decide to merge this in cpython, then this whole problem with 'finally' statements can be solved (at least for generator-based coroutines.) What do you think? - Yury From paul at colomiets.name Thu Oct 25 09:49:06 2012 From: paul at colomiets.name (Paul Colomiets) Date: Thu, 25 Oct 2012 10:49:06 +0300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: Hi Yury, On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov wrote: > Well, I couldn't resist and just implemented a *proof of concept* myself. > The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch > > The patch adds 'gi_in_finally' read-only property to generator objects. > Why haven't you used my implementation? http://bugs.python.org/issue14730 -- Paul From paul at colomiets.name Thu Oct 25 09:55:44 2012 From: paul at colomiets.name (Paul Colomiets) Date: Thu, 25 Oct 2012 10:55:44 +0300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: Hi Guido, On Thu, Oct 25, 2012 at 2:43 AM, Guido van Rossum wrote: > On Wed, Oct 24, 2012 at 4:26 PM, Yury Selivanov wrote: >> On 2012-10-24, at 7:12 PM, Guido van Rossum wrote: >>> Ok, I can understand. But still, this is a problem with timeouts in >>> general, not just with timeouts in a yield-based environment. How does >>> e.g. Twisted deal with this? > >> I don't know, I hope someone with an expertise in Twisted can tell us. >> >> But I would imagine that they don't have this particular problem, as it >> should be related only to coroutines and schedulers that run them. I.e. >> it's a problem when you run some code and may interrupt it. And you can't >> interrupt a plain python code that uses callbacks without yields and >> greenlets. > > Well, but in the Twisted world, if a cleanup callback requires more > blocking calls, it has to spawn more deferred callbacks. So I think > they *do* have the problem, unless they don't have a way at all to > constrain the total running time of an action involving cascading > callbacks. Also, they have inlineCallbacks which does use yield. > AFAIR, in twisted there is no timeout on coroutine, there is a timeout on request, which is usually just a socket timeout. So there is no problem of interrupting the code in arbitrary places. Another twisted thing, is doing all writes asynchronously with respect to user code, so if you want to write something and close a connection for finalization you just call: transport.write('something') transport.loseConnection() And they do not return deferreds, so it returns immediately even if the socket is not writable at the moment. (IIRC, it never writes right now, but rather from reactor callback) -- Paul From _ at lvh.cc Thu Oct 25 13:46:29 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Thu, 25 Oct 2012 13:46:29 +0200 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: Sorry, working really long hours these days; just wanted to chime in that yes, you can call transport.write with large strings, and the reactor will do the right thing under the hood: loseConnection is the polite way of dropping a connection, which should wait for all pending writes to finish etc. cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Oct 25 15:37:17 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 09:37:17 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> ]On 2012-10-25, at 3:49 AM, Paul Colomiets wrote: > Hi Yury, > > On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov wrote: >> Well, I couldn't resist and just implemented a *proof of concept* myself. >> The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch >> >> The patch adds 'gi_in_finally' read-only property to generator objects. >> > > Why haven't you used my implementation? > > http://bugs.python.org/issue14730 Because it's a different thing. Yours is a PEP 419 implementation -- 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property to generators and see how good/bad it is. - Yury From guido at python.org Thu Oct 25 16:43:20 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 07:43:20 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > Sorry, working really long hours these days; just wanted to chime in that > yes, you can call transport.write with large strings, and the reactor will > do the right thing under the hood: loseConnection is the polite way of > dropping a connection, which should wait for all pending writes to finish > etc. This seems a decent enough pattern. It also makes it possible to use one of these things as a substitute for a writable file object, so you can e.g. use it as sys.stdout or the stream for a logging.StreamHandler. Still, I wonder what happens if the socket/pipe/whatever that is written to is very slow and the program produces too much data. Does memory just balloon up, or is there some kind of throttling of the writer? Or a buffer overflow exception? For a totally general solution I would at least like to have the *option* of doing synchronous writes. (I'm asking these questions because I'd like to copy this useful pattern -- but I want to get the end cases right.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Thu Oct 25 16:44:58 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 07:44:58 -0700 Subject: [Python-ideas] Async API In-Reply-To: <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> Message-ID: On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov wrote: > ]On 2012-10-25, at 3:49 AM, Paul Colomiets wrote: > >> Hi Yury, >> >> On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov wrote: >>> Well, I couldn't resist and just implemented a *proof of concept* myself. >>> The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch >>> >>> The patch adds 'gi_in_finally' read-only property to generator objects. >>> >> >> Why haven't you used my implementation? >> >> http://bugs.python.org/issue14730 > > Because it's a different thing. Yours is a PEP 419 implementation -- > 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property > to generators and see how good/bad it is. I feel it's a code smell if you need to use this feature a lot. If you need it rarely, well, use one of the existing work-arounds. -- --Guido van Rossum (python.org/~guido) From mark.hackett at metoffice.gov.uk Thu Oct 25 16:48:32 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 25 Oct 2012 15:48:32 +0100 Subject: [Python-ideas] Enabling man page structure for python Message-ID: <201210251548.32082.mark.hackett@metoffice.gov.uk> In trying to reduce the repetition of option arguments to python scripts I basically needed to allow some structure to the program to be able to be automatically mangled so it could be used in a) the getopt() call b) the -h (give call usage) option in the program c) Synopsis subheading in the man page d) Options subheading in the man page rather than having to keep all in synch just because someone wanted a "-j" option added. Because it requires a programmed man page creation, Sphinx, pydoc et al haven't been really of any use, since they are YAML (Yet Another Markup Language) as far as I've been able to tell, not really able to allow runtime changes to reflect in document generation. I may have missed how, however... So I used a dictionary and wrote a program to generate man pages based on that dictionary and included function calls to automate the four repetitions above into one structure, rather similar to what you need for ArgParse. A dictionary allowed me to check the ordering, existence and allow optional and updatable sections to be used in man page writing. It also gave me a reason to use docstrings in public functions. I know man pages are passe and GUIs generally don't bother at all, but it still seems to me that adding some core python utility to express a man page and allow programmatic use of the construction both to define the program and its description is still a large gap. Making man pages easier to write would be enough, but I also think that if newcomers could see some utility in writing documentation inside the programs, they would do so more readily. And this learnt behaviour is useful elsewhere. The attached program (if it appears!) is my solution, basically baby python. It still has one redundant repetition because getopt() does it that way. And it has some possibly silly but useful markup based on the basic python data types (e.g. it displays a list differently from a scalar string). It is meant to illustrate what I felt was not possible with python as-is to see if there is a way to make this work done redundant. There are a few other people out there who have had to roll-their-own answer to the same problems. They solved it slightly differently and didn't include an ability to enforce "good practice" in man page creation which I think is warranted. So I do feel there is room for python to stop us flailing around trying to find our own solution. Is there agreement from others? -------------- next part -------------- A non-text attachment was scrubbed... Name: make_manpage.py Type: text/x-python Size: 13874 bytes Desc: not available URL: From mikegraham at gmail.com Thu Oct 25 17:09:35 2012 From: mikegraham at gmail.com (Mike Graham) Date: Thu, 25 Oct 2012 11:09:35 -0400 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: <201210251548.32082.mark.hackett@metoffice.gov.uk> References: <201210251548.32082.mark.hackett@metoffice.gov.uk> Message-ID: On Thu, Oct 25, 2012 at 10:48 AM, Mark Hackett wrote: > Because it requires a programmed man page creation, Sphinx, pydoc et al > haven't been really of any use, since they are YAML (Yet Another Markup > Language) as far as I've been able to tell, not really able to allow runtime > changes to reflect in document generation. I may have missed how, however... Use sphinx.builders.manpage.ManualPageBuilder to make a manpage with sphinx. I wouldn't be shocked if other documentation systems had something similar. I wouldn't be opposed to having argparse have some builtin or third-party capability for generating manpages. I wouldn't use getopt myself for anything but mimicing old, established, getopt-based interfaces. argparse has a lot more functionality already and it's more reasonable to expand it since it's a Python thing, not a pre-established thing. Mike From yselivanov.ml at gmail.com Thu Oct 25 17:12:11 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 11:12:11 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> Message-ID: On 2012-10-25, at 10:44 AM, Guido van Rossum wrote: > On Thu, Oct 25, 2012 at 6:37 AM, Yury Selivanov wrote: >> ]On 2012-10-25, at 3:49 AM, Paul Colomiets wrote: >> >>> Hi Yury, >>> >>> On Thu, Oct 25, 2012 at 9:18 AM, Yury Selivanov wrote: >>>> Well, I couldn't resist and just implemented a *proof of concept* myself. >>>> The patch is here: https://dl.dropbox.com/u/21052/gen_in_finally.patch >>>> >>>> The patch adds 'gi_in_finally' read-only property to generator objects. >>>> >>> >>> Why haven't you used my implementation? >>> >>> http://bugs.python.org/issue14730 >> >> Because it's a different thing. Yours is a PEP 419 implementation -- >> 'sys.setcleanuphook'. Mine is a quick hack to add 'gi_in_finally' property >> to generators and see how good/bad it is. > > I feel it's a code smell if you need to use this feature a lot. If you > need it rarely, well, use one of the existing work-arounds. But the feature isn't going to be used by users directly. It will be used only in scheduler implementations. Users will just write 'finally' blocks and they will work as expected. This just makes coroutines look and behave more like ordinary functions. Isn't it one of our goals--to make it convenient and reliable? - Yury From Steve.Dower at microsoft.com Thu Oct 25 17:28:29 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 25 Oct 2012 15:28:29 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> , Message-ID: >>> Mine is a quick hack to add 'gi_in_finally' property >>> to generators and see how good/bad it is. >> >> I feel it's a code smell if you need to use this feature a lot. If you >> need it rarely, well, use one of the existing work-arounds. >But the feature isn't going to be used by users directly. It will be used >only in scheduler implementations. Users will just write 'finally' blocks >and they will work as expected. This just makes coroutines look and behave >more like ordinary functions. Isn't it one of our goals--to make it >convenient and reliable? I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case? try: try: yield some_op() finally: yield cleanup_that_raises_network_error() except NetworkError: # will we ever see this? Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to. Cheers, Steve From yselivanov.ml at gmail.com Thu Oct 25 17:39:00 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 11:39:00 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> , Message-ID: On 2012-10-25, at 11:28 AM, Steve Dower wrote: >>>> Mine is a quick hack to add 'gi_in_finally' property >>>> to generators and see how good/bad it is. >>> >>> I feel it's a code smell if you need to use this feature a lot. If you >>> need it rarely, well, use one of the existing work-arounds. >> But the feature isn't going to be used by users directly. It will be used >> only in scheduler implementations. Users will just write 'finally' blocks >> and they will work as expected. This just makes coroutines look and behave >> more like ordinary functions. Isn't it one of our goals--to make it >> convenient and reliable? > > I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case? > > try: > try: > yield some_op() > finally: > yield cleanup_that_raises_network_error() > except NetworkError: > # will we ever see this? > > Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to. We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level. As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError... But if you have something like this: try: try: yield some_op().with_timeout(0.1) finally: yield something_else() except TimeoutError: # Then everything would be just fine here. Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly. - Yury From Steve.Dower at microsoft.com Thu Oct 25 17:43:01 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 25 Oct 2012 15:43:01 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> , Message-ID: >>>>> Mine is a quick hack to add 'gi_in_finally' property to generators >>>>> and see how good/bad it is. >>>> >>>> I feel it's a code smell if you need to use this feature a lot. If >>>> you need it rarely, well, use one of the existing work-arounds. >>> But the feature isn't going to be used by users directly. It will be >>> used only in scheduler implementations. Users will just write >>> 'finally' blocks and they will work as expected. This just makes >>> coroutines look and behave more like ordinary functions. Isn't it >>> one of our goals--to make it convenient and reliable? >> >> I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case? >> >> try: >> try: >> yield some_op() >> finally: >> yield cleanup_that_raises_network_error() >> except NetworkError: >> # will we ever see this? >> >> Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to. > >We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level. > >As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError... > >But if you have something like this: > > try: > try: > yield some_op().with_timeout(0.1) > finally: > yield something_else() > except TimeoutError: > # Then everything would be just fine here. > >Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly. The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally? Cheers, Steve From yselivanov.ml at gmail.com Thu Oct 25 17:47:57 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 11:47:57 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> , Message-ID: On 2012-10-25, at 11:43 AM, Steve Dower wrote: >>>>>> Mine is a quick hack to add 'gi_in_finally' property to generators >>>>>> and see how good/bad it is. >>>>> >>>>> I feel it's a code smell if you need to use this feature a lot. If >>>>> you need it rarely, well, use one of the existing work-arounds. >>>> But the feature isn't going to be used by users directly. It will be >>>> used only in scheduler implementations. Users will just write >>>> 'finally' blocks and they will work as expected. This just makes >>>> coroutines look and behave more like ordinary functions. Isn't it >>>> one of our goals--to make it convenient and reliable? >>> >>> I'm agree with the intent, but I'm more worried about the broadness of this approach. What happens in this case? >>> >>> try: >>> try: >>> yield some_op() >>> finally: >>> yield cleanup_that_raises_network_error() >>> except NetworkError: >>> # will we ever see this? >>> >>> Basically, I don't think we can handle the "don't raise" cases entirely automatically, though I'd like to be able to. >> >> We can. You can experiment with the approach--I've implemented it a bit differently and it proved to work. Now we're just talking about making this feature supported on the interpreter level. >> >> As for your example - I'm not sure what's the NetworkError is and how it relates to TimeoutError... >> >> But if you have something like this: >> >> try: >> try: >> yield some_op().with_timeout(0.1) >> finally: >> yield something_else() >> except TimeoutError: >> # Then everything would be just fine here. >> >> Look, it all the same as if you just drop yields. Generators already support 'finally' clause perfectly. > > The type of the error is irrelevant - if something_else() might raise an exception that is expected, it won't be passed in because the scheduler is suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the point of gi_in_finally? The only thing scheduler will ever suppress--is its *own* intent to *interrupt* something (until `gi_in_finally` gets back to 0.) Every other exception must be propagated as usual, without even checking `gi_in_finally` flag. - Yury From guido at python.org Thu Oct 25 17:58:54 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 08:58:54 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: Yuri, please give up this particular issue (trying to patch CPython to record whether a generator is in a finally clause). I have failed to explain my reasons why I think it is a bad idea, but you haven't convinced me it's a good idea, and we have at least two decent work-arounds. So let me just use the release cycle as an argument: your patch is a new feature, 3.3 just came out, so it cannot be introduced until 3.4. I don't want to wait for that. -- --Guido van Rossum (python.org/~guido) From mark.hackett at metoffice.gov.uk Thu Oct 25 18:08:14 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 25 Oct 2012 17:08:14 +0100 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: References: <201210251548.32082.mark.hackett@metoffice.gov.uk> Message-ID: <201210251708.14384.mark.hackett@metoffice.gov.uk> On Thursday 25 Oct 2012, Mike Graham wrote: > On Thu, Oct 25, 2012 at 10:48 AM, Mark Hackett > > wrote: > > Because it requires a programmed man page creation, Sphinx, pydoc et al > > haven't been really of any use, since they are YAML (Yet Another Markup > > Language) as far as I've been able to tell, not really able to allow > > runtime changes to reflect in document generation. I may have missed how, > > however... > > Use sphinx.builders.manpage.ManualPageBuilder to make a manpage with > sphinx. I wouldn't be shocked if other documentation systems had > something similar. > > I wouldn't be opposed to having argparse have some builtin or > third-party capability for generating manpages. I wouldn't use getopt > myself for anything but mimicing old, established, getopt-based > interfaces. argparse has a lot more functionality already and it's > more reasonable to expand it since it's a Python thing, not a > pre-established thing. > > Mike > Sphinx allows better formatting control and then translation to troff macros. But doesn't help encourage and self-write those man page sections. Certainly much of the code would be rendered obsolete by using Sphinx calls, but the production of the man page and reduction of duplication won't happen. For future inclusion, if it were to be included, argparse's method would be usable for defining the options. I don't know argparse benefits from having information about man pages in it, however, so a utility/class/method/include that can operate on what argparse requires to do the writing of the section(s) is entirely sensible. This may push argparse to include items that aren't used in itself, solely for documentation purposes. If some methodology for solving this duplication with man page content were put in future python releases, that same methodology could be written into home-built code by those who have not yet access to the latest python at their work, with at least the sop to their efforts that nobody using their suite will have to relearn another way of doing it. e.g. turning the argparse arguments into a getopt() call is pretty trivial if you don't have access to the argparse method. From yselivanov.ml at gmail.com Thu Oct 25 18:10:32 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 12:10:32 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: On 2012-10-25, at 11:58 AM, Guido van Rossum wrote: > Yuri, please give up this particular issue (trying to patch CPython to > record whether a generator is in a finally clause). I have failed to > explain my reasons why I think it is a bad idea, but you haven't > convinced me it's a good idea, and we have at least two decent > work-arounds. So let me just use the release cycle as an argument: > your patch is a new feature, 3.3 just came out, so it cannot be > introduced until 3.4. I don't want to wait for that. OK, NP. One question: what do we actually want to get? What're the goals? - A specification (PEP?) of how to make stdlib more async-friendly? - To develop a separate library that may be included in the stdlib one day? - And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects? I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas. And I just want to know where to channel my energy and expertise ;) - Yury From storchaka at gmail.com Thu Oct 25 18:42:19 2012 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 25 Oct 2012 19:42:19 +0300 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: <201210251708.14384.mark.hackett@metoffice.gov.uk> References: <201210251548.32082.mark.hackett@metoffice.gov.uk> <201210251708.14384.mark.hackett@metoffice.gov.uk> Message-ID: On 25.10.12 19:08, Mark Hackett wrote: > But doesn't help encourage and self-write those man page sections. Try help2man. From merwok at netwok.org Thu Oct 25 19:25:29 2012 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Thu, 25 Oct 2012 13:25:29 -0400 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: <201210251548.32082.mark.hackett@metoffice.gov.uk> References: <201210251548.32082.mark.hackett@metoffice.gov.uk> Message-ID: <50897609.4080808@netwok.org> Hi, See http://bugs.python.org/issue14102 ?argparse: add ability to create a man page? Cheers From Steve.Dower at microsoft.com Thu Oct 25 19:39:23 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 25 Oct 2012 17:39:23 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: > One question: what do we actually want to get? What're the goals? > > - A specification (PEP?) of how to make stdlib more async-friendly? > > - To develop a separate library that may be included in the stdlib one day? > > - And what's your opinion on writing a PEP about making it possible to pass a custom socket-factory to stdlib objects? > > I'm (and I think it's not just me) a bit lost here, after reading 100s of emails on python-ideas. And I just want to know where to channel my energy and expertise ;) It's not just you, I'm not entirely clear on what we expect to end up with either. My current view is that we'll get a PEP that defines a convention for user code and an interface for schedulers. Adding *_async() methods to the entire standard library could take a long time and should probably be divided up so we can have really experienced devs on particular areas (e.g. someone on Windows sockets, someone else on Linux sockets, etc.) and may need individual PEPs. My hope is that the first PEP provides a protocol for users to defer the rest of a task until after some/any operation has completed - I don't really want sockets/networking/files/threads/etc. to leak through at all, though these are all important use cases that need to be tried. This is the way I'm approaching it, so please let me know if I'm off the mark :) Cheers, Steve From guido at python.org Thu Oct 25 19:58:08 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 10:58:08 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: On Thu, Oct 25, 2012 at 9:10 AM, Yury Selivanov wrote: > One question: what do we actually want to get? What're the goals? Good question. I'm still in the requirements gathering phase myself. > - A specification (PEP?) of how to make stdlib more async-friendly? That's one of the hopeful goals, but a lot of things need to be decided before we can start adapting the stdlib. It is also likely that this will be a process that will take several release (and may never finish completely). > - To develop a separate library that may be included in the stdlib > one day? That's one way I am pursuing and I hope others will too. > - And what's your opinion on writing a PEP about making it possible > to pass a custom socket-factory to stdlib objects? That sounds like it might be jumping to a specific solution. I agree that the stdlib often, unfortunately, couples classes too tightly, where a class that needs an instance of another class just instantiates that other class rather than having an instance passed in (at least as an option). We're doing better with files these days -- most APIs (that I can think of) that work with streams let you pass one in. So maybe you're on to something. Perhaps, as a step towards the exploration of this PEP, you could come up with a concrete list of modules and classes (or other API elements) that you think would benefit from being able to pass in a socket? Please start another thread -- python-ideas is fine. I will read it. > I'm (and I think it's not just me) a bit lost here, after reading 100s > of emails on python-ideas. And I just want to know where to channel my > energy and expertise ;) Totally understood. I'm overwhelmed myself by the vast array of options. Still, I have been writing some experimental code myself, and I am beginning to understand in which direction I'd like to move. I am thinking of having a strict separation between an event loop, a task scheduler, specific transports, and protocol implementations. - The event loop in turn separates into a component that knows how to poll for I/O (or other) events using the best mechanism available on the platform, and a part that manages callback functions -- these are closely tied together, but the idea is that the callback management part does not have to vary by platform, so only the I/O polling needs to be a platform-specific. Details subject to bikeshedding (I've only got something working on Linux and OSX so far). One of the requirements for this event loop is that it should be possible to run frameworks like Twisted or Tornado using an adapter to it, and it should also be possible for Twisted/Tornado/etc. to provide their own event loop (again via some kind of adaptation) to replace the default one. - For the task scheduler I am piling all my hopes on PEP-380, i.e. yield from. I have not found a single thing that is harder to do using this style than using the PEP-342 yield style, and I really don't like mixing the two up (despite what Steve Dower says :-). But I don't want the event loop interface to know about this at all -- howver the scheduler has to know about the event loop (at least its interface). I am currently refactoring my ideas in this area; I think I'll end up with a Task object that smells a bit like a Future, but represents a whole stack of generator invocations linked via yield-from, and which allows suspension of the entire stack at once; user code only needs to use Tasks when it wants to schedule multiple activities concurrently, not when it just wants to be able to yield. (This may be the core insight in favor of PEP 380.) - Transports (e.g. TCP): I feel like a newbie here. I know sockets pretty well, but the key is to introduce abstractions that let you easily replace a transport with a different one -- e.g. TCP vs. pipes vs. SSL. Twisted clearly has paved the way here -- even if we end up slicing the abstractions somewhat differently, the road to the optimal interface has to take the same road that Twisted took -- implement a simple transport using sockets, then add another transport, refactor the abstractions to share the commonalities and separate the differences, then try adding yet another transport, rinse and repeat. We should provide a bunch of common transports but also let people build new ones; however, there will probably be way fewer transport implementations than protocol implementations. - Protocols (e.g. HTTP): A protocol should ideally be able to work with any transport (though obviously some protocols require certain transport extensions -- hopefully we'll have a small hierarchy of abstract classes defining different transport styles and capabilities). We should provide a bunch of common protocols (e.g. a good HTTP client and server) but this is where users will most often be writing their own -- so the APIs used by protocol implementations must be documented especially well, the standard protocol implementations must be examples of excellent coding style, and the transport implementations should not let protocol implementations get away with undefined behavior. It would be useful to have explicit testing support too -- just like there's a WSGI validator, we could have a protocol validator that acts like a particularly picky transport. (I found this idea in a library written by Jim Fulton for Zope, I think it's zope.ngi. It's a valuable idea.) I think it's inevitable that the choice of using PEP-380 will be reflected in the abstract classes defining transports and protocols. Hopefully we will be able to bridge between the PEP-380 world and Twisted's world of Deferred somehow -- the event loop is one interface layer, but I think we can build adapters for the other levels as well (at least for transports). One final thought: async WSGI anyone? -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Thu Oct 25 20:39:18 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 14:39:18 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: Guido, Thank you for such a detailed and deep response. Lots of good thoughts to digest. One idea: the scope of the problem is enormously big. It may take months/years to synchronize all ideas and thoughts by just communicating ideas over mail list without a concrete thing and subject to discuss. How about you/we create a repository with a draft implementation of scheduler/io loop/coroutines engine and we simply start tweaking an discussing that particular design? That way people will see where to start the discussion, what's done, and some will even participate? The goal is not to write a production-quality software, but rather to have a common place to discuss/try things/benchmark etc. I'm not sure, but maybe places like bitbucket, where you can have a wiki, issues, and the actual code is a better place, than a mail-list. I also think that there's need to move concurrency-related discussions to a separate mail-list, as everything else on python-ideas is lost now. On 2012-10-25, at 1:58 PM, Guido van Rossum wrote: [...] >> - And what's your opinion on writing a PEP about making it possible >> to pass a custom socket-factory to stdlib objects? > > That sounds like it might be jumping to a specific solution. I agree > that the stdlib often, unfortunately, couples classes too tightly, > where a class that needs an instance of another class just > instantiates that other class rather than having an instance passed in > (at least as an option). We're doing better with files these days -- > most APIs (that I can think of) that work with streams let you pass > one in. So maybe you're on to something. Perhaps, as a step towards > the exploration of this PEP, you could come up with a concrete list of > modules and classes (or other API elements) that you think would > benefit from being able to pass in a socket? Please start another > thread -- python-ideas is fine. I will read it. OK, I will, in a week or two. Need some time for a research. [...] > - For the task scheduler I am piling all my hopes on PEP-380, i.e. > yield from. I have not found a single thing that is harder to do using > this style than using the PEP-342 yield style, and I really > don't like mixing the two up (despite what Steve Dower says :-). But I > don't want the event loop interface to know about this at all -- > howver the scheduler has to know about the event loop (at least its > interface). I am currently refactoring my ideas in this area; I think > I'll end up with a Task object that smells a bit like a Future, but > represents a whole stack of generator invocations linked via > yield-from, and which allows suspension of the entire stack at once; > user code only needs to use Tasks when it wants to schedule multiple > activities concurrently, not when it just wants to be able to yield. > (This may be the core insight in favor of PEP 380.) The only problem I have with PEP-380, is that to me it's not entirely clear when you should use 'yield' or 'yield from' (please correct me if I am wrong). I'll try to demonstrate it by example: class Socket: def sendall(self, payload): f = Future() IOLoop.sendall(payload, future=f) return f class SMTP: def send(self, s): ... # yield the returned future to the scheduler yield self.sock.sendall(s) ... # And later: s = SMTP() yield from s.send('spam') Is it (roughly) how you want it all to look like? I.e. using 'yield' to send a future/task to the scheduler, and 'yield from' to delegate? If I guessed correctly, and that's how you envision it, I have a question: What if you decide to refactor 'Socket.sendall' to be a coroutine? In that case you'd want users to call it 'yield from Socket.sendall', and not 'yield Socket.sendall'. Thank you, Yury From guido at python.org Thu Oct 25 21:25:06 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 12:25:06 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov wrote: > Thank you for such a detailed and deep response. Lots of good thoughts > to digest. You're welcome. > One idea: the scope of the problem is enormously big. It may take > months/years to synchronize all ideas and thoughts by just communicating > ideas over mail list without a concrete thing and subject to discuss. > How about you/we create a repository with a draft implementation of > scheduler/io loop/coroutines engine and we simply start tweaking an > discussing that particular design? That way people will see where > to start the discussion, what's done, and some will even participate? > The goal is not to write a production-quality software, but rather to > have a common place to discuss/try things/benchmark etc. I'm not sure, > but maybe places like bitbucket, where you can have a wiki, issues, and > the actual code is a better place, than a mail-list. I am currently working on code. Steve Dower has also said he's going to write some code. I'm just not quite ready to show my code (I need to do a few more iterations on each component). As long as I can use Mercurial I'm happy; bitbucket or Google Code Hosting both work fine for me. > I also think that there's need to move concurrency-related discussions > to a separate mail-list, as everything else on python-ideas is lost > now. I don't have that problem. You are the one who started a new thread. :-) If you really want a new mailing list, you can set it up; I'd be happy to join, but my preference would be to stick it out here; I've seen too many specialized lists and SIGs dwindle after an initial burst of activity. [...] > The only problem I have with PEP-380, is that to me it's not entirely > clear when you should use 'yield' or 'yield from' (please correct me if > I am wrong). I'll try to demonstrate it by example: > > > class Socket: > def sendall(self, payload): > f = Future() > IOLoop.sendall(payload, future=f) > return f > > class SMTP: > def send(self, s): > ... > # yield the returned future to the scheduler > yield self.sock.sendall(s) > ... > > # And later: > s = SMTP() > yield from s.send('spam') > > Is it (roughly) how you want it all to look like? I.e. using 'yield' to > send a future/task to the scheduler, and 'yield from' to delegate? I think that's the style that Steve Dower prefers. Greg Ewing would rather see all public APIs use yield from, and reserve plain yield exclusively as an implementation detail of the scheduler. In my own experimental code I am using Greg's style and it is working out great. My main reason for taking a hard stance on this is that it would otherwise be too confusing for users -- should they use yield, yield from, or a plain call? I'd like to tell them "if it blocks, use yield from". BTW, if you haven't read Greg's introduction to this style, here it is -- worth reading! http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html > If I guessed correctly, and that's how you envision it, I have a question: > What if you decide to refactor 'Socket.sendall' to be a coroutine? > In that case you'd want users to call it 'yield from Socket.sendall', and > not 'yield Socket.sendall'. That's why using yield from all the way is better! -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Thu Oct 25 21:36:28 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 15:36:28 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: <014812F1-B98E-4D50-B302-06244FFF44C2@gmail.com> On 2012-10-25, at 3:25 PM, Guido van Rossum wrote: > On Thu, Oct 25, 2012 at 11:39 AM, Yury Selivanov > wrote: [...] >> One idea: the scope of the problem is enormously big. It may take >> months/years to synchronize all ideas and thoughts by just communicating >> ideas over mail list without a concrete thing and subject to discuss. >> How about you/we create a repository with a draft implementation of >> scheduler/io loop/coroutines engine and we simply start tweaking an >> discussing that particular design? That way people will see where >> to start the discussion, what's done, and some will even participate? >> The goal is not to write a production-quality software, but rather to >> have a common place to discuss/try things/benchmark etc. I'm not sure, >> but maybe places like bitbucket, where you can have a wiki, issues, and >> the actual code is a better place, than a mail-list. > > I am currently working on code. Steve Dower has also said he's going > to write some code. I'm just not quite ready to show my code (I need > to do a few more iterations on each component). As long as I can use > Mercurial I'm happy; bitbucket or Google Code Hosting both work fine > for me. OK. Let's wait until we have a somewhat stable platform to work with. [...] >> Is it (roughly) how you want it all to look like? I.e. using 'yield' to >> send a future/task to the scheduler, and 'yield from' to delegate? > > I think that's the style that Steve Dower prefers. Greg Ewing would > rather see all public APIs use yield from, and reserve plain yield > exclusively as an implementation detail of the scheduler. In my own > experimental code I am using Greg's style and it is working out great. > My main reason for taking a hard stance on this is that it would > otherwise be too confusing for users -- should they use yield, yield > from, or a plain call? I'd like to tell them "if it blocks, use yield > from". > > BTW, if you haven't read Greg's introduction to this style, here it is > -- worth reading! > http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html > >> If I guessed correctly, and that's how you envision it, I have a question: >> What if you decide to refactor 'Socket.sendall' to be a coroutine? >> In that case you'd want users to call it 'yield from Socket.sendall', and >> not 'yield Socket.sendall'. > > That's why using yield from all the way is better! Yes, that now makes sense! I'll definitely take a look at Greg's article. Thanks, Yury From tjreedy at udel.edu Thu Oct 25 22:39:40 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Oct 2012 16:39:40 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: On 10/25/2012 12:10 PM, Yury Selivanov wrote: > - And what's your opinion on writing a PEP about making it possible > to pass a custom socket-factory to stdlib objects? I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto). I am not sure this needs a PEP. Most parameter additions are just tracker issues. But I would be worthwhile to decide on the details here first. -- Terry Jan Reedy From yselivanov.ml at gmail.com Thu Oct 25 22:51:09 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 16:51:09 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: <15810675-361E-4143-A215-5B9418CAE264@gmail.com> On 2012-10-25, at 4:39 PM, Terry Reedy wrote: > On 10/25/2012 12:10 PM, Yury Selivanov wrote: > >> - And what's your opinion on writing a PEP about making it possible >> to pass a custom socket-factory to stdlib objects? > > I think this is probably a good idea quite aside from async issues. For one thing, it would make testing with a mock-socket class easier. Issues to decide: name of parameter (should be same for all socket using classes); keyword only? (ditto). Right, good catch on mocking sockets! As for the issues: I think that the parameter name should be the same/very consistent, and surely keyword-only. > I am not sure this needs a PEP. Most parameter additions are just tracker issues. But I would be worthwhile to decide on the details here first. We'll see. I'll start with a detailed post on python-ideas, and if the PEP looks like an overkill - I'd be glad to skip the PEP step. Thanks, Yury From tjreedy at udel.edu Thu Oct 25 23:06:17 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 25 Oct 2012 17:06:17 -0400 Subject: [Python-ideas] Async API In-Reply-To: <15810675-361E-4143-A215-5B9418CAE264@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <15810675-361E-4143-A215-5B9418CAE264@gmail.com> Message-ID: On 10/25/2012 4:51 PM, Yury Selivanov wrote: > On 2012-10-25, at 4:39 PM, Terry Reedy > wrote: > >> On 10/25/2012 12:10 PM, Yury Selivanov wrote: >> >>> - And what's your opinion on writing a PEP about making it >>> possible to pass a custom socket-factory to stdlib objects? >> >> I think this is probably a good idea quite aside from async issues. >> For one thing, it would make testing with a mock-socket class >> easier. Issues to decide: name of parameter (should be same for all >> socket using classes); keyword only? (ditto). > > Right, good catch on mocking sockets! > > As for the issues: I think that the parameter name should be the > same/very consistent, and surely keyword-only. I left out the following issue: should the argument be a socket-returning callable (a 'socket-factory' as you called it above) or an opened socket? For files, we variously pass file names to be used with the default opener, opened files, and file descriptors, but never an alternate opener (such as StringIO). One reason is the the user typically needs a handle on the file object in order to later retrieve the contents. I am not sure that the same applies to sockets. If I ask the ftp module to get or send a file, I should not ever need to see the socket used for the transport. -- Terry Jan Reedy From guido at python.org Thu Oct 25 23:12:57 2012 From: guido at python.org (Guido van Rossum) Date: Thu, 25 Oct 2012 14:12:57 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <15810675-361E-4143-A215-5B9418CAE264@gmail.com> Message-ID: Please start a new thread for this sub-topic. Note that for mocking, you won't need to pass in a socket object; you can just mock out socket.socket() directly using Michael Foord's all-singing all-dancing unittest.mock module (now in the Python 3 stdlib). On Thu, Oct 25, 2012 at 2:06 PM, Terry Reedy wrote: > On 10/25/2012 4:51 PM, Yury Selivanov wrote: >> >> On 2012-10-25, at 4:39 PM, Terry Reedy >> wrote: >> >>> On 10/25/2012 12:10 PM, Yury Selivanov wrote: >>> >>>> - And what's your opinion on writing a PEP about making it >>>> possible to pass a custom socket-factory to stdlib objects? >>> >>> >>> I think this is probably a good idea quite aside from async issues. >>> For one thing, it would make testing with a mock-socket class >>> easier. Issues to decide: name of parameter (should be same for all >>> socket using classes); keyword only? (ditto). >> >> >> Right, good catch on mocking sockets! >> >> As for the issues: I think that the parameter name should be the >> same/very consistent, and surely keyword-only. > > > I left out the following issue: should the argument be a socket-returning > callable (a 'socket-factory' as you called it above) or an opened socket? > > For files, we variously pass file names to be used with the default opener, > opened files, and file descriptors, but never an alternate opener (such as > StringIO). One reason is the the user typically needs a handle on the file > object in order to later retrieve the contents. > > I am not sure that the same applies to sockets. If I ask the ftp module to > get or send a file, I should not ever need to see the socket used for the > transport. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Thu Oct 25 23:22:40 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Oct 2012 10:22:40 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> Message-ID: <5089ADA0.8070706@canterbury.ac.nz> If the main concern in all of this is timeouts, it should be possible to address that without adding any more interpreter machinery. For example, when a timeout exception is thrown, whatever is responsible for that can flag the task as being in the process of handling a timeout, and refrain from initiating any more timeouts until that flag is cleared. -- Greg From greg.ewing at canterbury.ac.nz Thu Oct 25 23:30:07 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 26 Oct 2012 10:30:07 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> <08DF94A5-CFE4-4EAC-8CA8-BF6AB7E7DEC5@gmail.com> Message-ID: <5089AF5F.4070302@canterbury.ac.nz> Steve Dower wrote: > The type of the error is irrelevant - if something_else() might raise an > exception that is expected, it won't be passed in because the scheduler is > suppressing exceptions inside finally blocks. Or perhaps I've misunderstood the > point of gi_in_finally? IIUC, it's only *asynchronous* exceptions that would be blocked -- i.e. ones thrown in from a different task, or arising from an external event such as a timeout. An exception raised explicity by the task's own code would be unaffected. -- Greg From yselivanov.ml at gmail.com Fri Oct 26 01:50:52 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 25 Oct 2012 19:50:52 -0400 Subject: [Python-ideas] docs.python.org Message-ID: Hi, I remember a discussion to make docs.python.org pointed to py3k docs by default. Are we still going to do that? - Yury -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.svetlov at gmail.com Fri Oct 26 08:29:23 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Fri, 26 Oct 2012 09:29:23 +0300 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: +1 for switching default On Fri, Oct 26, 2012 at 2:50 AM, Yury Selivanov wrote: > Hi, > > I remember a discussion to make docs.python.org pointed to py3k docs by > default. > > Are we still going to do that? > > - > Yury > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Thanks, Andrew Svetlov From ncoghlan at gmail.com Fri Oct 26 10:47:11 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 26 Oct 2012 18:47:11 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: Eventually, but not just yet :) Definitely by 3.4, but maybe earlier if it seems appropriate. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) On Oct 26, 2012 9:51 AM, "Yury Selivanov" wrote: > Hi, > > I remember a discussion to make docs.python.org pointed to py3k docs by > default. > > Are we still going to do that? > > - > Yury > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fetchinson at googlemail.com Fri Oct 26 11:22:30 2012 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Fri, 26 Oct 2012 11:22:30 +0200 Subject: [Python-ideas] list of reserved identifiers in program? Message-ID: Hi folks, Would it be a good idea to have a built-in list of strings containing the reserved identifiers of python such as 'assert', 'import', etc? The reason I think this would be useful is that whenever I write a class with user defined methods I always have to exclude the reserved keywords. So for instance myinstance.mymethod( ) is okay but myinstance.assert( ) is not. In these cases I use the convention myinstance._assert( ), etc. In order to test for these cases I hard code the keywords in a list and test from there. I take the list of keywords from http://docs.python.org/reference/lexical_analysis.html#keywords But what if these change in the future? So if I would have a built-in list containing all the keywords of the given interpreter version in question my life would be that much easier. What do you think? Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From christian at python.org Fri Oct 26 11:28:55 2012 From: christian at python.org (Christian Heimes) Date: Fri, 26 Oct 2012 11:28:55 +0200 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: References: Message-ID: Am 26.10.2012 11:22, schrieb Daniel Fetchinson: > Hi folks, > > Would it be a good idea to have a built-in list of strings containing > the reserved identifiers of python such as 'assert', 'import', etc? Something like http://hg.python.org/cpython/file/405932ddca9c/Lib/keyword.py ? :) Christian From fetchinson at googlemail.com Fri Oct 26 11:34:44 2012 From: fetchinson at googlemail.com (Daniel Fetchinson) Date: Fri, 26 Oct 2012 11:34:44 +0200 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: References: Message-ID: >> Would it be a good idea to have a built-in list of strings containing >> the reserved identifiers of python such as 'assert', 'import', etc? > > Something like > http://hg.python.org/cpython/file/405932ddca9c/Lib/keyword.py ? :) Exactly! Thanks a lot, I did not know about it before! Cheers, Daniel -- Psss, psss, put it down! - http://www.cafepress.com/putitdown From rob.cliffe at btinternet.com Fri Oct 26 11:33:47 2012 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Fri, 26 Oct 2012 10:33:47 +0100 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: References: Message-ID: <508A58FB.1000805@btinternet.com> On 26/10/2012 10:22, Daniel Fetchinson wrote: > Hi folks, > > Would it be a good idea to have a built-in list of strings containing > the reserved identifiers of python such as 'assert', 'import', etc? > > The reason I think this would be useful is that whenever I write a > class with user defined methods I always have to exclude the reserved > keywords. So for instance myinstance.mymethod( ) is okay but > myinstance.assert( ) is not. In these cases I use the convention > myinstance._assert( ), etc. In order to test for these cases I hard > code the keywords in a list and test from there. I take the list of > keywords from http://docs.python.org/reference/lexical_analysis.html#keywords > But what if these change in the future? > > So if I would have a built-in list containing all the keywords of the > given interpreter version in question my life would be that much > easier. > > What do you think? > > Cheers, > Daniel > > >>> import keyword >>> keyword.kwlist ['and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'exec', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'not', 'or', 'pass', 'print', 'raise', 'return', 'try', 'while', 'with', 'yield'] >>> Rob Cliffe From steve at pearwood.info Fri Oct 26 11:42:25 2012 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 26 Oct 2012 20:42:25 +1100 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: References: Message-ID: <508A5B01.6060508@pearwood.info> On 26/10/12 20:22, Daniel Fetchinson wrote: > Hi folks, > > Would it be a good idea to have a built-in list of strings containing > the reserved identifiers of python such as 'assert', 'import', etc? > > The reason I think this would be useful is that whenever I write a > class with user defined methods I always have to exclude the reserved > keywords. So for instance myinstance.mymethod( ) is okay but > myinstance.assert( ) is not. In these cases I use the convention > myinstance._assert( ), etc. The usual convention is that leading underscores are private, and trailing underscores are used to avoid name clashes with reserved words. So myinstance.assert_ rather than myinstance._assert, which would be considered "private, do not use". -- Steven From mark.hackett at metoffice.gov.uk Fri Oct 26 11:58:51 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Fri, 26 Oct 2012 10:58:51 +0100 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: <508A5B01.6060508@pearwood.info> References: <508A5B01.6060508@pearwood.info> Message-ID: <201210261058.51903.mark.hackett@metoffice.gov.uk> On Friday 26 Oct 2012, Steven D'Aprano wrote: > On 26/10/12 20:22, Daniel Fetchinson wrote: > > Hi folks, > > > > Would it be a good idea to have a built-in list of strings containing > > the reserved identifiers of python such as 'assert', 'import', etc? > > > > The reason I think this would be useful is that whenever I write a > > class with user defined methods I always have to exclude the reserved > > keywords. So for instance myinstance.mymethod( ) is okay but > > myinstance.assert( ) is not. In these cases I use the convention > > myinstance._assert( ), etc. > > The usual convention is that leading underscores are private, and > trailing underscores are used to avoid name clashes with reserved words. > > So myinstance.assert_ rather than myinstance._assert, which would be > considered "private, do not use". > One story I heard about development was a site that had included as an early C++ header had #define private public If users REALLY want to use a function you though was private, they will. Convention works just as well without having people go to extreme lengths to avoid it (where their use case makes it beneficial). From sturla at molden.no Fri Oct 26 12:27:03 2012 From: sturla at molden.no (Sturla Molden) Date: Fri, 26 Oct 2012 12:27:03 +0200 Subject: [Python-ideas] Async API In-Reply-To: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> Message-ID: <508A6577.3030804@molden.no> On 23.10.2012 21:33, Yury Selivanov wrote: > topics = FE.select([ > FE.publication_date, > FE.body, > FE.category, > (FE.creator, [ > (FE.creator.subject, [ > (gpi, [ > gpi.avatar > ]) > ]) > ]) > ]).filter(FE.publication_date< FE.publication_date.now(), > FE.category == self.category) Why use Python when you clearly want Java? Sturla From ned at nedbatchelder.com Fri Oct 26 13:55:06 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 26 Oct 2012 07:55:06 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: <508A7A1A.5080206@nedbatchelder.com> On 10/25/2012 7:50 PM, Yury Selivanov wrote: > Hi, > > I remember a discussion to make docs.python.org > pointed to py3k docs by default. > > Are we still going to do that? > Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. --Ned. > - > Yury > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Fri Oct 26 14:03:32 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Fri, 26 Oct 2012 12:03:32 +0000 Subject: [Python-ideas] Async API In-Reply-To: <5087218E.8090805@rushing.nightmare.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> Message-ID: > -----Original Message----- > From: Python-ideas [mailto:python-ideas- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Sam Rushing > Sent: 23. okt?ber 2012 23:01 > To: Yury Selivanov > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] Async API > > In shrapnel it is simply: > > coro.with_timeout (, , *args, **kwargs) > > Timeouts are caught thus: > > try: > coro.with_timeout (...) > except coro.TimeoutError: > ... Hi Sam ( I rember our talk about Shrapnel here at CCP some years back) , others: Jumping in here with some random stuff, in case anyone cares: A few years ago, I started trying to create a standard library for stackless python. we use it internally at ccp and it is open source, at https://bitbucket.org/krisvale/stacklesslib What it provides is 1) some utility classes for stackless (context managers mostly) but also synchronization primitives. 2) a basic "main" functionality: A main loop and an event scheduler 3) a set of replacement modules for threading/socket, etc 4) Monkeypatching tools, to monkeypatch in the replacements, and even run monkeypatched scripts. On the basis of the event scheduler, I also implemented timeout for socket.receive() functions. These used to allow e.g. timeouts for locking operations Timeouts are indeed implemented as exceptions raised. There are some minor race issues to think about but that's it. Notice the need for a stacklesslib.main module. The issue I have found with this sort of event driven model, is that composability suffers when everyone has their own idea about what a "main" loop should be. In threaded programming, the OS provides the main loop and the event scheduler. For something like Python, a whole application has to agree on what the main loop is, and how to schedule future events. Hopefully this discussion is an attempt to settle that in a standard manner. Cheers, Kristj?n p.s. stacklesslib is in a state of protracted and procrastinated development. I promised that I would fix it up at last pycon. Mostly I'm working on restructuring and making the main loop work more "out of the box." From jstpierre at mecheye.net Fri Oct 26 14:52:49 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 26 Oct 2012 08:52:49 -0400 Subject: [Python-ideas] list of reserved identifiers in program? In-Reply-To: <201210261058.51903.mark.hackett@metoffice.gov.uk> References: <508A5B01.6060508@pearwood.info> <201210261058.51903.mark.hackett@metoffice.gov.uk> Message-ID: On Fri, Oct 26, 2012 at 5:58 AM, Mark Hackett wrote: > On Friday 26 Oct 2012, Steven D'Aprano wrote: >> On 26/10/12 20:22, Daniel Fetchinson wrote: >> > Hi folks, >> > >> > Would it be a good idea to have a built-in list of strings containing >> > the reserved identifiers of python such as 'assert', 'import', etc? >> > >> > The reason I think this would be useful is that whenever I write a >> > class with user defined methods I always have to exclude the reserved >> > keywords. So for instance myinstance.mymethod( ) is okay but >> > myinstance.assert( ) is not. In these cases I use the convention >> > myinstance._assert( ), etc. >> >> The usual convention is that leading underscores are private, and >> trailing underscores are used to avoid name clashes with reserved words. >> >> So myinstance.assert_ rather than myinstance._assert, which would be >> considered "private, do not use". >> > > One story I heard about development was a site that had included as an early > C++ header had > > #define private public > > If users REALLY want to use a function you though was private, they will. > > Convention works just as well without having people go to extreme lengths to > avoid it (where their use case makes it beneficial). I use it more as a guarantee. Any API that you mark as private can and will break in the future, and is not covered by any stability promise. If they really need to do some awfulness that my library can help out with, sure, they can hack up the private API, but they're on their own. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From yselivanov.ml at gmail.com Fri Oct 26 15:26:29 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 09:26:29 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508A6577.3030804@molden.no> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <508A6577.3030804@molden.no> Message-ID: <313099F8-EB12-4739-BEFF-54133A40FFFC@gmail.com> On 2012-10-26, at 6:27 AM, Sturla Molden wrote: > On 23.10.2012 21:33, Yury Selivanov wrote: > > > topics = FE.select([ > > FE.publication_date, > > FE.body, > > FE.category, > > (FE.creator, [ > > (FE.creator.subject, [ > > (gpi, [ > > gpi.avatar > > ]) > > ]) > > ]) > > ]).filter(FE.publication_date< FE.publication_date.now(), > > FE.category == self.category) > > > Why use Python when you clearly want Java? And why do you think so? ;) - Yury From itamar at futurefoundries.com Fri Oct 26 16:03:56 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Fri, 26 Oct 2012 10:03:56 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Thu, Oct 25, 2012 at 10:43 AM, Guido van Rossum wrote: > On Thu, Oct 25, 2012 at 4:46 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > > Sorry, working really long hours these days; just wanted to chime in that > > yes, you can call transport.write with large strings, and the reactor > will > > do the right thing under the hood: loseConnection is the polite way of > > dropping a connection, which should wait for all pending writes to finish > > etc. > > This seems a decent enough pattern. It also makes it possible to use > one of these things as a substitute for a writable file object, so you > can e.g. use it as sys.stdout or the stream for a > logging.StreamHandler. > > Still, I wonder what happens if the socket/pipe/whatever that is > written to is very slow and the program produces too much data. Does > memory just balloon up, or is there some kind of throttling of the > writer? Or a buffer overflow exception? For a totally general solution > I would at least like to have the *option* of doing synchronous > writes. > > (I'm asking these questions because I'd like to copy this useful > pattern -- but I want to get the end cases right.) > There's a callback that gets called saying "your buffer is too full". This is the producer/consumer API people have referred to. It's not the best API in the world, and Glyph is working on an improvement, but that's the basic idea. The general move is towards a push API - push as much data as you can until you're told to stop. Tornado has a "tell me when this write is removed from the buffer and actually written to the socket" callback. This is more of a pull approach; you write some data, and get notified when you should write some more. --Itamar -------------- next part -------------- An HTML attachment was scrubbed... URL: From itamar at futurefoundries.com Fri Oct 26 16:12:15 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Fri, 26 Oct 2012 10:12:15 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum wrote: > > > I don't know, I hope someone with an expertise in Twisted can tell us. > > > > But I would imagine that they don't have this particular problem, as it > > should be related only to coroutines and schedulers that run them. I.e. > > it's a problem when you run some code and may interrupt it. And you > can't > > interrupt a plain python code that uses callbacks without yields and > > greenlets. > > Well, but in the Twisted world, if a cleanup callback requires more > blocking calls, it has to spawn more deferred callbacks. So I think > they *do* have the problem, unless they don't have a way at all to > constrain the total running time of an action involving cascading > callbacks. Also, they have inlineCallbacks which does use yield. > Deferreds don't do anything to prevent blocking. They're just a nice abstraction for callbacks. And yes, if you call 1000 functions that do lots of CPU in a row, that will keep other stuff from happening. However, consider how a timeout works: the event loop notices enough time has passed, and so calls some code that tells the Deferred to cancel its operation. So you're *not* adding the cancellation operations to the stack of the original operation, you're starting from the event loop. And so timeouts are just normal event loop world, where you need to be careful not to do to much CPU-intensive processing in any given call, and you can't call blocking system calls (except using a thread). Of course, you can't timeout a function that's just looping using CPU, or a blocking system call, and so code needs to be structured to deal with this, but that's a different issue. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Oct 26 17:25:06 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2012 08:25:06 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring wrote: > > > On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum wrote: >> >> >> > I don't know, I hope someone with an expertise in Twisted can tell us. >> > >> > But I would imagine that they don't have this particular problem, as it >> > should be related only to coroutines and schedulers that run them. I.e. >> > it's a problem when you run some code and may interrupt it. And you >> > can't >> > interrupt a plain python code that uses callbacks without yields and >> > greenlets. >> >> Well, but in the Twisted world, if a cleanup callback requires more >> blocking calls, it has to spawn more deferred callbacks. So I think >> they *do* have the problem, unless they don't have a way at all to >> constrain the total running time of an action involving cascading >> callbacks. Also, they have inlineCallbacks which does use yield. > > > Deferreds don't do anything to prevent blocking. They're just a nice > abstraction for callbacks. And yes, if you call 1000 functions that do lots > of CPU in a row, that will keep other stuff from happening. > > However, consider how a timeout works: the event loop notices enough time > has passed, and so calls some code that tells the Deferred to cancel its > operation. So you're *not* adding the cancellation operations to the stack > of the original operation, you're starting from the event loop. And so > timeouts are just normal event loop world, where you need to be careful not > to do to much CPU-intensive processing in any given call, and you can't call > blocking system calls (except using a thread). > > Of course, you can't timeout a function that's just looping using CPU, or a > blocking system call, and so code needs to be structured to deal with this, > but that's a different issue. So, basically, it's just "after T seconds you get this second callback and it's up to you to deal with it"? I guess the timeout callback can inspect the state of the operation, and cancel any pending operations? Do you have a way to translate timeouts into exceptions in inlineCallbacks? If so, how is that working out? -- --Guido van Rossum (python.org/~guido) From _ at lvh.cc Fri Oct 26 17:40:41 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Fri, 26 Oct 2012 17:40:41 +0200 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: There's an exception for "a deferred has been cancelled". Cancelling a deferred fires that down its errback chain just like any exception. Since @inlineCallbacks works on top of deferreds, it magically works: >>> from twisted.internet import defer >>> d = defer.Deferred() >>> @defer.inlineCallbacks ... def f(): ... yield d ... >>> r = f() >>> r >>> d.cancel() >>> r >> On Fri, Oct 26, 2012 at 5:25 PM, Guido van Rossum wrote: > On Fri, Oct 26, 2012 at 7:12 AM, Itamar Turner-Trauring > wrote: > > > > > > On Wed, Oct 24, 2012 at 7:43 PM, Guido van Rossum > wrote: > >> > >> > >> > I don't know, I hope someone with an expertise in Twisted can tell us. > >> > > >> > But I would imagine that they don't have this particular problem, as > it > >> > should be related only to coroutines and schedulers that run them. > I.e. > >> > it's a problem when you run some code and may interrupt it. And you > >> > can't > >> > interrupt a plain python code that uses callbacks without yields and > >> > greenlets. > >> > >> Well, but in the Twisted world, if a cleanup callback requires more > >> blocking calls, it has to spawn more deferred callbacks. So I think > >> they *do* have the problem, unless they don't have a way at all to > >> constrain the total running time of an action involving cascading > >> callbacks. Also, they have inlineCallbacks which does use yield. > > > > > > Deferreds don't do anything to prevent blocking. They're just a nice > > abstraction for callbacks. And yes, if you call 1000 functions that do > lots > > of CPU in a row, that will keep other stuff from happening. > > > > However, consider how a timeout works: the event loop notices enough time > > has passed, and so calls some code that tells the Deferred to cancel its > > operation. So you're *not* adding the cancellation operations to the > stack > > of the original operation, you're starting from the event loop. And so > > timeouts are just normal event loop world, where you need to be careful > not > > to do to much CPU-intensive processing in any given call, and you can't > call > > blocking system calls (except using a thread). > > > > Of course, you can't timeout a function that's just looping using CPU, > or a > > blocking system call, and so code needs to be structured to deal with > this, > > but that's a different issue. > > So, basically, it's just "after T seconds you get this second callback > and it's up to you to deal with it"? I guess the timeout callback can > inspect the state of the operation, and cancel any pending operations? > > Do you have a way to translate timeouts into exceptions in > inlineCallbacks? If so, how is that working out? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Fri Oct 26 17:52:49 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Fri, 26 Oct 2012 17:52:49 +0200 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: err, I suppose the missing bit there is that you'll probably want to: reactor.callLater(timeout, d.cancel) As opposed to calling d.cancel() directly. (That snippet was in bpython-urwid with the reactor running in the background, but I doubt it'd work well anywhere else outside of manholes :)) cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Fri Oct 26 17:52:53 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 26 Oct 2012 17:52:53 +0200 Subject: [Python-ideas] Async API References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50889743.8060402@canterbury.ac.nz> <616958EC-D087-4864-BC3E-31B4095B3DF1@gmail.com> Message-ID: <20121026175253.1361628a@cosmocat> Le Thu, 25 Oct 2012 16:39:40 -0400, Terry Reedy a ?crit : > On 10/25/2012 12:10 PM, Yury Selivanov wrote: > > > - And what's your opinion on writing a PEP about making it possible > > to pass a custom socket-factory to stdlib objects? > > I think this is probably a good idea quite aside from async issues. I think it's a rather bad idea. It does not correspond to any real use case and will clutter the API with an additional parameter. Regards Antoine. From ryan at ryanhiebert.com Fri Oct 26 18:31:52 2012 From: ryan at ryanhiebert.com (Ryan D Hiebert) Date: Fri, 26 Oct 2012 09:31:52 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: <508A7A1A.5080206@nedbatchelder.com> References: <508A7A1A.5080206@nedbatchelder.com> Message-ID: <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> On Oct 26, 2012, at 4:55 AM, Ned Batchelder wrote: > Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. +1 I can't count how many times I've been on the right page, but the wrong version, and need to switch. From guido at python.org Fri Oct 26 18:36:59 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2012 09:36:59 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > err, I suppose the missing bit there is that you'll probably want to: > > reactor.callLater(timeout, d.cancel) > > As opposed to calling d.cancel() directly. (That snippet was in > bpython-urwid with the reactor running in the background, but I doubt it'd > work well anywhere else outside of manholes :)) So I think that Yuri's original problem statement, transformed to Twisted+Deferred, might still apply, depending on how you implement it. Yuri essentially did this: def foobar(): # a task try: yield finally: # must clean up regardless of whether action succeeded or failed: yield He then calls this with a timeout, with the semantics that if the generator is blocked in a yield when the timeout arrives, that yield raises a Timeout exception (and at no other time is Timeout raised). The problem with this is that if the action succeeds within the timeout, but barely, there's a chance that the cleanup of a *successful* action receives the Timeout exception. Apparently this bit Yuri. I'm not sure how you'd model that using just Deferreds, but using inlineCallbacks it seems the same thing might happen. Using Deferreds, I assume there's a common pattern to implement this that doesn't have this problem. Of course, using coroutines, there is too -- spawn the cleanup as an independent task. -- --Guido van Rossum (python.org/~guido) From itamar at futurefoundries.com Fri Oct 26 18:57:16 2012 From: itamar at futurefoundries.com (Itamar Turner-Trauring) Date: Fri, 26 Oct 2012 12:57:16 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum wrote: > On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > > err, I suppose the missing bit there is that you'll probably want to: > > > > reactor.callLater(timeout, d.cancel) > > > > As opposed to calling d.cancel() directly. (That snippet was in > > bpython-urwid with the reactor running in the background, but I doubt > it'd > > work well anywhere else outside of manholes :)) > > So I think that Yuri's original problem statement, transformed to > Twisted+Deferred, might still apply, depending on how you implement > it. Yuri essentially did this: > > def foobar(): # a task > try: > yield > finally: > # must clean up regardless of whether action succeeded or failed: > yield > > He then calls this with a timeout, with the semantics that if the > generator is blocked in a yield when the timeout arrives, that yield > raises a Timeout exception (and at no other time is Timeout raised). > The problem with this is that if the action succeeds within the > timeout, but barely, there's a chance that the cleanup of a > *successful* action receives the Timeout exception. Apparently this > bit Yuri. I'm not sure how you'd model that using just Deferreds, but > using inlineCallbacks it seems the same thing might happen. Using > Deferreds, I assume there's a common pattern to implement this that > doesn't have this problem. Of course, using coroutines, there is too > -- spawn the cleanup as an independent task. > If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue. In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Oct 26 19:06:14 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 13:06:14 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: <5A10D33B-39ED-453E-AA1A-DACF4BF4532C@gmail.com> On 2012-10-26, at 12:57 PM, Itamar Turner-Trauring wrote: > On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum wrote: > On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > > err, I suppose the missing bit there is that you'll probably want to: > > > > reactor.callLater(timeout, d.cancel) > > > > As opposed to calling d.cancel() directly. (That snippet was in > > bpython-urwid with the reactor running in the background, but I doubt it'd > > work well anywhere else outside of manholes :)) > > So I think that Yuri's original problem statement, transformed to > Twisted+Deferred, might still apply, depending on how you implement > it. Yuri essentially did this: > > def foobar(): # a task > try: > yield > finally: > # must clean up regardless of whether action succeeded or failed: > yield > > He then calls this with a timeout, with the semantics that if the > generator is blocked in a yield when the timeout arrives, that yield > raises a Timeout exception (and at no other time is Timeout raised). > The problem with this is that if the action succeeds within the > timeout, but barely, there's a chance that the cleanup of a > *successful* action receives the Timeout exception. Apparently this > bit Yuri. I'm not sure how you'd model that using just Deferreds, but > using inlineCallbacks it seems the same thing might happen. Using > Deferreds, I assume there's a common pattern to implement this that > doesn't have this problem. Of course, using coroutines, there is too > -- spawn the cleanup as an independent task. > > If you call cancel() on a Deferred that already has a result, nothing happens. So you don't get a TimeoutError if the operation has succeeded (or failed some other way). This would also be true when using inlineCallbacks, so there's no issue. > > In general I'm not clear why this is a problem: in a single-threaded program only one thing happens at a time. Your code for triggering a timeout always has the option to check if the operation has succeeded, without worrying about race conditions. Let me ask you a question that may help me and others to understand how inlineCallbacks works. If you write the following: def func(): try: yield one_thing() yield and_another() finally: yield and_finally() Then each of those yields will create a separate Deferred object, that 'inlineCallbacks' transparently dispatches via generator send/throw, right? And if you 'yield func()' the same will happen--'inlineCallbacks' will return a Deferred, that will have a result of 'func' execution? Thanks, Yury From guido at python.org Fri Oct 26 19:08:24 2012 From: guido at python.org (Guido van Rossum) Date: Fri, 26 Oct 2012 10:08:24 -0700 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 9:57 AM, Itamar Turner-Trauring wrote: > > > On Fri, Oct 26, 2012 at 12:36 PM, Guido van Rossum wrote: >> >> On Fri, Oct 26, 2012 at 8:52 AM, Laurens Van Houtven <_ at lvh.cc> wrote: >> > err, I suppose the missing bit there is that you'll probably want to: >> > >> > reactor.callLater(timeout, d.cancel) >> > >> > As opposed to calling d.cancel() directly. (That snippet was in >> > bpython-urwid with the reactor running in the background, but I doubt >> > it'd >> > work well anywhere else outside of manholes :)) >> >> So I think that Yuri's original problem statement, transformed to >> Twisted+Deferred, might still apply, depending on how you implement >> it. Yuri essentially did this: >> >> def foobar(): # a task >> try: >> yield >> finally: >> # must clean up regardless of whether action succeeded or failed: >> yield >> >> He then calls this with a timeout, with the semantics that if the >> generator is blocked in a yield when the timeout arrives, that yield >> raises a Timeout exception (and at no other time is Timeout raised). >> The problem with this is that if the action succeeds within the >> timeout, but barely, there's a chance that the cleanup of a >> *successful* action receives the Timeout exception. Apparently this >> bit Yuri. I'm not sure how you'd model that using just Deferreds, but >> using inlineCallbacks it seems the same thing might happen. Using >> Deferreds, I assume there's a common pattern to implement this that >> doesn't have this problem. Of course, using coroutines, there is too >> -- spawn the cleanup as an independent task. > > > If you call cancel() on a Deferred that already has a result, nothing > happens. So you don't get a TimeoutError if the operation has succeeded (or > failed some other way). This would also be true when using inlineCallbacks, > so there's no issue. > > In general I'm not clear why this is a problem: in a single-threaded program > only one thing happens at a time. Your code for triggering a timeout always > has the option to check if the operation has succeeded, without worrying > about race conditions. But the example is not single-threaded (in the informal sense that you use it here). Each yield is a suspension point where other things can happen, and one of those things could be a cancellation of *this* task (because of a timeout or otherwise). The example would have to set some flag indicating it has a result after the first yield (i.e. before entering the finally, or at least before yielding in the finally clause). And the timeout callback would have to check this flag. This makes it slightly awkward to design a general-purpose timeout mechanism for tasks written in this style -- if you expect a timeout or cancellation you must protect your cleanup code from it by using some API. Anyway, no need to respond: I think I understand how Twisted deals with this, and translating that into the world of PEP 380 is not your job. -- --Guido van Rossum (python.org/~guido) From chris.jerdonek at gmail.com Fri Oct 26 19:13:01 2012 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Fri, 26 Oct 2012 10:13:01 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> References: <508A7A1A.5080206@nedbatchelder.com> <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> Message-ID: On Fri, Oct 26, 2012 at 9:31 AM, Ryan D Hiebert wrote: > On Oct 26, 2012, at 4:55 AM, Ned Batchelder wrote: >> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. > > +1 I can't count how many times I've been on the right page, but the wrong version, and need to switch. I believe the primary issue filed for this is here: http://bugs.python.org/issue8040 --Chris From jstpierre at mecheye.net Fri Oct 26 19:14:56 2012 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Fri, 26 Oct 2012 13:14:56 -0400 Subject: [Python-ideas] Async API In-Reply-To: <5A10D33B-39ED-453E-AA1A-DACF4BF4532C@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <5A10D33B-39ED-453E-AA1A-DACF4BF4532C@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov wrote: ... snip ... > Let me ask you a question that may help me and others to understand > how inlineCallbacks works. > > If you write the following: > > def func(): > try: > yield one_thing() > yield and_another() > finally: > yield and_finally() > > Then each of those yields will create a separate Deferred object, that > 'inlineCallbacks' transparently dispatches via generator send/throw, > right? one_thing() and and_another() and and_finally() should return Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for completion/error, and resumes the generator at the appropriate time. You don't use the results from either Deferreds, so the values will just be thrown out. The yield/trampoline doesn't create any Deferreds for those operations itself. > And if you 'yield func()' the same will happen--'inlineCallbacks' will > return a Deferred, that will have a result of 'func' execution? You didn't decorate func with inlineCallbacks, but if you do, func() will give you Deferred. Note that func itself doesn't return any value. In Twisted land, this is done by defer.returnValue(), which uses exceptions to return a value to the trampoline. This maps well to the new sugar in 3.3. > Thanks, > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Jasper From yselivanov.ml at gmail.com Fri Oct 26 19:36:54 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 13:36:54 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: <508A7A1A.5080206@nedbatchelder.com> References: <508A7A1A.5080206@nedbatchelder.com> Message-ID: On 2012-10-26, at 7:55 AM, Ned Batchelder wrote: > On 10/25/2012 7:50 PM, Yury Selivanov wrote: >> Hi, >> >> I remember a discussion to make docs.python.org pointed to py3k docs by default. >> >> Are we still going to do that? >> > > Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. OK. I've just created an issue http://bugs.python.org/issue16331 with a working patch attached to it. Docs will look like this: https://dl.dropbox.com/u/21052/python/p3_doc_dd.png Please check it out! Thanks, Yury From christian at python.org Fri Oct 26 19:42:32 2012 From: christian at python.org (Christian Heimes) Date: Fri, 26 Oct 2012 19:42:32 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: Am 26.10.2012 01:50, schrieb Yury Selivanov: > Hi, > > I remember a discussion to make docs.python.org > pointed to py3k docs by default. > > Are we still going to do that? How about http://docs2.python.org for the latest stable version of Python 2.x and http://docs3.python.org for the latest stable of Python 3.x? The py3k docs traditionally point to the latest development version. Christian From yselivanov.ml at gmail.com Fri Oct 26 19:46:28 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 13:46:28 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> Christian, On 2012-10-26, at 1:42 PM, Christian Heimes wrote: > Am 26.10.2012 01:50, schrieb Yury Selivanov: >> Hi, >> >> I remember a discussion to make docs.python.org >> pointed to py3k docs by default. >> >> Are we still going to do that? > > How about http://docs2.python.org for the latest stable version of > Python 2.x and http://docs3.python.org for the latest stable of Python > 3.x? The py3k docs traditionally point to the latest development version. As for me, I like simple 'docs.python.org'. The rest of UX is easy to ensure with a little JS ;) Take a look at my patch attached to the issue 16331. - Yury From yselivanov.ml at gmail.com Fri Oct 26 19:49:58 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 13:49:58 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <5A10D33B-39ED-453E-AA1A-DACF4BF4532C@gmail.com> Message-ID: <9049D9EC-8317-447D-AD9D-5CDAD072EF6A@gmail.com> On 2012-10-26, at 1:14 PM, Jasper St. Pierre wrote: > On Fri, Oct 26, 2012 at 1:06 PM, Yury Selivanov wrote: > > ... snip ... > >> Let me ask you a question that may help me and others to understand >> how inlineCallbacks works. >> >> If you write the following: >> >> def func(): >> try: >> yield one_thing() >> yield and_another() >> finally: >> yield and_finally() >> >> Then each of those yields will create a separate Deferred object, that >> 'inlineCallbacks' transparently dispatches via generator send/throw, >> right? > > one_thing() and and_another() and and_finally() should return > Deferreds. inlineCallbacks gets those Deferreds, adds callbacks for > completion/error, and resumes the generator at the appropriate time. > You don't use the results from either Deferreds, so the values will > just be thrown out. The yield/trampoline doesn't create any Deferreds > for those operations itself. > >> And if you 'yield func()' the same will happen--'inlineCallbacks' will >> return a Deferred, that will have a result of 'func' execution? > > You didn't decorate func with inlineCallbacks, but if you do, func() > will give you Deferred. Note that func itself doesn't return any > value. In Twisted land, this is done by defer.returnValue(), which > uses exceptions to return a value to the trampoline. This maps well to > the new sugar in 3.3. Right, I forgot to decorate the 'func' with 'inlineCallbacks'. If it is decorated, though, how can I invoke it with a timeout? - Yury From bruce at leapyear.org Fri Oct 26 19:56:25 2012 From: bruce at leapyear.org (Bruce Leban) Date: Fri, 26 Oct 2012 10:56:25 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> References: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 10:46 AM, Yury Selivanov wrote: > > On 2012-10-26, at 1:42 PM, Christian Heimes wrote: > > >> Are we still going to do that? > > > > How about http://docs2.python.org for the latest stable version of > > Python 2.x and http://docs3.python.org for the latest stable of Python > > 3.x? The py3k docs traditionally point to the latest development version. > > As for me, I like simple 'docs.python.org'. > The rest of UX is easy to ensure with a little JS ;) > Take a look at my patch attached to the issue 16331. > > There are tons of links out there that would break if you switched to docs2 and docs3. JS is better. And it would accommodate a feature where a user can set a preference of what version of python documentation they want to see rather than defaulting to 2.7 or 3.x. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Fri Oct 26 20:04:09 2012 From: christian at python.org (Christian Heimes) Date: Fri, 26 Oct 2012 20:04:09 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> Message-ID: <508AD099.9080606@python.org> Am 26.10.2012 19:56, schrieb Bruce Leban: > There are tons of links out there that would break if you switched to > docs2 and docs3. JS is better. And it would accommodate a feature where > a user can set a preference of what version of python documentation they > want to see rather than defaulting to 2.7 or 3.x. We can have the FQDNs additionally to http://docs.python.org and have them as mnemonic for the correct Python 2.x or 3.x docs. It's easy to create an Apache rewrite rule that redirects the user to the proper documents. RewriteCond %{HTTP_HOST} =docs3.python.org [NC] RewriteRule ^/(.*) http://docs.python.org/release/3.3.0/$1 [R=301,L] Yury, I'm not arguing against your JS UI -- I actually like it. I like to have both. Christian From yselivanov.ml at gmail.com Fri Oct 26 20:09:43 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 14:09:43 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: <508AD099.9080606@python.org> References: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> <508AD099.9080606@python.org> Message-ID: <6C373A1F-6AD5-4852-ACD2-9BA72876C006@gmail.com> On 2012-10-26, at 2:04 PM, Christian Heimes wrote: > Am 26.10.2012 19:56, schrieb Bruce Leban: >> There are tons of links out there that would break if you switched to >> docs2 and docs3. JS is better. And it would accommodate a feature where >> a user can set a preference of what version of python documentation they >> want to see rather than defaulting to 2.7 or 3.x. > > We can have the FQDNs additionally to http://docs.python.org and have > them as mnemonic for the correct Python 2.x or 3.x docs. It's easy to > create an Apache rewrite rule that redirects the user to the proper > documents. > > RewriteCond %{HTTP_HOST} =docs3.python.org [NC] > RewriteRule ^/(.*) http://docs.python.org/release/3.3.0/$1 [R=301,L] > > Yury, I'm not arguing against your JS UI -- I actually like it. I like > to have both. Thanks ;) The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later users will use python 3. There is no future for python 2. That's why I think that it's better to have just one main doc destination that everybody knows, uses, and posts links to. Just my 2 cents. - Yury From breamoreboy at yahoo.co.uk Fri Oct 26 21:13:35 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Fri, 26 Oct 2012 20:13:35 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: <6C373A1F-6AD5-4852-ACD2-9BA72876C006@gmail.com> References: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> <508AD099.9080606@python.org> <6C373A1F-6AD5-4852-ACD2-9BA72876C006@gmail.com> Message-ID: On 26/10/2012 19:09, Yury Selivanov wrote: > > The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later > users will use python 3. There is no future for python 2. That's why > I think that it's better to have just one main doc destination that > everybody knows, uses, and posts links to. Just my 2 cents. > > - > Yury > I entirely agree with your sentiments. Complaints along the lines of "but library xyz isn't compatible with Python 3" should be met with a response from the Python community "what can we do to fix this situation". A very personnal preference, but I would like to see this happening rather than having people playing with new toys, like the Async API. YMMV. -- Cheers. Mark Lawrence. From chris.jerdonek at gmail.com Fri Oct 26 22:08:38 2012 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Fri, 26 Oct 2012 13:08:38 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <508A7A1A.5080206@nedbatchelder.com> Message-ID: On Fri, Oct 26, 2012 at 10:36 AM, Yury Selivanov wrote: > On 2012-10-26, at 7:55 AM, Ned Batchelder wrote: > >> On 10/25/2012 7:50 PM, Yury Selivanov wrote: >>> Hi, >>> >>> I remember a discussion to make docs.python.org pointed to py3k docs by default. >>> >>> Are we still going to do that? >>> >> >> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. > > OK. > > I've just created an issue http://bugs.python.org/issue16331 > with a working patch attached to it. Did you see my earlier response before this message that provides a link to an already-existing issue on this topic? --Chris From yselivanov.ml at gmail.com Fri Oct 26 22:11:35 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 26 Oct 2012 16:11:35 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <508A7A1A.5080206@nedbatchelder.com> Message-ID: <5D5BF27A-CF98-452F-8E69-3CBF92E3DF3C@gmail.com> On 2012-10-26, at 4:08 PM, Chris Jerdonek wrote: > On Fri, Oct 26, 2012 at 10:36 AM, Yury Selivanov > wrote: >> On 2012-10-26, at 7:55 AM, Ned Batchelder wrote: >> >>> On 10/25/2012 7:50 PM, Yury Selivanov wrote: >>>> Hi, >>>> >>>> I remember a discussion to make docs.python.org pointed to py3k docs by default. >>>> >>>> Are we still going to do that? >>>> >>> >>> Before we do anything to make py3 the default, let's please provide a navigation bar that shows the version, and makes it easy to switch between versions? Py2 is still vastly more used. >> >> OK. >> >> I've just created an issue http://bugs.python.org/issue16331 >> with a working patch attached to it. > > Did you see my earlier response before this message that provides a > link to an already-existing issue on this topic? Take a look at 16331. There is a open question there--which issue should be closed now ;) I apologize that I didn't find your issue (but I honestly tried to.) - Yury From albrecht.andi at gmail.com Fri Oct 26 22:22:22 2012 From: albrecht.andi at gmail.com (Andi Albrecht) Date: Fri, 26 Oct 2012 22:22:22 +0200 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: <50897609.4080808@netwok.org> References: <201210251548.32082.mark.hackett@metoffice.gov.uk> <50897609.4080808@netwok.org> Message-ID: Hi, On Thu, Oct 25, 2012 at 7:25 PM, ?ric Araujo wrote: > Hi, > > See http://bugs.python.org/issue14102 ?argparse: add ability to create a > man page? I've started to work on this issue some time ago. The starting point was a man page formatter based on optparse I wrote earlier. But I've encountered some problems since the output order of argparse formatters differ from what to expect on a man page. IIRC I saw the need to do some changes to the way how argparse formatters work but unfortunately got interrupted by other work. IMO adding a argparse formatter would the probably the right way to add man page support. There would even be no need to add this to stdlib then. Best regards, Andi > > Cheers > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From g.brandl at gmx.net Fri Oct 26 23:09:34 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 26 Oct 2012 23:09:34 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: Am 26.10.2012 19:42, schrieb Christian Heimes: > Am 26.10.2012 01:50, schrieb Yury Selivanov: >> Hi, >> >> I remember a discussion to make docs.python.org >> pointed to py3k docs by default. >> >> Are we still going to do that? > > How about http://docs2.python.org for the latest stable version of > Python 2.x and http://docs3.python.org for the latest stable of Python > 3.x? The py3k docs traditionally point to the latest development version. FWIW, docs3 already exists. Nobody is using it. Georg From andrew.svetlov at gmail.com Fri Oct 26 23:13:29 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Sat, 27 Oct 2012 00:13:29 +0300 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: Maybe just because it is a simple redirect? If python3 docs will be accessible as docs3.python.org instead docs.pycon.org/py3k people will start to use this address? On Sat, Oct 27, 2012 at 12:09 AM, Georg Brandl wrote: > Am 26.10.2012 19:42, schrieb Christian Heimes: >> Am 26.10.2012 01:50, schrieb Yury Selivanov: >>> Hi, >>> >>> I remember a discussion to make docs.python.org >>> pointed to py3k docs by default. >>> >>> Are we still going to do that? >> >> How about http://docs2.python.org for the latest stable version of >> Python 2.x and http://docs3.python.org for the latest stable of Python >> 3.x? The py3k docs traditionally point to the latest development version. > > FWIW, docs3 already exists. Nobody is using it. > > Georg > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Thanks, Andrew Svetlov From g.brandl at gmx.net Fri Oct 26 23:21:36 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 26 Oct 2012 23:21:36 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: I don't know, and I'm not fond of docs3, so I wouldn't make it more prominent. It was requested some years ago, and since it doesn't cause problems that way, I added it as a redirect. Georg Am 26.10.2012 23:13, schrieb Andrew Svetlov: > Maybe just because it is a simple redirect? > If python3 docs will be accessible as docs3.python.org instead > docs.pycon.org/py3k people will start to use this address? > > On Sat, Oct 27, 2012 at 12:09 AM, Georg Brandl wrote: >> Am 26.10.2012 19:42, schrieb Christian Heimes: >>> Am 26.10.2012 01:50, schrieb Yury Selivanov: >>>> Hi, >>>> >>>> I remember a discussion to make docs.python.org >>>> pointed to py3k docs by default. >>>> >>>> Are we still going to do that? >>> >>> How about http://docs2.python.org for the latest stable version of >>> Python 2.x and http://docs3.python.org for the latest stable of Python >>> 3.x? The py3k docs traditionally point to the latest development version. >> >> FWIW, docs3 already exists. Nobody is using it. >> >> Georg >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > From tjreedy at udel.edu Sat Oct 27 00:15:32 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Oct 2012 18:15:32 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: On 10/26/2012 4:47 AM, Nick Coghlan wrote: > Eventually, but not just yet :) I think it should already have been done. To not feature our latest release on the page where the latest releases have always before been featured is to say that it is somehow not a full production-ready release. -- Terry Jan Reedy From tjreedy at udel.edu Sat Oct 27 00:18:52 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Oct 2012 18:18:52 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: On 10/26/2012 5:09 PM, Georg Brandl wrote: > FWIW, docs3 already exists. Nobody is using it. I do, when I want to see the updated version instead of the older window's help version. -- Terry Jan Reedy From jeanpierreda at gmail.com Sat Oct 27 00:22:19 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 26 Oct 2012 18:22:19 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy wrote: > I think it should already have been done. To not feature our latest release > on the page where the latest releases have always before been featured is to > say that it is somehow not a full production-ready release. There were times when 3.1 and 3.2 were the latest releases, and they have never been featured there. They were also production ready. -- Devin From tjreedy at udel.edu Sat Oct 27 00:22:31 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Oct 2012 18:22:31 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> References: <508A7A1A.5080206@nedbatchelder.com> <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> Message-ID: > On Oct 26, 2012, at 4:55 AM, Ned Batchelder > wrote: >> Py2 is still vastly more used. Every time we release a new version, the previous version is vastly more used. But we have previously put the new docs on docs.p.org anyway. For beginners learning Python in classes, I suspect Python 3 is more used. (I certainly hope so ;-). -- Terry Jan Reedy From tjreedy at udel.edu Sat Oct 27 00:17:16 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Oct 2012 18:17:16 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: On 10/26/2012 1:42 PM, Christian Heimes wrote: > Am 26.10.2012 01:50, schrieb Yury Selivanov: >> Hi, >> >> I remember a discussion to make docs.python.org >> pointed to py3k docs by default. >> >> Are we still going to do that? > > How about http://docs2.python.org for the latest stable version of > Python 2.x and http://docs3.python.org for the latest stable of Python > 3.x? The py3k docs traditionally point to the latest development version. I thought we had half-way already decided on that, with the possibility of docs.python.org listing both. -- Terry Jan Reedy From cs at zip.com.au Sat Oct 27 00:46:44 2012 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 27 Oct 2012 09:46:44 +1100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: Message-ID: <20121026224644.GA28636@cskk.homeip.net> On 26Oct2012 18:22, Devin Jeanpierre wrote: | On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy wrote: | > I think it should already have been done. To not feature our latest release | > on the page where the latest releases have always before been featured is to | > say that it is somehow not a full production-ready release. | | There were times when 3.1 and 3.2 were the latest releases, and they | have never been featured there. They were also production ready. That's Terry's point: by not featuring them there we're insinuating that they were not production ready... -- Cameron Simpson You can blip it twice to clear the bore, But blip it thrice, and you've sinned once more. - Tom Warner From ned at nedbatchelder.com Sat Oct 27 00:58:53 2012 From: ned at nedbatchelder.com (Ned Batchelder) Date: Fri, 26 Oct 2012 18:58:53 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <508A7A1A.5080206@nedbatchelder.com> <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> Message-ID: <508B15AD.8050402@nedbatchelder.com> On 10/26/2012 6:22 PM, Terry Reedy wrote: >> On Oct 26, 2012, at 4:55 AM, Ned Batchelder >> wrote: >>> Py2 is still vastly more used. > > Every time we release a new version, the previous version is vastly > more used. But we have previously put the new docs on docs.p.org anyway. > I'm not suggesting having py2 as the default, just providing an easy way to get to them. I can read 2.7 docs and figure out how 2.6 works from them much more easily than I can read 3.3 docs and figure out how 2.7 works. > For beginners learning Python in classes, I suspect Python 3 is more > used. (I certainly hope so ;-). > Hmm, I don't think that's true just yet. --Ned. From jeanpierreda at gmail.com Sat Oct 27 01:36:03 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Fri, 26 Oct 2012 19:36:03 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <508A7A1A.5080206@nedbatchelder.com> <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> Message-ID: On Fri, Oct 26, 2012 at 6:22 PM, Terry Reedy wrote: > For beginners learning Python in classes, I suspect Python 3 is more used. > (I certainly hope so ;-). Instructors have their own kind of inertia. If they change major versions, they no longer get to reuse old slides, they have to rewrite old assignments, upgrade the automated test systems, and even just plain learn Python 3, which is a challenge of its own (albeit a small one.) Remember also that must non-research instructors are vastly overworked, and most research professors aren't exactly eager to burn lots of time in course preparation either, since their job is not to teach but to research. Considering that the differences between Python 2 and 3 are irrelevant for nearly any educational context, what's the payoff? The move is just something they have to do eventually because of bug support reasons, not something they are eager to do except out of some kind of enthusiasm (which, admittedly, instructors often have -- shiny is shiny.) My university (the University of Toronto) has switched to Python 3 for their new Coursera courses, because they involved writing material from scratch anyway, so might as well make it futureproof. The regular classes taught inside the university itself still use Python 2.7 (actually, they used Python 2.5 until the upgrade process a year and a half ago, which I was a part of), and other than the coursera work, as far as I am aware, no moves have been made to switch to Python 3. They might also switch to another language entirely instead. They used Racket in a couple of introductory courses last year, and I've heard good things from faculty and students involved. It's a more viable decision than it used to be, since a lot of work has to be done regardless to switch to Python 3, so the inertial reason of staying with Python is diminished. I don't think this will happen near-term, because they're still investing in Python, but it was nice to see that they were breaking out of their rut and trying new things. -- Devin From raymond.hettinger at gmail.com Sat Oct 27 04:09:29 2012 From: raymond.hettinger at gmail.com (Raymond Hettinger) Date: Fri, 26 Oct 2012 19:09:29 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <508A7A1A.5080206@nedbatchelder.com> <4D342DE2-F8F9-48BA-BA9D-6141C3FA28B6@ryanhiebert.com> Message-ID: <491A1C24-F152-46D2-AA09-A24307511730@gmail.com> On Oct 26, 2012, at 4:36 PM, Devin Jeanpierre wrote: >> For beginners learning Python in classes, I suspect Python 3 is more used. >> (I certainly hope so ;-). I've been teaching quite a bit this year. Python 3 isn't being used at all (by any of my clients or by any of the other instructors I know who are teaching Python). Raymond -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.dipierro at gmail.com Sat Oct 27 04:34:27 2012 From: massimo.dipierro at gmail.com (massimo.dipierro at gmail.com) Date: Fri, 26 Oct 2012 19:34:27 -0700 (PDT) Subject: [Python-ideas] docs.python.org Message-ID: <874981647.7869.1351305270158.JavaMail.seven@ap17.p0.sjc.7sys.net> An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Oct 27 04:55:33 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 26 Oct 2012 22:55:33 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: <20121026224644.GA28636@cskk.homeip.net> References: <20121026224644.GA28636@cskk.homeip.net> Message-ID: On 10/26/2012 6:46 PM, Cameron Simpson wrote: > On 26Oct2012 18:22, Devin Jeanpierre wrote: > | On Fri, Oct 26, 2012 at 6:15 PM, Terry Reedy wrote: > | > I think it should already have been done. To not feature our latest release > | > on the page where the latest releases have always before been featured is to > | > say that it is somehow not a full production-ready release. > | > | There were times when 3.1 and 3.2 were the latest releases, and they > | have never been featured there. They were also production ready. 3.1 came out in between 2.6 and 2.7 and one could argue that it was still somewhat a trial version and that switching back and forth (2.6, 3.1, 2.7) would not be a good idea. 3.2 came out 8 months after 2.7. I would have made the switch then, but I acknowledge that one could argue that 2.7 had not had its 18-24 months in the sun, and that 3.2 still lacked 3rd party library support. > That's Terry's point: by not featuring them there we're insinuating that they > were not production ready... 3.3 is now out 29 months after 2.7, library support is much improved, and the new unicode implementation fixes most to almost all the remaining problems with unicode. It is a release we can be proud of and should promote as the latest and greatest Python version. -- Terry Jan Reedy From pydsigner at gmail.com Sat Oct 27 05:06:51 2012 From: pydsigner at gmail.com (Daniel Foerster) Date: Fri, 26 Oct 2012 22:06:51 -0500 Subject: [Python-ideas] Async API Message-ID: <7516094788412279153@unknownmsgid> So, are threads still an option? I feel that many of these problems with generators could be solved with threads. -------------- next part -------------- An HTML attachment was scrubbed... URL: From breamoreboy at yahoo.co.uk Sat Oct 27 05:13:57 2012 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Sat, 27 Oct 2012 04:13:57 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> Message-ID: On 27/10/2012 03:55, Terry Reedy wrote: > > 3.3 is now out 29 months after 2.7, library support is much improved, > and the new unicode implementation fixes most to almost all the > remaining problems with unicode. It is a release we can be proud of and > should promote as the latest and greatest Python version. > +1 -- Cheers. Mark Lawrence. From yselivanov.ml at gmail.com Sat Oct 27 06:46:07 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 00:46:07 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> Message-ID: <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> On 2012-10-26, at 10:55 PM, Terry Reedy wrote: > 3.3 is now out 29 months after 2.7, library support is much improved, and the new unicode implementation fixes most to almost all the remaining problems with unicode. It is a release we can be proud of and should promote as the latest and greatest Python version. I feel the same. On the one hand I understand position to keep 2.7 as default here and there, as it's currently used more; but on the other, here is what we have: - default documentation page - 2.7 - python.org home page: New to Python or choosing between Python 2 and Python 3? Read Python 2 or Python 3 - python.org downloads: -- The current production versions are Python 2.7.3 and Python 3.3.0. -- If you don't know which version to use, start with Python 2.7; more existing third party software is compatible with Python 2 than Python 3 right now. -- First links to downloads - 2.7 Isn't it too much of python 2? What is the impression after all of this? Python 2.7 is the current and recommended version. I think that the message should be clear, and after 3 years it's time to say that python 3 is always the preferred way. After all, people are not dumb, if they use python 2 they can go and download it, and they certainly can find docs for it as well. - Yury From ncoghlan at gmail.com Sat Oct 27 06:54:45 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Oct 2012 14:54:45 +1000 Subject: [Python-ideas] Async API In-Reply-To: <7516094788412279153@unknownmsgid> References: <7516094788412279153@unknownmsgid> Message-ID: On Sat, Oct 27, 2012 at 1:06 PM, Daniel Foerster wrote: > So, are threads still an option? I feel that many of these problems with > generators could be solved with threads. No, because available operating systems can handle a few orders of magnitude more concurrent IO operations per process than they can handle threads per process. The idea of asynchronous programming is to only use additional threads when you really need them (i.e. for blocking synchronous operations with no asynchronous equivalent), thus providing support for a far greater number of concurrent operations per process than if you rely entirely on threads for concurrency. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Sat Oct 27 07:07:07 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 01:07:07 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <7516094788412279153@unknownmsgid> Message-ID: <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> Nick, On 2012-10-27, at 12:54 AM, Nick Coghlan wrote: > The idea of asynchronous programming is to > only use additional threads when you really need them (i.e. for > blocking synchronous operations with no asynchronous equivalent) BTW, you've touched a very interesting subject. There are lots of potentially blocking operations that are very hard to do asynchronously without threads. Such as working with directories or even reading from files (there is aio on linux, but I haven't seen a library that supports it.) It would be great if we can address those problems with the new async API. I.e. we can use threadpools where necessary, but make the public API look fancy and yield-from-able. Same approach that Joyent uses in their libuv. And when OSes gain more advanced and wide non-blocking support we can decrease use of threads. - Yury From ncoghlan at gmail.com Sat Oct 27 07:15:51 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Oct 2012 15:15:51 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Sat, Oct 27, 2012 at 2:46 PM, Yury Selivanov wrote: > I think that the message should be clear, and after 3 years it's time to say > that python 3 is always the preferred way. After all, people are not dumb, > if they use python 2 they can go and download it, and they certainly can find > docs for it as well. The message is clear, but some people just don't like the current message: Python 2 is still the recommended default version for production systems and applications. - most hosting services (including Platform-as-a-Service providers with a Python option) only offer Python 2 - Fedora, RHEL and derivatives still require Python 2 for all their system utilities (Ubuntu at least has migrated their core system tools, but I don't know about Debian upstream) - Django does not yet have a released version that supports Python 3 (and even once 1.5 final is out the door, the Python 3 support is technically classed as experimental until 1.6) - graphics support in Python 3 is still a little sketchy in some regards, but clearly improving (pygame and various GUI libraries like pyside already work, pyglet has an alpha version, there's no PIL/Pillow release, but there are working forks [1]) I don't think the ecosystem is to the point where it makes sense to flip the switch just yet, but I do think it would be reasonable to define the ecosystem state where we *will* flip the switch. The two key missing pieces for me are: - a Django release with non-experimental Python 3 support (i.e. likely to happen with Django 1.6) - an official release of PIL (or Pillow) that supports Python 3 (Why do I include those, and not Twisted? Because if you're a capable enough developer to cope with Twisted, you're going to be able to cope with the move from 3.3 back to 2.7) Cheers, Nick. [1] http://mail.python.org/pipermail/image-sig/2012-October/007080.html -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From Steve.Dower at microsoft.com Sat Oct 27 07:41:41 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sat, 27 Oct 2012 05:41:41 +0000 Subject: [Python-ideas] Async API In-Reply-To: <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> References: <7516094788412279153@unknownmsgid> , <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> Message-ID: Yury Selivanov wrote: > It would be great if we can address those problems with the new > async API. I.e. we can use threadpools where necessary, but make > the public API look fancy and yield-from-able. Same approach that > Joyent uses in their libuv. And when OSes gain more advanced and > wide non-blocking support we can decrease use of threads. This certainly seems to be the plan, though I expect the details will be determined as libraries are updated to support the async API. As long as we ensure that the API itself can support event loops and operations using threads and other OS primitives, then we don't need to specify each and every one at this stage. My design (which I'm writing up now) puts most of the responsibility on the active scheduler, which should make it much easier to have different default schedulers for each platform (and maybe specialised ones that are optimised for more limited situations) while the operations themselves can be built out of existing Python functions. I'll post more details soon, but it basically allows schedulers to optionally support some operations (such as select() or Condition.wait()) 'natively', with the operation only having to implement a fallback (presumably on a thread pool). Cheers, Steve From yselivanov.ml at gmail.com Sat Oct 27 07:44:26 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 01:44:26 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On 2012-10-27, at 1:15 AM, Nick Coghlan wrote: > I don't think the ecosystem is to the point where it makes sense to > flip the switch just yet, but I do think it would be reasonable to > define the ecosystem state where we *will* flip the switch. The two > key missing pieces for me are: > - a Django release with non-experimental Python 3 support (i.e. likely > to happen with Django 1.6) > - an official release of PIL (or Pillow) that supports Python 3 One last thought (no need to reply if you disagree). What if it's all "chicken or the egg" problem? Maybe the right strategy is not to hide python 2 from everywhere and start actively promoting py3k, but to push it gradually? Start with docs switching to py3k by default. That shouldn't be harmful (and I hope that my docs theme patch will be accepted soon). A bit later, when Django finally adds python 3 support - change python.org homepage with a more prominent advice to use py3d. Etc. Thanks, Yury From ncoghlan at gmail.com Sat Oct 27 08:22:20 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Oct 2012 16:22:20 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov wrote: > Start with docs switching to py3k by default. That shouldn't be harmful > (and I hope that my docs theme patch will be accepted soon). Actually, there are at least a few very real harms that come from switching the docs over: 1. Many third party Python 2 tutorials include links to our docs. We can't magically reach out to those sites and update their links, so they will end up linking to Python 3 resources from Python 2 ones 2. It breaks links on sites like Stack Overflow and in mailing list archives and our own bug tracker, which currently link to the main docs to explain Python 2 behaviour 3. it completely breaks direct hyperlinks to names that no longer exist in Python 3 (even the ones that exist under new names). I'm actually wondering if docs.python.org should be updated *now* with a rewrite rule that redirects to a more explicit docs.python.org/2.x/ URL. At the moment, there is no easy way to get hold of a stable URL for the Python 2 docs, and nothing we can put in any advance announcement of a migration to say something like: "docs.python.org will switch to displaying the Python 3 documentation by default in June 2013. Please update any direct links that are intended to refer specifically to the Python 2 documentation by including a leading '/2.x/' in the path component of the URL. For example, 'http://docs.python.org/library/os' would become 'http://docs.python.org/2.x/library/os'. Between now and the migration in June 2013, affected links will be automatically redirected to the new stable Python 2.x URLs". So that's my concrete proposal: 1. We pick a date (June next year sounds about right) 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") 3. We start redirecting affected pages immediately 4. We add a notice like the one above to the home page of the 2.7 docs, announce it on the PSF blog, announce it far and wide Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From donald.stufft at gmail.com Sat Oct 27 09:11:02 2012 From: donald.stufft at gmail.com (Donald Stufft) Date: Sat, 27 Oct 2012 03:11:02 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Saturday, October 27, 2012 at 2:22 AM, Nick Coghlan wrote: > So that's my concrete proposal: > 1. We pick a date (June next year sounds about right) > 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") > 3. We start redirecting affected pages immediately > 4. We add a notice like the one above to the home page of the 2.7 > docs, announce it on the PSF blog, announce it far and wide > +1 Can we change /py3k/ to /3.x/ and redirect the old one to match? Another idea is similar, but instead of doing /2.x/ always redirect the the root of docs.python.org to the latest production release, so right now /foo would redirect to /2.7/foo. This is even better for maintaining links to the actual resource people meant to link to. Could even include a header at the top of old versions saying that "You are currently viewing the docs for 2.5. Click here to view the docs for 2.7". -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Sat Oct 27 09:17:41 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 03:17:41 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: <533ADC11-627D-4251-A83C-1C96213AA42E@gmail.com> Nick, On 2012-10-27, at 2:22 AM, Nick Coghlan wrote: > I'm actually wondering if docs.python.org should be updated *now* with > a rewrite rule that redirects to a more explicit docs.python.org/2.x/ > URL. At the moment, there is no easy way to get hold of a stable URL > for the Python 2 docs, and nothing we can put in any advance > announcement of a migration to say something like: > > "docs.python.org will switch to displaying the Python 3 documentation > by default in June 2013. Please update any direct links that are > intended to refer specifically to the Python 2 documentation by > including a leading '/2.x/' in the path component of the URL. For > example, 'http://docs.python.org/library/os' would become > 'http://docs.python.org/2.x/library/os'. Between now and the migration > in June 2013, affected links will be automatically redirected to the > new stable Python 2.x URLs". > > So that's my concrete proposal: > 1. We pick a date (June next year sounds about right) > 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") > 3. We start redirecting affected pages immediately > 4. We add a notice like the one above to the home page of the 2.7 > docs, announce it on the PSF blog, announce it far and wide Now that's a great plan! Big +1. A few comments: 1. I'd still vote for an earlier date, like February/March 2013 2. How about simple docs.pyhton.org/2 and docs.python.org/3 ? - Yury From bruce at leapyear.org Sat Oct 27 10:02:53 2012 From: bruce at leapyear.org (Bruce Leban) Date: Sat, 27 Oct 2012 01:02:53 -0700 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 11:22 PM, Nick Coghlan wrote: > On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov > wrote: > > Start with docs switching to py3k by default. That shouldn't be harmful > > (and I hope that my docs theme patch will be accepted soon). > > Actually, there are at least a few very real harms that come from > switching the docs over: > 1. Many third party Python 2 tutorials include links to our docs. We > can't magically reach out to those sites and update their links, so > they will end up linking to Python 3 resources from Python 2 ones > And many tutorials are not intentionally version specific. > 2. It breaks links on sites like Stack Overflow and in mailing list > archives and our own bug tracker, which currently link to the main > docs to explain Python 2 behaviour > However, just because stack overflow and other sites link to 2.x docs doesn't mean that the user wants to read the 2.x docs. Scenario: I'm using 3.x, I go to stack overflow to find out how to do something. it links to the docs for the old version which is inaccurate for me. What I want is to be able to quickly get to the doc that's relevant to *my* version. > 3. it completely breaks direct hyperlinks to names that no longer > exist in Python 3 (even the ones that exist under new names) > Urls for things that have been renamed should redirect to the appropriate pages (whether docs on the new thing or an explanation of why this feature doesn't exist in that version). This should work both forwards (2.x feature renamed in 3.x) and backwards (3.x feature doesn't exist in 2.x) > So that's my concrete proposal: 1. We pick a date (June next year sounds about right) > 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") > 3. We start redirecting affected pages immediately > 4. We add a notice like the one above to the home page of the 2.7 > docs, announce it on the PSF blog, announce it far and wide > I think this following proposal provides a better user experience. If you don't think this is better, why? 2. Pick a stable url for docs and a way for referrers to select the referenced version when that matters Examples: (a) http://docs.python.org/dev/library/os.html#os.walk -- displays user's preferred version (see below) (b) http://docs.python.org/dev/library/os.html?version=2.7#os.walk -- displays version 2.7 if user does not have user's preferred version (c) http://docs.python.org/dev/library/os.html?exactversion=2.7#os.walk-- always displays version 2.7 (discouraged unless talking specifically about that version) 3. All the pages have a version picker (as previously discussed). The dropdown to pick a version number could also have a way to pick the user's preferred version and save it in a cookie. 4. Make the version number more prominent in case (c) so user will be aware that they are not seeing their preferred version. --- Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 27 10:52:16 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 27 Oct 2012 18:52:16 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Sat, Oct 27, 2012 at 6:02 PM, Bruce Leban wrote: > 2. Pick a stable url for docs and a way for referrers to select the > referenced version when that matters > > Examples: > (a) http://docs.python.org/dev/library/os.html#os.walk -- displays > user's preferred version (see below) > (b) http://docs.python.org/dev/library/os.html?version=2.7#os.walk -- > displays version 2.7 if user does not have user's preferred version > (c) http://docs.python.org/dev/library/os.html?exactversion=2.7#os.walk > -- always displays version 2.7 (discouraged unless talking specifically > about that version) We can already reference exact versions: http://docs.python.org/2.6/library/os http://docs.python.org/2.7/library/os http://docs.python.org/3.2/library/os http://docs.python.org/3.3/library/os For non-current releases, those will redirect to the appropriate release-specific URL, for the two current releases, it will redirect to the stable "latest release" URL. The problem is the current stable URLs for "latest Python 2" and "latest Python 3" are respectively: http://docs.python.org/library/os http://docs.python.org/py3k/library/os (despite comments elsewhere in the thread, "py3k" does *not* resolve to the dev docs - those use the "/dev/" prefix in the path component) It was suggested previously (i.e. more than a year ago) that it would be better if 2.x/3.x worked as expected so people could update their links appropriately, and I thought we had agreement on making that change, but I guess nobody with server access agreed that was the case (there's no ticket tracker currently in place for the python.org infrastructure). Note that I am deliberately limiting my suggestions to those which require nothing new in the docs theming, just updates to the URL handling in the web server. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From kristjan at ccpgames.com Sat Oct 27 12:27:40 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Sat, 27 Oct 2012 10:27:40 +0000 Subject: [Python-ideas] Async API In-Reply-To: <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> References: <7516094788412279153@unknownmsgid> <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> Message-ID: Yes, stacklesslib provides this functionality with the call_on_thread() api, which turns a blocking operation into a non-blocking one. This is also useful for cpu bound operations, btw. For example, in EVE, when we need to do file operations and zipping of local files, we do it using this api. K -----Original Message----- From: Python-ideas [mailto:python-ideas-bounces+kristjan=ccpgames.com at python.org] On Behalf Of Yury Selivanov Sent: 27. okt?ber 2012 05:07 To: Nick Coghlan Cc: Python-ideas at python.org Subject: Re: [Python-ideas] Async API It would be great if we can address those problems with the new async API. I.e. we can use threadpools where necessary, but make the public API look fancy and yield-from-able. Same approach that Joyent uses in their libuv. And when OSes gain more advanced and wide non-blocking support we can decrease use of threads. From mal at egenix.com Sat Oct 27 12:54:20 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 27 Oct 2012 12:54:20 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: <508BBD5C.305@egenix.com> On 27.10.2012 08:22, Nick Coghlan wrote: > So that's my concrete proposal: > 1. We pick a date (June next year sounds about right) > 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") Why "/2.x/" and not just "/2/" ? > 3. We start redirecting affected pages immediately I think we should do the same for all Python 3 resources, i.e. have "/library/os.html" redirect to "/3/library/os.html" so that we don't run into the same problem again in the future. > 4. We add a notice like the one above to the home page of the 2.7 > docs, announce it on the PSF blog, announce it far and wide We also need a solution for URLs that exist for Python 2, but not for Python 3. Those should be redirected to the Python 2 resource automatically, e.g. URLs pointing to the Python 2 modules that were renamed in Python 3. BTW: Will you write up a PEP for this ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 27 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-10-29: PyCon DE 2012, Leipzig, Germany ... 2 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From p.f.moore at gmail.com Sat Oct 27 13:06:41 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 27 Oct 2012 12:06:41 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On 27 October 2012 08:11, Donald Stufft wrote: > On Saturday, October 27, 2012 at 2:22 AM, Nick Coghlan wrote: > > So that's my concrete proposal: > 1. We pick a date (June next year sounds about right) > 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") > 3. We start redirecting affected pages immediately > 4. We add a notice like the one above to the home page of the 2.7 > docs, announce it on the PSF blog, announce it far and wide > > +1 +1 also. > Can we change /py3k/ to /3.x/ and redirect the old one to match? +1. I'm sorry, but now that Python 3 is up to 3.3, and is a really solid version, the "py3k" name doesn't feel "official" enough. > Another idea is similar, but instead of doing /2.x/ always redirect the > the root of docs.python.org to the latest production release, so > right now /foo would redirect to /2.7/foo. This is even better for > maintaining links to the actual resource people meant to link > to. Could even include a header at the top of old versions saying that > "You are currently viewing the docs for 2.5. Click here to view the > docs for 2.7". -1. Certainly what I (and I suspect many others) usually care about is getting at the "Python 2" or "Python 3" documentation, not a specific version. Having the 2.7, 2.6 links is fine, but I don't *think* of myself as going to the 2.7 docs, but rather to the 2.x docs (as opposed to 3.x). The "New in x.y" annotations give me the history I need. And I think that's true of links as well - they would be to "python 2" or "python 3", not (normally) to a specific minor version. Paul. From dickinsm at gmail.com Sat Oct 27 13:34:44 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 27 Oct 2012 12:34:44 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: On Sat, Oct 27, 2012 at 7:22 AM, Nick Coghlan wrote: > On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov wrote: >> Start with docs switching to py3k by default. That shouldn't be harmful >> (and I hope that my docs theme patch will be accepted soon). > > Actually, there are at least a few very real harms that come from > switching the docs over: > 1. Many third party Python 2 tutorials include links to our docs. We > can't magically reach out to those sites and update their links, so > they will end up linking to Python 3 resources from Python 2 ones As a data point, MIT's '6.00x Introduction to Computer Science and Programming' EdX online course contains many links of the form "http://docs.python.org/library/...". I don't have exact numbers, but judging by the EPD download numbers we've been seeing there are definitely thousands of students, and probably tens of thousands, taking that course. Switching docs.python.org without a generous warning period would not be a good idea for those students. -- Mark From solipsis at pitrou.net Sat Oct 27 13:43:48 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Oct 2012 13:43:48 +0200 Subject: [Python-ideas] docs.python.org References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: <20121027134348.71ebaba1@pitrou.net> On Sat, 27 Oct 2012 12:06:41 +0100 Paul Moore wrote: > > > Another idea is similar, but instead of doing /2.x/ always redirect the > > the root of docs.python.org to the latest production release, so > > right now /foo would redirect to /2.7/foo. This is even better for > > maintaining links to the actual resource people meant to link > > to. Could even include a header at the top of old versions saying that > > "You are currently viewing the docs for 2.5. Click here to view the > > docs for 2.7". > > -1. Certainly what I (and I suspect many others) usually care about is > getting at the "Python 2" or "Python 3" documentation, not a specific > version. Having the 2.7, 2.6 links is fine, but I don't *think* of > myself as going to the 2.7 docs, but rather to the 2.x docs (as > opposed to 3.x). The "New in x.y" annotations give me the history I > need. And I think that's true of links as well - they would be to > "python 2" or "python 3", not (normally) to a specific minor version. I'm not sure why you're -1 about something which wouldn't affect you negatively. As you say yourself, the 2.7 docs have all the information you need about previous releases as well (because of the versionadded and versionchanged markers). *However*, the 2.6 and previous docs don't have information about useful stuff added in 2.7. And since 2.7 is the last in the 2.x line, I think it makes sense to reflect that explicitly in the redirections. Regards Antoine. From p.f.moore at gmail.com Sat Oct 27 14:21:59 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 27 Oct 2012 13:21:59 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: <20121027134348.71ebaba1@pitrou.net> References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <20121027134348.71ebaba1@pitrou.net> Message-ID: On 27 October 2012 12:43, Antoine Pitrou wrote: > On Sat, 27 Oct 2012 12:06:41 +0100 > Paul Moore wrote: >> >> > Another idea is similar, but instead of doing /2.x/ always redirect the >> > the root of docs.python.org to the latest production release, so >> > right now /foo would redirect to /2.7/foo. This is even better for >> > maintaining links to the actual resource people meant to link >> > to. Could even include a header at the top of old versions saying that >> > "You are currently viewing the docs for 2.5. Click here to view the >> > docs for 2.7". >> >> -1. Certainly what I (and I suspect many others) usually care about is >> getting at the "Python 2" or "Python 3" documentation, not a specific >> version. Having the 2.7, 2.6 links is fine, but I don't *think* of >> myself as going to the 2.7 docs, but rather to the 2.x docs (as >> opposed to 3.x). The "New in x.y" annotations give me the history I >> need. And I think that's true of links as well - they would be to >> "python 2" or "python 3", not (normally) to a specific minor version. > > I'm not sure why you're -1 about something which wouldn't affect you > negatively. As you say yourself, the 2.7 docs have all the information > you need about previous releases as well (because of the versionadded > and versionchanged markers). *However*, the 2.6 and previous docs don't > have information about useful stuff added in 2.7. Maybe I misunderstood. I was assuming that there would be no "2.x" link, only "2.7". That's what I'm against - I would prefer to use a generic 2.x link to get to the Python 2 docs if I needed them (just as I use docs.python.org at the moment). My -1 was too strong though, make that a -0 (and a "don't care" if there will be a 2.x link as well as the explicit ones). > And since 2.7 is the last in the 2.x line, I think it makes sense to > reflect that explicitly in the redirections. I'm not against an explicit 2.7 link - we have that already, don't we? Paul From mal at egenix.com Sat Oct 27 15:27:53 2012 From: mal at egenix.com (M.-A. Lemburg) Date: Sat, 27 Oct 2012 15:27:53 +0200 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: <508BE159.2030407@egenix.com> On 27.10.2012 13:34, Mark Dickinson wrote: > On Sat, Oct 27, 2012 at 7:22 AM, Nick Coghlan wrote: >> On Sat, Oct 27, 2012 at 3:44 PM, Yury Selivanov wrote: >>> Start with docs switching to py3k by default. That shouldn't be harmful >>> (and I hope that my docs theme patch will be accepted soon). >> >> Actually, there are at least a few very real harms that come from >> switching the docs over: >> 1. Many third party Python 2 tutorials include links to our docs. We >> can't magically reach out to those sites and update their links, so >> they will end up linking to Python 3 resources from Python 2 ones > > As a data point, MIT's '6.00x Introduction to Computer Science and > Programming' EdX online course contains many links of the form > "http://docs.python.org/library/...". I don't have exact numbers, but > judging by the EPD download numbers we've been seeing there are > definitely thousands of students, and probably tens of thousands, > taking that course. Switching docs.python.org without a generous > warning period would not be a good idea for those students. Wouldn't it be possible to leave the non-versioned URLs redirecting to the Python 2 versions for say another 5 years and instead have the base URL http://docs.python.org/ provide links to either the Python 2 or 3 version (perhaps even listing the various available minor versions) ? That would avoid the issue of having existing course material on the web fail to work after just one year. At PyCon UK we discussed these issues with teachers and people interested in getting Python on the UK teaching plan. Their main concern was that text books and course material have a much longer life period than just 18 months. For them it's very important to have a stable release of both Python and its documentation that remains valid for at least 5 years. I hope that Python 3.x has stabilized enough now with the 3.3 release that it can become the basis for such materials. In any case, if we want Python 3 to be picked up in such environments, we cannot easily go about breaking things like URLs to documentation and will have to settle on a stable approach soon. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 27 2012) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2012-10-29: PyCon DE 2012, Leipzig, Germany ... 2 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From ncoghlan at gmail.com Sat Oct 27 16:40:46 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Oct 2012 00:40:46 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: <508BBD5C.305@egenix.com> References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: On Sat, Oct 27, 2012 at 8:54 PM, M.-A. Lemburg wrote: > On 27.10.2012 08:22, Nick Coghlan wrote: >> So that's my concrete proposal: >> 1. We pick a date (June next year sounds about right) >> 2. We pick a stable URL prefix for the Python 2 docs (I vote "/2.x/") > > Why "/2.x/" and not just "/2/" ? I find the /2/ vs /3/ too easy to miss in the middle of a full URL, whereas I find the extra space to the right of the number in /2.x/ vs /3.x/ makes them easier to separate. However, in writing up the PEP, I discovered it was annoyingly ambiguous whether "/2.x/" specifically meant that URL, or whether it meant "/2.7/" and friends, so I switched to the shorter form. >> 3. We start redirecting affected pages immediately > > I think we should do the same for all Python 3 resources, i.e. > have "/library/os.html" redirect to "/3/library/os.html" so that > we don't run into the same problem again in the future. In writing up the PEP, I rediscovered an old proposal of mine to avoid breaking deep links by simply do a "documented deprecation" of unqualified deep links, but otherwise leaving them pointing to Python 2. Only the default landing page would be switched to Python 3. Since that approach avoids a *lot* of issues, that's what I ended writing up. >> 4. We add a notice like the one above to the home page of the 2.7 >> docs, announce it on the PSF blog, announce it far and wide > > We also need a solution for URLs that exist for Python 2, but > not for Python 3. Those should be redirected to the Python 2 > resource automatically, e.g. URLs pointing to the Python 2 modules > that were renamed in Python 3. > > BTW: Will you write up a PEP for this ? Committed as PEP 430, should show up http://www.python.org/dev/peps/pep-0430 before too long. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From _ at lvh.cc Sat Oct 27 17:11:33 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 27 Oct 2012 17:11:33 +0200 Subject: [Python-ideas] Async API In-Reply-To: <9049D9EC-8317-447D-AD9D-5CDAD072EF6A@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50870B41.6090504@rushing.nightmare.com> <2AA1835A-3A23-47BD-BB3F-C0D054BA69B4@gmail.com> <5087218E.8090805@rushing.nightmare.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <5A10D33B-39ED-453E-AA1A-DACF4BF4532C@gmail.com> <9049D9EC-8317-447D-AD9D-5CDAD072EF6A@gmail.com> Message-ID: On Fri, Oct 26, 2012 at 7:49 PM, Yury Selivanov wrote: > If it is decorated, though, how can I invoke it with a timeout? > The important thing to remember is that the fundamental abstraction at play here is the deferred. Calling such a decorated function gives you a deferred. So, you call it with a timeout the same way you timeout (cancel) any deferred: d = deferred_returning_expression reactor.callLater(timeout, d.cancel) Where deferred_returning_expression can be anything, including calling your @inlineCallbacks-decorated function. The way it fits in with all existing stuff, making it look an awful lot like a lot of existing stuff, is probably why deferred cancellation is one of the more recent features to make it into twisted: a lot of people did similar things using the tools that were already there. > - > Yury > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sat Oct 27 17:16:06 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 27 Oct 2012 17:16:06 +0200 Subject: [Python-ideas] Async API In-Reply-To: References: <7516094788412279153@unknownmsgid> <1768C1F7-71A9-4992-A10D-9478F3AE657A@gmail.com> Message-ID: Yes, thread pools are unfortunately necessary evils. Twisted comes with a few tools to handle the use cases we're discussing. The 1:1 equivalent for call_on_thread would be deferToThread/deferToThreadPool (deferToThread == deferToThreadPool except with the default thread pool instead of a specific one). There are a few other tools: - spawnProcess (equiv to subprocess module, except with async communication with the subprocess) - cooperative multitasking, such (twisted.internet.task.) Cooperator and coiterate: basically resumable tasks that are explicit about where they can be paused/resumed - third party tools such as corotwine, giving stackless-style coroutines, or ampoule, giving remote subprocesses The more I learn about other stuff the more I see that everything is the same because everything is different :) On Sat, Oct 27, 2012 at 12:27 PM, Kristj?n Valur J?nsson < kristjan at ccpgames.com> wrote: > Yes, stacklesslib provides this functionality with the call_on_thread() > api, which turns a blocking operation into a non-blocking one. This is > also useful for cpu bound operations, btw. For example, in EVE, when we > need to do file operations and zipping of local files, we do it using this > api. > K > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Oct 27 17:46:30 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 27 Oct 2012 17:46:30 +0200 Subject: [Python-ideas] docs.python.org References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <20121027134348.71ebaba1@pitrou.net> Message-ID: <20121027174630.1462881b@pitrou.net> On Sat, 27 Oct 2012 13:21:59 +0100 Paul Moore wrote: > > > And since 2.7 is the last in the 2.x line, I think it makes sense to > > reflect that explicitly in the redirections. > > I'm not against an explicit 2.7 link - we have that already, don't we? Yes, but the proposal is about redirecting docs.python.org to docs.python.org/2.7. Regards Antoine. From yselivanov.ml at gmail.com Sat Oct 27 17:53:41 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 11:53:41 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: <0C499D34-A0CD-4EDE-B2B9-F322C0C07F4C@gmail.com> On 2012-10-27, at 10:40 AM, Nick Coghlan wrote: > Committed as PEP 430, should show up > http://www.python.org/dev/peps/pep-0430 before too long. I like the PEP, Nick. - Yury From pydsigner at gmail.com Sat Oct 27 18:02:45 2012 From: pydsigner at gmail.com (Daniel Foerster) Date: Sat, 27 Oct 2012 11:02:45 -0500 Subject: [Python-ideas] Async API In-Reply-To: References: <7516094788412279153@unknownmsgid> Message-ID: <508C05A5.5030206@gmail.com> On 10/26/2012 11:54 PM, Nick Coghlan wrote: > On Sat, Oct 27, 2012 at 1:06 PM, Daniel Foerster wrote: >> So, are threads still an option? I feel that many of these problems with >> generators could be solved with threads. > No, because available operating systems can handle a few orders of > magnitude more concurrent IO operations per process than they can > handle threads per process. The idea of asynchronous programming is to > only use additional threads when you really need them (i.e. for > blocking synchronous operations with no asynchronous equivalent), thus > providing support for a far greater number of concurrent operations > per process than if you rely entirely on threads for concurrency. > > Cheers, > Nick. > I'm realizing that I perhaps don't grasp the entirety of Asynchronous programming. However, The only results I have found are for .NET and C#. Would you like to recommend some online sources I could read? From tjreedy at udel.edu Sat Oct 27 22:12:38 2012 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 27 Oct 2012 16:12:38 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: <0C499D34-A0CD-4EDE-B2B9-F322C0C07F4C@gmail.com> References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> <0C499D34-A0CD-4EDE-B2B9-F322C0C07F4C@gmail.com> Message-ID: On 10/27/2012 11:53 AM, Yury Selivanov wrote: > On 2012-10-27, at 10:40 AM, Nick Coghlan wrote: > >> Committed as PEP 430, should show up >> http://www.python.org/dev/peps/pep-0430 before too long. > > I like the PEP, Nick. It looks good to me also. I agree that breaking the existing non-specific deep links is a problem. As I understand the proposal, browser bars would only display version- or at least series-specific links so that future copy and paste of links would do the right thing for the indefinite future. -- Terry Jan Reedy From dickinsm at gmail.com Sat Oct 27 22:16:56 2012 From: dickinsm at gmail.com (Mark Dickinson) Date: Sat, 27 Oct 2012 21:16:56 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: On Sat, Oct 27, 2012 at 3:40 PM, Nick Coghlan wrote: > In writing up the PEP, I rediscovered an old proposal of mine to avoid > breaking deep links by simply do a "documented deprecation" of > unqualified deep links, but otherwise leaving them pointing to Python > 2. Only the default landing page would be switched to Python 3. > > Since that approach avoids a *lot* of issues, that's what I ended writing up. This seems like a nice solution. -- Mark From greg.ewing at canterbury.ac.nz Sun Oct 28 01:21:35 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 28 Oct 2012 12:21:35 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: <508C6C7F.5050408@canterbury.ac.nz> Guido van Rossum wrote: > The example would have to set some flag indicating it has a result > after the first yield (i.e. before entering the finally, or at least > before yielding in the finally clause). And the timeout callback would > have to check this flag. This makes it slightly awkward to design a > general-purpose timeout mechanism for tasks written in this style -- > if you expect a timeout or cancellation you must protect your cleanup > code from it by using some API. This is where having a way to find out whether a generator is in a finally clause would help. It would allow the scheduler to take care of this transparently. -- Greg From yselivanov.ml at gmail.com Sun Oct 28 01:45:13 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 19:45:13 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508C6C7F.5050408@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <508C6C7F.5050408@canterbury.ac.nz> Message-ID: On 2012-10-27, at 7:21 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> The example would have to set some flag indicating it has a result >> after the first yield (i.e. before entering the finally, or at least >> before yielding in the finally clause). And the timeout callback would >> have to check this flag. This makes it slightly awkward to design a >> general-purpose timeout mechanism for tasks written in this style -- >> if you expect a timeout or cancellation you must protect your cleanup >> code from it by using some API. > > This is where having a way to find out whether a generator > is in a finally clause would help. It would allow the scheduler > to take care of this transparently. Right. But now I'm not sure this approach will work with yield-froms. As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation. - Yury From greg.ewing at canterbury.ac.nz Sun Oct 28 01:52:40 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 28 Oct 2012 12:52:40 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> Message-ID: <508C73C8.6050104@canterbury.ac.nz> Yury Selivanov wrote: > But now I'm not sure this approach will work with yield-froms. > As when you yield-fromming scheduler knows nothing about the chain of > generators, as it's all hidden in the yield-from implementation. I think this just means that the implementation would involve more than looking at a single bit. Something like an in_finally() method that looks along the yield-from chain and returns true if any of the generators are in a finally section. -- Greg From yselivanov.ml at gmail.com Sun Oct 28 02:29:31 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sat, 27 Oct 2012 20:29:31 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508C73C8.6050104@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <508C73C8.6050104@canterbury.ac.nz> Message-ID: <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> On 2012-10-27, at 7:52 PM, Greg Ewing wrote: > Yury Selivanov wrote: >> But now I'm not sure this approach will work with yield-froms. >> As when you yield-fromming scheduler knows nothing about the chain of generators, as it's all hidden in the yield-from implementation. > > I think this just means that the implementation would involve > more than looking at a single bit. Something like an in_finally() > method that looks along the yield-from chain and returns true if > any of the generators are in a finally section. That would not be a solution either. Imagine that we have two coroutines: @coroutine def c1(): try: yield c2().with_timeout(1.0) # p1 finally: try: yield c2().with_timeout(1.0) # p2 except TimeoutError: pass @coroutine def c2(): try: yield c3().with_timeout(2.0) # p3 finally: yield c4() # p4 In the above example scheduler *can* safely interrupt "c2" when it is invoked from "c1" at "p2". I.e. scheduler can't interrupt the coroutine when it is itself in its finally statement, but it's fine to interrupt it when it is not, even if it is invoked from other coroutine's finally block. If you translate this example in yield-from form, then checking 'in_finally()' result on "c1" when it is at "p2" will prevent you to raise TimeoutError, but you clearly should. In other words, we want coroutines behaviour to be closer to the regular python code. - Yury From greg.ewing at canterbury.ac.nz Sun Oct 28 06:55:52 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 28 Oct 2012 18:55:52 +1300 Subject: [Python-ideas] Async API In-Reply-To: <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> Message-ID: <508CC8E8.2010406@canterbury.ac.nz> Yury Selivanov wrote: > In the above example scheduler *can* safely interrupt "c2" when it > is invoked from "c1" at "p2". I.e. scheduler can't interrupt the > coroutine when it is itself in its finally statement, but it's fine > to interrupt it when it is not, even if it is invoked from other > coroutine's finally block. I'm confused about the relationship between c1 and c2 here, and what you mean by one coroutine "invoking" another. Can you post a version that uses yield-from instead of yielding objects with unknown (to me) semantics? -- Greg From yselivanov.ml at gmail.com Sun Oct 28 08:03:34 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 28 Oct 2012 03:03:34 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508CC8E8.2010406@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <508CC8E8.2010406@canterbury.ac.nz> Message-ID: <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> On 2012-10-28, at 1:55 AM, Greg Ewing wrote: > Yury Selivanov wrote: >> In the above example scheduler *can* safely interrupt "c2" when it >> is invoked from "c1" at "p2". I.e. scheduler can't interrupt the >> coroutine when it is itself in its finally statement, but it's fine >> to interrupt it when it is not, even if it is invoked from other >> coroutine's finally block. > > I'm confused about the relationship between c1 and c2 here, and > what you mean by one coroutine "invoking" another. > > Can you post a version that uses yield-from instead of yielding > objects with unknown (to me) semantics? The reason I kept using my version is because I'm not sure how we will set timeouts for yield-from style coroutines. But let's assume that we can do that with a context manager. Let's also assume that generator object has 'in_finally()' method, as you defined: "Something like an in_finally() method that looks along the yield-from chain and returns true if any of the generators are in a finally section." def coro1(): try: with timeout(1.0): yield from coro2() # 1 finally: try: with timeout(1.0): yield from coro2() # 2 except TimeoutError: pass def coro2(): try: block() yield # 3 action() finally: block() yield # 4 another_action() Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with TimeoutError. If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2. IIUC, yield-from supporting scheduler, won't know about "coro2". All it will have is a generator for "coro1". All dispatching will be handled by "yield from" statement automatically. In this case, you can't rely on "coro1.in_finally()", because it will return: - True, when "coro1" is at #1 & "coro2" is at #4 (it's unsafe to interrupt) - True, when "coro1" is at #2 & "coro2" is at #3 (safe to interrupt) The fundamental problem here, is that scheduler knows nothing about coroutines call chain. It doesn't even know at what generator 'with timeout' was called. - Yury From reingart at gmail.com Sun Oct 28 08:30:00 2012 From: reingart at gmail.com (Mariano Reingart) Date: Sun, 28 Oct 2012 04:30:00 -0300 Subject: [Python-ideas] i18n and Python tracebacks In-Reply-To: <87tyq5hpf0.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87iq6ogtdp.fsf@uwakimon.sk.tsukuba.ac.jp> <4BF1B2FD.7020408@gmail.com> <87tyq5hpf0.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On Tue, May 18, 2010 at 12:14 AM, Stephen J. Turnbull wrote: > Nick Coghlan writes: > > > It would actually be interesting to see just how far someone could get > > [on translating tracebacks] purely with sys.excepthook. > > > > It would be subject to some fairly significant limitations (particularly > > when it comes to reparsing strings with interpolated values), but the > > traceback parsing and comparison code in doctest may offer a good > > starting point. > > Actually, it shouldn't be too hard to handle the interpolations. In > fact the language to be parsed is probably mostly pretty simple, and > can be automatically translated to BNF or whatever input your favorite > parsing library wants from the .pot file. The generated grammar > probably would be on the order of the size of the .pot file, no? It > could be stored with the .mos as a "pseudo-translation". Interpolation is not very hard (although it could be error prone). I tried that with some regex but I'd found some dead-ends because some messages are hard-coded at the interpreter level, so they cannot be implemented purely with sys.excepthook I'd created a parallel project just if anyone is interested (would be the pure-python version but it would require too much work): http://code.google.com/p/pydiversity/ Maybe I missed something, but the gettext approach seems more consistent and cleaner, and IMHO using gettext is easier than rewriting an interpreter :-) [sorry for the 2-year delay] Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com From reingart at gmail.com Sun Oct 28 08:39:22 2012 From: reingart at gmail.com (Mariano Reingart) Date: Sun, 28 Oct 2012 04:39:22 -0300 Subject: [Python-ideas] i18n and Python tracebacks In-Reply-To: <4BEF726F.6020401@gmail.com> References: <4BEF726F.6020401@gmail.com> Message-ID: On Sun, May 16, 2010 at 1:19 AM, Nick Coghlan wrote: > Mariano Reingart wrote: >> Sorry if there is any mistake, I hope the interested people (here in >> Argentina at least), with more experience in C and Python, would help >> me to fix/enhance this and/or champion it. >> >> Do you think this is the right way? > > The basic concept appears sound, but you'll want to work against the > py3k branch rather than trunk. > Done (sorry for the 2-year delay), it implements Py_GETTEXT against py3.3+: http://bugs.python.org/issue16344 Updated proposal: http://python.org.ar/pyar/TracebackInternationalizationProposal BTW, I've make a patch for a related issue too (utf-8): http://bugs.python.org/issue16343 If this Traceback Internationalization Proposal makes sense, I could present it on the PyCon Argentina 2012 Core-Python Sprint to see if we can advance it: http://ar.pycon.org/2012/projects/index#134 Best regards Mariano Reingart http://www.sistemasagiles.com.ar http://reingart.blogspot.com From g.brandl at gmx.net Sun Oct 28 08:59:11 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 28 Oct 2012 08:59:11 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: Am 27.10.2012 16:40, schrieb Nick Coghlan: >>> 4. We add a notice like the one above to the home page of the 2.7 >>> docs, announce it on the PSF blog, announce it far and wide >> >> We also need a solution for URLs that exist for Python 2, but >> not for Python 3. Those should be redirected to the Python 2 >> resource automatically, e.g. URLs pointing to the Python 2 modules >> that were renamed in Python 3. >> >> BTW: Will you write up a PEP for this ? > > Committed as PEP 430, should show up > http://www.python.org/dev/peps/pep-0430 before too long. Well, with the approval I've seen here, I have absolutely no problem with appointing myself PEP Czar and accepting the PEP :) I'll work on fixing the Apache config. Georg From _ at lvh.cc Sun Oct 28 09:44:19 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 28 Oct 2012 09:44:19 +0100 Subject: [Python-ideas] Async API In-Reply-To: <508C05A5.5030206@gmail.com> References: <7516094788412279153@unknownmsgid> <508C05A5.5030206@gmail.com> Message-ID: On Sat, Oct 27, 2012 at 6:02 PM, Daniel Foerster wrote: > I'm realizing that I perhaps don't grasp the entirety of Asynchronous > programming. However, The only results I have found are for .NET and C#. > Would you like to recommend some online sources I could read? > A simple question with a multitude of answers! Presumably you are more interested in the gory details of async programming, whereas most of this discussing has been about what the code looks like. The wikipedia articles, while not fantastic, aren't terrible: https://en.wikipedia.org/wiki/Event_loop https://en.wikipedia.org/wiki/Asynchronous_I/O Also, there's a great introduction at http://krondo.com/?page_id=1327 ; which unfortunately comes wrapped as a twisted tutorial ;) (You can stop reading a while in when it becomes twisted-specific). -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Sun Oct 28 11:45:01 2012 From: barry at python.org (Barry Warsaw) Date: Sun, 28 Oct 2012 06:45:01 -0400 Subject: [Python-ideas] docs.python.org References: <20121026224644.GA28636@cskk.homeip.net> Message-ID: <20121028064501.6b6c0203@resist> On Oct 26, 2012, at 10:55 PM, Terry Reedy wrote: >3.3 is now out 29 months after 2.7, library support is much improved, and the >new unicode implementation fixes most to almost all the remaining problems >with unicode. It is a release we can be proud of and should promote as the >latest and greatest Python version. Very definitely +1 -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Sun Oct 28 11:52:01 2012 From: barry at python.org (Barry Warsaw) Date: Sun, 28 Oct 2012 06:52:01 -0400 Subject: [Python-ideas] docs.python.org References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> Message-ID: <20121028065201.74bd83dc@resist> On Oct 27, 2012, at 03:15 PM, Nick Coghlan wrote: >The message is clear, but some people just don't like the current >message: Python 2 is still the recommended default version for >production systems and applications. I would hedge that and say that for new work where you have your Python 3 dependencies available, Python 3 should be the recommended default. In Ubuntu, we are actively porting our core system utilities to Python 3, but some dependencies stop us for getting all the way there. Xapian and Twisted come to mind, but the Twisted folks are making great progress, so I expect that for our Twisted apps at least, that story will be better soon. Python 3.3 has some very clear advantages, so we are pushing to make that the default leading up to Ubuntu 14.04 LTS. >- Fedora, RHEL and derivatives still require Python 2 for all their >system utilities (Ubuntu at least has migrated their core system >tools, but I don't know about Debian upstream) Debian Wheezy is in freeze so I wouldn't expect a lot of adoption there until after that's released. Then I hope that we'll be able to push those things upstream. >I don't think the ecosystem is to the point where it makes sense to >flip the switch just yet, but I do think it would be reasonable to >define the ecosystem state where we *will* flip the switch. The two >key missing pieces for me are: >- a Django release with non-experimental Python 3 support (i.e. likely >to happen with Django 1.6) >- an official release of PIL (or Pillow) that supports Python 3 One way to look at it is that there doesn't necessary have to be just one big switch. There's a big bank of switches, many of which can be flipped now. Yes, I'd love for the whole line of 'em to be Python 3 green, and eventually they will be, but if you don't need Django or PIL (or whatever still isn't ported yet), don't wait, port! Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Sun Oct 28 12:25:28 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 28 Oct 2012 21:25:28 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: On Sun, Oct 28, 2012 at 5:59 PM, Georg Brandl wrote: > Am 27.10.2012 16:40, schrieb Nick Coghlan: > >>>> 4. We add a notice like the one above to the home page of the 2.7 >>>> docs, announce it on the PSF blog, announce it far and wide >>> >>> We also need a solution for URLs that exist for Python 2, but >>> not for Python 3. Those should be redirected to the Python 2 >>> resource automatically, e.g. URLs pointing to the Python 2 modules >>> that were renamed in Python 3. >>> >>> BTW: Will you write up a PEP for this ? >> >> Committed as PEP 430, should show up >> http://www.python.org/dev/peps/pep-0430 before too long. > > Well, with the approval I've seen here, I have absolutely no problem > with appointing myself PEP Czar and accepting the PEP :) Heh, asking you to do that was next on my list, so thanks. Did Guido hide a mind reading device in the time machine? :) > I'll work on fixing the Apache config. Huzzah \o/ Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Sun Oct 28 15:43:06 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Sun, 28 Oct 2012 10:43:06 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <508BBD5C.305@egenix.com> Message-ID: <25E481E9-CF7B-4C96-BD8D-8AE0270FB712@gmail.com> On 2012-10-28, at 3:59 AM, Georg Brandl wrote: > Well, with the approval I've seen here, I have absolutely no problem > with appointing myself PEP Czar and accepting the PEP :) That's awesome! - Yury From Steve.Dower at microsoft.com Sun Oct 28 17:58:05 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Sun, 28 Oct 2012 16:58:05 +0000 Subject: [Python-ideas] Async API In-Reply-To: References: <7516094788412279153@unknownmsgid> <508C05A5.5030206@gmail.com>, Message-ID: Laurens Van Houtven [_ at lvh.cc] wrote: > Also, there's a great introduction at http://krondo.com/?page_id=1327 ; > which unfortunately comes wrapped as a twisted tutorial ;) (You can > stop reading a while in when it becomes twisted-specific). That's a great description (even the parts about Twisted ;) ) - bookmarked. Thanks! My one dislike about the general introduction is the sole focus on I/O. In its context (as a Twisted intro) this is entirely understandable, but I'm afraid some people may come away thinking that async never involves threads. (Not that waiting on a thread is any different to waiting on I/O. Thread parallelism is a completely different concept, of course.) Cheers, Steve From guido at python.org Mon Oct 29 00:52:02 2012 From: guido at python.org (Guido van Rossum) Date: Sun, 28 Oct 2012 16:52:02 -0700 Subject: [Python-ideas] Async API: some code to review Message-ID: I am finally ready to show the code I worked on for the past two weeks. This is definitely not ready for anything except as a quick demo, but I learned enough while writing it to feel comfortable with the PEP 380 paradigm. I've set up a Hg repo on code.google.com, and I picked a codename: tulip. View the code here: http://code.google.com/p/tulip/source/browse/ It runs on Linux and OSX; I have no easy access to Windows but I'd be happy to take contributions. Key files in the directory: - main.py: the main program for testing, and a rough HTTP client - sockets.py: transports for sockets and SSL, and a buffering layer - scheduling.py: a Task class and related stuff; this is where the PEP 380 scheduler is implemented - polling.py: an event loop and basic polling implementations for: select(), poll(), epoll(), kqueue() Other junk: .hgignore, Makefile, README, p3time.py (benchmark yield from vs. plain functions), longlines.py (stupid style checker) More detailed discussions per file follows; please read the code along with my description (separately they may not make much sense): polling.py: http://code.google.com/p/tulip/source/browse/polling.py I found it remarkably easy to come up with polling implementations using all those different system calls. I ended up mixing in the pollster class with the event loop class, although I'm not sure that's the best design -- perhaps it's better if the event loop just references the pollster as a separate object. The pollster has a very simple API: add_reader(fd, callback, *args), add_writer(), remove_reader(fd), remove_writer(fd), and poll(timeout) -> list of events. (fd means file descriptor.) There's also pollable() which just checks if there are any fds registered. My implementation requires fd to be an int, but that could easily be extended to support other types of event sources. I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong). The event list started out as a tuple of (fd, flag, callback, args), where flag is 'r' or 'w' (easily extensible); in practice neither the fd nor the flag are used, and one of the last things I did was to wrap callback and args into a simple object that allows cancelling the callback; the add_*() methods return this object. (This could probably use a little more abstraction.) Note that poll() doesn't call the callbacks -- that's up to the event loop. The event loop has two basic ways to register callbacks: call_soon(callback, *args) causes callback(*args) to be called the next time the event loop runs; call_later(delay, callback, *args) schedules a callback at some time (relative or absolute) in the future. It also inherits add_reader() and add_writer() from the pollster. Then there is run(), which runs the event loop until there's nothing left to do (no readers, no writers, no soon or later callbacks), and run_once(), which goes through the entire list of event sources once. (I think the order in which I do this isn't quite right but it works for now.) Finally, there's a helper class (ThreadRunner) here which lets you run something in a separate thread using the features of concurrent.futures. It uses the "self-pipe trick" (Google it :-) to ensure that the poll() call wakes up -- this is needed by call_in_thread() at the next layer (scheduling.py). (There may be a race condition here, but I think it can be fixed.) Note that there are no yields (or yield froms) here; that's for the next layer: scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py This is the scheduler for PEP-380 style coroutines. I started with a Scheduler class and operations along the lines of Greg Ewing's design, with a Scheduler instance as a global variable, but ended up ripping it out in favor of a Task object that represents a single stack of generators chained via yield-from. There is a Context object holding the event loop and the current task in thread-local storage, so that multiple threads can (and must) have independent event loops. Most user (and much library) code in this system should be written as generators invoking other generators directly using yield from. However to run something as an independent task, you wrap the generator call in a Task() constructor, possibly giving it a timeout, and then calling its start() method. A Task also acts a little like a future -- you can wait() for it, add done-callbacks, and it preserves the return value of the generator call. This can be used to introduce concurrency or to give something a separate timeout. (There are also primitives to wait for the first N completed of a bunch of Tasks.) To invoke a primitive I/O operation, you call the current task's block() method and then immediately yield (similar to Greg Ewing's approach). There are helpers block_r() and block_w() that arrange for a task to block until a file descriptor is ready for reading/writing. Examples of their use are in sockets.py. There is also call_in_thread() which integrates with polling.ThreadRunner to run a function in a separate thread and wait for it. Also used in sockets.py. In the docstrings I use the prefix "COROUTINE:" to indicate public APIs that should be invoked using yield from. sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py This implements some internet primitives using the APIs in scheduling.py (including block_r() and block_w()). I call them transports but they are different from transports Twisted; they are closer to idealized sockets. SocketTransport wraps a plain socket, offering recv() and send() methods that must be invoked using yield from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, stdlib ssl sockets have good async support!). Then there is a BufferedReader class that implements more traditional read() and readline() coroutines (i.e., to be invoked using yield from), the latter handy for line-oriented transports. Finally there are some functions for connecting sockets, the highest-level one create_transport(). These use call_in_thread() to run socket.getaddrinfo() in a thread (this provides IPv6 support). I don't particularly care about the exact abstractions in this module; they are convenient and I was surprised how easy it was to add SSL, but still these mostly serve as somewhat realistic examples of how to use scheduling.py. (Afterthought: I think the SocketTransport's recv() and send() methods could be made more similar to SslTransport.) More examples in the final file: main.py: http://code.google.com/p/tulip/source/browse/main.py There is a simplistic HTTP client here built on top of the sockets.*Transport abstractions. And the main code exercises this by spawning four tasks fetching a variety of URLs (more when you uncomment a block of code) and waiting for their results. The code is a bit of a mess because I used it as a place to try out various APIs. I'm most interested in feedback on the design of polling.py and scheduling.py, and to a lesser extent on the design of sockets.py; main.py is just an example of how this style works out in practice. Sorry for the brain-dump style; I would like to write it all up better, but at the same time waiting longer doesn't necessarily make it better, so here it is, for all to see. (I also have a list of problems I had to debug during the development and what I learned from that; but that's too raw to post right now.) -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Mon Oct 29 03:47:53 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 29 Oct 2012 11:47:53 +0900 Subject: [Python-ideas] docs.python.org In-Reply-To: <6C373A1F-6AD5-4852-ACD2-9BA72876C006@gmail.com> References: <514B805D-D53F-48CF-B52B-82546ED70A82@gmail.com> <508AD099.9080606@python.org> <6C373A1F-6AD5-4852-ACD2-9BA72876C006@gmail.com> Message-ID: <87pq42kn8m.fsf@uwakimon.sk.tsukuba.ac.jp> Yury Selivanov writes: > The thing about 'doc2' & 'doc3' urls I don't like is that sooner or later > users will use python 3. There is no future for python 2. That's true for each user (assuming they don't die before switching). It's not true for all applications, though. There will undoubtedly be systems based on Python 2 still in active, profitable use 10 years from now. It's just a yucky UI, let's stick to that for a reason. From stephen at xemacs.org Mon Oct 29 04:50:23 2012 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Mon, 29 Oct 2012 12:50:23 +0900 Subject: [Python-ideas] docs.python.org In-Reply-To: <20121028064501.6b6c0203@resist> References: <20121026224644.GA28636@cskk.homeip.net> <20121028064501.6b6c0203@resist> Message-ID: <87mwz6kkcg.fsf@uwakimon.sk.tsukuba.ac.jp> Barry Warsaw writes: > On Oct 26, 2012, at 10:55 PM, Terry Reedy wrote: > > >3.3 is now out 29 months after 2.7, library support is much improved, and the > >new unicode implementation fixes most to almost all the remaining problems > >with unicode. It is a release we can be proud of and should promote as the > >latest and greatest Python version. > > Very definitely +1 As stated, yes, very much so. I think it's unfortunate that some of this discussion has generated more heat than light because there are three different goals here all stemming from "promoting Python 3": (1) "... as a great language", (2) "... as a great production-ready development environment" (for *some* applications), and (3) "... as a great production-ready development environment" (period, or to take a page from Linus's book, "World Domination! Now!") I think Nick's approach starts to phase in a change in promotion effort appropriately. But it's only a start. From greg.ewing at canterbury.ac.nz Mon Oct 29 06:05:24 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 29 Oct 2012 18:05:24 +1300 Subject: [Python-ideas] Async API In-Reply-To: <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> Message-ID: <508E0E94.10909@canterbury.ac.nz> Yury Selivanov wrote: > def coro1(): > try: > with timeout(1.0): > yield from coro2() # 1 > finally: > try: > with timeout(1.0): > yield from coro2() # 2 > except TimeoutError: > pass > > def coro2(): > try: > block() > yield # 3 > action() > finally: > block() > yield # 4 > another_action() > > Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with > TimeoutError. > > If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it > called from #1 or #2. What is your reasoning behind asserting this? Because it's inside a try block of its own? Because it's subject to a nested timeout? Something else? -- Greg From mark.hackett at metoffice.gov.uk Mon Oct 29 11:09:24 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 29 Oct 2012 10:09:24 +0000 Subject: [Python-ideas] Enabling man page structure for python In-Reply-To: References: <201210251548.32082.mark.hackett@metoffice.gov.uk> <50897609.4080808@netwok.org> Message-ID: <201210291009.24172.mark.hackett@metoffice.gov.uk> On Friday 26 Oct 2012, Andi Albrecht wrote: > Hi, > > On Thu, Oct 25, 2012 at 7:25 PM, ?ric Araujo wrote: > > Hi, > > > > See http://bugs.python.org/issue14102 ?argparse: add ability to create a > > man page? > > I've started to work on this issue some time ago. The starting point > was a man page formatter based on optparse I wrote earlier. But I've > encountered some problems since the output order of argparse > formatters differ from what to expect on a man page. IIRC I saw the > need to do some changes to the way how argparse formatters work but > unfortunately got interrupted by other work. > > IMO adding a argparse formatter would the probably the right way to > add man page support. There would even be no need to add this to > stdlib then. > > Best regards, > > Andi > > > Cheers > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > I'd still like to see some of the functionality in the code I'd written to solve my problem in the parser, if not too much trouble, Andi. I.e. at least a way to push more things to the man page (to be inserted in the page) so that you can add in more things (like external function calls). It's not obvious to me whether argparse also gives you a synopsis (self- written --help option). Cheers. From stefan at bytereef.org Mon Oct 29 11:36:37 2012 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 29 Oct 2012 11:36:37 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: <87mwz6kkcg.fsf@uwakimon.sk.tsukuba.ac.jp> References: <20121026224644.GA28636@cskk.homeip.net> <20121028064501.6b6c0203@resist> <87mwz6kkcg.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <20121029103637.GA1053@sleipnir.bytereef.org> Stephen J. Turnbull wrote: > I think Nick's approach starts to phase in a change in promotion > effort appropriately. But it's only a start. As for promotion, I just noticed that searching for "Python 3" gives this as the first result: http://www.python.org/download/releases/3.0/ Overall, the (Google) search results on the first page don't look very inviting, so perhaps we could improve the situation by adding "nofollow" to the older release pages. Stefan Krah From ncoghlan at gmail.com Mon Oct 29 11:51:33 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 29 Oct 2012 20:51:33 +1000 Subject: [Python-ideas] docs.python.org In-Reply-To: <20121029103637.GA1053@sleipnir.bytereef.org> References: <20121026224644.GA28636@cskk.homeip.net> <20121028064501.6b6c0203@resist> <87mwz6kkcg.fsf@uwakimon.sk.tsukuba.ac.jp> <20121029103637.GA1053@sleipnir.bytereef.org> Message-ID: On Mon, Oct 29, 2012 at 8:36 PM, Stefan Krah wrote: > As for promotion, I just noticed that searching for "Python 3" gives this > as the first result: > > http://www.python.org/download/releases/3.0/ The second result is the current docs at http://docs.python.org/3.3/, which is pretty useful, *except* that the docs have no pointer to the corresponding release page. Perhaps the existing "Welcome" paragraph should be extended with a reference to the appropriate release page? (Also: very nice work to everyone that helped make the version switcher a reality) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From stefan at bytereef.org Mon Oct 29 13:02:40 2012 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 29 Oct 2012 13:02:40 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <20121028064501.6b6c0203@resist> <87mwz6kkcg.fsf@uwakimon.sk.tsukuba.ac.jp> <20121029103637.GA1053@sleipnir.bytereef.org> Message-ID: <20121029120240.GA1391@sleipnir.bytereef.org> Nick Coghlan wrote: > On Mon, Oct 29, 2012 at 8:36 PM, Stefan Krah wrote: > > As for promotion, I just noticed that searching for "Python 3" gives this > > as the first result: > > > > http://www.python.org/download/releases/3.0/ > > The second result is the current docs at http://docs.python.org/3.3/, > which is pretty useful, *except* that the docs have no pointer to the > corresponding release page. Perhaps the existing "Welcome" paragraph > should be extended with a reference to the appropriate release page? I think that's probably not necessary. Someone who is really searching for the newest version will of course find it. Getting rid of 3.0 in the top search results is more of an image thing. 3.0 is associated with "this new experimental version with virtually no packages that support it". For the casual searcher who might be trying to decide between Python and other languages it would be nice to have more 3.3 links, hopefully sending the message "a better Python with many more features and Django/Twisted support just around the corner". > (Also: very nice work to everyone that helped make the version > switcher a reality) I agree, the docs.python.org changes are a great improvement. Thanks everyone. Stefan Krah From shibturn at gmail.com Mon Oct 29 14:13:15 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 29 Oct 2012 13:13:15 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On 28/10/2012 11:52pm, Guido van Rossum wrote: > I'm most interested in feedback on the design of polling.py and > scheduling.py, and to a lesser extent on the design of sockets.py; > main.py is just an example of how this style works out in practice. What happens if two tasks try to do a read op (or two tasks try to do a write op) on the same file descriptor? It looks like the second one to do scheduling.block_r(fd) will cause the first task to be forgotten, causing the first task to block forever. Shouldn't there be a list of pending readers and a list of pending writers for each fd? -- Richard From Steve.Dower at microsoft.com Mon Oct 29 15:00:35 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 29 Oct 2012 14:00:35 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: , Message-ID: Richard Oudkerk wrote: > On 28/10/2012 11:52pm, Guido van Rossum wrote: >> I'm most interested in feedback on the design of polling.py and >> scheduling.py, and to a lesser extent on the design of sockets.py; >> main.py is just an example of how this style works out in practice. > > What happens if two tasks try to do a read op (or two tasks try to do a > write op) on the same file descriptor? It looks like the second one to > do scheduling.block_r(fd) will cause the first task to be forgotten, > causing the first task to block forever. I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time. We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready. IMO, the important questions are: - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer? - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer? - how easy/difficult/flexible/restrictive is it to write async operations as an end user? - how straightforward is it to consume async operations? - how easy is it to write async code that is correct? Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion. Cheers, Steve From guido at python.org Mon Oct 29 15:47:55 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 07:47:55 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 7:00 AM, Steve Dower wrote: > Richard Oudkerk wrote: >> On 28/10/2012 11:52pm, Guido van Rossum wrote: >>> I'm most interested in feedback on the design of polling.py and >>> scheduling.py, and to a lesser extent on the design of sockets.py; >>> main.py is just an example of how this style works out in practice. >> >> What happens if two tasks try to do a read op (or two tasks try to do a >> write op) on the same file descriptor? It looks like the second one to >> do scheduling.block_r(fd) will cause the first task to be forgotten, >> causing the first task to block forever. > > I know I haven't posted my own code yet (coming very soon), but I'd like to put out there that I don't think this is an important sort of question at this time. Kind of. I think if it was an important use case it might affect the shape of the API. However I can't think of a use case where it might make sense for two tasks to read or write the same file descriptor without some higher-level mediation. (Even at a higher level I find it hard to imagine, except for writing to a common log file -- but even there you want to be sure that individual lines aren't spliced into each other, and the semantics of send() don't prevent that.) > We both have sample schedulers that work well enough to demonstrate the API, but aren't meant to be production ready. > > IMO, the important questions are: > > - how easy/difficult/flexible/restrictive is it to write a new scheduler as a core Python developer? > - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? > - how easy/difficult/flexible/restrictive is it to write async operations as a core Python developer? > - how easy/difficult/flexible/restrictive is it to write async operations as an end user? > - how straightforward is it to consume async operations? > - how easy is it to write async code that is correct? Yes, these are all important questions. I'm not sure that end users would be writing new schedulers -- but 3rd party library developers will be, and I suppose that's what you are referring to. My own approach to answering these is to first try to figure out what a typical application would be trying to accomplish. That's why I made a point of implementing a 100% async HTTP client -- it's just quirky enough that it exercises various issues (e.g. switching between line-mode and blob mode, and the need to invoke getaddrinfo()). > Admittedly, I am writing this preemptively knowing that there are a lot of distractions like this in my code (some people are going to be horrified at what I did with file IO :-) Don't worry, it's only for trying the API). Once we know what interface we'll be coding against we can worry about getting the implementation perfect. Also, I imagine we'll find some more volunteers for coding (hopefully people who have done non-blocking stuff in C or similar before) who are currently avoiding the higher-level ideas discussion. I'm looking forward to it! I suspect we'll be merging our designs shortly... -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Mon Oct 29 17:03:07 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 29 Oct 2012 16:03:07 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On 29/10/2012 2:47pm, Guido van Rossum wrote: > Kind of. I think if it was an important use case it might affect the > shape of the API. However I can't think of a use case where it might > make sense for two tasks to read or write the same file descriptor > without some higher-level mediation. (Even at a higher level I find it > hard to imagine, except for writing to a common log file -- but even > there you want to be sure that individual lines aren't spliced into > each other, and the semantics of send() don't prevent that.) It is a common pattern to have multiple threads/processes trying to accept connections on an single listening socket, so it would be unfortunate to disallow that. Writing (short messages) to a pipe also has atomic guarantees that can make having multiple writers perfectly reasonable. -- Richard From solipsis at pitrou.net Mon Oct 29 17:07:31 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Oct 2012 17:07:31 +0100 Subject: [Python-ideas] Async API: some code to review References: Message-ID: <20121029170731.74bd3d37@cosmocat> Hello Guido, Le Sun, 28 Oct 2012 16:52:02 -0700, Guido van Rossum a ?crit : > > The event list started out as a tuple of (fd, flag, callback, args), > where flag is 'r' or 'w' (easily extensible); in practice neither the > fd nor the flag are used, and one of the last things I did was to wrap > callback and args into a simple object that allows cancelling the > callback; the add_*() methods return this object. (This could probably > use a little more abstraction.) Note that poll() doesn't call the > callbacks -- that's up to the event loop. I don't understand why the pollster takes callback objects if it never calls them. Also the fact that it wraps them into DelayedCalls is more mysterious to me. DelayedCalls represent one-time cancellable callbacks with a given deadline, not callbacks which are called any number of times on I/O events and that you can't cancel. > scheduling.py: > http://code.google.com/p/tulip/source/browse/scheduling.py > > This is the scheduler for PEP-380 style coroutines. I started with a > Scheduler class and operations along the lines of Greg Ewing's design, > with a Scheduler instance as a global variable, but ended up ripping > it out in favor of a Task object that represents a single stack of > generators chained via yield-from. There is a Context object holding > the event loop and the current task in thread-local storage, so that > multiple threads can (and must) have independent event loops. YMMV, but I tend to be wary of implicit thread-local storage. What if someone runs a function or method depending on that thread-local storage from inside a thread pool? Weird bugs ensue. I think explicit context is much less error-prone. Even a single global instance (like Twisted's reactor) would be better :-) As for the rest of the scheduling module, I can't say much since I have a hard time reading and understanding it. > To invoke a primitive I/O operation, you call the current task's > block() method and then immediately yield (similar to Greg Ewing's > approach). There are helpers block_r() and block_w() that arrange for > a task to block until a file descriptor is ready for reading/writing. > Examples of their use are in sockets.py. That's weird and kindof ugly IMHO. Why would you write: scheduling.block_w(self.sock.fileno()) yield instead of say: yield scheduling.block_w(self.sock.fileno()) ? Also, the fact that each call to SocketTransport.{recv,send} explicitly registers then removes the fd on the event loop looks wasteful. By the way, even when a fd is signalled ready, you must still be prepared for recv() to return EAGAIN (see http://bugs.python.org/issue9090). > In the docstrings I use the prefix "COROUTINE:" to indicate public > APIs that should be invoked using yield from. Hmm, should they? Your approach looks a bit weird: you have functions that should use yield, and others that should use "yield from"? That sounds confusing to me. I'd much rather either have all functions use "yield", or have all functions use "yield from". (also, I wouldn't be shocked if coroutines had to wear a special decorator; it's a better marker than having the word COROUTINE in the docstring, anyway :-)) > sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py > > This implements some internet primitives using the APIs in > scheduling.py (including block_r() and block_w()). I call them > transports but they are different from transports Twisted; they are > closer to idealized sockets. SocketTransport wraps a plain socket, > offering recv() and send() methods that must be invoked using yield > from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, > stdlib ssl sockets have good async support!). SslTransport.{recv,send} need the same kind of logic as do_handshake(): catch both SSLWantReadError and SSLWantWriteError, and call block_r / block_w accordingly. > Then there is a > BufferedReader class that implements more traditional read() and > readline() coroutines (i.e., to be invoked using yield from), the > latter handy for line-oriented transports. Well... It would be nice if BufferedReader could re-use the actual io.BufferedReader and its fast readline(), read(), readinto() implementations. Regards Antoine. From mark.hackett at metoffice.gov.uk Mon Oct 29 17:09:51 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 29 Oct 2012 16:09:51 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <201210291609.51091.mark.hackett@metoffice.gov.uk> On Monday 29 Oct 2012, Richard Oudkerk wrote: > Writing (short messages) to a pipe also > has atomic guarantees that can make having multiple writers perfectly > reasonable. > > -- > Richard > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux (because of the string operations available in the x86 instruction set), but I don't thing anything other than an IPC message has a "you can write a string atomically" guarantee. And I may be misremembering that. And even if it's part of the SUS, how do we know this is true for non-UNIX compatible systems? From jrwren at xmtp.net Mon Oct 29 17:12:56 2012 From: jrwren at xmtp.net (Jay Wren) Date: Mon, 29 Oct 2012 12:12:56 -0400 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <20121027134348.71ebaba1@pitrou.net> Message-ID: On Oct 27, 2012, at 8:21 AM, Paul Moore wrote: > On 27 October 2012 12:43, Antoine Pitrou wrote: >> On Sat, 27 Oct 2012 12:06:41 +0100 >> Paul Moore wrote: >>> >>>> Another idea is similar, but instead of doing /2.x/ always redirect the >>>> the root of docs.python.org to the latest production release, so >>>> right now /foo would redirect to /2.7/foo. This is even better for >>>> maintaining links to the actual resource people meant to link >>>> to. Could even include a header at the top of old versions saying that >>>> "You are currently viewing the docs for 2.5. Click here to view the >>>> docs for 2.7". >>> >>> -1. Certainly what I (and I suspect many others) usually care about is >>> getting at the "Python 2" or "Python 3" documentation, not a specific >>> version. Having the 2.7, 2.6 links is fine, but I don't *think* of >>> myself as going to the 2.7 docs, but rather to the 2.x docs (as >>> opposed to 3.x). The "New in x.y" annotations give me the history I >>> need. And I think that's true of links as well - they would be to >>> "python 2" or "python 3", not (normally) to a specific minor version. >> >> I'm not sure why you're -1 about something which wouldn't affect you >> negatively. As you say yourself, the 2.7 docs have all the information >> you need about previous releases as well (because of the versionadded >> and versionchanged markers). *However*, the 2.6 and previous docs don't >> have information about useful stuff added in 2.7. > > Maybe I misunderstood. I was assuming that there would be no "2.x" > link, only "2.7". That's what I'm against - I would prefer to use a > generic 2.x link to get to the Python 2 docs if I needed them (just as > I use docs.python.org at the moment). > > My -1 was too strong though, make that a -0 (and a "don't care" if > there will be a 2.x link as well as the explicit ones). > >> And since 2.7 is the last in the 2.x line, I think it makes sense to >> reflect that explicitly in the redirections. > > I'm not against an explicit 2.7 link - we have that already, don't we? Did this change recently? I just noticed that from http://www.python.org/doc/ if I click "Browse Current Documentation" under then Python 2.x section, it links to docs.python.org which then redirects to docs.python.org/3/ which is NOT the 2.x current documentation for which I clicked. -- Jay From guido at python.org Mon Oct 29 17:35:16 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 09:35:16 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 9:03 AM, Richard Oudkerk wrote: > On 29/10/2012 2:47pm, Guido van Rossum wrote: >> >> Kind of. I think if it was an important use case it might affect the >> shape of the API. However I can't think of a use case where it might >> make sense for two tasks to read or write the same file descriptor >> without some higher-level mediation. (Even at a higher level I find it >> hard to imagine, except for writing to a common log file -- but even >> there you want to be sure that individual lines aren't spliced into >> each other, and the semantics of send() don't prevent that.) > > > It is a common pattern to have multiple threads/processes trying to accept > connections on an single listening socket, so it would be unfortunate to > disallow that. Ah, but that will work -- each thread has its own pollster, event loop and scheduler and collection of tasks. And listening on a socket is a pretty special case anyway -- I imagine we'd build a special API just for that purpose. > Writing (short messages) to a pipe also has atomic > guarantees that can make having multiple writers perfectly reasonable. That's a good one. I'll keep that on the list of requirements. -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Mon Oct 29 17:41:57 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Mon, 29 Oct 2012 16:41:57 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <201210291609.51091.mark.hackett@metoffice.gov.uk> References: <201210291609.51091.mark.hackett@metoffice.gov.uk> Message-ID: On 29/10/2012 4:09pm, Mark Hackett wrote: > Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux > (because of the string operations available in the x86 instruction set), but I > don't thing anything other than an IPC message has a "you can write a string > atomically" guarantee. And I may be misremembering that. The guarantee I was talking about is for pipes on Unix: POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1-2001 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.) ... On Windows writes to pipes in message oriented mode are also atomic. > And even if it's part of the SUS, how do we know this is true for non-UNIX > compatible systems? We don't, but that isn't necessarily a reason to ban it as evil. -- Richard From yselivanov.ml at gmail.com Mon Oct 29 17:42:44 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 12:42:44 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508E0E94.10909@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> <508E0E94.10909@canterbury.ac.nz> Message-ID: On 2012-10-29, at 1:05 AM, Greg Ewing wrote: > Yury Selivanov wrote: > >> def coro1(): >> try: >> with timeout(1.0): >> yield from coro2() # 1 >> finally: >> try: >> with timeout(1.0): >> yield from coro2() # 2 >> except TimeoutError: >> pass >> def coro2(): >> try: >> block() >> yield # 3 >> action() >> finally: >> block() >> yield # 4 >> another_action() >> Now, if "coro2" is suspended at #4 -- it shouldn't be interrupted with >> TimeoutError. >> If, however, "coro2" is at #3 -- it can be, and it doesn't matter was it called from #1 or #2. > > What is your reasoning behind asserting this? Because it's inside > a try block of its own? Because it's subject to a nested timeout? > Something else? Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it. - Yury From mark.hackett at metoffice.gov.uk Mon Oct 29 17:46:13 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 29 Oct 2012 16:46:13 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <201210291609.51091.mark.hackett@metoffice.gov.uk> Message-ID: <201210291646.13139.mark.hackett@metoffice.gov.uk> On Monday 29 Oct 2012, Richard Oudkerk wrote: > > On Windows writes to pipes in message oriented mode are also atomic. > > > And even if it's part of the SUS, how do we know this is true for > > non-UNIX compatible systems? > > We don't, but that isn't necessarily a reason to ban it as evil. Hey, good idea I didn't say ban it, then hey? But if the OS cannot guarantee atomic writes (and enforce that size to ensure atomic writes for the system run under), then you cannot just say "Atomic writes mean we can have safely multiple threads accessing the pipe". The multiple access requires atomic access. If that cannot be guaranteed, then you cannot give multiple access. From mark.hackett at metoffice.gov.uk Mon Oct 29 17:47:46 2012 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 29 Oct 2012 16:47:46 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <201210291609.51091.mark.hackett@metoffice.gov.uk> Message-ID: <201210291647.46107.mark.hackett@metoffice.gov.uk> On Monday 29 Oct 2012, Richard Oudkerk wrote: > On Windows writes to pipes in message oriented mode are also atomic. > PS this means, like I said maybe, that you have to be running an IPC message to get guaranteed atomic writes. If someone has their python programming with multiple thread accessing the pipe, but that pipe is NOT running in message oriented mode, then you will get corruption. From yselivanov.ml at gmail.com Mon Oct 29 17:47:50 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 12:47:50 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <20121029170731.74bd3d37@cosmocat> References: <20121029170731.74bd3d37@cosmocat> Message-ID: On 2012-10-29, at 12:07 PM, Antoine Pitrou wrote: >> To invoke a primitive I/O operation, you call the current task's >> block() method and then immediately yield (similar to Greg Ewing's >> approach). There are helpers block_r() and block_w() that arrange for >> a task to block until a file descriptor is ready for reading/writing. >> Examples of their use are in sockets.py. > > That's weird and kindof ugly IMHO. Why would you write: > > scheduling.block_w(self.sock.fileno()) > yield > > instead of say: > > yield scheduling.block_w(self.sock.fileno()) > > ? I, personally, like and use the second approach. But I believe the main incentive for Guido & Greg to use 'yields' like that is to make one thing *very* clear: always use 'yield from' to call something. 'yield' statement is just an explicit context switch point, and it should be used only for that purpose and only when you write a low-level APIs. - Yury From yselivanov.ml at gmail.com Mon Oct 29 17:59:12 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 12:59:12 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <20121029170731.74bd3d37@cosmocat> References: <20121029170731.74bd3d37@cosmocat> Message-ID: <90D4E462-4C49-4BEF-BEA1-F725C5EE352F@gmail.com> On 2012-10-29, at 12:07 PM, Antoine Pitrou wrote: >> In the docstrings I use the prefix "COROUTINE:" to indicate public >> APIs that should be invoked using yield from. > > Hmm, should they? Your approach looks a bit weird: you have functions > that should use yield, and others that should use "yield from"? That > sounds confusing to me. > > I'd much rather either have all functions use "yield", or have all > functions use "yield from". > > (also, I wouldn't be shocked if coroutines had to wear a special > decorator; it's a better marker than having the word COROUTINE in the > docstring, anyway :-)) That's what bothers me is well. 'yield from' looks too long for a simple thing it does (1); users will be confused whether they should use 'yield' or 'yield from' (2); there is no visible difference between a plain generator and a coroutine (3). Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword). With that approach it's easy to distinguish coroutines, generators and plain functions. And it'd be easier to add some special methods/properties to codefs, like 'in_finally()' method etc. - Yury From cesare.di.mauro at gmail.com Mon Oct 29 18:02:09 2012 From: cesare.di.mauro at gmail.com (Cesare Di Mauro) Date: Mon, 29 Oct 2012 18:02:09 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <201210291609.51091.mark.hackett@metoffice.gov.uk> References: <201210291609.51091.mark.hackett@metoffice.gov.uk> Message-ID: 2012/10/29 Mark Hackett > On Monday 29 Oct 2012, Richard Oudkerk wrote: > > Writing (short messages) to a pipe also > > has atomic guarantees that can make having multiple writers perfectly > > reasonable. > > > > -- > > Richard > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > Is that actually true? It may be guaranteed on Intel x86 compatibles and > Linux > (because of the string operations available in the x86 instruction set), > but I > don't thing anything other than an IPC message has a "you can write a > string > atomically" guarantee. And I may be misremembering that. > x86 and x64 string operations aren't atomic. Only a few, selected, instructions can be LOCK prefixed (XCHG is the only one that doesn't require it, since it's always locked) to ensure an atomic RMW memory operation. Regards, Cesare -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 29 18:03:00 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 10:03:00 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <20121029170731.74bd3d37@cosmocat> References: <20121029170731.74bd3d37@cosmocat> Message-ID: On Mon, Oct 29, 2012 at 9:07 AM, Antoine Pitrou wrote: > Le Sun, 28 Oct 2012 16:52:02 -0700, > Guido van Rossum a ?crit : >> The event list started out as a tuple of (fd, flag, callback, args), >> where flag is 'r' or 'w' (easily extensible); in practice neither the >> fd nor the flag are used, and one of the last things I did was to wrap >> callback and args into a simple object that allows cancelling the >> callback; the add_*() methods return this object. (This could probably >> use a little more abstraction.) Note that poll() doesn't call the >> callbacks -- that's up to the event loop. > > I don't understand why the pollster takes callback objects if it never > calls them. Also the fact that it wraps them into DelayedCalls is more > mysterious to me. DelayedCalls represent one-time cancellable callbacks > with a given deadline, not callbacks which are called any number of > times on I/O events and that you can't cancel. Yeah, this part definitely needs reworking. In the current design the pollster is a base class of the eventloop, and the latter *does* call them; but I want to refactor that anyway. I'll probably end up with a pollster that registers (what are to it) opaque tokens and returns just a list of tokens from poll(). (Unrelated: would it be useful if poll() was an iterator?) >> scheduling.py: >> http://code.google.com/p/tulip/source/browse/scheduling.py >> >> This is the scheduler for PEP-380 style coroutines. I started with a >> Scheduler class and operations along the lines of Greg Ewing's design, >> with a Scheduler instance as a global variable, but ended up ripping >> it out in favor of a Task object that represents a single stack of >> generators chained via yield-from. There is a Context object holding >> the event loop and the current task in thread-local storage, so that >> multiple threads can (and must) have independent event loops. > > YMMV, but I tend to be wary of implicit thread-local storage. What if > someone runs a function or method depending on that thread-local > storage from inside a thread pool? Weird bugs ensue. Agreed, I had to figure out one of these in the implementation of call_in_thread() and it wasn't fun. I don't know what else to do -- I think it's probably best if I base my implementation on this for now so that I know it works correctly in such an environment. In the end there will probably be an API to get the current context and another to influence how that API gets it, so people can plug in their own schemes, from TLS to a simple global to something determined by an external library. > I think explicit context is much less error-prone. Even a single global > instance (like Twisted's reactor) would be better :-) I find that passing the context around everywhere makes for awkward APIs though. > As for the rest of the scheduling module, I can't say much since I have > a hard time reading and understanding it. That's a problem, I need to write this up properly so that everyone can understand it. >> To invoke a primitive I/O operation, you call the current task's >> block() method and then immediately yield (similar to Greg Ewing's >> approach). There are helpers block_r() and block_w() that arrange for >> a task to block until a file descriptor is ready for reading/writing. >> Examples of their use are in sockets.py. > > That's weird and kindof ugly IMHO. Why would you write: > > scheduling.block_w(self.sock.fileno()) > yield > > instead of say: > > yield scheduling.block_w(self.sock.fileno()) > > ? This has been debated at nauseam already (be glad you missed it); basically, there's not a whole lot of difference but if there are some APIs that require "yield X(args)" and others that require "yield from Y(args)" that's really confusing. The "bare yield only" makes it possible (though I didn't implement it here) to put some strict checks in the scheduler -- next() should never return anything except None. But there are other ways to do that too. Anyway, I probably will change the API so that e.g. sockets.py doesn't have to use this paradigm; I'll just wrap these low-level APIs in a proper "coroutine" and then sockets.py can just use "yield from block_r(fd)". (This is one reason why I like the "bare generators with yield from" approach that Greg Ewing and PEP 380 recommend: it's really cheap to wrap an API in an extra layer of yield-from. (See the yyftime.py benchmark I added to the tulip drectory.) > Also, the fact that each call to SocketTransport.{recv,send} explicitly > registers then removes the fd on the event loop looks wasteful. I am hoping to add some optimization for this -- I am actually planning a hackathon (or re-education session :-) with some Twisted folks where I hope they'll explain to me how they do this. > By the way, even when a fd is signalled ready, you must still be > prepared for recv() to return EAGAIN (see > http://bugs.python.org/issue9090). Yeah, I should know, I ran into this for a Google project too (there was a kernel driver that was lying...). I had a cryptic remark in my post above referring to this. >> In the docstrings I use the prefix "COROUTINE:" to indicate public >> APIs that should be invoked using yield from. > > Hmm, should they? Your approach looks a bit weird: you have functions > that should use yield, and others that should use "yield from"? That > sounds confusing to me. Yeah, see above. > I'd much rather either have all functions use "yield", or have all > functions use "yield from". Agreed, and I'm strongly in favor of "yield from". The block_r() + yield is considered an *internal* API. > (also, I wouldn't be shocked if coroutines had to wear a special > decorator; it's a better marker than having the word COROUTINE in the > docstring, anyway :-)) Agreed it would be useful as documentation, and maybe an API can use this to enforce proper coding style. It would have to be purely decoration though -- I don't want an extra layer of wrapping to occur each time you call a coroutine. (I.e. the decorator should just return "func".) >> sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py >> >> This implements some internet primitives using the APIs in >> scheduling.py (including block_r() and block_w()). I call them >> transports but they are different from transports Twisted; they are >> closer to idealized sockets. SocketTransport wraps a plain socket, >> offering recv() and send() methods that must be invoked using yield >> from. SslTransport wraps an ssl socket (luckily in Python 2.6 and up, >> stdlib ssl sockets have good async support!). > > SslTransport.{recv,send} need the same kind of logic as do_handshake(): > catch both SSLWantReadError and SSLWantWriteError, and call block_r / > block_w accordingly. Oh... Thanks for the tip. I didn't find this in the ssl module docs. >> Then there is a >> BufferedReader class that implements more traditional read() and >> readline() coroutines (i.e., to be invoked using yield from), the >> latter handy for line-oriented transports. > > Well... It would be nice if BufferedReader could re-use the actual > io.BufferedReader and its fast readline(), read(), readinto() > implementations. Agreed, I would love that too, but the problem is, *this* BufferedReader defines methods you have to invoke with yield from. Maybe we can come up with a solution for sharing code by modifying the _io module though; that would be great! (I've also been thinking of layering TextIOWrapper on top of these.) Thanks for the thorough review! -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Oct 29 18:08:14 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 13:08:14 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> Message-ID: <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> On 2012-10-29, at 1:03 PM, Guido van Rossum wrote: > Agreed it would be useful as documentation, and maybe an API can use > this to enforce proper coding style. It would have to be purely > decoration though -- I don't want an extra layer of wrapping to occur > each time you call a coroutine. (I.e. the decorator should just return > "func".) I'd also set something like 'func.__coroutine__' to True. That will allow to analyze, introspect, validate and do other useful things. - Yury From g.brandl at gmx.net Mon Oct 29 18:24:30 2012 From: g.brandl at gmx.net (Georg Brandl) Date: Mon, 29 Oct 2012 18:24:30 +0100 Subject: [Python-ideas] docs.python.org In-Reply-To: References: <20121026224644.GA28636@cskk.homeip.net> <81687767-DC69-4097-877B-0C5AEB471D28@gmail.com> <20121027134348.71ebaba1@pitrou.net> Message-ID: Am 29.10.2012 17:12, schrieb Jay Wren: >>> And since 2.7 is the last in the 2.x line, I think it makes sense to >>> reflect that explicitly in the redirections. >> >> I'm not against an explicit 2.7 link - we have that already, don't we? > > Did this change recently? I just noticed that from http://www.python.org/doc/ > if I click "Browse Current Documentation" under then Python 2.x section, it > links to docs.python.org which then redirects to docs.python.org/3/ which is > NOT the 2.x current documentation for which I clicked. -- Jay Good point. Should be fixed now. Geor From guido at python.org Mon Oct 29 18:43:09 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 10:43:09 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 10:08 AM, Yury Selivanov wrote: > On 2012-10-29, at 1:03 PM, Guido van Rossum wrote: > >> Agreed it would be useful as documentation, and maybe an API can use >> this to enforce proper coding style. It would have to be purely >> decoration though -- I don't want an extra layer of wrapping to occur >> each time you call a coroutine. (I.e. the decorator should just return >> "func".) > > I'd also set something like 'func.__coroutine__' to True. That will allow > to analyze, introspect, validate and do other useful things. Yes, that sounds about right. -- --Guido van Rossum (python.org/~guido) From andrew.svetlov at gmail.com Mon Oct 29 19:02:09 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 29 Oct 2012 20:02:09 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: Pollster has to support any object as file descriptor. The use case is ZeroMQ sockets: they are implemented at user level and socket is just some opaque structure wrapped by Python object. ZeroMQ has own poll function to process zmq sockets as well as regular sockets/pipes/files. I would to see add_{reader,writer} and call_{soon,later} accepting **kwargs as well as *args. At least to respect functions with keyword-only arguments. +1 for explicit passing loop instance and clearing role of DelayedCall. Decorating coroutines with setting some flag looks good to me, but I expect some problems with setting extra attribute to objects like staticmethod/classmethod. Thanks, Andrew. From g.rodola at gmail.com Mon Oct 29 19:08:45 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Mon, 29 Oct 2012 19:08:45 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: 2012/10/29 Guido van Rossum > > I'm most interested in feedback on the design of polling.py and > scheduling.py, and to a lesser extent on the design of sockets.py; > main.py is just an example of how this style works out in practice. Follows my comments. === About polling.py === 1 - I think DelayedCall should have a reset() method, other than just cancel(). 2 - EventLoopMixin should have a call_every() method other than just call_later() 3 - call_later() and call_every() should also take **kwargs other than just *args 4 - I think PollsterBase should provide a method to modify() the events registered for a certain fd (both poll() and epoll() have such a method and it's faster compared to un/registering a fd). Feel free to take a look at my scheduler implementation which looks quite similar to what you've done in polling.py: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85 === About sockets.py === 1 - In SocketTransport it seems there's no error handling provisioned for send() and recv(). You should expect these errors http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry" 2 - SslTransport's send() and recv() methods should suffer the same problem. 3 - I don't fully understand how data transfer works exactly but keep in mind that the transport should interact with the pollster. What I mean is that generally speaking a connected socket should *always* be readable ("r"), even when it's idle, then switch to "rw" events when sending data, then get back to "r" when all the data has been sent. This is *crucial* if you want to achieve high performances/scalability and that is why PollsterBase should probably provide a modify() method. Please take a look at what I've done here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 === Other considerations === This 'yield' / 'yield from' approach is new to me (I'm more of a "callback guy") so I can't say I fully understand what's going on just by reading the code. What I would like to see instead of main.py is a bunch of code samples / demos showing how this library is supposed to be used in different circumstances. In details I'd like to see at least: 1 - a client example (connect(), send() a string, recv() a response, close()) 2 - an echo server example (accept(), recv() string, send() it back(), close() 3 - how to use a different transport (e.g. UDP)? 4 - how to run long running tasks in a thread? Also: 5 - is it possible to use multiple "reactors" in different threads? How? (asyncore for example achieves this by providing a separate 'map' argument for both the 'reactor' and the dispatchers) I understand you just started with this so I'm probably asking too much at this point in time. Feel free to consider this a kind of a "long term review". --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From guido at python.org Mon Oct 29 19:10:42 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 11:10:42 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov wrote: > Pollster has to support any object as file descriptor. > The use case is ZeroMQ sockets: they are implemented at user level and > socket is just some opaque structure wrapped by Python object. > ZeroMQ has own poll function to process zmq sockets as well as regular > sockets/pipes/files. Good call! This seem to be an excellent use case to validate the pollster design. Are you saying that the approach I used for SslTransport doesn't work here? (I can believe it, I've never looked at 0MQ, but I can't tell from your message.) The insistence on isinstance(fd, int) is mostly there so that I don't accidentally register a socket object *and* its file descriptor at the same time -- but there are other ways to ensure that. I've added a TODO item for now. > I would to see add_{reader,writer} and call_{soon,later} accepting > **kwargs as well as *args. At least to respect functions with > keyword-only arguments. Hmm... I intentionally ruled those out because I wanted to leave the door open for keyword args that modify the registration function (add_reader etc.); it is awkward to require conventions like "your function cannot have a keyword arg named X because we use that for our own API" and it is even more awkward to have to retrofit new values of X into that rule. Maybe we can come up with a simple wrapper. > +1 for explicit passing loop instance and clearing role of DelayedCall. Will do. (I think you meant clarifying?) > Decorating coroutines with setting some flag looks good to me, but I > expect some problems with setting extra attribute to objects like > staticmethod/classmethod. Noted. -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 29 19:43:57 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 11:43:57 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? wrote: > 2012/10/29 Guido van Rossum >> >> I'm most interested in feedback on the design of polling.py and >> scheduling.py, and to a lesser extent on the design of sockets.py; >> main.py is just an example of how this style works out in practice. > > Follows my comments. > > === About polling.py === > > 1 - I think DelayedCall should have a reset() method, other than just cancel(). So, essentially an uncancel()? Why not just re-register in that case? Or what's your use case? (Right now there's no problem in calling one of these many times -- it's just that cancellation is permanent.) > 2 - EventLoopMixin should have a call_every() method other than just > call_later() Arguably you can emulate that with a simple loop: def call_every(secs, func, *args): while True: yield from scheduler.sleep(secs) func(*args) (Flavor to taste to log exceptions, handle cancellation, automatically spawn a separate task, etc.) I can build lots of other useful things out of call_soon() and call_later() -- but I do need at least those two as "axioms". > 3 - call_later() and call_every() should also take **kwargs other than > just *args I just replied to that in a previous message; there's also a comment in the code. How important is this really? Are there lots of use cases that require you to pass keyword args? If it's only on occasion you can use a lambda. (The *args is a compromise so we don't need a lambda to wrap every callback. But I want to reserve keyword args for future extensions to the registration functions.) > 4 - I think PollsterBase should provide a method to modify() the > events registered for a certain fd (both poll() and epoll() have such > a method and it's faster compared to un/registering a fd). Did you see the concrete implementations? Those where this matters implicitly uses modify() if the required flags change. I can imagine more optimizations of the implementations (e.g. delaying register()/modify() calls until poll() is actually called, to avoid unnecessary churn) without making the API more complex. > Feel free to take a look at my scheduler implementation which looks > quite similar to what you've done in polling.py: > http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85 Thanks, I had seen it previously, I think this also proves that there's nothing particularly earth-shattering about this design. :-) I'd love to copy some more of your tricks, e.g. the occasional re-heapifying. (What usage pattern is this dealing with exactly?) I should also check that I've taken care of all the various flags and other details (I recall being quite surprised that with poll(), on some platforms I need to check for POLLHUP but not on others). > === About sockets.py === > > 1 - In SocketTransport it seems there's no error handling provisioned > for send() and recv(). > You should expect these errors > http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 > signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry" Right, I know have been naive about these and have already got a TODO note. > 2 - SslTransport's send() and recv() methods should suffer the same problem. Ditto, Antoine told me. > 3 - I don't fully understand how data transfer works exactly but keep > in mind that the transport should interact with the pollster. > What I mean is that generally speaking a connected socket should > *always* be readable ("r"), even when it's idle, then switch to "rw" > events when sending data, then get back to "r" when all the data has > been sent. > This is *crucial* if you want to achieve high performances/scalability > and that is why PollsterBase should probably provide a modify() > method. > Please take a look at what I've done here: > http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 Hm. I am not convinced that managing this explicitly from the transport is the right solution (note that my transports are quite different from those in Twisted). But I'll keep this in mind -- I would like to set up a benchmark suite at some point. I will probably have to implement the server side of HTTP for that purpose, so I can point e.g. ab at my app. > === Other considerations === > > This 'yield' / 'yield from' approach is new to me (I'm more of a > "callback guy") so I can't say I fully understand what's going on just > by reading the code. Fair enough. You should probably start by reading Greg Ewing's tutorial -- it's short and sweet: http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html > What I would like to see instead of main.py is a bunch of code samples > / demos showing how this library is supposed to be used in different > circumstances. Agreed, more examples are needed. > In details I'd like to see at least: > > 1 - a client example (connect(), send() a string, recv() a response, close()) Hm, that's all in urlfetch(). > 2 - an echo server example (accept(), recv() string, send() it back(), close() Yes, that's missing. > 3 - how to use a different transport (e.g. UDP)? I haven't looked into this yet. I expect I'll have to write a different SocketTransport for this (the existing transports are implicitly stream-oriented) but I know that the scheduler and eventloop implementation can handle this fine. > 4 - how to run long running tasks in a thread? That's implemented. Check out call_in_thread(). Note that you can pass it an alternate threadpool (executor). > Also: > > 5 - is it possible to use multiple "reactors" in different threads? Should be possible. > How? (asyncore for example achieves this by providing a separate > 'map' argument for both the 'reactor' and the dispatchers) It works by making the Context class use thread-local storage (TLS). > I understand you just started with this so I'm probably asking too > much at this point in time. > Feel free to consider this a kind of a "long term review". You have asked many useful questions already. Since you have implemented a real-world I/O loop yourself, your input is extremely valuable. Thanks, and keep at it! -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Oct 29 20:10:17 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 15:10:17 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: <868DCB50-AB1A-4E8F-8234-C14847414495@gmail.com> On 2012-10-29, at 2:02 PM, Andrew Svetlov wrote: > Pollster has to support any object as file descriptor. > The use case is ZeroMQ sockets: they are implemented at user level and > socket is just some opaque structure wrapped by Python object. > ZeroMQ has own poll function to process zmq sockets as well as regular > sockets/pipes/files. Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. Just get the underlying file descriptor with 'getsockopt', as described here: http://api.zeromq.org/master:zmq-getsockopt#toc20 For instance, here is a stripped out zmq support classes I have in my framework: class Socket(_zmq_Socket): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.fileno = self.getsockopt(FD) ... #coroutine def send(self, data, *, flags=0, copy=True, track=False): flags |= NOBLOCK try: result = _zmq_Socket.send(self, data, flags, copy, track) except ZMQError as e: if e.errno != EAGAIN: raise self._sending = (Promise(), data, flags, copy, track) self._scheduler.proactor._schedule_write(self) return self._sending[0] else: p = Promise() p.send(result) return p ... class Context(_zmq_Context): _socket_class = Socket And '_schedule_write' accepts any object with 'fileno' property, and uses an appropriate polling mechanism to poll. So to use a non-blocking ZMQ sockets, you simply do: context = Context() socket = context.socket(zmq.REP) ... yield socket.send(message) From andrew.svetlov at gmail.com Mon Oct 29 20:24:25 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 29 Oct 2012 21:24:25 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum wrote: > On Mon, Oct 29, 2012 at 11:02 AM, Andrew Svetlov > wrote: >> Pollster has to support any object as file descriptor. >> The use case is ZeroMQ sockets: they are implemented at user level and >> socket is just some opaque structure wrapped by Python object. >> ZeroMQ has own poll function to process zmq sockets as well as regular >> sockets/pipes/files. > > Good call! This seem to be an excellent use case to validate the > pollster design. Are you saying that the approach I used for > SslTransport doesn't work here? (I can believe it, I've never looked > at 0MQ, but I can't tell from your message.) The insistence on > isinstance(fd, int) is mostly there so that I don't accidentally > register a socket object *and* its file descriptor at the same time -- > but there are other ways to ensure that. I've added a TODO item for > now. > 0MQ socket has no file descriptor at all, it's just pointer to some unspecified structure. So 0MQ has own *poll* function which can process that sockets as well as file descriptors. Interface is mimic to poll object from python stdlib. You can see https://github.com/zeromq/pyzmq/blob/master/zmq/eventloop/ioloop.py as example. For 0MQ support tulip has to have yet another reactor implementation in line of select, epoll, kqueue etc. Not big deal, but it would be nice if PollsterBase will not assume the registered object is always int file descriptor. >> I would to see add_{reader,writer} and call_{soon,later} accepting >> **kwargs as well as *args. At least to respect functions with >> keyword-only arguments. > > Hmm... I intentionally ruled those out because I wanted to leave the > door open for keyword args that modify the registration function > (add_reader etc.); it is awkward to require conventions like "your > function cannot have a keyword arg named X because we use that for our > own API" and it is even more awkward to have to retrofit new values of > X into that rule. Maybe we can come up with a simple wrapper. It can be solved easy with using names like __when, __callback etc. That names will never clutter with user provided kwargs I believe. > >> +1 for explicit passing loop instance and clearing role of DelayedCall. > > Will do. (I think you meant clarifying?) Exactly. Thanks. > >> Decorating coroutines with setting some flag looks good to me, but I >> expect some problems with setting extra attribute to objects like >> staticmethod/classmethod. > > Noted. > > -- > --Guido van Rossum (python.org/~guido) Thank you, Andrew Svetlov From andrew.svetlov at gmail.com Mon Oct 29 20:32:41 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 29 Oct 2012 21:32:41 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <868DCB50-AB1A-4E8F-8234-C14847414495@gmail.com> References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> <868DCB50-AB1A-4E8F-8234-C14847414495@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov wrote: > On 2012-10-29, at 2:02 PM, Andrew Svetlov wrote: > >> Pollster has to support any object as file descriptor. >> The use case is ZeroMQ sockets: they are implemented at user level and >> socket is just some opaque structure wrapped by Python object. >> ZeroMQ has own poll function to process zmq sockets as well as regular >> sockets/pipes/files. > > Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. > Just get the underlying file descriptor with 'getsockopt', as described > here: http://api.zeromq.org/master:zmq-getsockopt#toc20 Well, will take a look. I used zmq poll only. It works for reading only, not for writing, right? As I know you use proactor pattern. Can reactor has some problems with this approach? May embedded 0MQ poll be more effective via some internal optimizations? > > For instance, here is a stripped out zmq support classes I have in my > framework: > > class Socket(_zmq_Socket): > def __init__(self, *args, **kwargs): > super().__init__(*args, **kwargs) > self.fileno = self.getsockopt(FD) > > ... > > #coroutine > def send(self, data, *, flags=0, copy=True, track=False): > flags |= NOBLOCK > > try: > result = _zmq_Socket.send(self, data, flags, copy, track) > except ZMQError as e: > if e.errno != EAGAIN: > raise > self._sending = (Promise(), data, flags, copy, track) > self._scheduler.proactor._schedule_write(self) > return self._sending[0] > else: > p = Promise() > p.send(result) > return p > ... > > class Context(_zmq_Context): > _socket_class = Socket > > And '_schedule_write' accepts any object with 'fileno' property, and > uses an appropriate polling mechanism to poll. > > So to use a non-blocking ZMQ sockets, you simply do: > > context = Context() > socket = context.socket(zmq.REP) > ... > yield socket.send(message) > -- Thanks, Andrew Svetlov From guido at python.org Mon Oct 29 20:54:24 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 12:54:24 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov wrote: > On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum wrote: [Andrew] >>> I would to see add_{reader,writer} and call_{soon,later} accepting >>> **kwargs as well as *args. At least to respect functions with >>> keyword-only arguments. >> >> Hmm... I intentionally ruled those out because I wanted to leave the >> door open for keyword args that modify the registration function >> (add_reader etc.); it is awkward to require conventions like "your >> function cannot have a keyword arg named X because we use that for our >> own API" and it is even more awkward to have to retrofit new values of >> X into that rule. Maybe we can come up with a simple wrapper. > > It can be solved easy with using names like __when, __callback etc. > That names will never clutter with user provided kwargs I believe. No, those names have different meaning inside a class (they would be transformed into ___when, where is the name of the *current* class textually enclosing the use). I am not closing the door on this one but I'd have to see a lot more evidence that this issue is widespread. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Mon Oct 29 20:57:26 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 15:57:26 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> <868DCB50-AB1A-4E8F-8234-C14847414495@gmail.com> Message-ID: <0C7F5490-A0D2-4B98-BFEC-EEF8445E16BB@gmail.com> On 2012-10-29, at 3:32 PM, Andrew Svetlov wrote: > On Mon, Oct 29, 2012 at 9:10 PM, Yury Selivanov wrote: >> On 2012-10-29, at 2:02 PM, Andrew Svetlov wrote: >> >>> Pollster has to support any object as file descriptor. >>> The use case is ZeroMQ sockets: they are implemented at user level and >>> socket is just some opaque structure wrapped by Python object. >>> ZeroMQ has own poll function to process zmq sockets as well as regular >>> sockets/pipes/files. >> >> Well, you can use epoll/select/kqueue or whatever else with ZMQ sockets. >> Just get the underlying file descriptor with 'getsockopt', as described >> here: http://api.zeromq.org/master:zmq-getsockopt#toc20 > > Well, will take a look. I used zmq poll only. > It works for reading only, not for writing, right? > As I know you use proactor pattern. > Can reactor has some problems with this approach? > May embedded 0MQ poll be more effective via some internal optimizations? It's officially documented and supported approach. We haven't seen any problem with it so far. It works both for reading and writing, however, 99.9% EAGAIN errors occur on reading. When you 'send', it just stores your data in an internal buffer and sends it itself. When you 'read', well, if there is no data in buffers you get EAGAIN. As for the performance -- I haven't tested 'zmq.poll' vs (let's say) epoll, but I doubt there is any significant difference. And if I would want to write a benchmark, I'd first compare pure blocking ZMQ sockets vs non-blocking ZMQ sockets with ZMQ.poll, as ZMQ uses threads heavily, and probably, blocking threads-driven IO is faster then non-blocking with polling (when FDs count is relatively small), no matter whether you use zmq.poll or epoll/etc. - Yury From andrew.svetlov at gmail.com Mon Oct 29 20:58:35 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 29 Oct 2012 21:58:35 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 8:43 PM, Guido van Rossum wrote: > On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? wrote: >> 2012/10/29 Guido van Rossum >>> >>> I'm most interested in feedback on the design of polling.py and >>> scheduling.py, and to a lesser extent on the design of sockets.py; >>> main.py is just an example of how this style works out in practice. >> >> Follows my comments. >> >> === About polling.py === >> >> 1 - I think DelayedCall should have a reset() method, other than just cancel(). > > So, essentially an uncancel()? Why not just re-register in that case? > Or what's your use case? (Right now there's no problem in calling one > of these many times -- it's just that cancellation is permanent.) > >> 2 - EventLoopMixin should have a call_every() method other than just >> call_later() > > Arguably you can emulate that with a simple loop: > > def call_every(secs, func, *args): > while True: > yield from scheduler.sleep(secs) > func(*args) > > (Flavor to taste to log exceptions, handle cancellation, automatically > spawn a separate task, etc.) > > I can build lots of other useful things out of call_soon() and > call_later() -- but I do need at least those two as "axioms". > >> 3 - call_later() and call_every() should also take **kwargs other than >> just *args > > I just replied to that in a previous message; there's also a comment > in the code. How important is this really? Are there lots of use cases > that require you to pass keyword args? If it's only on occasion you > can use a lambda. (The *args is a compromise so we don't need a lambda > to wrap every callback. But I want to reserve keyword args for future > extensions to the registration functions.) Well, using keyword-only arguments for passing flags can be good point. I can live with *args only. Maybe using **kwargs for call_later family only is good compromise? Really I don't care on add_reader/add_writer, that functions intended to library writers. call_later and call_soon can be used in user code often enough and passing keyword arguments can be convenient. > >> 4 - I think PollsterBase should provide a method to modify() the >> events registered for a certain fd (both poll() and epoll() have such >> a method and it's faster compared to un/registering a fd). > > Did you see the concrete implementations? Those where this matters > implicitly uses modify() if the required flags change. I can imagine > more optimizations of the implementations (e.g. delaying > register()/modify() calls until poll() is actually called, to avoid > unnecessary churn) without making the API more complex. > >> Feel free to take a look at my scheduler implementation which looks >> quite similar to what you've done in polling.py: >> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#85 > > Thanks, I had seen it previously, I think this also proves that > there's nothing particularly earth-shattering about this design. :-) > I'd love to copy some more of your tricks, e.g. the occasional > re-heapifying. (What usage pattern is this dealing with exactly?) I > should also check that I've taken care of all the various flags and > other details (I recall being quite surprised that with poll(), on > some platforms I need to check for POLLHUP but not on others). > >> === About sockets.py === >> >> 1 - In SocketTransport it seems there's no error handling provisioned >> for send() and recv(). >> You should expect these errors >> http://hg.python.org/cpython/file/95931c48a76f/Lib/asyncore.py#l60 >> signaling disconnection plus EWOULDBLOCK and EAGAIN for "retry" > > Right, I know have been naive about these and have already got a TODO note. > >> 2 - SslTransport's send() and recv() methods should suffer the same problem. > > Ditto, Antoine told me. > >> 3 - I don't fully understand how data transfer works exactly but keep >> in mind that the transport should interact with the pollster. >> What I mean is that generally speaking a connected socket should >> *always* be readable ("r"), even when it's idle, then switch to "rw" >> events when sending data, then get back to "r" when all the data has >> been sent. >> This is *crucial* if you want to achieve high performances/scalability >> and that is why PollsterBase should probably provide a modify() >> method. >> Please take a look at what I've done here: >> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 > > Hm. I am not convinced that managing this explicitly from the > transport is the right solution (note that my transports are quite > different from those in Twisted). But I'll keep this in mind -- I > would like to set up a benchmark suite at some point. I will probably > have to implement the server side of HTTP for that purpose, so I can > point e.g. ab at my app. > >> === Other considerations === >> >> This 'yield' / 'yield from' approach is new to me (I'm more of a >> "callback guy") so I can't say I fully understand what's going on just >> by reading the code. > > Fair enough. You should probably start by reading Greg Ewing's > tutorial -- it's short and sweet: > http://www.cosc.canterbury.ac.nz/greg.ewing/python/tasks/SimpleScheduler.html > >> What I would like to see instead of main.py is a bunch of code samples >> / demos showing how this library is supposed to be used in different >> circumstances. > > Agreed, more examples are needed. > >> In details I'd like to see at least: >> >> 1 - a client example (connect(), send() a string, recv() a response, close()) > > Hm, that's all in urlfetch(). > >> 2 - an echo server example (accept(), recv() string, send() it back(), close() > > Yes, that's missing. > >> 3 - how to use a different transport (e.g. UDP)? > > I haven't looked into this yet. I expect I'll have to write a > different SocketTransport for this (the existing transports are > implicitly stream-oriented) but I know that the scheduler and > eventloop implementation can handle this fine. > >> 4 - how to run long running tasks in a thread? > > That's implemented. Check out call_in_thread(). Note that you can pass > it an alternate threadpool (executor). > >> Also: >> >> 5 - is it possible to use multiple "reactors" in different threads? > > Should be possible. > >> How? (asyncore for example achieves this by providing a separate >> 'map' argument for both the 'reactor' and the dispatchers) > > It works by making the Context class use thread-local storage (TLS). > >> I understand you just started with this so I'm probably asking too >> much at this point in time. >> Feel free to consider this a kind of a "long term review". > > You have asked many useful questions already. Since you have > implemented a real-world I/O loop yourself, your input is extremely > valuable. Thanks, and keep at it! > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Thanks, Andrew Svetlov From andrew.svetlov at gmail.com Mon Oct 29 21:03:12 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Mon, 29 Oct 2012 22:03:12 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: I mean just something like: def call_soon(__self, __callback, *__args, **__kwargs): dcall = DelayedCall(None, __callback, __args, __kwargs) __self.ready.append(dcall) return dcall Not big deal, through. We can delay this discussion for later. On Mon, Oct 29, 2012 at 9:54 PM, Guido van Rossum wrote: > On Mon, Oct 29, 2012 at 12:24 PM, Andrew Svetlov > wrote: >> On Mon, Oct 29, 2012 at 8:10 PM, Guido van Rossum wrote: > [Andrew] >>>> I would to see add_{reader,writer} and call_{soon,later} accepting >>>> **kwargs as well as *args. At least to respect functions with >>>> keyword-only arguments. >>> >>> Hmm... I intentionally ruled those out because I wanted to leave the >>> door open for keyword args that modify the registration function >>> (add_reader etc.); it is awkward to require conventions like "your >>> function cannot have a keyword arg named X because we use that for our >>> own API" and it is even more awkward to have to retrofit new values of >>> X into that rule. Maybe we can come up with a simple wrapper. >> >> It can be solved easy with using names like __when, __callback etc. >> That names will never clutter with user provided kwargs I believe. > > No, those names have different meaning inside a class (they would be > transformed into ___when, where is the name of the > *current* class textually enclosing the use). I am not closing the > door on this one but I'd have to see a lot more evidence that this > issue is widespread. > > -- > --Guido van Rossum (python.org/~guido) -- Thanks, Andrew Svetlov From g.rodola at gmail.com Mon Oct 29 22:20:44 2012 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Mon, 29 Oct 2012 22:20:44 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: 2012/10/29 Guido van Rossum : > On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? wrote: >> 2012/10/29 Guido van Rossum >> === About polling.py === >> >> 1 - I think DelayedCall should have a reset() method, other than just cancel(). > > So, essentially an uncancel()? Why not just re-register in that case? > Or what's your use case? (Right now there's no problem in calling one > of these many times -- it's just that cancellation is permanent.) The most common use case is when you want to disconnect the other peer after a certain time of inactivity. Ideally what you would do is schedule() a idle/timeout function and reset() it every time the other peer sends you some data. >> 2 - EventLoopMixin should have a call_every() method other than just >> call_later() > > Arguably you can emulate that with a simple loop: > > def call_every(secs, func, *args): > while True: > yield from scheduler.sleep(secs) > func(*args) > > (Flavor to taste to log exceptions, handle cancellation, automatically > spawn a separate task, etc.) > > I can build lots of other useful things out of call_soon() and > call_later() -- but I do need at least those two as "axioms". Agreed. >> 3 - call_later() and call_every() should also take **kwargs other than >> just *args > > I just replied to that in a previous message; there's also a comment > in the code. How important is this really? Are there lots of use cases > that require you to pass keyword args? If it's only on occasion you > can use a lambda. (The *args is a compromise so we don't need a lambda > to wrap every callback. But I want to reserve keyword args for future > extensions to the registration functions.) It's not crucial to have kwargs, just nice, but I understand your motives to rule them out, in fact I reserved two kwarg names ('_errback' and '_scheduler') for the same reason. In my experience I learned that passing an extra error handler function (what Twisted calls 'errrback') can be desirable, so that's another thing you might want to consider. In my scheduler implementation I achieved that by passing an _errback keyword parameter, like this: >>> ioloop.call_later(30, callback, _errback=err_callback) Not very nice to use a reserved keyword, I agree. Perhaps you can keep ruling out kwargs referred to the callback function and change the current call_later signature as such: - def call_later(self, when, callback, *args): + def call_later(self, when, callback, *args, errback=None): ...or maybe provide a DelayedCall.add_errback() method a-la Twisted. > Thanks, I had seen it previously, I think this also proves that > there's nothing particularly earth-shattering about this design. :-) > I'd love to copy some more of your tricks, Sure, go on. It's MIT licensed code. > e.g. the occasional re-heapifying. (What usage pattern is this > dealing with exactly?) It's intended to avoid making the list grow with too many cancelled functions. Imagine this use case: WEEK = 60 x 60 x 24 x 7 for x in xrange(1000000): f = call_later(WEEK, fun) f.cancel() You'll end up having a heap with milions of cancelled items which will be freed after a week. Instead you can keep track of the number of cancelled functions every time cancel() is called and re-heapify the list when that number gets too high: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#122 > should also check that I've taken care of all the various flags and > other details (I recall being quite surprised that with poll(), on > some platforms I need to check for POLLHUP but not on others). Yeah, that's a painful part. Try to look here: http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 Instead of handle_close()ing you should add the fd to the list of readable ones ("r"). The call to recv() which will be coming next will then cause the socket to close (you have to add the error handling to recv() first though). >> 3 - I don't fully understand how data transfer works exactly but keep >> in mind that the transport should interact with the pollster. >> What I mean is that generally speaking a connected socket should >> *always* be readable ("r"), even when it's idle, then switch to "rw" >> events when sending data, then get back to "r" when all the data has >> been sent. >> This is *crucial* if you want to achieve high performances/scalability >> and that is why PollsterBase should probably provide a modify() >> method. >> Please take a look at what I've done here: >> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 > > Hm. I am not convinced that managing this explicitly from the > transport is the right solution (note that my transports are quite > different from those in Twisted). But I'll keep this in mind -- I > would like to set up a benchmark suite at some point. I will probably > have to implement the server side of HTTP for that purpose, so I can > point e.g. ab at my app. I think you might want to apply that to something slighlty higher level than the mere transport. Something like the equivalent of asynchat.push / asynchat.push_with_producer, if you'll ever want to go that far in terms of abstraction, or maybe avoid that at all but make it clear in the doc that the user should take care of that. My point is that having a socket registered for both "r" AND "w" events when in fact you want only "r" OR "w" is an exponential waste of CPU cycles and it should be avoided either by the lib or by the user. "old select() implementation" vs "new select() implementation" benchmark shown here reflects exactly this problem which still affects base asyncore module: https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 I'll keep following the progress on this and hopefully come up with another set of questions and/or random thoughts. --- Giampaolo https://code.google.com/p/pyftpdlib/ https://code.google.com/p/psutil/ https://code.google.com/p/pysendfile/ From solipsis at pitrou.net Mon Oct 29 22:25:41 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 29 Oct 2012 22:25:41 +0100 Subject: [Python-ideas] non-blocking buffered I/O References: <20121029170731.74bd3d37@cosmocat> Message-ID: <20121029222541.07c461b3@pitrou.net> On Mon, 29 Oct 2012 10:03:00 -0700 Guido van Rossum wrote: > >> Then there is a > >> BufferedReader class that implements more traditional read() and > >> readline() coroutines (i.e., to be invoked using yield from), the > >> latter handy for line-oriented transports. > > > > Well... It would be nice if BufferedReader could re-use the actual > > io.BufferedReader and its fast readline(), read(), readinto() > > implementations. > > Agreed, I would love that too, but the problem is, *this* > BufferedReader defines methods you have to invoke with yield from. > Maybe we can come up with a solution for sharing code by modifying the > _io module though; that would be great! (I've also been thinking of > layering TextIOWrapper on top of these.) There is a rather infamous issue about _io.BufferedReader and non-blocking I/O at http://bugs.python.org/issue13322 It is a bit problematic because currently non-blocking readline() returns '' instead of None when no data is available, meaning EOF can't be easily detected :( Once this issue is solved, you could use _io.BufferedReader, and workaround the "partial read/readline result" issue by iterating (hopefully in most cases there is enough data in the buffer to return a complete read or readline, so the C optimizations are useful). Here is how it may work: def __init__(self, fd): self.fd = fd self.bufio = _io.BufferedReader(...) def readline(self): chunks = [] while True: line = self.bufio.readline() if line is not None: chunks.append(line) if line == b'' or line.endswith(b'\n'): # EOF or EOL return b''.join(chunks) yield from scheduler.block_r(self.fd) def read(self, n): chunks = [] bytes_read = 0 while True: data = self.bufio.read(n - bytes_read) if data is not None: chunks.append(data) bytes_read += len(data) if data == b'' or bytes_read == n: # EOF or read satisfied break yield from scheduler.block_r(self.fd) return b''.join(chunks) As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all (but my memories are vague). By the way I don't know how this whole approach (of mocking socket-like or file-like objects with coroutine-y read() / readline() methods) lends itself to plugging into Windows' IOCP. You may rely on some raw I/O object that registers a callback when a read() is requested and then yields a Future object that gets completed by the callback. I'm sure Richard has some ideas about that :-) Regards Antoine. From guido at python.org Mon Oct 29 23:03:07 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 15:03:07 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodol? wrote: > 2012/10/29 Guido van Rossum : >> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? wrote: >>> 2012/10/29 Guido van Rossum >>> === About polling.py === >>> >>> 1 - I think DelayedCall should have a reset() method, other than just cancel(). >> >> So, essentially an uncancel()? Why not just re-register in that case? >> Or what's your use case? (Right now there's no problem in calling one >> of these many times -- it's just that cancellation is permanent.) > > The most common use case is when you want to disconnect the other peer > after a certain time of inactivity. > Ideally what you would do is schedule() a idle/timeout function and > reset() it every time the other peer sends you some data. Um, ok, I think you are saying that you want to be able to set timeouts and then "reset" that timeout. This is a much higher-level thing than canceling the DelayedCall object. (I have no desire to make DelayedCall have functionality like Twisted's Deferred. It is something *much* simpler; it's just the API for cancelling a callback passed to call_later(), and its other uses are similar to this.) [...] > Not very nice to use a reserved keyword, I agree. > Perhaps you can keep ruling out kwargs referred to the callback > function and change the current call_later signature as such: > > - def call_later(self, when, callback, *args): > + def call_later(self, when, callback, *args, errback=None): > > ...or maybe provide a DelayedCall.add_errback() method a-la Twisted. I really don't want that though! But I'm glad you're not too hell-bent on supporting callbacks with keyword-only args. [...] >> should also check that I've taken care of all the various flags and >> other details (I recall being quite surprised that with poll(), on >> some platforms I need to check for POLLHUP but not on others). > > Yeah, that's a painful part. > Try to look here: > http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 > Instead of handle_close()ing you should add the fd to the list of > readable ones ("r"). > The call to recv() which will be coming next will then cause the > socket to close (you have to add the error handling to recv() first > though). Aha, are you suggesting that I close the socket when I detect that the socket is closed? But what if the other side uses shutdown() to close only one end? Depending on the protocol it might be useful to either stop reading but keep sending, or vice versa. Maybe I could detect that both ends are closed and then close the socket. Or are you suggesting something else? >>> 3 - I don't fully understand how data transfer works exactly but keep >>> in mind that the transport should interact with the pollster. >>> What I mean is that generally speaking a connected socket should >>> *always* be readable ("r"), even when it's idle, then switch to "rw" >>> events when sending data, then get back to "r" when all the data has >>> been sent. >>> This is *crucial* if you want to achieve high performances/scalability >>> and that is why PollsterBase should probably provide a modify() >>> method. >>> Please take a look at what I've done here: >>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 >> >> Hm. I am not convinced that managing this explicitly from the >> transport is the right solution (note that my transports are quite >> different from those in Twisted). But I'll keep this in mind -- I >> would like to set up a benchmark suite at some point. I will probably >> have to implement the server side of HTTP for that purpose, so I can >> point e.g. ab at my app. > > I think you might want to apply that to something slighlty higher > level than the mere transport. (Apply *what*?) > Something like the equivalent of asynchat.push / > asynchat.push_with_producer, if you'll ever want to go that far in > terms of abstraction, or maybe avoid that at all but make it clear in > the doc that the user should take care of that. I'm actually not sufficiently familiar with asynchat to comment. I think it's got quite a different model than what I am trying to set up here. > My point is that having a socket registered for both "r" AND "w" > events when in fact you want only "r" OR "w" is an exponential waste > of CPU cycles and it should be avoided either by the lib or by the > user. One task can only be blocked for reading OR writing. The only way to have a socket registered for both is if there are separate tasks for reading and writing, and then presumably that is what you want. (I have a feeling you haven't fully grokked my HTTP client code yet?) > "old select() implementation" vs "new select() implementation" > benchmark shown here reflects exactly this problem which still affects > base asyncore module: > https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 Hm, I am already using epoll or kqueue if available, otherwise poll, falling back to select only if there's nothing else available (in practice that's only Windows). But I will diligently work towards a benchmarkable demo. > I'll keep following the progress on this and hopefully come up with > another set of questions and/or random thoughts. Thanks! -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Oct 29 23:08:54 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 15:08:54 -0700 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: <20121029222541.07c461b3@pitrou.net> References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: On Mon, Oct 29, 2012 at 2:25 PM, Antoine Pitrou wrote: > On Mon, 29 Oct 2012 10:03:00 -0700 > Guido van Rossum wrote: >> >> Then there is a >> >> BufferedReader class that implements more traditional read() and >> >> readline() coroutines (i.e., to be invoked using yield from), the >> >> latter handy for line-oriented transports. >> > >> > Well... It would be nice if BufferedReader could re-use the actual >> > io.BufferedReader and its fast readline(), read(), readinto() >> > implementations. >> >> Agreed, I would love that too, but the problem is, *this* >> BufferedReader defines methods you have to invoke with yield from. >> Maybe we can come up with a solution for sharing code by modifying the >> _io module though; that would be great! (I've also been thinking of >> layering TextIOWrapper on top of these.) > > There is a rather infamous issue about _io.BufferedReader and > non-blocking I/O at http://bugs.python.org/issue13322 > It is a bit problematic because currently non-blocking readline() > returns '' instead of None when no data is available, meaning EOF can't > be easily detected :( Eeew! > Once this issue is solved, you could use _io.BufferedReader, and > workaround the "partial read/readline result" issue by iterating > (hopefully in most cases there is enough data in the buffer to > return a complete read or readline, so the C optimizations are useful). Yes, that's what I'm hoping for. > Here is how it may work: > > def __init__(self, fd): > self.fd = fd > self.bufio = _io.BufferedReader(...) > > def readline(self): > chunks = [] > while True: > line = self.bufio.readline() > if line is not None: > chunks.append(line) > if line == b'' or line.endswith(b'\n'): > # EOF or EOL > return b''.join(chunks) > yield from scheduler.block_r(self.fd) > > def read(self, n): > chunks = [] > bytes_read = 0 > while True: > data = self.bufio.read(n - bytes_read) > if data is not None: > chunks.append(data) > bytes_read += len(data) > if data == b'' or bytes_read == n: > # EOF or read satisfied > break > yield from scheduler.block_r(self.fd) > return b''.join(chunks) Hm... I wonder if it would make more sense if these standard APIs were to return specific exceptions, like the ssl module does in non-blocking mode? Look here (I updated since posting last night): http://code.google.com/p/tulip/source/browse/sockets.py#142 > As for TextIOWrapper, AFAIR it doesn't handle non-blocking I/O at all > (but my memories are vague). Same suggestion... (I only found out about ssl's approach to async I/O a few days ago. It felt brilliant and right to me. But maybe I'm missing something?) > By the way I don't know how this whole approach (of mocking socket-like > or file-like objects with coroutine-y read() / readline() methods) > lends itself to plugging into Windows' IOCP. Me neither. I hope Steve Dower can tell us. > You may rely on some raw > I/O object that registers a callback when a read() is requested and > then yields a Future object that gets completed by the callback. > I'm sure Richard has some ideas about that :-) Which Richard? -- --Guido van Rossum (python.org/~guido) From andrew.svetlov at gmail.com Mon Oct 29 23:19:16 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Tue, 30 Oct 2012 00:19:16 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Tue, Oct 30, 2012 at 12:03 AM, Guido van Rossum wrote: > On Mon, Oct 29, 2012 at 2:20 PM, Giampaolo Rodol? wrote: >> 2012/10/29 Guido van Rossum : >>> On Mon, Oct 29, 2012 at 11:08 AM, Giampaolo Rodol? wrote: >>>> 2012/10/29 Guido van Rossum >>>> === About polling.py === >>>> >>>> 1 - I think DelayedCall should have a reset() method, other than just cancel(). >>> >>> So, essentially an uncancel()? Why not just re-register in that case? >>> Or what's your use case? (Right now there's no problem in calling one >>> of these many times -- it's just that cancellation is permanent.) >> >> The most common use case is when you want to disconnect the other peer >> after a certain time of inactivity. >> Ideally what you would do is schedule() a idle/timeout function and >> reset() it every time the other peer sends you some data. > > Um, ok, I think you are saying that you want to be able to set > timeouts and then "reset" that timeout. This is a much higher-level > thing than canceling the DelayedCall object. (I have no desire to make > DelayedCall have functionality like Twisted's Deferred. It is > something *much* simpler; it's just the API for cancelling a callback > passed to call_later(), and its other uses are similar to this.) > Twisted's DelayedCall is different from Deferred, it used for reactor.callLater and returned from this function (the same as call_later from tulip) Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L676 Implementation is http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35 DelayedCall from twisted has nothing common with Deferred, it's just an interface for scheduled activity, which can be called once, cancelled or rescheduled to another time. I've found that concept very useful when I used twisted. > [...] >> Not very nice to use a reserved keyword, I agree. >> Perhaps you can keep ruling out kwargs referred to the callback >> function and change the current call_later signature as such: >> >> - def call_later(self, when, callback, *args): >> + def call_later(self, when, callback, *args, errback=None): >> >> ...or maybe provide a DelayedCall.add_errback() method a-la Twisted. > > I really don't want that though! But I'm glad you're not too hell-bent > on supporting callbacks with keyword-only args. > > [...] >>> should also check that I've taken care of all the various flags and >>> other details (I recall being quite surprised that with poll(), on >>> some platforms I need to check for POLLHUP but not on others). >> >> Yeah, that's a painful part. >> Try to look here: >> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#464 >> Instead of handle_close()ing you should add the fd to the list of >> readable ones ("r"). >> The call to recv() which will be coming next will then cause the >> socket to close (you have to add the error handling to recv() first >> though). > > Aha, are you suggesting that I close the socket when I detect that the > socket is closed? But what if the other side uses shutdown() to close > only one end? Depending on the protocol it might be useful to either > stop reading but keep sending, or vice versa. Maybe I could detect > that both ends are closed and then close the socket. Or are you > suggesting something else? > >>>> 3 - I don't fully understand how data transfer works exactly but keep >>>> in mind that the transport should interact with the pollster. >>>> What I mean is that generally speaking a connected socket should >>>> *always* be readable ("r"), even when it's idle, then switch to "rw" >>>> events when sending data, then get back to "r" when all the data has >>>> been sent. >>>> This is *crucial* if you want to achieve high performances/scalability >>>> and that is why PollsterBase should probably provide a modify() >>>> method. >>>> Please take a look at what I've done here: >>>> http://code.google.com/p/pyftpdlib/source/browse/trunk/pyftpdlib/lib/ioloop.py?spec=svn1115&r=1115#809 >>> >>> Hm. I am not convinced that managing this explicitly from the >>> transport is the right solution (note that my transports are quite >>> different from those in Twisted). But I'll keep this in mind -- I >>> would like to set up a benchmark suite at some point. I will probably >>> have to implement the server side of HTTP for that purpose, so I can >>> point e.g. ab at my app. >> >> I think you might want to apply that to something slighlty higher >> level than the mere transport. > > (Apply *what*?) > >> Something like the equivalent of asynchat.push / >> asynchat.push_with_producer, if you'll ever want to go that far in >> terms of abstraction, or maybe avoid that at all but make it clear in >> the doc that the user should take care of that. > > I'm actually not sufficiently familiar with asynchat to comment. I > think it's got quite a different model than what I am trying to set up > here. > >> My point is that having a socket registered for both "r" AND "w" >> events when in fact you want only "r" OR "w" is an exponential waste >> of CPU cycles and it should be avoided either by the lib or by the >> user. > > One task can only be blocked for reading OR writing. The only way to > have a socket registered for both is if there are separate tasks for > reading and writing, and then presumably that is what you want. (I > have a feeling you haven't fully grokked my HTTP client code yet?) > >> "old select() implementation" vs "new select() implementation" >> benchmark shown here reflects exactly this problem which still affects >> base asyncore module: >> https://code.google.com/p/pyftpdlib/issues/detail?id=203#c6 > > Hm, I am already using epoll or kqueue if available, otherwise poll, > falling back to select only if there's nothing else available (in > practice that's only Windows). > > But I will diligently work towards a benchmarkable demo. > >> I'll keep following the progress on this and hopefully come up with >> another set of questions and/or random thoughts. > > Thanks! > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- Thanks, Andrew Svetlov From rene at stranden.com Mon Oct 29 23:23:34 2012 From: rene at stranden.com (Rene Nejsum) Date: Mon, 29 Oct 2012 23:23:34 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <90D4E462-4C49-4BEF-BEA1-F725C5EE352F@gmail.com> References: <20121029170731.74bd3d37@cosmocat> <90D4E462-4C49-4BEF-BEA1-F725C5EE352F@gmail.com> Message-ID: <25B4FAEA-2ED8-4AED-8792-D75576320C2C@stranden.com> On Oct 29, 2012, at 5:59 PM, Yury Selivanov wrote: > On 2012-10-29, at 12:07 PM, Antoine Pitrou wrote: > >>> In the docstrings I use the prefix "COROUTINE:" to indicate public >>> APIs that should be invoked using yield from. >> >> Hmm, should they? Your approach looks a bit weird: you have functions >> that should use yield, and others that should use "yield from"? That >> sounds confusing to me. >> >> I'd much rather either have all functions use "yield", or have all >> functions use "yield from". >> >> (also, I wouldn't be shocked if coroutines had to wear a special >> decorator; it's a better marker than having the word COROUTINE in the >> docstring, anyway :-)) > > That's what bothers me is well. 'yield from' looks too long for a > simple thing it does (1); users will be confused whether they should > use 'yield' or 'yield from' (2); there is no visible difference between > a plain generator and a coroutine (3). I agree, was this ever commented ? I know it maybe late in the discussion but just because you can use yield/yield from for concurrent stuff, should you? it looks very implicit to me (breaking the second rule) Have the delegate/event model of C# been discussed ? As always i recommend moving the concurrent stuff to the object level, it would be so much easier to state that a message for an object is just that: An async message sent from one object to another? :-) A simple decorator like @task would be enough: @task # explicit run instance in own thread/coroutine class SomeTask(object): def asyc_add(self, x, y) return x + y # returns a Future() with result task = SomeTask() n = task.async_add(2,2) # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will wait/hang until result is ready br /rene > > Personally, I like Greg's PEP 3152 (aside from 'cocall' keyword). > With that approach it's easy to distinguish coroutines, generators and > plain functions. And it'd be easier to add some special > methods/properties to codefs, like 'in_finally()' method etc. > > - > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Mon Oct 29 23:26:59 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 15:26:59 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 3:19 PM, Andrew Svetlov wrote: > Twisted's DelayedCall is different from Deferred, it used for > reactor.callLater and returned from this function (the same as > call_later from tulip) > Interface is: http://twistedmatrix.com/trac/browser/trunk/twisted/internet/interfaces.py#L676 > Implementation is > http://twistedmatrix.com/trac/browser/trunk/twisted/internet/base.py#L35 > DelayedCall from twisted has nothing common with Deferred, it's just > an interface for scheduled activity, which can be called once, > cancelled or rescheduled to another time. > > I've found that concept very useful when I used twisted. Oh dear. I had no idea there was something named DelayedCall in Twisted. There is no intention of similarity. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 30 00:00:14 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 29 Oct 2012 23:00:14 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <25B4FAEA-2ED8-4AED-8792-D75576320C2C@stranden.com> References: <20121029170731.74bd3d37@cosmocat> <90D4E462-4C49-4BEF-BEA1-F725C5EE352F@gmail.com> <25B4FAEA-2ED8-4AED-8792-D75576320C2C@stranden.com> Message-ID: Rene Nejsum wrote: >> [SNIP] >> >> That's what bothers me is well. 'yield from' looks too long for a >> simple thing it does (1); users will be confused whether they should >> use 'yield' or 'yield from' (2); there is no visible difference >> between a plain generator and a coroutine (3). > > I agree, was this ever commented ? I know it maybe late in the discussion > but just because you can use yield/yield from for concurrent stuff, should you? > > it looks very implicit to me (breaking the second rule) > > Have the delegate/event model of C# been discussed ? > > As always i recommend moving the concurrent stuff to the object level, it > would be so much easier to state that a message for an object is just that: > An async message sent from one object to another... :-) A simple decorator > like @task would be enough: > > @task # explicit run instance in own thread/coroutine class SomeTask(object): > def asyc_add(self, x, y) > return x + y # returns a Future() with result > > task = SomeTask() > n = task.async_add(2,2) > # Do other stuff while waiting for answer print( "result is %d" % n ) # Future will > wait/hang until result is ready I think you'll like what I'll be sending out later tonight (US Pacific time), so hold on :) (In the meantime, feel free to read up on C#'s async/await model, which is very similar to what both Guido and I are proposing and has already been pretty well received.) Cheers, Steve From greg.ewing at canterbury.ac.nz Tue Oct 30 00:16:22 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 12:16:22 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <508F0E46.7040706@canterbury.ac.nz> Steve Dower wrote: > - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? I don't think that writing new schedulers is something an end user will do very often. Or more precisely, it's not something they should *have* to do except in extremely unusual circumstances. I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory for the vast majority of applications. -- Greg From Steve.Dower at microsoft.com Tue Oct 30 00:12:54 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 29 Oct 2012 23:12:54 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: Guido van Rossum wrote: > >> By the way I don't know how this whole approach (of mocking >> socket-like or file-like objects with coroutine-y read() / readline() >> methods) lends itself to plugging into Windows' IOCP. > > Me neither. I hope Steve Dower can tell us. I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do. >From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know). The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each. What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries. Cheers, Steve From guido at python.org Tue Oct 30 00:21:38 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 16:21:38 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <25B4FAEA-2ED8-4AED-8792-D75576320C2C@stranden.com> References: <20121029170731.74bd3d37@cosmocat> <90D4E462-4C49-4BEF-BEA1-F725C5EE352F@gmail.com> <25B4FAEA-2ED8-4AED-8792-D75576320C2C@stranden.com> Message-ID: On Mon, Oct 29, 2012 at 3:23 PM, Rene Nejsum wrote: > > On Oct 29, 2012, at 5:59 PM, Yury Selivanov wrote: > >> On 2012-10-29, at 12:07 PM, Antoine Pitrou wrote: >> >>>> In the docstrings I use the prefix "COROUTINE:" to indicate public >>>> APIs that should be invoked using yield from. >>> >>> Hmm, should they? Your approach looks a bit weird: you have functions >>> that should use yield, and others that should use "yield from"? That >>> sounds confusing to me. >>> >>> I'd much rather either have all functions use "yield", or have all >>> functions use "yield from". >>> >>> (also, I wouldn't be shocked if coroutines had to wear a special >>> decorator; it's a better marker than having the word COROUTINE in the >>> docstring, anyway :-)) >> >> That's what bothers me is well. 'yield from' looks too long for a >> simple thing it does (1); users will be confused whether they should >> use 'yield' or 'yield from' (2); there is no visible difference between >> a plain generator and a coroutine (3). > > I agree, was this ever commented ? I know it maybe late in the discussion > but just because you can use yield/yield from for concurrent stuff, should you? I explained my position on yield vs. yield from twice already in this thread. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 30 00:26:17 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 29 Oct 2012 23:26:17 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <508F0E46.7040706@canterbury.ac.nz> References: <508F0E46.7040706@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Steve Dower wrote: > >> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? > > I don't think that writing new schedulers is something an end user will do very often. Or > more precisely, it's not something they should *have* to do except in extremely > unusual circumstances. > > I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory > for the vast majority of applications. I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation. Cheers, Steve From guido at python.org Tue Oct 30 00:29:00 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 16:29:00 -0700 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower wrote: > Guido van Rossum wrote: >> >>> By the way I don't know how this whole approach (of mocking >>> socket-like or file-like objects with coroutine-y read() / readline() >>> methods) lends itself to plugging into Windows' IOCP. >> >> Me neither. I hope Steve Dower can tell us. > > I suppose since my name has been invoked I ought to comment, though Richard (Oudkerk, I think?) seems to have more experience with IOCP than I do. Aha, somehow I thought Richard was a Mac expert. :-( > From my point of view, IOCP fits in very well provided the callbacks (which will run in the IOCP thread pool) are only used to unblock tasks. Yes, it then will not be a pure single-threaded model, but on the other hand it isn't going to use an unbounded number of threads. There are alternatives to IOCP, but they will require expert hands to make them efficient under scale - IOCP has already had the expect hands applied (I assume... maybe it was written by an intern? I really don't know). Experts all point in its direction, so I believe IOCP is solid. > The whole blocking coroutine model works really well with callback-based unblocks (whether they call Future.set_result or unblock_task), so I don't think there's anything to worry about here. Compatibility-wise, it should be easy to make programs portable, and since we can have completely separate implementations for Linux/Mac/Windows it will be possible to get good, if not excellent, performance out of each. Right. Did you see my call_in_thread() yet? http://code.google.com/p/tulip/source/browse/scheduling.py#210 http://code.google.com/p/tulip/source/browse/polling.py#481 > What will make a difference is the ready vs. complete notifications - most async Windows APIs will signal when they are complete (for example, the data has been read from the file) unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this difference up by making all APIs notify on completion, and if we don't do this then user code may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important consideration for good cross-platform libraries. I wonder if this could be done by varying the transports by platform? Not too many people are going to write new transports -- there just aren't that many options. And those that do might be doing something platform-specific anyway. It shouldn't be that hard to come up with a transport abstraction that lets protocol implementations work regardless of whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support could use those too. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 30 00:37:52 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 16:37:52 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <508F0E46.7040706@canterbury.ac.nz> Message-ID: On Mon, Oct 29, 2012 at 4:26 PM, Steve Dower wrote: > Greg Ewing wrote: >> Steve Dower wrote: >> >>> - how easy/difficult/flexible/restrictive is it to write a new scheduler as an end user? >> >> I don't think that writing new schedulers is something an end user will do very often. Or >> more precisely, it's not something they should *have* to do except in extremely >> unusual circumstances. >> >> I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory >> for the vast majority of applications. > > I agree, and I chose my words poorly for that point: "library/framework developers" is more accurate than "end user". And since I expect every GUI framework is going to need (or at least want) their own scheduler, not to mention all the cases of Python being embedded in other programs, there is some value in helping these developers to get it right by virtue of the design rather than relying on documentation. BTW, would it be useful to separate this into pollster, eventloop, and scheduler? At least in my world these are different; of these three, only the pollster contains platform-specific code (and then again the transports do too -- this is a nice match IMO). -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 30 00:47:51 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 29 Oct 2012 23:47:51 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: > Guido van Rossum wrote: > [SNIP] > > On Mon, Oct 29, 2012 at 4:12 PM, Steve Dower wrote: >> The whole blocking coroutine model works really well with callback-based unblocks >> (whether they call Future.set_result or unblock_task), so I don't think there's anything >> to worry about here. Compatibility-wise, it should be easy to make programs portable, >> and since we can have completely separate implementations for Linux/Mac/Windows it >> will be possible to get good, if not excellent, performance out of each. > > Right. Did you see my call_in_thread() yet? > http://code.google.com/p/tulip/source/browse/scheduling.py#210 > http://code.google.com/p/tulip/source/browse/polling.py#481 Yes, and it really stood out as one of the similarities between our work. I don't have an equivalent function, since writing "yield thread_pool.submit(...)" is sufficient (because it already returns a Future), but I haven't actually made the thread pool a property of the current scheduler. I think there's value in it >> What will make a difference is the ready vs. complete notifications - most async Windows >> APIs will signal when they are complete (for example, the data has been read from the file) >> unlike many (most? All?) Linux APIs that signal when they are ready. It is possible to wrap this >> difference up by making all APIs notify on completion, and if we don't do this then user code >> may be less portable, which I'd hate to see. It doesn't directly relate to IOCP, but it is an important >> consideration for good cross-platform libraries. > > I wonder if this could be done by varying the transports by platform? > Not too many people are going to write new transports -- there just aren't that many options. > And those that do might be doing something platform-specific anyway. It shouldn't be that hard > to come up with a transport abstraction that lets protocol implementations work regardless of > whether it's a UNIX style transport or a Windows style transport. UNIX systems with IOCP support > could use those too. I feel like a bit of a tease now, since I still haven't posted my code (it's coming, but I also have day work to do [also Python related]), but I've really left this side of things out of my definition completely in favour of allowing schedulers to "unblock" known functions. For example, (library) code that needs a socket to be ready can ask the current scheduler if it can do "select([sock], [], [])", and if the scheduler can then it will give the library code a Future. How the scheduler ends up implementing the asynchronous-select is entirely up to the scheduler, and if it can't do it, the caller can do it their own way (which probably means using a thread pool as a last resort). What I would expect this to result in is a set of platform-specific default schedulers that do common operations well and other (3rd-party) schedulers that do particular things really well. So if you want high performance single-threaded sockets, you replace the default scheduler with another one - but if Windows doesn't support the optimized scheduler, you can use the default scheduler without your code breaking. Writing this now it seems to be even clearer that we've approached the problem differently, which should mean there'll be room to share parts of the designs and come up with a really solid result. I'm looking forward to it. Cheers, Steve From greg.ewing at canterbury.ac.nz Tue Oct 30 00:53:56 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 12:53:56 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <201210291609.51091.mark.hackett@metoffice.gov.uk> References: <201210291609.51091.mark.hackett@metoffice.gov.uk> Message-ID: <508F1714.9080808@canterbury.ac.nz> Mark Hackett wrote: > Is that actually true? It may be guaranteed on Intel x86 compatibles and Linux > (because of the string operations available in the x86 instruction set), but I > don't thing anything other than an IPC message has a "you can write a string > atomically" guarantee. And I may be misremembering that. It seems to be a POSIX requirement: PIPE_BUF POSIX.1-2001 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. (From http://dell9.ma.utexas.edu/cgi-bin/man-cgi?pipe+7) There's no corresponding guarantee for reading, though. The process on the other end can't be sure of getting the data from one write() call in a single read() call. In other words, the write does *not* establish a record boundary. -- Greg From shibturn at gmail.com Tue Oct 30 01:01:23 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 30 Oct 2012 00:01:23 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: On 29/10/2012 11:29pm, Guido van Rossum wrote: > I wonder if this could be done by varying the transports by platform? > Not too many people are going to write new transports -- there just > aren't that many options. And those that do might be doing something > platform-specific anyway. It shouldn't be that hard to come up with a > transport abstraction that lets protocol implementations work > regardless of whether it's a UNIX style transport or a Windows style > transport. UNIX systems with IOCP support could use those too. Yes, having separate implementations of the transport layer should work. But I think it would be cleaner to put all the platform specific stuff in the pollster, and make the pollster poll-for-completion rather than poll-for-readiness. (Is this the "proactor pattern"?) That seems to be the direction libevent has moved in. -- Richard From greg.ewing at canterbury.ac.nz Tue Oct 30 01:06:29 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 13:06:29 +1300 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> Message-ID: <508F1A05.3020401@canterbury.ac.nz> Yury Selivanov wrote: > Because scheduler, when it is deciding to interrupt a coroutine or not, > should only question whether that particular coroutine is in its finally, > and not the one which called it. So given this: def c1(): try: something() finally: yield from c2() very_important_cleanup() def c2(): yield from block() # 1 it should be okay to interrupt at point 1, even though it will prevent very_important_cleanup() from being done? That doesn't seem right to me. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 01:19:08 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 13:19:08 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: <508F1CFC.3060409@canterbury.ac.nz> Guido van Rossum wrote: >>I would to see add_{reader,writer} and call_{soon,later} accepting >>**kwargs as well as *args. At least to respect functions with >>keyword-only arguments. > > Hmm... I intentionally ruled those out because I wanted to leave the > door open for keyword args that modify the registration function One way to accommodate that would be to make the registration API look like this: call_later(my_func)(arg1, ..., kwd = value, ...) -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 01:24:18 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 13:24:18 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <508F1E32.9070205@canterbury.ac.nz> Guido van Rossum wrote: > I can build lots of other useful things out of call_soon() and > call_later() -- but I do need at least those two as "axioms". Isn't call_soon() equivalent to call_later() with a time delay of 0? If so, then call_later() is really the only axiomatic one. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 01:25:43 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 13:25:43 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> Message-ID: <508F1E87.7000307@canterbury.ac.nz> Andrew Svetlov wrote: > 0MQ socket has no file descriptor at all, it's just pointer to some > unspecified structure. > So 0MQ has own *poll* function which can process that sockets as well > as file descriptors. Aaargh... yet another event loop that wants to rule the world. This is not good. -- Greg From ncoghlan at gmail.com Tue Oct 30 01:34:24 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 30 Oct 2012 10:34:24 +1000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: On Tue, Oct 30, 2012 at 9:29 AM, Guido van Rossum wrote: > Aha, somehow I thought Richard was a Mac expert. :-( Just in case anyone else confused the two names (I know I have in the past): Ronald Oussoren = Mac expert Richard Oudkerk = multiprocessing expert (including tools for inter-process communication) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yselivanov.ml at gmail.com Tue Oct 30 01:44:03 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 20:44:03 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508F1A05.3020401@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> <508F1A05.3020401@canterbury.ac.nz> Message-ID: On 2012-10-29, at 8:06 PM, Greg Ewing wrote: > Yury Selivanov wrote: >> Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it. > > So given this: > > def c1(): > try: > something() > finally: > yield from c2() > very_important_cleanup() > > def c2(): > yield from block() # 1 > > it should be okay to interrupt at point 1, even though > it will prevent very_important_cleanup() from being done? > > That doesn't seem right to me. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From yselivanov.ml at gmail.com Tue Oct 30 01:43:23 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 20:43:23 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> Guido, Finally got some time to do a review & read what others posted. Some comments are more general, some are more implementation-specific (hopefully you want to hear latter ones as well) And I'm still in the process of digesting your approach & code (as I've spent too much time with my implementation)... On 2012-10-28, at 7:52 PM, Guido van Rossum wrote: [...] > polling.py: http://code.google.com/p/tulip/source/browse/polling.py [...] 1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd be able to add many different pollsters to one EventLoop. This way you can have specialized pollster for different types of IO, including UI etc. 2. Sometimes, there is a need to run a coroutine in a threadpool. I know it sounds weird, but it's probably worth exploring. 3. In my framework each threadpool worker has its own local context, with various information like what Task run the operation etc. And few small things: 4. epoll.poll and other syscalls need to be wrapped in try..except to catch and ignore (and log?) EINTR type of exceptions. 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors too. > scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py [...] > In the docstrings I use the prefix "COROUTINE:" to indicate public > APIs that should be invoked using yield from. [...] As others, I would definitely suggest adding a decorator to make coroutines more distinguishable. It would be even better if we can return a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', instead of: task = scheduling.Task(doit(), timeout=2.1) task.start() scheduling.run() And avoid manual Task instantiation at all. I also liked the simplicity of the Task class. I think it'd be easy to mix greenlets in it by switching in a new greenlet on each 'step'. That will give you 'yield_()' function, which you can use in the same way you use 'yield' statement now (I'm not proposing to incorporate greenlets in the lib itself, but rather to provide an option to do so) Hence there should be a way to plug your own Task (sub-)class in. Thank you, Yury From yselivanov.ml at gmail.com Tue Oct 30 02:00:46 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 21:00:46 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508F1A05.3020401@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> <508F1A05.3020401@canterbury.ac.nz> Message-ID: <7789C8EE-2002-4BF6-A617-88A8EA0DB646@gmail.com> Oh... I'm sorry for the empty reply. On 2012-10-29, at 8:06 PM, Greg Ewing wrote: > Yury Selivanov wrote: >> Because scheduler, when it is deciding to interrupt a coroutine or not, should only question whether that particular coroutine is in its finally, and not the one which called it. > > So given this: > > def c1(): > try: > something() > finally: > yield from c2() > very_important_cleanup() > > def c2(): > yield from block() # 1 > > it should be okay to interrupt at point 1, even though > it will prevent very_important_cleanup() from being done? > > That doesn't seem right to me. But you don't just randomly interrupt coroutines. You interrupt them when you *explicitly stated*, for instance, that this very one coroutine is executed with a timeout. And it's your responsibility to handle a TimeoutError when you call it with such restriction. That's the *main* thing here. Again, when you, explicitly, execute something with a timeout), then that very something shouldn't be interrupted uncontrollably by the scheduler. It's that particular something, whose 'finally' should be protected. So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout. There simply no reason to interrupt it ever. But if you want to make c2() interruptible, you would write: def c1(): try: something() finally: yield from with_timeout(2.0, c2()) very_important_cleanup() And that way, c2() actually may be (and at some point will be) interrupted by scheduler. And it's your responsibility to catch TimeoutError. So you would write your code in the following way to protect c1's finally statement: def c1(): try: something() finally: try: yield from with_timeout(2.0, c2()) except TimeoutError: ... very_important_cleanup() Now, the problem is that when you call c2() with a timeout, scheduler should not interrupt c2's finally statement (if there is any). And it has nothing to do with c1 entirely. So if c2() code is like the following: def c2(): try: something() finally: yield from someotherthing() important_cleanup() Then you need scheduler to know if it is in its finally or not. Because its c2() which was run with a timeout. It's c2() code that may be subject to aborting. And it doesn't matter from where c2() was called, the only thing that matters, is that if it was called with a timeout, its finally block should be protected from interrupting. That's all. - Yury From guido at python.org Tue Oct 30 02:02:43 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 18:02:43 -0700 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: On Mon, Oct 29, 2012 at 5:01 PM, Richard Oudkerk wrote: > On 29/10/2012 11:29pm, Guido van Rossum wrote: >> >> I wonder if this could be done by varying the transports by platform? >> Not too many people are going to write new transports -- there just >> aren't that many options. And those that do might be doing something >> platform-specific anyway. It shouldn't be that hard to come up with a >> transport abstraction that lets protocol implementations work >> regardless of whether it's a UNIX style transport or a Windows style >> transport. UNIX systems with IOCP support could use those too. > > > Yes, having separate implementations of the transport layer should work. > > But I think it would be cleaner to put all the platform specific stuff in > the pollster, and make the pollster poll-for-completion rather than > poll-for-readiness. (Is this the "proactor pattern"?) That seems to be the > direction libevent has moved in. Interesting. I'd like to hear what Twisted thinks of this. (I will find out next week. :-) -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 30 02:40:53 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 01:40:53 +0000 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: To save people scrolling to get to the interesting parts, I'll lead with the links: Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal Source code: https://bitbucket.org/stevedower/tulip/src (Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.) Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform). There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism. For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same): @async def doit(): TIMEOUT = 2 cs = CancellationSource() cs.cancel_after(TIMEOUT) tasks = set() task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs) tasks.add(task1) task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs) tasks.add(task2) task3 = urlfetch('python.org', 80, path='/', cancel_source=cs) tasks.add(task3) task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs) tasks.add(task4) ## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)? winners = [t.result() for t in tasks if t.done()] print('And the winners are:', [w for w in winners]) results = [] # This 'wait all' loop could easily be a helper function for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :( results.append((yield t)) print('And the players were:', [r for r in results]) return results This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads. However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach. Cheers, Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Oct 30 02:59:56 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 21:59:56 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <508C6C7F.5050408@canterbury.ac.nz> Message-ID: Guido, Greg, On 2012-10-27, at 7:45 PM, Yury Selivanov wrote: > Right. But now I'm not sure this approach will work with yield-froms. > As when you yield-fromming scheduler knows nothing about the chain of > generators, as it's all hidden in the yield-from implementation. I think I've come up with a solution that should work for yield-froms too (if we accept my in_finally idea in 3.4). And there should be a way of writing a 'protect_finally' context manager too. I'll illustrate the approach on Guido's tulip micro-framework (consider it a pseudo code to illustrate the idea): class Interrupt(BaseException): """Should penetrate all try..excepts""" def call_with_timeout(timeout, gen): context.current_task._add_timeout(timeout, gen) try: return (yield from gen) except Interrupt: raise TimeoutError() from None class Task: def _add_timeout(timeout, gen): self.eventloop.call_later( timeout, partial(self._interrupt, gen)) def _interrupt(self, gen): if not gen.in_finally: gen.throw(Interrupt, Interrupt(), None) else: # So we set a flag to watch for gen's in_finally value # on each 'step' call. And when it's 0 - Task.step # will call '_interrupt' again. self._watch_finally(gen) I defined a new function 'call_with_timeout', because tulip's 'with_timeout' starts a new Task, whereas the former works in any generator inside the task. So, after that you'd be able to do the following: yield from call_with_timeout(1.0, something()) And something's 'finally' won't ever be aborted. - Yury From Steve.Dower at microsoft.com Tue Oct 30 03:06:37 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 02:06:37 +0000 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: , Message-ID: Possibly I should have selected a different code name, now I come to think of it, but we came up with such similar code that I don't think it'll stay separate for too long. ________________________________ From: Python-ideas [python-ideas-bounces+steve.dower=microsoft.com at python.org] on behalf of Steve Dower [Steve.Dower at microsoft.com] Sent: Monday, October 29, 2012 6:40 PM To: python-ideas at python.org Subject: [Python-ideas] Async API: some more code to review To save people scrolling to get to the interesting parts, I'll lead with the links: Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal Source code: https://bitbucket.org/stevedower/tulip/src (Yes, I renamed my repo after the code name was selected. That would have been far too much of a coincidence.) Practically all of the details are in the write-up linked first, so anything that's not is either something I didn't think of or something I decided is unimportant right now (for example, the optimal way to wait for ten thousand sockets simultaneously on every different platform). There's a reimplemented Future class in the code which is not essential, but it is drastically simplified from concurrent.futures.Future (CFF). It can't be directly replaced by CFF, but only because CFF requires more state management that the rest of the implementation does not perform ("set_running_or_notify_cancel"). CFF also includes cancellation, for which I've proposed a different mechanism. For the sake of a quick example, I've modified Guido's main.doit function (http://code.google.com/p/tulip/source/browse/main.py) to how it could be written with my proposal (apologies if I've butchered it, but I think it should behave the same): @async def doit(): TIMEOUT = 2 cs = CancellationSource() cs.cancel_after(TIMEOUT) tasks = set() task1 = urlfetch('localhost', 8080, path='/', cancel_source=cs) tasks.add(task1) task2 = urlfetch('127.0.0.1', 8080, path='/home', cancel_source=cs) tasks.add(task2) task3 = urlfetch('python.org', 80, path='/', cancel_source=cs) tasks.add(task3) task4 = urlfetch('xkcd.com', ssl=True, path='/', af=socket.AF_INET, cancel_source=cs) tasks.add(task4) ## for t in tasks: t.start() # tasks start as soon as they are called - this function does not exist yield delay(0.2) # I believe this is equivalent to scheduling.with_timeout(0.2, ...)? winners = [t.result() for t in tasks if t.done()] print('And the winners are:', [w for w in winners]) results = [] # This 'wait all' loop could easily be a helper function for t in tasks: # Unfortunately, [(yield t) for t in tasks] does not work :( results.append((yield t)) print('And the players were:', [r for r in results]) return results This is untested code, and has a few differences. I don't have task names, so it will print the returned value from urlfetch (a tuple of (host, port, path, status, len(data), time_taken)). The cancellation approach is quite different, but IMO far more likely to avoid the finally-related issues discussed in other threads. However, I want to emphasise that unless you are already familiar with this exact style, it is near impossible to guess exactly what is going on from this little sample. Please read the write-up before assuming what is or is not possible with this approach. Cheers, Steve -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 30 03:07:21 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 19:07:21 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> References: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> Message-ID: On Mon, Oct 29, 2012 at 5:43 PM, Yury Selivanov wrote: > Finally got some time to do a review & read what others posted. Great! > Some comments are more general, some are more implementation-specific > (hopefully you want to hear latter ones as well) Yes! > And I'm still in the process of digesting your approach & code (as > I've spent too much time with my implementation)... Heh. :-) > On 2012-10-28, at 7:52 PM, Guido van Rossum wrote: > [...] >> polling.py: http://code.google.com/p/tulip/source/browse/polling.py > [...] > > 1. I'd make EventLoopMixin a separate entity from pollsters. So that you'd > be able to add many different pollsters to one EventLoop. This way > you can have specialized pollster for different types of IO, including > UI etc. I came to the same conclusion, so I fixed this. See the latest version. (BTW, I also renamed add_reader() etc. on the Pollster class to register_reader() etc. -- I dislike similar APIs on different classes to have the same name if there's not a strict super class override involved.) > 2. Sometimes, there is a need to run a coroutine in a threadpool. I know it > sounds weird, but it's probably worth exploring. I think that can be done quite simply. Since each thread has its own eventloop (via the magic of TLS), it's as simple as writing a function that creates a task, starts it, and then runs the eventloop. There's nothing else running in that particular thread, and its eventloop will terminate when there's nothing left to do there -- i.e. when the task is done. Sketch: def some_generator(arg): ...stuff using yield from... return 42 def run_it_in_the_threadpool(arg): t = Task(some_generator(arg)) t.start() scheduling.run() return t.result # And in your code: result = yield from scheduling.call_in_thread(run_it_in_the_threadpool, arg) # Now result == 42. > 3. In my framework each threadpool worker has its own local context, with > various information like what Task run the operation etc. I think I have this too -- Thread-Local Storage! > And few small things: > > 4. epoll.poll and other syscalls need to be wrapped in try..except to catch > and ignore (and log?) EINTR type of exceptions. Good point. > 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors > too. Do you have a code sample? I haven't found a need yet. >> scheduling.py: http://code.google.com/p/tulip/source/browse/scheduling.py > [...] > >> In the docstrings I use the prefix "COROUTINE:" to indicate public >> APIs that should be invoked using yield from. > [...] > > As others, I would definitely suggest adding a decorator to make > coroutines more distinguishable. That's definitely on my TODO list. > It would be even better if we can return > a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', > instead of: > > task = scheduling.Task(doit(), timeout=2.1) > task.start() > scheduling.run() The run() call shouldn't be necessary unless you are at the toplevel. > And avoid manual Task instantiation at all. Hm. I want the generator function to return just a generator object, and I can't add methods to that. But we can come up with a decent API. > I also liked the simplicity of the Task class. I think it'd be easy > to mix greenlets in it by switching in a new greenlet on each 'step'. > That will give you 'yield_()' function, which you can use in the same > way you use 'yield' statement now (I'm not proposing to incorporate > greenlets in the lib itself, but rather to provide an option to do so) > Hence there should be a way to plug your own Task (sub-)class in. Hm. Someone else will have to give that a try. Thanks for your feedback!! -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Tue Oct 30 03:18:25 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 22:18:25 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> Message-ID: On 2012-10-29, at 10:07 PM, Guido van Rossum wrote: [...] >> 5. For epoll you probably want to check/(log?) EPOLLHUP and EPOLLERR errors >> too. > > Do you have a code sample? I haven't found a need yet. Just a code dump from my epoll proactor: if ev & EPOLLHUP: sock.close(_error_cls=ConnectionResetError) self._unschedule(fd) continue if ev & EPOLLERR: sock.close(_error_cls=ConnectionError, _error_msg='socket error in epoll proactor') self._unschedule(fd) continue [...] >> It would be even better if we can return >> a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', >> instead of: >> >> task = scheduling.Task(doit(), timeout=2.1) >> task.start() >> scheduling.run() > > The run() call shouldn't be necessary unless you are at the toplevel. Yes, that's just a sugar to make top-level runs more appealing. You'll also get a nice way of setting timeouts, yield from coro().with_timeout(1.0) [...] >> I also liked the simplicity of the Task class. I think it'd be easy >> to mix greenlets in it by switching in a new greenlet on each 'step'. >> That will give you 'yield_()' function, which you can use in the same >> way you use 'yield' statement now (I'm not proposing to incorporate >> greenlets in the lib itself, but rather to provide an option to do so) >> Hence there should be a way to plug your own Task (sub-)class in. > > Hm. Someone else will have to give that a try. I'll be that someone once we choose the direction ;) IMO the greenlets integration is a very important topic. - Yury From yselivanov.ml at gmail.com Tue Oct 30 03:29:59 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 29 Oct 2012 22:29:59 -0400 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: On 2012-10-29, at 9:40 PM, Steve Dower wrote: > To save people scrolling to get to the interesting parts, I'll lead with the links: > > Detailed write-up: https://bitbucket.org/stevedower/tulip/wiki/Proposal > > Source code: https://bitbucket.org/stevedower/tulip/src Your design looks very similar to the framework I developed. I'll try to review your code in detail tomorrow. Couple of things I like already: 1) Use of 'yield from' is completely optional 2) @async decorator. That makes coroutines more visible and allows to add extra methods to them. 3) Tight control over coroutines execution, something that is completely missing when you use yield-from. I dislike the choice of name for 'async', though. Since @async-decorated functions are going to be yielded most of the time (yield makes them "sync" in that context), I'd stick to plain @coroutine. P.S. If this approach is viable (optional yield-from, required @async-or-something decorator), I can invest some time and open source the core of my framework (one benefit is that it has lots and lots of unit-tests). - Yury From yselivanov.ml at gmail.com Tue Oct 30 05:08:25 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 30 Oct 2012 00:08:25 -0400 Subject: [Python-ideas] Async API In-Reply-To: References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <50873551.5040207@canterbury.ac.nz> <50885DDC.4050108@canterbury.ac.nz> <80A26DE4-9B7C-4C27-B8F2-68E25CC2B14A@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <508C6C7F.5050408@canterbury.ac.nz> Message-ID: On 2012-10-29, at 9:59 PM, Yury Selivanov wrote: > Guido, Greg, > > On 2012-10-27, at 7:45 PM, Yury Selivanov wrote: > >> Right. But now I'm not sure this approach will work with yield-froms. >> As when you yield-fromming scheduler knows nothing about the chain of >> generators, as it's all hidden in the yield-from implementation. > > I think I've come up with a solution that should work for yield-froms too > (if we accept my in_finally idea in 3.4). And there should be a way > of writing a 'protect_finally' context manager too. > > I'll illustrate the approach on Guido's tulip micro-framework > (consider it a pseudo code to illustrate the idea): > > class Interrupt(BaseException): > """Should penetrate all try..excepts""" > > def call_with_timeout(timeout, gen): > context.current_task._add_timeout(timeout, gen) > try: > return (yield from gen) > except Interrupt: > raise TimeoutError() from None > > class Task: > def _add_timeout(timeout, gen): > self.eventloop.call_later( > timeout, > partial(self._interrupt, gen)) > > def _interrupt(self, gen): > if not gen.in_finally: > gen.throw(Interrupt, Interrupt(), None) > else: > # So we set a flag to watch for gen's in_finally value > # on each 'step' call. And when it's 0 - Task.step > # will call '_interrupt' again. > self._watch_finally(gen) > > I defined a new function 'call_with_timeout', because tulip's 'with_timeout' > starts a new Task, whereas the former works in any generator inside the task. > > So, after that you'd be able to do the following: > > yield from call_with_timeout(1.0, something()) > > And something's 'finally' won't ever be aborted. Ah, the solution is wrong, I've tricked myself. The right code would be something like that: class Interrupt(BaseException): """Should penetrate all try..excepts""" def call_with_timeout(timeout, gen): context.current_task._add_timeout(timeout, gen) try: return (yield from gen) except Interrupt: raise TimeoutError() from None class Task: def _add_timeout(timeout, gen): # XXX The following line is the key. We need a reference # to the generator object that is yield-fromming our 'gen' # ('caller' for 'gen') current_yield_from = self.gen.yield_from self.eventloop.call_later( timeout, partial(self._interrupt, gen, current_yield_from)) def _interrupt(self, gen, yf): if not yf.in_finally: # If gen's caller is not in it's finally block - it's # safe for us to interrupt gen. gen.throw(Interrupt, Interrupt(), None) else: # So we set a flag to watch for yf's in_finally value # on each 'step' call. And when it's 0 - Task.step # will call '_interrupt' again. self._watch_finally(yf, gen) IOW, besides just 'in_finally', we also need to add 'yield_from' property to generator object. The latter will hold a reference to the sub-generator that current generator is yielding from. The logic is pretty twisted, but i'm sure that the problem is solvable. P.S. I'm not proposing to add anything. It's more about finding *any* way to actually solve the problem correctly. Once we find that way, we *maybe* start thinking about language support of it. - Yury From guido at python.org Tue Oct 30 05:25:21 2012 From: guido at python.org (Guido van Rossum) Date: Mon, 29 Oct 2012 21:25:21 -0700 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: On Monday, October 29, 2012, Steve Dower wrote: > Possibly I should have selected a different code name, now I come to > think of it, but we came up with such similar code that I don't think it'll > stay separate for too long. > Hm, yes, this felt weird. I figured the code names would be useful to reference the proposals when comparing them, not as the ultimate eventual project name once it's beeb PEP-ified and put in the stdlib. Maybe you can call yours "wattle"? That's a Pythonic plant name. :-) (Sorry, still reading through your docs and code, it's too early for more substantial fedback.) --Guido -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Tue Oct 30 05:38:28 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 04:38:28 +0000 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: , Message-ID: Guido van Rossum wrote: > On Monday, October 29, 2012, Steve Dower wrote: > >> Possibly I should have selected a different code name, >> now I come to think of it, but we came up with such >> similar code that I don't think it'll stay separate for too long. > > Hm, yes, this felt weird. I figured the code names would be > useful to reference the proposals when comparing them, not > as the ultimate eventual project name once it's beeb PEP-ified > and put in the stdlib. > > Maybe you can call yours "wattle"? That's a Pythonic plant name. :-) Nice idea. I renamed it and (hopefully) made it so the original links still work. https://bitbucket.org/stevedower/wattle/src https://bitbucket.org/stevedower/wattle/wiki/Proposal I was never expecting the name to last, I just figured you had to make something up to create a project. Eventually it will all just become a boring PEP-xxx number... Cheers, Steve From greg.ewing at canterbury.ac.nz Tue Oct 30 06:10:28 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 18:10:28 +1300 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: <508F6144.2040105@canterbury.ac.nz> Steve Dower wrote: > From my point of view, IOCP fits in very well provided the callbacks (which will > run in the IOCP thread pool) are only used to unblock tasks. Is it really necessary to have a separate thread just to handle unblocking tasks? That thread will have very little to do, so it could just as well run the tasks too, couldn't it? -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 06:20:13 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 18:20:13 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <508F0E46.7040706@canterbury.ac.nz> Message-ID: <508F638D.8030303@canterbury.ac.nz> Steve Dower wrote: >>I believe it will be possible to provide a scheduler in the stdlib that will be satisfactory >>for the vast majority of applications. > > I agree, and I chose my words poorly for that point: "library/framework > developers" is more accurate than "end user". I don't think that even library developers should need to write their own scheduler very often. > And since I expect every GUI > framework is going to need (or at least want) their own scheduler, I don't agree with that. They might need their own event loop, but I haven't seen any reason so far to think they would need their own coroutine scheduler. Remember that Guido wants to keep the event loop stuff and the scheduler stuff very clearly separated. The scheduler will all be pure Python and should be usable with just about any event loop. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 06:27:34 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 18:27:34 +1300 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> Message-ID: <508F6546.2010301@canterbury.ac.nz> Steve Dower wrote: > For example, (library) code that needs > a socket to be ready can ask the current scheduler if it can do "select([sock], > [], [])", I think you're mixing up the scheduler and event loop layers here. If the scheduler is involved in this at all, it would only be to pass the request on to the event loop. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 06:36:10 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 18:36:10 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> References: <8C3C7F0F-04A7-4216-8C06-2C4BF165CC4B@gmail.com> Message-ID: <508F674A.9050701@canterbury.ac.nz> Yury Selivanov wrote: > It would be even better if we can return > a tiny wrapper, that lets you to simply write 'doit.run().with_timeout(2.1)', > instead of: > > task = scheduling.Task(doit(), timeout=2.1) > task.start() > scheduling.run() I would prefer spelling this something like scheduling.spawn(doit(), timeout=2.1) A newly spawned task should be scheduled automatically; if you're not ready for it to run yet, then don't spawn it until you are. Also, it should almost *never* be necessary to call scheduling.run(). That should happen only in a very few places, mostly buried deep inside the scheduling/event loop system. -- Greg From greg.ewing at canterbury.ac.nz Tue Oct 30 06:53:09 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 30 Oct 2012 18:53:09 +1300 Subject: [Python-ideas] Async API In-Reply-To: <7789C8EE-2002-4BF6-A617-88A8EA0DB646@gmail.com> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> <7789C8EE-2002-4BF6-A617-88A8EA0DB646@gmail.com> Message-ID: <508F6B45.7010300@canterbury.ac.nz> Yury Selivanov wrote: > So in your example scheduler would never ever has a question of > interrupting c2(), because it wasn't called with any restriction/timeout. > There simply no reason to interrupt it ever. But there's nothing to stop someone writing def c3(): try: yield from with_timeout(10.0, c1()) except TimeoutError: print("That's cool, I can cope with that") Also, it's not just TimeoutErrors that are a potential problem, it's any asynchronous exception. For example, the task calling c1() might get cancelled by another task while c2() is blocked. If cancelling is implemented by throwing in an exception, you have the same problem. > Then you need scheduler to know if it is in its finally or not. Because its > c2() which was run with a timeout. It's c2() code that may be subject to > aborting. I'm really not following your reasoning here. You seem to be speaking as if with_timeout() calls only have an effect one level deep. But that's not the case -- the frame that a TimeoutError gets thrown into by with_timeout() can be nested any number of yield-from calls deep. -- Greg From _ at lvh.cc Tue Oct 30 11:12:17 2012 From: _ at lvh.cc (Laurens Van Houtven) Date: Tue, 30 Oct 2012 11:12:17 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Hi, I've been following the PEP380-related threads and I've reviewed this stuff, while trying to do the protocols/transports PEP, and trying to glue the two together. The biggest difference I can see is that protocols as they've been discussed are "pull": they get called when some data arrives. They don't know how much data there is; they just get told "here's some data". The obvious difference with the API in, eg: https://code.google.com/p/tulip/source/browse/sockets.py#56 ... is that now I have to tell a socket to read n bytes, which "blocks" the coroutine, then I get some data. Now, there doesn't have to be an issue; you could simply say: data = yield from s.recv(4096) # that's the magic number usually right proto.data_received(4096) It seems a bit boilerplatey, but I suppose that eventually could be hidden away. But this style is pervasive, for example that's how reading by lines works: https://code.google.com/p/tulip/source/browse/echosvr.py#20 While I'm not a big fan (I may be convinced if I see a protocol test that looks nice); I'm just wondering if there's any point in trying to write the pull-style protocols when this works quite differently. Additionally, I'm not sure if readline belongs on the socket. I understand the simile with files, though. With the coroutine style I could see how the most obvious fit would be something like tornado's read_until, or an as_lines that essentially calls read_until repeatedly. Can the delimiter for this be modified? My main syntactic gripe is that when I write @inlineCallbacks code or monocle code or whatever, when I say "yield" I'm yielding to the reactor. That makes sense to me (I realize natural language arguments don't always make sense in a programming language context). "yield from" less so (but okay, that's what it has to look like). But this just seems weird to me: yield from trans.send(line.upper()) Not only do I not understand why I'm yielding there in the first place (I don't have to wait for anything, I just want to push some data out!), it feels like all of my yields have been replaced with yield froms for no obvious reason (well, there are reasons, I'm just trying to look at this naively). I guess Twisted gets away with this because of deferred chaining: that one deferred might have tons of callbacks in the background, many of which also doing IO operations, resulting in a sequence of asynchronous operations that only at the end cause the generator to be run some more. I guess that belongs in a different thread, though. Even, then, I'm not sure if I'm uncomfortable because I'm seeing something different from what I'm used to, or if my argument from English actually makes any sense whatsoever. Speaking of protocol tests, what would those look like? How do I yell, say, "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock transport, and call the handler with that? (I realize it's early days to be thinking that far ahead; I'm just trying to figure out how I can contribute a good protocol definition to all of this). cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Oct 30 12:36:41 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Oct 2012 12:36:41 +0100 Subject: [Python-ideas] non-blocking buffered I/O References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: <20121030123641.23224db2@cosmocat> Le Tue, 30 Oct 2012 18:10:28 +1300, Greg Ewing a ?crit : > Steve Dower wrote: > > > From my point of view, IOCP fits in very well provided the > > callbacks (which will run in the IOCP thread pool) are only used to > > unblock tasks. > > Is it really necessary to have a separate thread just to handle > unblocking tasks? That thread will have very little to do, so > it could just as well run the tasks too, couldn't it? The IOCP thread pool is managed by Windows, not you. Regards Antoine. From jkbbwr at gmail.com Tue Oct 30 14:10:53 2012 From: jkbbwr at gmail.com (Jakob Bowyer) Date: Tue, 30 Oct 2012 13:10:53 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Sorry to chime in, but would this be a case where there could be the syntax `yield to` ? On Tue, Oct 30, 2012 at 10:12 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > Hi, > > I've been following the PEP380-related threads and I've reviewed this stuff, > while trying to do the protocols/transports PEP, and trying to glue the two > together. > > The biggest difference I can see is that protocols as they've been discussed > are "pull": they get called when some data arrives. They don't know how much > data there is; they just get told "here's some data". The obvious difference > with the API in, eg: > > https://code.google.com/p/tulip/source/browse/sockets.py#56 > > ... is that now I have to tell a socket to read n bytes, which "blocks" the > coroutine, then I get some data. > > Now, there doesn't have to be an issue; you could simply say: > > data = yield from s.recv(4096) # that's the magic number usually right > proto.data_received(4096) > > It seems a bit boilerplatey, but I suppose that eventually could be hidden > away. > > But this style is pervasive, for example that's how reading by lines works: > > https://code.google.com/p/tulip/source/browse/echosvr.py#20 > > While I'm not a big fan (I may be convinced if I see a protocol test that > looks nice); I'm just wondering if there's any point in trying to write the > pull-style protocols when this works quite differently. > > Additionally, I'm not sure if readline belongs on the socket. I understand > the simile with files, though. With the coroutine style I could see how the > most obvious fit would be something like tornado's read_until, or an > as_lines that essentially calls read_until repeatedly. Can the delimiter for > this be modified? > > My main syntactic gripe is that when I write @inlineCallbacks code or > monocle code or whatever, when I say "yield" I'm yielding to the reactor. > That makes sense to me (I realize natural language arguments don't always > make sense in a programming language context). "yield from" less so (but > okay, that's what it has to look like). But this just seems weird to me: > > yield from trans.send(line.upper()) > > > Not only do I not understand why I'm yielding there in the first place (I > don't have to wait for anything, I just want to push some data out!), it > feels like all of my yields have been replaced with yield froms for no > obvious reason (well, there are reasons, I'm just trying to look at this > naively). > > I guess Twisted gets away with this because of deferred chaining: that one > deferred might have tons of callbacks in the background, many of which also > doing IO operations, resulting in a sequence of asynchronous operations that > only at the end cause the generator to be run some more. > > I guess that belongs in a different thread, though. Even, then, I'm not sure > if I'm uncomfortable because I'm seeing something different from what I'm > used to, or if my argument from English actually makes any sense whatsoever. > > Speaking of protocol tests, what would those look like? How do I yell, say, > "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock > transport, and call the handler with that? (I realize it's early days to be > thinking that far ahead; I'm just trying to figure out how I can contribute > a good protocol definition to all of this). > > cheers > lvh > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From guido at python.org Tue Oct 30 15:02:54 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 07:02:54 -0700 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: Steve, I don't want to beat around the bush, I think your approach is too slow. In may situations I would be guilty of premature optimization saying this, but (a) the whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b) the time difference is rather marked. I wrote a simple program for each version (attached) that times a simple double-recursive function, where each recursive level uses yield. With a depth of 20, wattle takes about 24 seconds on my MacBook Pro. And the same problem in tulip takes 0.7 seconds! That's close to two orders of magnitude. Now, this demo is obviously geared towards showing the pure overhead of the "one future per level" approach compared to "pure yield from". But that's what you're proposing. And I think allowing the user to mix yield and yield from is just too risky. (I got rid of block_r/w() + bare yield as a public API from tulip -- that API is now wrapped up in a generator too. And I can do that without feeling guilty knowing that an extra level of generators costs me almost nothing. Debugging experience: I made the same mistake in each program (I guess I copied it over before fixing the bug :-), which caused an AttributeError to happen at the time.time() call. In both frameworks this was baffling, because it caused the program to exit immediately without any output. So on this count we're even. :-) I have to think more about what I'd like to borrow from wattle -- I agree that it's nice to mark up async functions with a decorator (it just shouldn't affect call speed), I like being able to start a task with a single call. Probably more, but my family is calling me to get out of bed. :-) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- A non-text attachment was scrubbed... Name: tulip_bench.py Type: application/octet-stream Size: 624 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: wattle_bench.py Type: application/octet-stream Size: 572 bytes Desc: not available URL: From guido at python.org Tue Oct 30 15:52:51 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 07:52:51 -0700 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov wrote: > Couple of things I like already: > > 1) Use of 'yield from' is completely optional That's actually my biggest gripe... > 2) @async decorator. That makes coroutines more visible and > allows to add extra methods to them. Yes on marking them more visibly. No on wrapping each call into an object that slows down the invocation. > 3) Tight control over coroutines execution, something that > is completely missing when you use yield-from. This I don't understand. What do you mean by "tight control"? And why would you want it? > I dislike the choice of name for 'async', though. Since > @async-decorated functions are going to be yielded most of the > time (yield makes them "sync" in that context), I'd stick to > plain @coroutine. Hm. I think of it this way: the "async" (or whatever) function *is* asynchronous, and just calling it does *not* block. However if you then *yield* (or in my tulip proposal *yield from*) it, that suspends the current task until the asyc function completes, giving the *illusion* of synchronicity or blocking. (I have to admit I was confused by a comment in Steve's example code saying "does not block" on a line containing a yield, where I have been used to think of such lines as blocking.) > P.S. If this approach is viable (optional yield-from, required > @async-or-something decorator), I can invest some time and > open source the core of my framework (one benefit is that it > has lots and lots of unit-tests). Just open-sourcing the tests would already be useful!! -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Tue Oct 30 16:57:39 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 15:57:39 +0000 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: Guido van Rossum wrote: > I don't want to beat around the bush, I think your approach is too slow. In may > situations I would be guilty of premature optimization saying this, but (a) the > whole *point* of async I/O is to be blindingly fast (the C10K problem), and (b) > the time difference is rather marked. > > I wrote a simple program for each version (attached) that times a simple > double-recursive function, where each recursive level uses yield. > > With a depth of 20, wattle takes about 24 seconds on my MacBook Pro. > And the same problem in tulip takes 0.7 seconds! That's close to two orders of > magnitude. Now, this demo is obviously geared towards showing the pure overhead > of the "one future per level" approach compared to "pure yield from". But that's > what you're proposing. I get similar results on my machine with those benchmarks, though the difference was not so significant with my own (100 connections x 100 messages to SocketSpam.py - I included SocketSpamStress.py). The only time there was more than about 5% difference was when the 'yield from' case was behaving completely differently (each connection's routine was not interleaving with the others - my own bug, which I fixed). Choice of scheduler makes a difference as well. Using my UnthreadedSocketScheduler() instead of SingleThreadedScheduler() halves the time taken, and just using "main(depth).result()" reduces that by about 10% again. It still is not directly comparable to tulip, but there are ways to make them equivalent (discussed below). > And I think allowing the user to mix yield and yield from is just too risky. The errors involved when you get yield and yield from confused are quite clear in this case. However, if you use 'yield' instead of 'yield from' in tulip, you simply don't ever run that function. Maybe this will give you an error further down the track, but it won't be as immediate. On the other hand, if you're really after extreme performance (*cough*use C*cough* :) ) we can easily add an "__unwrapped__" attribute to @async that provides access to the internal generator, which you can then 'yield from' from: @async def binary(n): if n <= 0: return 1 l = yield from binary.__unwrapped__(n-1) r = yield from binary.__unwrapped__(n-1) return l + 1 + r With this change the performance is within 5% of tulip (most times are up to 5% slower, but some are faster - I'd say margin of error), regardless of the scheduler. (I've no doubt this could be improved further by modifying _Awaiter and Future to reduce the amount of memory allocations, and a super optimized library could use C implementations that still fit the API and work with existing code.) I much prefer treating 'yield from __unwrapped__' as an advanced case, so I'm all for providing ways to optimize async code where necessary, but when I think about how I'd teach this to a class of undergraduates I'd much rather have the simpler @async/yield rule (which doesn't even require an understanding of generators). For me, "get it to work" and "get it to work, fast" comes well before "get it to work fast". > (I got rid of block_r/w() + bare yield as a public API from tulip -- that API is > now wrapped up in a generator too. And I can do that without feeling guilty > knowing that an extra level of generators costs me almost nothing. I don't feel particularly guilty about the extra level... if the operations you're blocking on are that much quicker than the overhead then you probably don't need to block. I'm pretty certain that even with multiple network cards you'll still suffer from bus contention before suffering from generator overhead. > Debugging experience: I made the same mistake in each program (I guess I copied > it over before fixing the bug :-), which caused an AttributeError to happen at > the time.time() call. In both frameworks this was baffling, because it caused > the program to exit immediately without any output. So on this count we're even. > :-) This is my traceback once I misspell time(): ...>c:\Python33_x64\python.exe wattle_bench.py Traceback (most recent call last): File "wattle_bench.py", line 27, in SingleThreadScheduler().run(main, depth=depth) File "SingleThreadScheduler.py", line 106, in run raise self._exit_exception File "scheduler.py", line 171, in _step next_future = self.generator.send(result) File "wattle_bench.py", line 22, in main t1 = time.tme() AttributeError: 'module' object has no attribute 'tme' Of course, if you do call an @async function and don't yield (or call result()) then you won't ever see an exception. I don't think there's any nice way to propagate these automatically (except maybe through a finalizer... not so keen on that). You can do 'op.add_done_callback(Future.result)' to force the error to be raised somewhere (or better yet, pass it to a logger - this is why we allow multiple callbacks, after all). > I have to think more about what I'd like to borrow from wattle -- I agree that > it's nice to mark up async functions with a decorator (it just shouldn't affect > call speed), I like being able to start a task with a single call. You'll probably find (as I did in my early work) that starting the task in the initial call doesn't work with yield from. Because it does the first next() call, you can't send results/exceptions back in. If all the yields (at the deepest level) are blank, this might be okay, but it caused me issues when I was yielding objects to wait for. I'm also interested in your thoughts on get_future_for(), since that seems to be one of the more unorthodox ideas of wattle. I can clearly see how it works, but I have no idea whether I've expressed it well in the description. Cheers, Steve From yselivanov.ml at gmail.com Tue Oct 30 17:08:39 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 30 Oct 2012 12:08:39 -0400 Subject: [Python-ideas] Async API: some more code to review In-Reply-To: References: Message-ID: <12BC7B1A-0E3B-4F2E-B38E-9B701AC360AD@gmail.com> Guido, Well, with such a jaw dropping benchmarks results there is no point in discussion whether it's better to use yield-froms or yields+promises. But let me also share results of my framework: - Plain coroutines - 24.4 - Coroutines + greenlets - 34.5 - Coroutines + greenlets + many cython optimizations: 4.79 (still too slow) Now with dynamically replacing (opcodes magic) 'yield' with 'yield_' to entirely avoid generators and some other optimizations I believe it's possible to speed it up even further, probably to times below 1 second. But, again, the price of not using yield-froms is too high (and I don't even mention hard-to-fix tracebacks when you use just yields) On 2012-10-30, at 10:52 AM, Guido van Rossum wrote: > On Mon, Oct 29, 2012 at 7:29 PM, Yury Selivanov wrote: >> Couple of things I like already: >> >> 1) Use of 'yield from' is completely optional > > That's actually my biggest gripe... Yes, let's use just one thing everywhere. >> 2) @async decorator. That makes coroutines more visible and >> allows to add extra methods to them. > > Yes on marking them more visibly. No on wrapping each call into an > object that slows down the invocation. > >> 3) Tight control over coroutines execution, something that >> is completely missing when you use yield-from. > > This I don't understand. What do you mean by "tight control"? And why > would you want it? Actually, if we make decorating coroutines with @coro-like decorator strongly recommended (or even required) I can get that tight-control thing. It gives you the following: - Breakdown profiling results by individual coroutines - Blocking code detection - Hacks to protect finally statements, modify your coroutines internals, etc / probably I'm the only one in the world who need this :( - Better debugging (just logging individual coroutines sometimes helps) And decorator makes code more future-proof as well. Who knows what kind of instruments you need later. >> I dislike the choice of name for 'async', though. Since >> @async-decorated functions are going to be yielded most of the >> time (yield makes them "sync" in that context), I'd stick to >> plain @coroutine. > > Hm. I think of it this way: the "async" (or whatever) function *is* > asynchronous, and just calling it does *not* block. However if you > then *yield* (or in my tulip proposal *yield from*) it, that suspends > the current task until the asyc function completes, giving the > *illusion* of synchronicity or blocking. (I have to admit I was > confused by a comment in Steve's example code saying "does not block" > on a line containing a yield, where I have been used to think of such > lines as blocking.) "*illusion* of synchronicity or blocking" -- that's precisely the reason I don't like '@async' used together with yields. >> P.S. If this approach is viable (optional yield-from, required >> @async-or-something decorator), I can invest some time and >> open source the core of my framework (one benefit is that it >> has lots and lots of unit-tests). > > Just open-sourcing the tests would already be useful!! When the tulip is ready I simply start integrating them. - Yury From kristjan at ccpgames.com Tue Oct 30 17:05:45 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 30 Oct 2012 16:05:45 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: > -----Original Message----- > From: Python-ideas [mailto:python-ideas- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Guido van > Rossum > Sent: 29. okt?ber 2012 16:35 > To: Richard Oudkerk > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] Async API: some code to review > > It is a common pattern to have multiple threads/processes trying to > > accept connections on an single listening socket, so it would be > > unfortunate to disallow that. > > Ah, but that will work -- each thread has its own pollster, event loop and > scheduler and collection of tasks. And listening on a socket is a pretty special > case anyway -- I imagine we'd build a special API just for that purpose. > I don't think he meant actual "threads" but rather thread in the context of coroutines. in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal. We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug. K From Steve.Dower at microsoft.com Tue Oct 30 17:27:37 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 16:27:37 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: <508F6546.2010301@canterbury.ac.nz> References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6546.2010301@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Steve Dower wrote: >> For example, (library) code that needs a socket to be ready can ask >> the current scheduler if it can do "select([sock], [], [])", > > I think you're mixing up the scheduler and event loop layers here. If the scheduler > is involved in this at all, it would only be to pass the request on to the event loop. Could you clarify for me what goes into each layer? I've been treating "scheduler" and "event loop" as more-or-less synonyms (I see an event loop as one possible implementation of a scheduler). If you consider the scheduler to be the part that calls __next__() on the generator and sets up callbacks, that is implemented in my _Awaiter class, and should never need to be touched. Possibly the difference in terminology comes out because I'm not treating I/O specially? As far as wattle is concerned, I/O is just another operation that will eventually call Future.set_result(). I've tried to capture this in my write-up: https://bitbucket.org/stevedower/wattle/wiki/Proposal Cheers, Steve From Steve.Dower at microsoft.com Tue Oct 30 17:32:19 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Tue, 30 Oct 2012 16:32:19 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: <508F6144.2040105@canterbury.ac.nz> References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: Greg Ewing wrote: > Steve Dower wrote: > >> From my point of view, IOCP fits in very well provided the callbacks >> (which will run in the IOCP thread pool) are only used to unblock tasks. > > Is it really necessary to have a separate thread just to handle unblocking tasks? > That thread will have very little to do, so it could just as well run the tasks too, > couldn't it? In the C10k problem (which seems to keep coming up as our "goal") that thread will have a lot to do. I would expect that most actual users of this API could keep running on that thread without issue, but since it is OS managed and belongs to a pool, the chances of deadlocking are much higher than on a 'real' CPU thread. Limiting its work to unblocking at least prevents the end developer from having to worry about this. Cheers, Steve From guido at python.org Tue Oct 30 17:40:18 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 09:40:18 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: [Richard Oudkerk (?)] >> > It is a common pattern to have multiple threads/processes trying to >> > accept connections on an single listening socket, so it would be >> > unfortunate to disallow that. [Guido] >> Ah, but that will work -- each thread has its own pollster, event loop and >> scheduler and collection of tasks. And listening on a socket is a pretty special >> case anyway -- I imagine we'd build a special API just for that purpose. On Tue, Oct 30, 2012 at 9:05 AM, Kristj?n Valur J?nsson wrote: > I don't think he meant actual "threads" but rather thread in the context of coroutines. (Yes, we figured that out already. :-) > in StacklessIO (our custom sockets lib for stackless) multiple tasklets can have an "accept" pending on a socket, so that when multiple connections arrive, wakeup time is minimal. What kind of time savings are we talking about? I imagine that the accept() loop I put in tulip/echosvr.py is fast enough in terms of response time (latency) -- throughput would seem the more important measure (and I have no idea of this yet). http://code.google.com/p/tulip/source/browse/echosvr.py#37 > We have also been careful to allow multiple operations on sockets, from different tasklets, although the same caveats apply as when multiple threads perform operations, i.e. no guarantees about it making any sense. The important bit is that when such things happen, you get some defined result, rather than for example a tasklet being infinitely blocked. Such errors are suprising and hard to debug. That's a good point. It should either cause an immediate, clear exception, or interleave the data without compromising integrity of the scheduler or the app. -- --Guido van Rossum (python.org/~guido) From kristjan at ccpgames.com Tue Oct 30 17:11:40 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Tue, 30 Oct 2012 16:11:40 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: <508F6144.2040105@canterbury.ac.nz> References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: > -----Original Message----- > From: Python-ideas [mailto:python-ideas- > bounces+kristjan=ccpgames.com at python.org] On Behalf Of Greg Ewing > Sent: 30. okt?ber 2012 05:10 > To: python-ideas at python.org > Subject: Re: [Python-ideas] non-blocking buffered I/O wrote: > > > From my point of view, IOCP fits in very well provided the callbacks > > (which will run in the IOCP thread pool) are only used to unblock tasks. > > Is it really necessary to have a separate thread just to handle unblocking > tasks? That thread will have very little to do, so it could just as well run the > tasks too, couldn't it? StacklessIO (which is an IOCP implementation for stackless) uses callbacks on an arbitrary thread (in practice a worker thread from window's own threadpool that it keeps for such things) to unblock tasklets. You don't want to do any significant work on such a thread because it is used for other stuff by the system. By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient. K From guido at python.org Tue Oct 30 18:34:12 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 10:34:12 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Tue, Oct 30, 2012 at 3:12 AM, Laurens Van Houtven <_ at lvh.cc> wrote: > I've been following the PEP380-related threads and I've reviewed this stuff, > while trying to do the protocols/transports PEP, and trying to glue the two > together. Thanks! I know it can't be easy to keep up with all the threads (and now code repos). > The biggest difference I can see is that protocols as they've been discussed > are "pull": they get called when some data arrives. They don't know how much > data there is; they just get told "here's some data". The obvious difference > with the API in, eg: > > https://code.google.com/p/tulip/source/browse/sockets.py#56 > > ... is that now I have to tell a socket to read n bytes, which "blocks" the > coroutine, then I get some data. Yes. But do note that sockets.py is mostly a throw-away example written to support the only style I am familiar with -- synchronous reads and writes. My point in writing this particular set of transports is that I want to take existing synchronous code (e.g. a threaded server built using the stdlib's socketserver.ThreadingTCPServer class) and make minimal changes to the protocol logic to support async operation -- those minimal changes should boil down to using a different way to set up a connection or a listening socket or constructing a stream from a socket, and putting "yield from" in front of the blocking operations (recv(), send(), and the read/readline/write operations on the streams. I'm still looking for guidance from Twisted and Tornado (and you!) to come up with better abstractions for transports and protocols. The underlying event loop *does* support a style where an object registers a callback function once which is called repeatedly, as long as the socket is readable (or writable, depending on the registration call). > Now, there doesn't have to be an issue; you could simply say: > > data = yield from s.recv(4096) # that's the magic number usually right > proto.data_received(4096) (Off-topic: ages ago I determined that the optimal block size is actually 8192. But for all I know it is 256K these days. :-) > It seems a bit boilerplatey, but I suppose that eventually could be hidden > away. > > But this style is pervasive, for example that's how reading by lines works: > > https://code.google.com/p/tulip/source/browse/echosvr.py#20 Right -- again, this is all geared towards making it palatable for people used to write synchronous code (either single-threaded or multi-threaded), not for people used to Twisted. > While I'm not a big fan (I may be convinced if I see a protocol test that > looks nice); Check out urlfetch() in main.py: http://code.google.com/p/tulip/source/browse/main.py#39 For sure, this isn't "pretty" and it should be rewritten using more abstraction -- I only wrote the entire thing as a single function because I was focused on the scheduler and event loop. And it is clearly missing a buffering layer for writing (it currently uses a separate send() call for each line of the HTTP headers, blech). But it implements a fairly complex (?) protocol and it performs well enough. > I'm just wondering if there's any point in trying to write the > pull-style protocols when this works quite differently. Perhaps you could try to write some pull-style transports and protocols for tulip to see if anything's missing from the scheduler and eventloop APIs or implementations? I'd be happy to rename sockets.py to push_sockets.py so there's room for a competing pull_sockets.py, and then we can compare apples to apples. (Unlike the yield vs. yield-from issue, where I am very biased, I am not biased about push vs. pull style. I just coded up what I was most familiar with first.) > Additionally, I'm not sure if readline belongs on the socket. It isn't -- it is on the BufferedReader, which wraps around the socket (or other socket-like transport, like SSL). This is similar to the way the stdlib socket.socket class has a makefile() method that returns a stream wrapping the socket. > I understand the simile with files, though. Right, that's where I've gotten most of my inspiration. I figure they are a good model to lure unsuspecting regular Python users in. :-) > With the coroutine style I could see how the > most obvious fit would be something like tornado's read_until, or an > as_lines that essentially calls read_until repeatedly. Can the delimiter for > this be modified? You can write your own BufferedReader, and if this is a common pattern we can make it a standard API. Unlike the SocketTransport and SslTransport classes, which contain various I/O hacks and integrate tightly with the polling capability of the eventloop, I consider BufferedReader plain user code. Antoine also hinted that with not too many changes we could reuse the existing buffering classes in the stdlib io module, which are implemented in C. > My main syntactic gripe is that when I write @inlineCallbacks code or > monocle code or whatever, when I say "yield" I'm yielding to the reactor. > That makes sense to me (I realize natural language arguments don't always > make sense in a programming language context). "yield from" less so (but > okay, that's what it has to look like). But this just seems weird to me: > > yield from trans.send(line.upper()) > > Not only do I not understand why I'm yielding there in the first place (I > don't have to wait for anything, I just want to push some data out!), it > feels like all of my yields have been replaced with yield froms for no > obvious reason (well, there are reasons, I'm just trying to look at this > naively). Are you talking about yield vs. yield-from here, or about the need to suspend every write? Regarding yield vs. yield-from, please squint and get used to seeing yield-from everywhere -- the scheduler implementation becomes *much* simpler and *much* faster using yield-from, so much so that there really is no competition. As to why you would have to suspend each time you call send(), that's mostly just an artefact of the incomplete example -- I didn't implement a BufferedWriter yet. I also have some worries about a task producing data at a rate faster than the socket can drain it from the buffer, but in practice I would probably relent and implement a write() call that returns immediately and should *not* be used with yield-from. (Unfortunately you can't have a call that works with or without yield-from.) I think there's a throttling mechanism in Twisted that can probably be copied here. > I guess Twisted gets away with this because of deferred chaining: that one > deferred might have tons of callbacks in the background, many of which also > doing IO operations, resulting in a sequence of asynchronous operations that > only at the end cause the generator to be run some more. > > I guess that belongs in a different thread, though. Even, then, I'm not sure > if I'm uncomfortable because I'm seeing something different from what I'm > used to, or if my argument from English actually makes any sense whatsoever. > > Speaking of protocol tests, what would those look like? How do I yell, say, > "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock > transport, and call the handler with that? (I realize it's early days to be > thinking that far ahead; I'm just trying to figure out how I can contribute > a good protocol definition to all of this). Actually I think the ease of writing tests should definitely be taken into account when designing the APIs here. In the Zope world, Jim Fulton wrote a simple abstraction for networking code that explicitly provides for testing: http://packages.python.org/zc.ngi/ (it also supports yield-style callbacks, similar to Twisted's inlineCallbacks). I currently don't have any tests, apart from manually running main.py and checking its output. I am a bit hesitant to add unit tests in this early stage, because keeping the tests passing inevitably slows down the process of ripping apart the API and rebuilding it in a different way -- something I do at least once a day, whenever I get feedback or a clever thought strikes me or something annoying reaches my trigger level. But I should probably write at least *some* tests, I'm sure it will be enlightening and I will end up changing the APIs to make testing easier. It's in the TODO. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Oct 30 18:47:24 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 10:47:24 -0700 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson wrote: > By the way: We found that acquiring the GIL by a random external thread in response to the IOCP to wake up tasklets was incredibly expensive. I spent a lot of effort figuring out why that is and found no real answer. The mechanism we now use is to let the external worker thread schedule a "pending call" which is serviced by the main thread at the earliest opportunity. Also, the main thread is interrupted if it is doing a sleep. This is much more efficient. In which Python version? The GIL has been redesigned at least once. Also the latency (not necessarily cost) to acquire the GIL varies by the sys.setswitchinterval setting. (Actually the more responsive you make it, the more it will cost you in overall performance.) I do think that using the pending call mechanism is the right solution here. -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Tue Oct 30 18:50:53 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 30 Oct 2012 17:50:53 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On 30/10/2012 4:40pm, Guido van Rossum wrote: > What kind of time savings are we talking about? I imagine that the > accept() loop I put in tulip/echosvr.py is fast enough in terms of > response time (latency) -- throughput would seem the more important > measure (and I have no idea of this yet). > http://code.google.com/p/tulip/source/browse/echosvr.py#37 With Windows overlapped I/O I think you can get substantially better throughput by starting many AcceptEx() calls in parallel. (For bonus points you can also recycle the accepted connections using DisconnectEx().) Even so, Windows socket code always seems to be much slower than the equivalent on Linux. -- Richard From yselivanov.ml at gmail.com Tue Oct 30 18:52:54 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 30 Oct 2012 13:52:54 -0400 Subject: [Python-ideas] Async API In-Reply-To: <508F6B45.7010300@canterbury.ac.nz> References: <71C95BE8-C0C7-45F5-A5E9-45A86EB3265B@gmail.com> <72ACCA97-8FF8-4F6B-85B7-7DB9CDCB6DBC@gmail.com> <31A560E1-AF1A-437A-B024-5AF637EF3F35@gmail.com> <5F51531B-68BF-44D0-AF82-BD8A6ED7DC0C@gmail.com> <7789C8EE-2002-4BF6-A617-88A8EA0DB646@gmail.com> <508F6B45.7010300@canterbury.ac.nz> Message-ID: <37B98D1E-5508-4843-815C-57CE24B03843@gmail.com> On 2012-10-30, at 1:53 AM, Greg Ewing wrote: > Yury Selivanov wrote: > >> So in your example scheduler would never ever has a question of interrupting c2(), because it wasn't called with any restriction/timeout. >> There simply no reason to interrupt it ever. > > But there's nothing to stop someone writing > > def c3(): > try: > yield from with_timeout(10.0, c1()) > except TimeoutError: > print("That's cool, I can cope with that") > > Also, it's not just TimeoutErrors that are a potential > problem, it's any asynchronous exception. For example, > the task calling c1() might get cancelled by another > task while c2() is blocked. If cancelling is implemented > by throwing in an exception, you have the same problem. > >> Then you need scheduler to know if it is in its finally or not. Because its >> c2() which was run with a timeout. It's c2() code that may be subject to >> aborting. > > I'm really not following your reasoning here. You seem to > be speaking as if with_timeout() calls only have an effect > one level deep. But that's not the case -- the frame that a > TimeoutError gets thrown into by with_timeout() can be > nested any number of yield-from calls deep. Greg, Looks like I'm failing to explain my point of view (which is maybe wrong). The problem is tough, and without a shared code to debug and test ideas on it's just hard to communicate. Let's get back to this issue once we have a framework/library to work on. Thanks, Yury From guido at python.org Tue Oct 30 19:10:10 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 11:10:10 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Tue, Oct 30, 2012 at 10:50 AM, Richard Oudkerk wrote: > On 30/10/2012 4:40pm, Guido van Rossum wrote: >> >> What kind of time savings are we talking about? I imagine that the >> accept() loop I put in tulip/echosvr.py is fast enough in terms of >> response time (latency) -- throughput would seem the more important >> measure (and I have no idea of this yet). >> http://code.google.com/p/tulip/source/browse/echosvr.py#37 > With Windows overlapped I/O I think you can get substantially better > throughput by starting many AcceptEx() calls in parallel. (For bonus points > you can also recycle the accepted connections using DisconnectEx().) Hm... I already have on my list that the transports should probably be platform dependent. So this would suggest that the standard accept loop should be abstracted as a method on the transport object, right? > Even so, Windows socket code always seems to be much slower than the > equivalent on Linux. Is this Python sockets code or are you also talking about other languages, like C++? -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Tue Oct 30 20:31:01 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Oct 2012 20:31:01 +0100 Subject: [Python-ideas] Async API: some code to review References: Message-ID: <20121030203101.57904ac6@pitrou.net> On Tue, 30 Oct 2012 10:34:12 -0700 Guido van Rossum wrote: > > > > Speaking of protocol tests, what would those look like? How do I yell, say, > > "POST /blah HTTP/1.1\r\n" from a transport? Presumably I'd have a mock > > transport, and call the handler with that? (I realize it's early days to be > > thinking that far ahead; I'm just trying to figure out how I can contribute > > a good protocol definition to all of this). > > Actually I think the ease of writing tests should definitely be taken > into account when designing the APIs here. +11 ! Regards Antoine. From guido at python.org Tue Oct 30 21:24:23 2012 From: guido at python.org (Guido van Rossum) Date: Tue, 30 Oct 2012 13:24:23 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <50902E95.4070907@gmail.com> References: <50902E95.4070907@gmail.com> Message-ID: On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk wrote: > The difference in speed between AF_INET sockets and pipes on Windows is much > bigger than the difference between AF_INET sockets and pipes on Unix. > > (Who knows, maybe it is just my firewall which is causing the slowdown...) Here's another unscientific benchmark: I wrote a stupid "http" server (stupider than echosvr.py actually) that accepts HTTP requests and responds with the shortest possible "200 Ok" response. This should provide an adequate benchmark of how fast the event loop, scheduler, and transport are at accepting and closing connections (and reading and writing small amounts). On my linux box at work, over localhost, it seems I can handle 10K requests (sent using 'ab' over localhost) in 1.6 seconds. Is that good or bad? The box has insane amounts of memory and 12 cores (?) and rates at around 115K pystones. (I tried to repro this on my Mac, but I am running into problems, perhaps due to system limits.) -- --Guido van Rossum (python.org/~guido) From carlopires at gmail.com Tue Oct 30 21:33:12 2012 From: carlopires at gmail.com (Carlo Pires) Date: Tue, 30 Oct 2012 18:33:12 -0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <50902E95.4070907@gmail.com> Message-ID: 2012/10/30 Guido van Rossum > > Here's another unscientific benchmark: I wrote a stupid "http" server > (stupider than echosvr.py actually) that accepts HTTP requests and > responds with the shortest possible "200 Ok" response. This should > provide an adequate benchmark of how fast the event loop, scheduler, > and transport are at accepting and closing connections (and reading > and writing small amounts). On my linux box at work, over localhost, > it seems I can handle 10K requests (sent using 'ab' over localhost) in > 1.6 seconds. Is that good or bad? The box has insane amounts of memory > and 12 cores (?) and rates at around 115K pystones. > Take a look at http://nichol.as/benchmark-of-python-web-servers It is a bit outdated but can be useful to get some insight. -- Carlo Pires -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Oct 30 21:37:44 2012 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 31 Oct 2012 09:37:44 +1300 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <50903A98.2080402@canterbury.ac.nz> Kristj?n Valur J?nsson wrote: > in StacklessIO (our custom sockets lib for stackless) multiple tasklets can > have an "accept" pending on a socket, so that when multiple connections arrive, > wakeup time is minimal. With sufficiently cheap tasks, there's another way to approach this: one task is dedicated to accepting connections from the socket, and it spawns a new task to handle each connection. -- Greg From solipsis at pitrou.net Tue Oct 30 22:30:20 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 30 Oct 2012 22:30:20 +0100 Subject: [Python-ideas] Async API: some code to review References: <50902E95.4070907@gmail.com> Message-ID: <20121030223020.4f89ab1f@pitrou.net> On Tue, 30 Oct 2012 13:24:23 -0700 Guido van Rossum wrote: > On Tue, Oct 30, 2012 at 12:46 PM, Richard Oudkerk wrote: > > The difference in speed between AF_INET sockets and pipes on Windows is much > > bigger than the difference between AF_INET sockets and pipes on Unix. > > > > (Who knows, maybe it is just my firewall which is causing the slowdown...) > > Here's another unscientific benchmark: I wrote a stupid "http" server > (stupider than echosvr.py actually) that accepts HTTP requests and > responds with the shortest possible "200 Ok" response. This should > provide an adequate benchmark of how fast the event loop, scheduler, > and transport are at accepting and closing connections (and reading > and writing small amounts). On my linux box at work, over localhost, > it seems I can handle 10K requests (sent using 'ab' over localhost) in > 1.6 seconds. Is that good or bad? The box has insane amounts of memory > and 12 cores (?) and rates at around 115K pystones. It sounds both good and useless to me :) Regards Antoine. From paul at colomiets.name Tue Oct 30 22:45:46 2012 From: paul at colomiets.name (Paul Colomiets) Date: Tue, 30 Oct 2012 23:45:46 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Hi Richard, On Mon, Oct 29, 2012 at 3:13 PM, Richard Oudkerk wrote: > On 28/10/2012 11:52pm, Guido van Rossum wrote: >> >> I'm most interested in feedback on the design of polling.py and >> scheduling.py, and to a lesser extent on the design of sockets.py; >> main.py is just an example of how this style works out in practice. > > > What happens if two tasks try to do a read op (or two tasks try to do a > write op) on the same file descriptor? It looks like the second one to do > scheduling.block_r(fd) will cause the first task to be forgotten, causing > the first task to block forever. > > Shouldn't there be a list of pending readers and a list of pending writers > for each fd? > There is another approach to handle this. You create a dedicated coroutine which does writing (or reading). And if other coroutine needs to write, it puts data into a queue (or channel), and wait until writer coroutine picks it up. This way you don't care about atomicity of writes, and a lot of other things. This approach is similar to what Greg Ewing proposed for handling accept() recently. -- Paul From shibturn at gmail.com Tue Oct 30 23:01:19 2012 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 30 Oct 2012 22:01:19 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <50902E95.4070907@gmail.com> Message-ID: On 30/10/2012 8:24pm, Guido van Rossum wrote: > Here's another unscientific benchmark: I wrote a stupid "http" server > (stupider than echosvr.py actually) that accepts HTTP requests and > responds with the shortest possible "200 Ok" response. This should > provide an adequate benchmark of how fast the event loop, scheduler, > and transport are at accepting and closing connections (and reading > and writing small amounts). On my linux box at work, over localhost, > it seems I can handle 10K requests (sent using 'ab' over localhost) in > 1.6 seconds. Is that good or bad? The box has insane amounts of memory > and 12 cores (?) and rates at around 115K pystones. I tried the simple single threaded benchmark below on my laptop. | Connections/sec ---------------------------------------+----------------- Linux | 6000-11000 Linux in a VM (with 1 cpu assigned) | 4600 Windows | 1400 On Windows this sometimes failed with: OSError: [WinError 10055] An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full import socket, time, sys, argparse N = 10000 def server(): l = socket.socket() l.bind(('127.0.0.1', 0)) l.listen(100) print('listening on port', l.getsockname()[1]) while True: a, _ = l.accept() data = a.recv(20) a.sendall(data.upper()) a.close() def client(port): start = time.time() for i in range(N): with socket.socket() as c: c.connect(('127.0.0.1', port)) c.sendall(b'foo') res = c.recv(20) assert res == b'FOO' c.close() elapsed = time.time() - start print("elapsed=%s, connections/sec=%s" % (elapsed, N/elapsed)) parser = argparse.ArgumentParser() parser.add_argument('--port', type=int, default=None, help='port to connect to') args = parser.parse_args() if args.port is not None: client(args.port) else: server() -- Richard From grosser.meister.morti at gmx.net Wed Oct 31 06:02:52 2012 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Wed, 31 Oct 2012 06:02:52 +0100 Subject: [Python-ideas] Support data: URLs in urllib Message-ID: <5090B0FC.1030801@gmx.net> Sometimes it would be handy to read data:-urls just like any other url. While it is pretty easy to parse a data: url yourself I think it would be nice if urllib could do this for you. Example data url parser: >>> import base64 >>> import urllib.parse >>> >>> def read_data_url(url): >>> scheme, data = url.split(":") >>> assert scheme == "data", "unsupported scheme: "+scheme >>> mimetype, data = data.split(",") >>> if mimetype.endswith(";base64"): >>> return mimetype[:-7] or None, base64.b64decode(data.encode("UTF-8")) >>> else: >>> return mimetype or None, urllib.parse.unquote(data).encode("UTF-8") See also: http://tools.ietf.org/html/rfc2397 -panzi From rene at stranden.com Wed Oct 31 07:16:47 2012 From: rene at stranden.com (Rene Nejsum) Date: Wed, 31 Oct 2012 07:16:47 +0100 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: > > There is another approach to handle this. You create a dedicated > coroutine which does writing (or reading). And if other coroutine > needs to write, it puts data into a queue (or channel), and wait until > writer coroutine picks it up. This way you don't care about atomicity > of writes, and a lot of other things. I support this idea, IMHO it's by far the easiest (or least problematic) way to handle the complexity of concurrency. What's the general position on monkey patching existing libs ? This might not be possible with the above ? /rene > > This approach is similar to what Greg Ewing proposed for handling > accept() recently. > > -- > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From p.f.moore at gmail.com Wed Oct 31 08:54:29 2012 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 31 Oct 2012 07:54:29 +0000 Subject: [Python-ideas] Support data: URLs in urllib In-Reply-To: <5090B0FC.1030801@gmx.net> References: <5090B0FC.1030801@gmx.net> Message-ID: On Wednesday, 31 October 2012, Mathias Panzenb?ck wrote: > Sometimes it would be handy to read data:-urls just like any other url. > While it is pretty easy to parse a data: url yourself I think it would be > nice if urllib could do this for you. > > Example data url parser: > [...] IIUC, this should be possible with a custom opener. While it might be nice to have this in the stdlib, it would also be a really useful recipe to have in the docs, showing how to create and install a simple custom opener into the default set of openers (so that urllib.request gains the ability to handle data rules automatically). Would you be willing to submit a doc patch to cover this? Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: From kristjan at ccpgames.com Wed Oct 31 10:29:43 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 31 Oct 2012 09:29:43 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf > Of Guido van Rossum > Sent: 30. okt?ber 2012 17:47 > To: Kristj?n Valur J?nsson > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] non-blocking buffered I/O > > On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson > wrote: > > By the way: We found that acquiring the GIL by a random external thread > in response to the IOCP to wake up tasklets was incredibly expensive. I > spent a lot of effort figuring out why that is and found no real answer. The > mechanism we now use is to let the external worker thread schedule a > "pending call" which is serviced by the main thread at the earliest > opportunity. Also, the main thread is interrupted if it is doing a sleep. This is > much more efficient. > > In which Python version? The GIL has been redesigned at least once. > Also the latency (not necessarily cost) to acquire the GIL varies by the > sys.setswitchinterval setting. (Actually the more responsive you make it, the > more it will cost you in overall performance.) > > I do think that using the pending call mechanism is the right solution here. I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :) Anyway I don't think the issue is much affected by the particular GIL implementation. Alternative a) Callback comes on arbitrary thread arbitrary thread calls PyGILState_Ensure (This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired) arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet arbitrary thread calls PyGILState_Release For whatever reason, this approach _increased CPU usage_ on a loaded server. Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok. I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur. I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere. This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu. Alternative b) Callback comes on arbitrary thread external thread callse PyEval_SchedulePendingCall() This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately. external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. Main thread wakes up from its sleep (if it was sleeping). Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait. In reality, StacklessIO uses a slight variation of the above: StacklessIO dispatch system Callback comes on arbitrary thread external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread. This is protected by its own lock, and doesn't need the GIL. external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. If main thread is sleeping: Main thread wakes up from its sleep Immediately at after sleeping, the main thread will 'tick' the dispatch queue After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work. If not, it may continue sleeping. Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue. This may be a no-op if the main thread was sleeping and was already ticked. The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up. A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu. The reason I'm mentioning this here is that this is important. We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result. Perhaps things can be done to improve this. Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field. From glyph at twistedmatrix.com Wed Oct 31 11:10:18 2012 From: glyph at twistedmatrix.com (Glyph) Date: Wed, 31 Oct 2012 03:10:18 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <64FEC457-659B-47DB-BE0E-E830F61F4200@twistedmatrix.com> Finally getting around to this one... I am sorry if I'm repeating any criticism that has already been rehashed in this thread. There is really a deluge of mail here and I can't keep up with it. I've skimmed some of it and avoided or noted things that I did see mentioned, but I figured I should write up something before next week. To make a long story short, my main points here are: I think tulip unfortunately has a lot of the problems I tried to describe in earlier messages, it would be really great if we could have a core I/O interface that we could use for interoperability with Twisted before bolting a requirement for coroutine trampolines on to everything, twisted-style protocol/transport separation is really important and this should not neglect it. As I've tried to illustrate in previous messages, an API where applications have to call send() or recv() is just not going to behave intuitively in edge cases or perform well, I know it's a prototype, but this isn't such an unexplored area that it should be developed without TDD: all this code should both have tests and provide testing support to show how applications that use it can be tested the scheduler module needs some example implementation of something like Twisted's gatherResults for me to critique its expressiveness; it looks like it might be missing something in the area of one task coordinating multiple others but I can't tell On Oct 28, 2012, at 4:52 PM, Guido van Rossum wrote: > The pollster has a very simple API: add_reader(fd, callback, *args), > add_writer(), remove_reader(fd), remove_writer(fd), and > poll(timeout) -> list of events. (fd means file descriptor.) There's > also pollable() which just checks if there are any fds registered. My > implementation requires fd to be an int, but that could easily be > extended to support other types of event sources. I don't see how that is. All of the mechanisms I would leverage within Twisted to support other event sources are missing (e.g.: abstract interfaces for those event sources). Are you saying that a totally different pollster could just accept a different type to add_reader, and not an integer? If so, how would application code know how to construct something else. > I'm not super happy that I have parallel reader/writer APIs, but passing a separate read/write flag didn't come out any more elegant, and I don't foresee other operation types (though I may be wrong). add_reader and add_writer is an important internal layer of the API for UNIX-like operating systems, but the design here is fundamentally flawed in that application code (e.g. echosvr.py) needs to import concrete socket-handling classes like SocketTransport and BufferedReader in order to synthesize a transport. These classes might need to vary their behavior significantly between platforms, and application code should not be manipulating them unless there is a serious low-level need to. It looks like you've already addressed the fact that some transports need to be platform-specific. That's not quite accurate, unless you take a very broad definition of "platform". In Twisted, the basic socket-based TCP transport is actually supported across all platforms; but some other *APIs* (well, let's be honest, right now, just IOCP, but there have been others, such as java's native I/O APIs under Jython, in the past). You have to ask the "pollster" (by which I mean: reactor) for transport objects, because different multiplexing mechanisms can require different I/O APIs, even for basic socket I/O. This is why I keep talking about IOCP. It's not that Windows is particularly great, but that the IOCP API, if used correctly, is fairly alien, and is a good proxy for other use-cases which are less direct to explain, like interacting with GUI libraries where you need to interact with the GUI's notion of a socket to get notifications, rather than a raw FD. (GUI libraries often do this because they have to support Windows and therefore IOCP.) Others in this thread have already mentioned the fact that ZeroMQ requires the same sort of affordance. This is really a design error on 0MQ's part, but, you have to deal with it anyway ;-). More importantly, concretely tying everything to sockets is just bad design. You want to be able to operate on pipes and PTYs (which need to call read(), or, a bunch of gross ioctl()s and then read(), not recv()). You want to be able to able to operate on these things in unit tests without involving any actual file descriptors or syscalls. The higher level of abstraction makes regular application code a lot shorter, too: I was able to compress echosvr.py down to 22 lines by removing all the comments and logging and such, but that is still more than twice as long as the (9 line) echo server example on the front page of . It's closer in length to the (19 line) full line-based publish/subscribe protocol over on the third tab. Also, what about testing? You want to be able to simulate the order of responses of multiple syscalls to coerce your event-driven program to receive its events in different orders. One of the big advantages of event driven programming is that everything's just a method call, so your unit tests can just call the methods to deliver data to your program and see what it does, without needing to have a large, elaborate simulation edifice to pretend to be a socket. But, once you mix in the magic of the generator trampoline, it's somewhat hard to assemble your own working environment without some kind of test event source; at least, it's not clear to me how to assemble a Task without having a pollster anywhere, or how to make my own basic pollster for testing. > The event loop has two basic ways to register callbacks: > call_soon(callback, *args) causes callback(*args) to be called the > next time the event loop runs; call_later(delay, callback, *args) > schedules a callback at some time (relative or absolute) in the > future. "relative or absolute" is hiding the whole monotonic-clocks discussion behind a simple phrase, but that probably does not need to be resolved here... I'll let you know if we ever figure it out :). > sockets.py: http://code.google.com/p/tulip/source/browse/sockets.py > > This implements some internet primitives using the APIs in > scheduling.py (including block_r() and block_w()). I call them > transports but they are different from transports Twisted; they are > closer to idealized sockets. SocketTransport wraps a plain socket, > offering recv() and send() methods that must be invoked using yield > from. I feel I should note that these methods behave inconsistently; send() behaves as sendall(), re-trying its writes until it receives a full buffer, but recv() may yield a short read. (But most importantly, block_r and block_w are insufficient as primitives; you need a separate pollster that uses write_then_block(data) and read_then_block() too, which may need to dispatch to WSASend/WSARecv or WriteFile/ReadFile.) > SslTransport wraps an ssl socket (luckily in Python 2.6 and up, > stdlib ssl sockets have good async support!). stdlib ssl sockets have async support that makes a number of UNIX-y assumptions. The wrap_socket trick doesn't work with IOCP, because the I/O operations are initiated within the SSL layer, and therefore can't be associated with a completion port, so they won't cause a queued completion status trigger and therefore won't wake up the loop. This plagued us for many years within Twisted and has only relatively recently been fixed: . Since probably 99% of the people on this list don't actually give a crap about Windows, let me give a more practical example: you can't do SSL over a UNIX pipe. Off the top of my head, this means you can't write a command-line tool to encrypt a connection via a shell pipeline, but there are many other cases where you'd expect to be able to get arbitrary I/O over stdout. It's reasonable, of course, for lots of Python applications to not care about high-performance, high-concurrency SSL on Windows,; select() works okay for many applications on Windows. And most SSL happens on sockets, not pipes, hence the existence of the OpenSSL API that the stdlib ssl module exposes for wrapping sockets. But, as I'll explain in a moment, this is one reason that it's important to be able to give your code a turbo boost with Twisted (or other third-party extensions) once you start encountering problems like this. > I don't particularly care about the exact abstractions in this module; > they are convenient and I was surprised how easy it was to add SSL, > but still these mostly serve as somewhat realistic examples of how to > use scheduling.py. This is where I think we really differ. I think that the whole attempt to build a coroutine scheduler at the low level is somewhat misguided and will encourage people to write misleading, sloppy, incorrect programs that will be tricky to debug (although, to be fair, not quite as tricky as even more misleading/sloppy/incorrect multi-threaded ones). However, I'm more than happy to agree to disagree on this point: clearly you think that forests of yielding coroutines are a big part of the future of Python. Maybe you're even right to do so, since I have no interest in adding language features, whereas if you hit a rough edge in 'yield' syntax you can sand it off rather than living with it. I will readily concede that 'yield from' and 'return' are nicer than the somewhat ad-hoc idioms we ended up having to contend with in the current iteration of @inlineCallbacks. (Except for the exit-at-a-distance problem, which it doesn't seem that return->StopIteration addresses - does this happen, with PEP-380 generators? ) What I'm not happy to disagree about is the importance of a good I/O abstraction and interoperation layer. Twisted is not going away; there are oodles of good reasons that it's built the way it is, as I've tried to describe in this and other messages, and none of our plans for its future involve putting coroutine trampolines at the core of the event loop; those are just fine over on the side with inlineCallbacks. However, lots of Python programmers are going to use what you come up with. They'd use it even if it didn't really work, just because it's bundled in and it's convenient. But I think it'll probably work fine for many tasks, and it will appeal to lots of people new to event-driven I/O because of the seductive deception of synchronous control flow and the superiority to scheduling I/O operations with threads. What I think is really very important in the design of this new system is to present an API whereby: if someone wants to write a basic protocol or data-format parser for the stdlib, it should be easy to write it as a feed parser without needing generator coroutines (for example, if they're pushing data into a C library, they shouldn't have to write a while loop that calls recv, they should be able to just transform some data callback into Python into some data callback in C; it should be able to leverage tulip without much more work, if users of tulip (read; the stdlib) need access to some functionality implemented within Twisted, like an event-driven DNS client that is more scalable than getaddrinfo, they can call into Twisted without re-writing their entire program, if users of Twisted need to invoke some functionality implemented on top of tulip, they can construct a task and weave in a scheduler, similarly without re-writing much, if users of tulip want to just use Twisted to get better performance or reliability than the built-in stdlib multiplexor, they ideally shouldn't have to change anything, just run it with a different import line or something, and if (when) users of tulip realize that their generators have devolved into a mess of spaghetti ;-) and they need to migrate to Twisted-style event-driven callbacks and maybe some formal state machines or generated parsers to deal with their inputs, that process can be done incrementally and not in one giant shoot-the-moon effort which will make them hate Twisted. As an added bonus, such an API would provide a great basis for Tornado and Twisted to interoperate. It would also be nice to have a more discrete I/O layer to insulate application code from common foibles like the fact that, for example, if you call send() in tulip multiple times but forget to 'yield from ...send()', you may end up writing interleaved garbage on the connection, then raising an assertion error, but only if there's a sufficient quantity of data and it needs to block; it will otherwise appear to work, leading to bugs that only start happening when you are pushing large volumes of data through a system at rates exceeding wire speed. In other words, "only in production, only during the holiday season, only during traffic spikes, only when it's really really important for the system to keep working". This is why I think that step 1 here needs to be a common low-level API for event-triggered operations that does not have anything to do with generators. I don't want to stop you from doing interesting things with generators, but I do really want to decouple the tasks so that their responsibilities are not unnecessarily conflated. task.unblock() is a method; protocol.data_received is a method. Both can be invoked at the same level by an event loop. Once that low-level event loop is delivering data to that callback's satisfaction, the callbacks can happily drive a coroutine scheduler, and the coroutine scheduler can have much less of a deep integration with the I/O itself; it just needs some kind of sentinel object (a Future, a Deferred) to keep track of what exactly it's waiting for. > I'm most interested in feedback on the design of polling.py and > scheduling.py, and to a lesser extent on the design of sockets.py; > main.py is just an example of how this style works out in practice. It looks to me like there's a design error in scheduling.py with respect to coordinating concurrent operations. If you try to block on two operations at once, you'll get an assertion error ('assert not self.blocked', in block), so you can't coordinate two interesting I/O requests without spawning a bunch of new Tasks and then having them unblock their parent Task when they're done. I may just be failing to imagine how one would implement something like Twisted's gatherResults, but this looks like it would be frustrating, tedious, and involve creating lots of extra objects and making the scheduler do a bunch more work. Also, shouldn't there be a lot more real exceptions and a lot fewer assertions in this code? Relatedly, add_reader/writer will silently stomp on a previous FD registration, so if two tasks end up calling recv() on the same socket, it doesn't look like there's any way to find out that they both did that. It looks like the first task to call it will just hang forever, and the second one will "win"? What are the intended semantics? Speaking from the perspective of I/O scheduling, it will also be thrashing any stateful multiplexor with a ton of unnecessary syscalls. A Twisted protocol in normal operation just receiving data from a single connection, using, let's say, a kqueue-based multiplexor will call kevent() once to register interest, then kqueue() to block, and then just keep getting data-available notifications and processing them unless some downstream buffer fills up and the transport is told to pause producing data, at which point another kevent() gets issued. tulip, by contrast, will call kevent() over and over again, removing and then re-adding its reader repeatedly for every packet, since it can never know if someone is about to call recv() again any time soon. Once again, request/response is not the best model for retrieving data from a transport; active connections need to be prepared to receive more data at any time and not in response to any particular request. Finally, apologies for spelling / grammar errors; I didn't have a lot of time to copy-edit. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From glyph at twistedmatrix.com Wed Oct 31 11:20:29 2012 From: glyph at twistedmatrix.com (Glyph) Date: Wed, 31 Oct 2012 03:20:29 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <508F1E87.7000307@canterbury.ac.nz> References: <20121029170731.74bd3d37@cosmocat> <01150791-F34B-4A1A-BA93-CB7B3DC48BF7@gmail.com> <508F1E87.7000307@canterbury.ac.nz> Message-ID: On Oct 29, 2012, at 5:25 PM, Greg Ewing wrote: > Andrew Svetlov wrote: > >> 0MQ socket has no file descriptor at all, it's just pointer to some >> unspecified structure. >> So 0MQ has own *poll* function which can process that sockets as well >> as file descriptors. > > Aaargh... yet another event loop that wants to rule > the world. This is not good. As a wise man once said, "everybody wants to rule the world". All event loops have their own run() API, and expect to be on top of everything, driving the loop. This is one of the central principles of Twisted's design; by not attempting to directly take control of any loop, and providing a high-level wrapper around run, and an API that would accommodate every wacky wrapper around poll and select and kqueue and GetQueuedCompletionStatus, we could be a single loop that everything can use as an API and get the advantages of whatever event driven thing is popular this week. You can't accomplish this by trying to force other loops to play by your rules; rather, accommodate and pave over their peculiarities and it'll be your API that their users actually write to. (In the land of Mordor, where the shadows lie.) -glyph From kristjan at ccpgames.com Wed Oct 31 11:07:10 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 31 Oct 2012 10:07:10 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf > Of Guido van Rossum > Sent: 30. okt?ber 2012 16:40 > To: Kristj?n Valur J?nsson > Cc: Richard Oudkerk; python-ideas at python.org > Subject: Re: [Python-ideas] Async API: some code to review > > What kind of time savings are we talking about? I imagine that the > accept() loop I put in tulip/echosvr.py is fast enough in terms of response > time (latency) -- throughput would seem the more important measure (and I > have no idea of this yet). > http://code.google.com/p/tulip/source/browse/echosvr.py#37 > To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important. Looking at your code: c a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available. I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible. You either go for one or the other on windows, at least. in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing. It might be best to stick to one system. b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop. I this case, t will not start executing until after going around the loop twice. A new connection can only be accepted each loop. Imagine two http requests coming in simultaneously, at t=0 The sequence of operations will then be this (assuming FIFO scheduling) main loop runs accept 1 returns. task 1 created. accept 2 scheduled main loop runs making task 1 and accep2 runnable task 1 runs. does processing. performs send, and blocks accept2 returns, task2 created main loop runs, making task2 runnable task2 runs, does processing, performs send. Contributing to latency in this scenario are all the "main loop" runs. Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved. An alternative something like this: def loop(): while True: conn, addr = yield from listener.accept() handler(conn, addr) for I in range(n_handlers): t = scheduling.Task(loop) t.start() Here, events will be different: main loop runs, accept 1 and accept 2 runnable accept 1 returns, stariting handler, processing and blocking on send accept 2 returns, starting handler, processing, and blocking on send As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client. In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness.. Cheers From barry at python.org Wed Oct 31 11:38:53 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 31 Oct 2012 11:38:53 +0100 Subject: [Python-ideas] with-statement syntactic quirk Message-ID: <20121031113853.66fb0514@resist> with-statements have a syntactic quirk, which I think would be useful to fix. This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4, unless of course it's a bug . Legal: >>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass Not legal: >>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass Why is this useful? If you need to wrap this onto multiple lines, say to fit it within line length limits. IWBNI you could write it like this: with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass This seems analogous to using parens to wrap long if-statements, but maybe there's some subtle corner of the grammar that makes this problematic (like 'with' treating the whole thing as a single context manager). Of course, you can wrap with backslashes, but ick! Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From ncoghlan at gmail.com Wed Oct 31 12:55:54 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Oct 2012 21:55:54 +1000 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: <20121031113853.66fb0514@resist> References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 8:38 PM, Barry Warsaw wrote: > with-statements have a syntactic quirk, which I think would be useful to fix. > This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4, > unless of course it's a bug . > > Legal: > >>>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass > > Not legal: > >>>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass > > Why is this useful? If you need to wrap this onto multiple lines, say to fit > it within line length limits. IWBNI you could write it like this: > > with (open('/etc/passwd') as p1, > open('/etc/passwd') as p2): > pass > > This seems analogous to using parens to wrap long if-statements, but maybe > there's some subtle corner of the grammar that makes this problematic (like > 'with' treating the whole thing as a single context manager). It's not an especially subtle corner of the grammar, it's tuples-as-context-managers (i.e. the case with no as clauses) that causes hassles: with (cmA, cmB): pass This is: a) useless (because tuples aren't context managers); but also b) legal syntax (it blows up at runtime, complaining about a missing __enter__ or __exit__ method rather than throwing SyntaxError at compile time) Adding support for line continuation with parentheses to import statements was easier, because they don't accept arbitrary subexpressions, so there was no confusion with tuples. I do think it makes sense to change the semantics of this, but I ain't volunteering to figure out the necessary Grammar changes :P Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jeanpierreda at gmail.com Wed Oct 31 13:17:10 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 31 Oct 2012 08:17:10 -0400 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: <20121031113853.66fb0514@resist> References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 6:38 AM, Barry Warsaw wrote: > This seems analogous to using parens to wrap long if-statements, but maybe > there's some subtle corner of the grammar that makes this problematic (like > 'with' treating the whole thing as a single context manager). This seemed kind of icky when I read it, and I think Nick Coghlan stated the reason best. Is there a reason the tokenizer can't ignore newlines and indentation/deindentation between with/etc. and the trailing colon? This would solve the problem in general, without ambiguous syntax. -- Devin From eliben at gmail.com Wed Oct 31 13:33:04 2012 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 31 Oct 2012 05:33:04 -0700 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 5:17 AM, Devin Jeanpierre wrote: > On Wed, Oct 31, 2012 at 6:38 AM, Barry Warsaw wrote: > > This seems analogous to using parens to wrap long if-statements, but > maybe > > there's some subtle corner of the grammar that makes this problematic > (like > > 'with' treating the whole thing as a single context manager). > > This seemed kind of icky when I read it, and I think Nick Coghlan > stated the reason best. > > Is there a reason the tokenizer can't ignore newlines and > indentation/deindentation between with/etc. and the trailing colon? > This would solve the problem in general, without ambiguous syntax. > At the expense of making the tokenizer context dependent? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Wed Oct 31 13:45:00 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 31 Oct 2012 08:45:00 -0400 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 8:33 AM, Eli Bendersky wrote: >> Is there a reason the tokenizer can't ignore newlines and >> indentation/deindentation between with/etc. and the trailing colon? >> This would solve the problem in general, without ambiguous syntax. > > At the expense of making the tokenizer context dependent? It's already context-dependent in some sense, but this wouldn't make it any moreso. For example, the tokenizer already ignores indents/dedents when inside parens/braces/brackets, and handling this only slightly more complex than that. In particular, the trailing colon is the one not inside braces or brackets. Also, I'd avoid the term "context-dependent". It sounds too similar to "context-sensitive" ! Anyway, it looks like this isn't how the tokenizer treats braces/brackets (it ignores indent/dedent, but not newlines (I guess the grammar handles those)). What I meant to suggest was, treat "with ... :" similarly to how the OP suggests treating "with (...) :". -- Devin From eliben at gmail.com Wed Oct 31 13:52:04 2012 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 31 Oct 2012 05:52:04 -0700 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre wrote: > On Wed, Oct 31, 2012 at 8:33 AM, Eli Bendersky wrote: > >> Is there a reason the tokenizer can't ignore newlines and > >> indentation/deindentation between with/etc. and the trailing colon? > >> This would solve the problem in general, without ambiguous syntax. > > > > At the expense of making the tokenizer context dependent? > > It's already context-dependent in some sense, but this wouldn't make > it any moreso. For example, the tokenizer already ignores > indents/dedents when inside parens/braces/brackets, and handling this > only slightly more complex than that. In particular, the trailing > colon is the one not inside braces or brackets. > > Also, I'd avoid the term "context-dependent". It sounds too similar to > "context-sensitive" ! > I use the two as rough synonyms. Shouldn't I? > Anyway, it looks like this isn't how the tokenizer treats > braces/brackets (it ignores indent/dedent, but not newlines (I guess > the grammar handles those)). What I meant to suggest was, treat "with > ... :" similarly to how the OP suggests treating "with (...) :". > If this gets accepted, then, is there a reason to stop at "with"? Why not ignore newlines between "if" and its trailing ":" as well? [playing devil's advocate here] Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeanpierreda at gmail.com Wed Oct 31 14:14:07 2012 From: jeanpierreda at gmail.com (Devin Jeanpierre) Date: Wed, 31 Oct 2012 09:14:07 -0400 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 8:52 AM, Eli Bendersky wrote: >> Also, I'd avoid the term "context-dependent". It sounds too similar to >> "context-sensitive" ! > > I use the two as rough synonyms. Shouldn't I? "context sensitive" has a technical meaning, in the same way that "regular" or "recursively enumerable" does. In this particular case, the technical meaning doesn't align very well with the lay / intuitive meaning, but gets used in the same place as where one might use the phrase in the lay / intuitive sense -- if you'd said "context sensitive" I would've assumed you meant it in the technical sense. I guess I can't say that you should avoid the term unless I have a replacement. Maybe just using more words would help, like saying "then the actions of the tokenizer would depend on the context"? >> >> Anyway, it looks like this isn't how the tokenizer treats >> braces/brackets (it ignores indent/dedent, but not newlines (I guess >> the grammar handles those)). What I meant to suggest was, treat "with >> ... :" similarly to how the OP suggests treating "with (...) :". > > > If this gets accepted, then, is there a reason to stop at "with"? Why not > ignore newlines between "if" and its trailing ":" as well? [playing devil's > advocate here] I'd be very confused if newlines were acceptable inside `with` but not `if` and those. I'm not seeing a downside to changing them as well, except that it makes the workload (maybe significantly?) larger. I'm not sure if it's made that much larger. In the tokenizer it's easy, maybe in the grammar it's not so easy, and I don't know if this has to be in the grammar. The last time I ever tried editing python's parsing rules it ended very very poorly. -- Devin From ncoghlan at gmail.com Wed Oct 31 14:22:42 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 31 Oct 2012 23:22:42 +1000 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 10:52 PM, Eli Bendersky wrote: > On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre > wrote: >> Anyway, it looks like this isn't how the tokenizer treats >> braces/brackets (it ignores indent/dedent, but not newlines (I guess >> the grammar handles those)). What I meant to suggest was, treat "with >> ... :" similarly to how the OP suggests treating "with (...) :". > > If this gets accepted, then, is there a reason to stop at "with"? Why not > ignore newlines between "if" and its trailing ":" as well? [playing devil's > advocate here] Note that I agreed with Barry that we probably *should* change it from a principle-of-least-surprise point of view. I just called "not it" on actually figuring out how to make it work given the current Grammar definition as a starting point :) Between expression precedence control, singleton tuples, generator expressions, function calls, function parameter declarations, base class declarations, import statement grouping and probably a couple of other cases that have slipped my mind, parentheses already have plenty of different meanings in Python, and we also have plenty of places where the syntactical rules aren't quite the same as those in an ordinary expression. The thing that makes Python's parser simple is the fact that we have *prefixes* in the Grammar that make it clear when the parsing rules should change, so you don't need much lookahead at parsing time (it's deliberately limited to only 1 token, in fact). The challenge in this particular case is to avoid a Grammar ambiguity relative to ordinary expression syntax without duplicating large sections of the grammar file. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Oct 31 16:01:03 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 08:01:03 -0700 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: Modern CPUs are black boxes full of magic. I'm not too surprised that running Python code on multiple threads incurs some kind of overhead that keeping the Python interpreter in one thread avoids. On Wed, Oct 31, 2012 at 2:29 AM, Kristj?n Valur J?nsson wrote: > > >> -----Original Message----- >> From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf >> Of Guido van Rossum >> Sent: 30. okt?ber 2012 17:47 >> To: Kristj?n Valur J?nsson >> Cc: python-ideas at python.org >> Subject: Re: [Python-ideas] non-blocking buffered I/O >> >> On Tue, Oct 30, 2012 at 9:11 AM, Kristj?n Valur J?nsson >> wrote: >> > By the way: We found that acquiring the GIL by a random external thread >> in response to the IOCP to wake up tasklets was incredibly expensive. I >> spent a lot of effort figuring out why that is and found no real answer. The >> mechanism we now use is to let the external worker thread schedule a >> "pending call" which is serviced by the main thread at the earliest >> opportunity. Also, the main thread is interrupted if it is doing a sleep. This is >> much more efficient. >> >> In which Python version? The GIL has been redesigned at least once. >> Also the latency (not necessarily cost) to acquire the GIL varies by the >> sys.setswitchinterval setting. (Actually the more responsive you make it, the >> more it will cost you in overall performance.) >> >> I do think that using the pending call mechanism is the right solution here. > > I am talking about 2.7, of course, the python of hard working lumberjacks everywhere :) > > Anyway I don't think the issue is much affected by the particular GIL implementation. > Alternative a) > Callback comes on arbitrary thread > arbitrary thread calls PyGILState_Ensure > (This causes a _dynamic thread state_ to be generated for the arbitrary thread, and the GIL to be subsequently acquired) > arbitrary thread does whatever python gymnastics required to complete the IO (wake up tasklet > arbitrary thread calls PyGILState_Release > > For whatever reason, this approach _increased CPU usage_ on a loaded server. Latency was fine, throughput the same, and the delay in actual GIL acquisition was ok. I suspect that the problem lies with the dynamic acquisition of a thread state, and other initialization that may occur. I did experiment with having a cache of unused threadstates on the ready for external threads, but it didn't get me anywhere. This could also be the result of cache thrashing or something that doesn't show up immediately on a multicore cpu. > > Alternative b) > Callback comes on arbitrary thread > external thread callse PyEval_SchedulePendingCall() > This grabs a static lock, puts in a record, and signals to python that something needs to be done immediately. > external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. > Main thread wakes up from its sleep (if it was sleeping). > Main thread runs python code, causing it to immediately service the scheduled pending call, causing it to perform the wait. > > In reality, StacklessIO uses a slight variation of the above: > > StacklessIO dispatch system > Callback comes on arbitrary thread > external thread schedules a completion event in its own "dispatch" buffer to be serviced by the main thread. This is protected by its own lock, and doesn't need the GIL. > external thread callse PyEval_SchedulePendingCall() to "tick" the dispatch buffer > external thread calls a custom function to interrupt the main thread in the IO bound application, currently most likely sleeping in a WaitForMultipleObjects() with a timeout. > If main thread is sleeping: Main thread wakes up from its sleep > Immediately at after sleeping, the main thread will 'tick' the dispatch queue > After ticking, tasklets may have been made runnable, so the main thread may continue out into the main loop of the application to do work. If not, it may continue sleeping. > Main thread runs python code, causing it to immediately service the scheduled pending call, which will tick the dispatch queue. This may be a no-op if the main thread was sleeping and was already ticked. > > > > The issue we were facing was not with latency (although grabbing the GIL when the main thread is busy is slower than notifying it of a pending call), but with unexplained increased cpu showing up. A proxy node servicing 2000 clients or upwards would suddenly double or triple its cpu. > > > The reason I'm mentioning this here is that this is important. We have spent quite some time and energy on trying to figure out the most efficient way to complete IOCP from an arbitrary thread and this is the end result. Perhaps things can be done to improve this. Also, it is really important to study these things under real load, experience has shown me that the most innocuous changes that work well in the lab suddenly start behaving strangely in the field. > > > -- --Guido van Rossum (python.org/~guido) From kristjan at ccpgames.com Wed Oct 31 16:10:10 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 31 Oct 2012 15:10:10 +0000 Subject: [Python-ideas] non-blocking buffered I/O In-Reply-To: References: <20121029170731.74bd3d37@cosmocat> <20121029222541.07c461b3@pitrou.net> <508F6144.2040105@canterbury.ac.nz> Message-ID: > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf > Of Guido van Rossum > Sent: 31. okt?ber 2012 15:01 > To: Kristj?n Valur J?nsson > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] non-blocking buffered I/O > > Modern CPUs are black boxes full of magic. I'm not too surprised that running > Python code on multiple threads incurs some kind of overhead that keeping > the Python interpreter in one thread avoids. > Ah, but I forgot to mention one weird thing: If we used a pool of threads for the callbacks, and pre-initalized those threads with python states, and then acquired the GIL using PyEval_RestoreThread(), then this overhead went away. It was only the dynamic tread state acquired using PyGilState_Ensure() that caused cpu overhead. Using the fixed pool was not acceptable in the long run, in particular we din't want to complicate things to another level by adding a thread pool manger to the whole thing when the OS is fully capable of providing an external callback thread. I regret not spending more time on this and to be able to provide an actual performance analysis and fix. Instead I have to be that weird old man in the tavern uttering inscrutable warnings that no young adventurer pays any attention to :) K From guido at python.org Wed Oct 31 16:13:46 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 08:13:46 -0700 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: <20121031113853.66fb0514@resist> References: <20121031113853.66fb0514@resist> Message-ID: Honestly, is a backslash going to kill you? -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Wed Oct 31 16:28:01 2012 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 1 Nov 2012 01:28:01 +1000 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Thu, Nov 1, 2012 at 1:13 AM, Guido van Rossum wrote: > Honestly, is a backslash going to kill you? Aye, given the cost-benefit ratio on this one, I'll be rather surprised if anyone ever actually fixes it. I just wanted to be clear that I'm not *philosophically* opposed to fixing it (since I think Barry's proposed behaviour makes more sense from a user perspective), I'm just fairly sure it's likely to be hard to fix without making the Grammar harder to maintain, which *would* be a difficult sell for something that's a relatively trivial wart :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Oct 31 16:37:01 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 08:37:01 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Ok, this is a good point: the more you can do without having to go through the main loop again the better. I already took this to heart in my recent rewrites of recv() and send() -- they try to read/write the underlying socket first, and if it works, the task isn't suspended; only if they receive EAGAIN or something similar do they block the task and go back to the top. In fact, Listener.accept() does the same thing -- meaning the loop can go around many times without blocking a single time. (The listening socket is in non-blocking mode so accept() will raise EAGAIN when there *isn't* another client connection ready immediately.) This is also one of the advantages of yield-from; you *never* go back to the end of the ready queue just to invoke another layer of abstraction. (Steve tries to approximate this by running the generator immediately until the first yield, but the caller still ends up suspending to the scheduler, because they are using yield which doesn't avoid the suspension, unlike yield-from.) --Guido On Wed, Oct 31, 2012 at 3:07 AM, Kristj?n Valur J?nsson wrote: > > >> -----Original Message----- >> From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf >> Of Guido van Rossum >> Sent: 30. okt?ber 2012 16:40 >> To: Kristj?n Valur J?nsson >> Cc: Richard Oudkerk; python-ideas at python.org >> Subject: Re: [Python-ideas] Async API: some code to review >> >> What kind of time savings are we talking about? I imagine that the >> accept() loop I put in tulip/echosvr.py is fast enough in terms of response >> time (latency) -- throughput would seem the more important measure (and I >> have no idea of this yet). >> http://code.google.com/p/tulip/source/browse/echosvr.py#37 >> > To be honest, it isn't serious for applications that serve few connections, but for things like web servers, It becomes important. > Looking at your code: > c > > a) will always "block", causing the main thread (using the term loosely here) to once through the event loop, possibly doing other housekeepeing, even if a connection was available. I don't think there is no way to selectively do completion based io, i.e. do immediate mode if possible. You either go for one or the other on windows, at least. in select based mecanisms it could be possible to do a select here first and avoid that extra loop, but for the sake of the application it might be confusing. It might be best to stick to one system. > b) will either switch to the net task immediately (possible in stackless) or cause the srtart of t to wait until the next round in the event loop. > > I this case, t will not start executing until after going around the loop twice. A new connection can only be accepted each loop. Imagine two http requests coming in simultaneously, at t=0 > > The sequence of operations will then be this (assuming FIFO scheduling) > main loop runs > accept 1 returns. task 1 created. accept 2 scheduled > main loop runs making task 1 and accep2 runnable > task 1 runs. does processing. performs send, and blocks > accept2 returns, task2 created > main loop runs, making task2 runnable > task2 runs, does processing, performs send. > > Contributing to latency in this scenario are all the "main loop" runs. Note that I may misunderstand the way your architecture works, perhaps there is no main loop, perhaps everything is interleaved. > > An alternative something like this: > def loop(): > while True: > conn, addr = yield from listener.accept() > handler(conn, addr) > for I in range(n_handlers): > t = scheduling.Task(loop) > t.start() > > > Here, events will be different: > main loop runs, accept 1 and accept 2 runnable > accept 1 returns, stariting handler, processing and blocking on send > accept 2 returns, starting handler, processing, and blocking on send > > As you see, there is only one initial housekeeping run needed to make both tasklets runnable and ready to run without interruption, giving the lowest possible total latency to the client. > > In my expericene with RPC systems based this kind of asynchronous python IO, lowering the response time from when user space is made aware of the request and when python actually starts _processing_ it is critical to responsiveness.. > > Cheers > > -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Oct 31 16:42:28 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 08:42:28 -0700 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: On Wed, Oct 31, 2012 at 8:28 AM, Nick Coghlan wrote: > On Thu, Nov 1, 2012 at 1:13 AM, Guido van Rossum wrote: >> Honestly, is a backslash going to kill you? > > Aye, given the cost-benefit ratio on this one, I'll be rather > surprised if anyone ever actually fixes it. I just wanted to be clear > that I'm not *philosophically* opposed to fixing it (since I think > Barry's proposed behaviour makes more sense from a user perspective), > I'm just fairly sure it's likely to be hard to fix without making the > Grammar harder to maintain, which *would* be a difficult sell for > something that's a relatively trivial wart :) Yeah, the problem is that when you see a '(' immediately after 'with', you don't know whether that's just the start of a parenthesized expression or the start of a (foo as bar, blah as blabla) syntactic construct. -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Wed Oct 31 16:51:35 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 31 Oct 2012 15:51:35 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Guido van Rossum wrote: > This is also one of the advantages of yield-from; you *never* go back to the end > of the ready queue just to invoke another layer of abstraction. (Steve tries to > approximate this by running the generator immediately until the first yield, but > the caller still ends up suspending to the scheduler, because they are using > yield which doesn't avoid the suspension, unlike yield-from.) This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.) The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour). Cheers, Steve From kristjan at ccpgames.com Wed Oct 31 16:59:18 2012 From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_Valur_J=F3nsson?=) Date: Wed, 31 Oct 2012 15:59:18 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: > -----Original Message----- > From: gvanrossum at gmail.com [mailto:gvanrossum at gmail.com] On Behalf > Of Guido van Rossum > Sent: 31. okt?ber 2012 15:37 > To: Kristj?n Valur J?nsson > Cc: Richard Oudkerk; python-ideas at python.org > Subject: Re: [Python-ideas] Async API: some code to review > > Ok, this is a good point: the more you can do without having to go through > the main loop again the better. > > I already took this to heart in my recent rewrites of recv() and > send() -- they try to read/write the underlying socket first, and if it works, > the task isn't suspended; only if they receive EAGAIN or something similar do > they block the task and go back to the top. Yes, this is possible for non-blocking style IO. However, for IO architectures that are based on completions, you can't always mix and match. On windows, for example it is complicated to do because of how AcceptEx works. I recall socket properties, overlapped property and other things interfering. I also recall testing the use of first trying non-blocking IO (for accept and send/recv) and then resorting to an IOCP call. If I recall correctly, the added overhead of trying a non-blocking call for the usual case of it failing was detrimental to the whole exercise. the non-blocking IO calls took non-trivial time to complete. The approach of having multiple "threads" doing accept also avoids the delay required to dispatch the request from the accepting thread to the worker thread. > In fact, Listener.accept() does the same thing -- meaning the loop can go > This is also one of the advantages of yield-from; you *never* go back to the > end of the ready queue just to invoke another layer of abstraction. My experience with this stuff is of course based on stackless/gevent style programming, so some of it may not apply :) Personally, I feel that things should just magically work, from the programmer's point of view, rather than have to manually leave a trace of breadcrumbs through the stack using "yield" constructs. But that's just me. K From him at online.de Wed Oct 31 17:37:58 2012 From: him at online.de (=?ISO-8859-1?Q?Joachim_K=F6nig?=) Date: Wed, 31 Oct 2012 17:37:58 +0100 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: <509153E6.8020107@online.de> On 31/10/2012 16:42, Guido van Rossum wrote: > Yeah, the problem is that when you see a '(' immediately after 'with', > you don't know whether that's just the start of a parenthesized > expression or the start of a (foo as bar, blah as blabla) syntactic > construct. but couldn't "with" be interpreted as an additional kind of opening parantheses (and "if", "for", "while", "elif", "else" too) and the ":" as the closing one? I'm sure this has been asked a number of times but I couldn't find an answer. Joachim From barry at python.org Wed Oct 31 17:51:53 2012 From: barry at python.org (Barry Warsaw) Date: Wed, 31 Oct 2012 17:51:53 +0100 Subject: [Python-ideas] with-statement syntactic quirk References: <20121031113853.66fb0514@resist> Message-ID: <20121031175153.1d49db40@resist> On Oct 31, 2012, at 09:55 PM, Nick Coghlan wrote: >It's not an especially subtle corner of the grammar, it's >tuples-as-context-managers (i.e. the case with no as clauses) that >causes hassles: > > with (cmA, cmB): > pass > >This is: a) useless (because tuples aren't context managers); but also >b) legal syntax (it blows up at runtime, complaining about a missing >__enter__ or __exit__ method rather than throwing SyntaxError at >compile time) So clearly we need to make tuples proper context managers . -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From benjamin at python.org Wed Oct 31 18:04:40 2012 From: benjamin at python.org (Benjamin Peterson) Date: Wed, 31 Oct 2012 17:04:40 +0000 (UTC) Subject: [Python-ideas] with-statement syntactic quirk References: <20121031113853.66fb0514@resist> Message-ID: Nick Coghlan writes: > I do think it makes sense to change the semantics of this, but I ain't > volunteering to figure out the necessary Grammar changes :P It would not be difficult to special in AST construction. We do this for some other things already. From python at mrabarnett.plus.com Wed Oct 31 18:25:26 2012 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 31 Oct 2012 17:25:26 +0000 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: References: <20121031113853.66fb0514@resist> Message-ID: <50915F06.8020203@mrabarnett.plus.com> On 2012-10-31 13:22, Nick Coghlan wrote: > On Wed, Oct 31, 2012 at 10:52 PM, Eli Bendersky wrote: >> On Wed, Oct 31, 2012 at 5:45 AM, Devin Jeanpierre >> wrote: >>> Anyway, it looks like this isn't how the tokenizer treats >>> braces/brackets (it ignores indent/dedent, but not newlines (I guess >>> the grammar handles those)). What I meant to suggest was, treat "with >>> ... :" similarly to how the OP suggests treating "with (...) :". >> >> If this gets accepted, then, is there a reason to stop at "with"? Why not >> ignore newlines between "if" and its trailing ":" as well? [playing devil's >> advocate here] > > Note that I agreed with Barry that we probably *should* change it from > a principle-of-least-surprise point of view. I just called "not it" on > actually figuring out how to make it work given the current Grammar > definition as a starting point :) > > Between expression precedence control, singleton tuples, generator > expressions, function calls, function parameter declarations, base > class declarations, import statement grouping and probably a couple of > other cases that have slipped my mind, parentheses already have plenty > of different meanings in Python, and we also have plenty of places > where the syntactical rules aren't quite the same as those in an > ordinary expression. > > The thing that makes Python's parser simple is the fact that we have > *prefixes* in the Grammar that make it clear when the parsing rules > should change, so you don't need much lookahead at parsing time (it's > deliberately limited to only 1 token, in fact). The challenge in this > particular case is to avoid a Grammar ambiguity relative to ordinary > expression syntax without duplicating large sections of the grammar > file. > Another possibility could be to allow a tuple of context managers and a tuple of names: with (open('/etc/passwd'), open('/etc/passwd')) as (p1, p2): pass meaning: with open('/etc/passwd') as p1: with open('/etc/passwd')) as p2: pass or perhaps more correctly: with open('/etc/passwd') as temp_1: with open('/etc/passwd')) as temp_2: p1, p2 = temp_1, temp_2 pass It would also allow: with (cmA, cmB): pass meaning: with cmA: with cmB: pass From dholth at gmail.com Wed Oct 31 19:04:45 2012 From: dholth at gmail.com (Daniel Holth) Date: Wed, 31 Oct 2012 14:04:45 -0400 Subject: [Python-ideas] Allowing semver in packaging Message-ID: Or Changing the Version Comparison Module in Distutils (again) We've discussed a bit on distutils-sig about allowing http://semver.org/versions in Python packages. Ronald's suggestion to replace - with ~ as a filename parts separator made me think of it again, because semver also uses the popular - character. The gist of semver: Major.Minor.Patch (always 3 numbers) 1.0.0-prerelease.version 1.0.0+postrelease.version 1.0.0-pre+post And the big feature: no non-lexicographical sorting. Right now, setuptools replaces every run of non-alphanumeric characters in versions (and in project names) to a single dash (-). This would have to change to at least allow +, and the regexp for recognizing an installed dist would have to allow the plus as well. Current setuptools handling: def safe_version(version): version = version.replace(' ','.') return re.sub('[^A-Za-z0-9.]+', '-', version) Semver would be an upgrade from the existing conventions because it is easy to remember (no special-case sorting for forgettable words 'dev' and 'post'), because you can legally put Mercurial revision numbers in your package's pre- or post- version, and because the meaning of Major, Minor, Patch is defined. For compatibility I think it would be enough to say you could not mix semver and PEP-386 in the same major.minor.patch release. Vinay Sajip's distlib has some experimental support for semver. -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Oct 31 20:29:50 2012 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 31 Oct 2012 20:29:50 +0100 Subject: [Python-ideas] with-statement syntactic quirk References: <20121031113853.66fb0514@resist> Message-ID: <20121031202950.4cb3a7f9@pitrou.net> On Wed, 31 Oct 2012 11:38:53 +0100 Barry Warsaw wrote: > with-statements have a syntactic quirk, which I think would be useful to fix. > This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4, > unless of course it's a bug . > > Legal: > > >>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass > > Not legal: > > >>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass > > Why is this useful? If you need to wrap this onto multiple lines, say to fit > it within line length limits. IWBNI you could write it like this: > > with (open('/etc/passwd') as p1, > open('/etc/passwd') as p2): > pass This bit me a couple of days ago. +1 for supporting it. Regards Antoine. From arnodel at gmail.com Wed Oct 31 22:03:26 2012 From: arnodel at gmail.com (Arnaud Delobelle) Date: Wed, 31 Oct 2012 21:03:26 +0000 Subject: [Python-ideas] with-statement syntactic quirk In-Reply-To: <20121031113853.66fb0514@resist> References: <20121031113853.66fb0514@resist> Message-ID: On 31 October 2012 10:38, Barry Warsaw wrote: > with-statements have a syntactic quirk, which I think would be useful to fix. > This is true in Python 2.7 through 3.3, but it's likely not fixable until 3.4, > unless of course it's a bug . > > Legal: > >>>> with open('/etc/passwd') as p1, open('/etc/passwd') as p2: pass > > Not legal: > >>>> with (open('/etc/passwd') as p1, open('/etc/passwd') as p2): pass > > Why is this useful? If you need to wrap this onto multiple lines, say to fit > it within line length limits. IWBNI you could write it like this: > > with (open('/etc/passwd') as p1, > open('/etc/passwd') as p2): > pass > > This seems analogous to using parens to wrap long if-statements, but maybe > there's some subtle corner of the grammar that makes this problematic (like > 'with' treating the whole thing as a single context manager). > > Of course, you can wrap with backslashes, but ick! No need for backslashes, just put the brackets in the right place: with ( open('/etc/passwd')) as p1, ( open('/etc/passwd')) as p2: pass ;) -- Arnaud From guido at python.org Wed Oct 31 22:18:28 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 14:18:28 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Wed, Oct 31, 2012 at 8:51 AM, Steve Dower wrote: > Guido van Rossum wrote: >> This is also one of the advantages of yield-from; you *never* go back to the end >> of the ready queue just to invoke another layer of abstraction. (Steve tries to >> approximate this by running the generator immediately until the first yield, but >> the caller still ends up suspending to the scheduler, because they are using >> yield which doesn't avoid the suspension, unlike yield-from.) > > This is easily changed by modifying lines 141 and 180 of scheduler.py to call _step() directly instead of requeuing it. The reason why it currently requeues the task is that there is no guarantee that the caller wanted the next step to occur in the same scheduler, whether because the completed operation or a previous one continued somewhere else. (I removed the option to attach this information to the Future itself, but it is certainly of value in some circumstances, though mostly involving threads and not necessarily sockets.) I think you are missing the point. Even if you don't make a roundtrip through the queue, *each* yield statement, if it is executed at all, must transfers control to the scheduler. What you're proposing is just making the scheduler immediately resume the generator. So, if you have a trivial task, like this: @async def trivial(x): return x yield # Unreachable, but makes it a generator and a caller: @async caller(): foo = yield trivial(42) print(foo) then the call to trivial(42) returns a Future that already has the result 42 set in it. But caller() still suspends to the scheduler, yielding that Future. The scheduler can resume caller() immediately but the damage (overhead) is done. In contrast, in the yield-from world, we'd write this def trivial(x): return x yield from () # Unreachable def caller(): foo = yield from trivial(42) print(foo) where the latter expands roughly to the following, without reference to the scheduler at all: def caller(): _gen = trivial(42) try: while True: _val = next(_gen) yield _val except StopIteration as _exc: foo = _exc.value print(foo) The first next(gen) call raises StopIteration so the yield is never reached -- the scheduler doesn't know that any of this is going in. And there's no need to do anything special to advance the generator to the first yield manually either. (It's different of course when a generator is wrapped in a Task() constructor. But that should be relatively rare.) > The change I would probably make here is to test self.target and only requeue if it is different to the current scheduler (alternatively, a scheduler could implement its submit() to do this). Yes, this adds a little more overhead, but I'm still convinced that in general the operations being blocked on will take long enough for it to be insignificant. (And of course using a mechanism to bypass the decorator and use 'yield from' also avoids this overhead, though it potentially changes the program's behaviour). Just get with the program and use yield-from exclusively. -- --Guido van Rossum (python.org/~guido) From yselivanov.ml at gmail.com Wed Oct 31 22:31:02 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 31 Oct 2012 17:31:02 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: <56B33F9F-0525-46D5-8C53-55EF51C8A319@gmail.com> On 2012-10-31, at 5:18 PM, Guido van Rossum wrote: > @async > def trivial(x): > return x > yield # Unreachable, but makes it a generator FWIW, just a crazy comment: if we make @async decorator to clone the code object of a passed function and set its (co_flags | 0x0020), then any passed function becomes a generator, even if it doesn't have yields/yield-froms ;) - Yury From Steve.Dower at microsoft.com Wed Oct 31 22:31:58 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 31 Oct 2012 21:31:58 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Guido van Rossum wrote: > Just get with the program and use yield-from exclusively. I didn't realise there was a "program" here, just a discussion about an API design. I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute. When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance. Cheers, Steve From andrew.svetlov at gmail.com Wed Oct 31 22:34:02 2012 From: andrew.svetlov at gmail.com (Andrew Svetlov) Date: Wed, 31 Oct 2012 23:34:02 +0200 Subject: [Python-ideas] Async API: some code to review In-Reply-To: <56B33F9F-0525-46D5-8C53-55EF51C8A319@gmail.com> References: <56B33F9F-0525-46D5-8C53-55EF51C8A319@gmail.com> Message-ID: Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library. On Wed, Oct 31, 2012 at 11:31 PM, Yury Selivanov wrote: > On 2012-10-31, at 5:18 PM, Guido van Rossum wrote: > > > @async > > def trivial(x): > > return x > > yield # Unreachable, but makes it a generator > > FWIW, just a crazy comment: if we make @async decorator to clone > the code object of a passed function and set its (co_flags | 0x0020), > then any passed function becomes a generator, even if it doesn't > have yields/yield-froms ;) > > - > Yury > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Thanks, Andrew Svetlov -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 31 22:41:51 2012 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 31 Oct 2012 17:41:51 -0400 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: <56B33F9F-0525-46D5-8C53-55EF51C8A319@gmail.com> Message-ID: <2CD13A7E-636D-49FC-8C90-F7C5887584F5@gmail.com> On 2012-10-31, at 5:34 PM, Andrew Svetlov wrote: > Yury, you are really the crazy hacker. Not sure tricks with patching bytecode etc are good for standard library. I know that I sort of created an image for myself of "a guy who solves any problem by patching opcodes on live code", but don't worry, I'll never ever recommend such solutions for stdlib/python :) This is, however, a nice technique to rapidly prototype and test interesting ideas. - Yury From guido at python.org Wed Oct 31 22:51:47 2012 From: guido at python.org (Guido van Rossum) Date: Wed, 31 Oct 2012 14:51:47 -0700 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower wrote: > Guido van Rossum wrote: >> Just get with the program and use yield-from exclusively. > > I didn't realise there was a "program" here, just a discussion about an API design. Sorry, I left off a smiley. :-) > I've already raised my concerns with using yield from exclusively, but since the performance argument trumps all of those then there is little more I can contribute. What about the usability argument? Don't you think users will be confused by the need to use yield from some times and just yield other times? Yes, they may be able to tell by looking up the definition and checking how it is decorated, but that doesn't really help. > When a final design begins to stabilise, I will see how I can make use of it in my own code. Until then, I'll continue using Futures, which are ideal for my current needs. I won't be forcing 'yield from' onto my users until its usage is clear and I can provide them with suitable guidance. Understood. What exactly is it that makes Futures so ideal for your current needs? Is it integration with threads? Another tack: could you make use of tulip/polling.py? That doesn't use generators of any form; it is meant as an integration point with other styles of async programming (although I am not claiming that it is any good in its current form -- this too is just a strawman to shoot down). -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Wed Oct 31 23:36:13 2012 From: Steve.Dower at microsoft.com (Steve Dower) Date: Wed, 31 Oct 2012 22:36:13 +0000 Subject: [Python-ideas] Async API: some code to review In-Reply-To: References: Message-ID: Guido van Rossum wrote: > On Wed, Oct 31, 2012 at 2:31 PM, Steve Dower wrote: >> Guido van Rossum wrote: >>> Just get with the program and use yield-from exclusively. >> >> I didn't realise there was a "program" here, just a discussion about an API >> design. > > Sorry, I left off a smiley. :-) Always a risk in email communication - no offence taken. >> I've already raised my concerns with using yield from exclusively, but since >> the performance argument trumps all of those then there is little more I can >> contribute. > > What about the usability argument? Don't you think users will be confused by the > need to use yield from some times and just yield other times? Yes, they may be > able to tell by looking up the definition and checking how it is decorated, but > that doesn't really help. Users only ever _need_ to write yield. The only reason that wattle does not work with Python 3.2 is because of non-blank returns inside generators. There is only one reason to use 'yield from' and that is for the performance optimisation, which I do acknowledge and did observe in my own benchmarks. >> When a final design begins to stabilise, I will see how I can make use of it >> in my own code. Until then, I'll continue using Futures, which are ideal for my >> current needs. I won't be forcing 'yield from' onto my users until its usage is >> clear and I can provide them with suitable guidance. > > Understood. What exactly is it that makes Futures so ideal for your current > needs? Is it integration with threads? > > Another tack: could you make use of tulip/polling.py? That doesn't use > generators of any form; it is meant as an integration point with other styles of > async programming (although I am not claiming that it is any good in its current > form -- this too is just a strawman to shoot down). I know I've been vague about our intended application (deliberately so, to try and keep the discussion neutral), but I'll lay out some details. We're working on adding support for Windows 8 apps (formerly known as Metro) written in Python. These will use the new API (WinRT) which is highly asynchronous - even operations such as opening a file are only* available as an asynchronous function. The intention is to never block on the UI thread. (* Some synchronous Win32 APIs are still available from C++, but these are actively discouraged and restricted in many ways. Most of Win32 is not usable.) The model used for these async APIs is future-based: every *Async() function returns a future for a task that is already running. The caller is not allowed to wait on this future - the only option is to attach a callback. C# and VB use their async/await keywords (good 8 min intro video on those: http://www.visualstudiolaunch.com/vs2012vle/Theater?sid=1778) while JavaScript and C++ have multi-line lambda support. For Python, we are aiming for closer to the async/await model (which is also how we chose the names). Incidentally, our early designs used yield from exclusively. It was only when we started discovering edge-cases where things broke, as well as the impact on code 'cleanliness', that we switched to yield. There are three aspects of this that work better and result in cleaner code with wattle than with tulip: - event handlers can be "async-void", such that when the event is raised by the OS/GUI/device/whatever the handler can use asynchronous tasks without blocking the main thread. In this case, the caller receives a future but ignores it because it does not care about the final result. (We could achieve this under 'yield from' by requiring a decorator, which would then probably prevent other Python code from calling the handler directly. There is very limited opportunity for us to reliably intercept this case.) - the event loop is implemented by the OS. Our Scheduler implementation does not need to provide an event loop, since we can submit() calls to the OS-level loop. This pattern also allows wattle to 'sit on top of' any other event loop, probably including Twisted and 0MQ, though I have not tried it (except with Tcl). - Future objects can be marshalled directly from Python into Windows, completing the interop story. Even with tulip, we would probably still require a decorator for this case so that we can marshal regular generators as iterables (for which there is a specific type). Without a decorator, we would probably have to ban both cases to prevent subtly misbehaving programs. At least with wattle, the user does not have to do anything different from any of their other @async functions. Despite this intended application, I have tried to approach this design task independently to produce an API that will work for many cases, especially given the narrow focus on sockets. If people decide to get hung up on "the Microsoft way" or similar rubbish then I will feel vindicated for not mentioning it earlier :-) - it has not had any more influence on wattle than any of my other past experience has. Cheers, Steve