From benhoyt at gmail.com Tue Jan 1 01:11:37 2013 From: benhoyt at gmail.com (Ben Hoyt) Date: Tue, 1 Jan 2013 13:11:37 +1300 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: Interesting idea, though I don't think it's something that should be a Python language extension. For instance, iOS (typically more resource-constrained) sends the application signals when system memory is getting low so it can free stuff -- this is done at the OS level. And I think that's the right place, because this will almost certainly be setup- and system-dependent. For instance, it would depend hugely on whether there's a virtual memory manager, and how it's configured. I'd say your best bet is to write a little library that does the appropriate thing for your needs (your system or setup). Say starts a thread that checks the system's free memory every so often and sends your application a signal/callback saying "we're getting low, free some of your caches". It could even send a "level flag" to your callback saying "fairly low", "very low", or "critically low" -- I think iOS does this. -Ben On Tue, Jan 1, 2013 at 11:16 AM, Max Moroz wrote: > Sometimes, I have the flexibility to reduce the memory used by my > program (e.g., by destroying large cached objects, etc.). It would be > great if I could ask Python interpreter to notify me when memory is > running out, so I can take such actions. > > Of course, it's nearly impossible for Python to know in advance if the > OS would run out of memory with the next malloc call. Furthermore, > Python shouldn't guess which memory (physical, virtual, etc.) is > relevant in the particular situation (for instance, in my case, I only > care about physical memory, since swapping to disk makes my > application as good as frozen). So the problem as stated above is > unsolvable. > > But let's say I am willing to do some work to estimate the maximum > amount of memory my application can be allowed to use. If I provide > that number to Python interpreter, it may be possible for it to notify > me when the next memory allocation would exceed this limit by calling > a function I provide it (hopefully passing as arguments the amount of > memory being requested, as well as the amount currently in use). My > callback function could then destroy some objects, and return True to > indicate that some objects were destroyed. At that point, the > intepreter could run its standard garbage collection routines to > release the memory that corresponded to those objects - before > proceeding with whatever it was trying to do originally. (If I > returned False, or if I didn't provide a callback function at all, the > interpreter would simply behave as it does today.) Any memory > allocations that happen while the callback function itself is > executing, would not trigger further calls to it. The whole mechanism > would be disabled for the rest of the session if the memory freed by > the callback function was insufficient to prevent going over the > memory limit. > > Would this be worth considering for a future language extension? How > hard would it be to implement? > > Max > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg at krypto.org Tue Jan 1 04:22:29 2013 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 31 Dec 2012 19:22:29 -0800 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: Within CPython the way the C API is today it is too late by the time the code to raise a MemoryError has been called so capturing all places that could occur is not easy. Implementing this at the C level malloc later makes more sense. Have it dip into a reserved low memory pool to satisfy the current request and send the process a signal indicating it is running low. This approach would also work with C extension modules or an embedded Python. I'd expect this already exists but I haven't looked for one. Having a thread polling memory use it not generally wise as that is polling rather than event driven and could easily miss low memory situations before it is too late and a failure has already happened (allocation demand can come in large spikes depending on the application). OSes running processes in constrained environments or ones where the resources available can be reduced by the OS later may already send their own warning signals prior to outright killing the process but that should not preclude an application being able to monitor and constrain itself on its own without needing the OS to do it. -gps On Mon, Dec 31, 2012 at 4:11 PM, Ben Hoyt wrote: > Interesting idea, though I don't think it's something that should be a > Python language extension. For instance, iOS (typically more > resource-constrained) sends the application signals when system memory is > getting low so it can free stuff -- this is done at the OS level. And I > think that's the right place, because this will almost certainly be setup- > and system-dependent. For instance, it would depend hugely on whether > there's a virtual memory manager, and how it's configured. > > I'd say your best bet is to write a little library that does the > appropriate thing for your needs (your system or setup). Say starts a > thread that checks the system's free memory every so often and sends your > application a signal/callback saying "we're getting low, free some of your > caches". It could even send a "level flag" to your callback saying "fairly > low", "very low", or "critically low" -- I think iOS does this. > > -Ben > > > > On Tue, Jan 1, 2013 at 11:16 AM, Max Moroz wrote: > >> Sometimes, I have the flexibility to reduce the memory used by my >> program (e.g., by destroying large cached objects, etc.). It would be >> great if I could ask Python interpreter to notify me when memory is >> running out, so I can take such actions. >> >> Of course, it's nearly impossible for Python to know in advance if the >> OS would run out of memory with the next malloc call. Furthermore, >> Python shouldn't guess which memory (physical, virtual, etc.) is >> relevant in the particular situation (for instance, in my case, I only >> care about physical memory, since swapping to disk makes my >> application as good as frozen). So the problem as stated above is >> unsolvable. >> >> But let's say I am willing to do some work to estimate the maximum >> amount of memory my application can be allowed to use. If I provide >> that number to Python interpreter, it may be possible for it to notify >> me when the next memory allocation would exceed this limit by calling >> a function I provide it (hopefully passing as arguments the amount of >> memory being requested, as well as the amount currently in use). My >> callback function could then destroy some objects, and return True to >> indicate that some objects were destroyed. At that point, the >> intepreter could run its standard garbage collection routines to >> release the memory that corresponded to those objects - before >> proceeding with whatever it was trying to do originally. (If I >> returned False, or if I didn't provide a callback function at all, the >> interpreter would simply behave as it does today.) Any memory >> allocations that happen while the callback function itself is >> executing, would not trigger further calls to it. The whole mechanism >> would be disabled for the rest of the session if the memory freed by >> the callback function was insufficient to prevent going over the >> memory limit. >> >> Would this be worth considering for a future language extension? How >> hard would it be to implement? >> >> Max >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From random832 at fastmail.us Tue Jan 1 04:28:43 2013 From: random832 at fastmail.us (Random832) Date: Mon, 31 Dec 2012 22:28:43 -0500 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: <50E257EB.5070002@fastmail.us> On 12/31/2012 7:11 PM, Ben Hoyt wrote: > Interesting idea, though I don't think it's something that should be a > Python language extension. For instance, iOS (typically more > resource-constrained) sends the application signals when system memory > is getting low so it can free stuff -- this is done at the OS level. > And I think that's the right place, because this will almost certainly > be setup- and system-dependent. For instance, it would depend hugely > on whether there's a virtual memory manager, and how it's configured. > > I'd say your best bet is to write a little library that does the > appropriate thing for your needs (your system or setup). Say starts a > thread that checks the system's free memory every so often and sends > your application a signal/callback saying "we're getting low, free > some of your caches". It could even send a "level flag" to your > callback saying "fairly low", "very low", or "critically low" -- I > think iOS does this. I'm concerned that a program that does this will end up as the loser in this scenario: http://blogs.msdn.com/b/oldnewthing/archive/2012/01/18/10257834.aspx (tl;dr, two programs each having a different idea of how much free memory the system should have results in an "unfair" total allocation of memory) I think it's possibly important to avoid using the system's free memory as an input to any such system. From greg at krypto.org Tue Jan 1 04:33:01 2013 From: greg at krypto.org (Gregory P. Smith) Date: Mon, 31 Dec 2012 19:33:01 -0800 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: <50E257EB.5070002@fastmail.us> References: <50E257EB.5070002@fastmail.us> Message-ID: On Mon, Dec 31, 2012 at 7:28 PM, Random832 wrote: > On 12/31/2012 7:11 PM, Ben Hoyt wrote: > >> Interesting idea, though I don't think it's something that should be a >> Python language extension. For instance, iOS (typically more >> resource-constrained) sends the application signals when system memory is >> getting low so it can free stuff -- this is done at the OS level. And I >> think that's the right place, because this will almost certainly be setup- >> and system-dependent. For instance, it would depend hugely on whether >> there's a virtual memory manager, and how it's configured. >> >> I'd say your best bet is to write a little library that does the >> appropriate thing for your needs (your system or setup). Say starts a >> thread that checks the system's free memory every so often and sends your >> application a signal/callback saying "we're getting low, free some of your >> caches". It could even send a "level flag" to your callback saying "fairly >> low", "very low", or "critically low" -- I think iOS does this. >> > > I'm concerned that a program that does this will end up as the loser in > this scenario: > > http://blogs.msdn.com/b/**oldnewthing/archive/2012/01/**18/10257834.aspx > > (tl;dr, two programs each having a different idea of how much free memory > the system should have results in an "unfair" total allocation of memory) > > I think it's possibly important to avoid using the system's free memory as > an input to any such system. > > indeed. only look at your own process's consumption vs. some numerical limit you've chosen for yourself. this also means you can adjust your own limit up or down at runtime if desired. (JVM's tend to force you to work this way) -gps -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 1 22:55:05 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 1 Jan 2013 22:55:05 +0100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow References: <20121231000012.GA10426@iskra.aviel.ru> Message-ID: <20130101225505.757540fa@pitrou.net> On Mon, 31 Dec 2012 04:00:12 +0400 Oleg Broytman wrote: > Hello and happy New Year! > > On Sun, Dec 30, 2012 at 11:20:34PM +0100, Victor Stinner wrote: > > If I understood correctly, you would like to list some specific issues > > like print() not flushing immediatly stdout if you ask to not write a > > newline (print "a", in Python 2 or print("a", end=" ") in Python 3). > > If I understood correctly, and if you want to improve Python, you > > should help the documentation project. Or if you can build a website > > listing such issues *and listing solutions* like calling > > sys.stdout.flush() or using print(flush=True) (Python 3.3+) for the > > print issue. > > > > A list of such issue without solution doesn't help anyone. > > I cannot say for Anatoly but for me warts are: > > -- things that don't exist where they should (but the core team object > or they are hard to implement or something); > -- things that exist where they shouldn't; they are hard to fix because > removing them would break backward compatibility; > -- things that are implemented in strange, inconsistent ways. > > A few examples: > [snip] The problem is you are listing examples which *in your opinion* are issues with Python. Other people would have different ideas of what is an issue and what is not. This can't be the right methodology if we want to write a piece of Python docs. Only things which are *well-known* annoyances can qualify. I also disagree that missing features are "warts"; they are just missing features, not something unpleasant that's difficult to get rid of. Regards Antoine. From rosuav at gmail.com Tue Jan 1 23:16:39 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 09:16:39 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <20130101225505.757540fa@pitrou.net> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> Message-ID: On Wed, Jan 2, 2013 at 8:55 AM, Antoine Pitrou wrote: > The problem is you are listing examples which *in your opinion* are > issues with Python. Other people would have different ideas of what is > an issue and what is not. This can't be the right methodology if we > want to write a piece of Python docs. Only things which are > *well-known* annoyances can qualify. My understanding of a "Python wart" is that it's something that cannot be changed without readdressing some fundamental design. For example, Python has decided that indentation and line-endings are significant - that a logical statement ends at end-of-line. Python has further decided that line continuation characters are unnecessary inside parenthesized expressions. Resultant wart: Breaking a massive 'for' loop between its assignment list and its iterable list doesn't work, even though breaking it anywhere else does. (This question came up on python-list a little while ago.) Why should it be an error to break it here, but not there? Why can't I split it like this: for x,y,z in start_point, continuation_point, end_point : pass when it's perfectly legal to split it like this: for ( x,y,z ) in ( start_point, continuation_point, end_point ): pass Well, because you can't. It's a little odd what you can and can't do, until you understand the underlying system fairly well. It's something that's highly unlikely to change; one of the premises would have to be sacrificed (or at least modified) to achieve it. Something that could be changed if the devs had enough time is a tracker issue (or a "show me some code" issue - you want to complain, you can do the work to fix it). Something that could be changed, but would break backward compatibility is a prime candidate for __future__ and/or Python 4 (like the change of the division operator - that change introduced its own oddities, some of which may be warts, eg that int/int->float but sqrt(float) !-> complex). A wart is different from both of the above. ChrisA From wuwei23 at gmail.com Wed Jan 2 00:17:34 2013 From: wuwei23 at gmail.com (alex23) Date: Tue, 1 Jan 2013 15:17:34 -0800 (PST) Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> Message-ID: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> On Jan 2, 8:16?am, Chris Angelico wrote: > It's a little odd what you can and can't do, > until you understand the underlying system fairly well. It's something > that's highly unlikely to change; one of the premises would have to be > sacrificed (or at least modified) to achieve it. By this definition, though, every feature of Python that someone doesn't understand is a wart. For a new user, mutable default parameters is a wart, but once you understand Python's execution & object models, it's just the way the language is. Generally, I find "wart" means "something the user doesn't like about the language even if it makes internal sense". From rosuav at gmail.com Wed Jan 2 00:46:00 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 10:46:00 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> Message-ID: On Wed, Jan 2, 2013 at 10:17 AM, alex23 wrote: > On Jan 2, 8:16 am, Chris Angelico wrote: >> It's a little odd what you can and can't do, >> until you understand the underlying system fairly well. It's something >> that's highly unlikely to change; one of the premises would have to be >> sacrificed (or at least modified) to achieve it. > > By this definition, though, every feature of Python that someone > doesn't understand is a wart. For a new user, mutable default > parameters is a wart, but once you understand Python's execution & > object models, it's just the way the language is. > > Generally, I find "wart" means "something the user doesn't like about > the language even if it makes internal sense". That's pretty much it, yeah. The warts of Python are the gotchas that need to be grokked before you can call yourself fluent in the language. Might feel as though the designers "got it wrong", or were making an arbitrary choice, but whatever it is, the language behaves that way and you have to get to know it. I agree that mutable defaults as a wart. PHP's scoping rules are simpler than Python's. A variable inside a function is local unless it's explicitly declared global; function names are global. (That's not the whole set of rules, but close enough for this argument.) Python, on the other hand, adds the oddity that a name referenced inside a function is global unless, somewhere in that function, it's assigned to. This is a Python wart that bites people (see the first question in http://toykeeper.net/warts/python/ for instance), but it's a consequence of putting "variables" and "functions" into a single namespace called "name bindings", plus the decision to not require variable declarations (C, for instance, has the same notion of "everything's a name", but instead of declaring globals, declares locals). Python's scoping rules are vastly superior to PHP's, but a bit more complicated, and may need a bit of explanation. (Incidentally, of all the warts listed in the page I linked above, two give a quick and easy error message, two have better ways of doing things (don't use +=, use append/extend), and only one is really a wart - mutable default values. Well, that and the behaviour of += on something in a tuple, but .extend dodges that one too.) Documenting these sorts of oddities is a good thing, as long as the underlying goal is one of new programmer education and not "hey you idiots who develop this language, here's all the things you did wrong". ChrisA From steve at pearwood.info Wed Jan 2 00:57:00 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 10:57:00 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> Message-ID: <50E377CC.70108@pearwood.info> On 02/01/13 09:16, Chris Angelico wrote: > My understanding of a "Python wart" is that it's something that cannot > be changed without readdressing some fundamental design. For example, > Python has decided that indentation and line-endings are significant - > that a logical statement ends at end-of-line. Python has further > decided that line continuation characters are unnecessary inside > parenthesized expressions. Resultant wart: Breaking a massive 'for' > loop between its assignment list and its iterable list doesn't work, > even though breaking it anywhere else does. A truly poor example. You can't break a for loop between its assignment and iterable for the same reason you can't break any other statement at an arbitrary place. That's not how Python does things: statements must be on a single logical line. > (This question came up on > python-list a little while ago.) Why should it be an error to break it > here, but not there? Why can't I split it like this: > > for x,y,z in > start_point, > continuation_point, > end_point > : > pass As you say above, logical statements end at end-of-line. There is an end-of-line following "for x,y,z in". Why would anyone think that you should be able to split the statement there? - is there a line-continuation that would let you continue over multiple physical lines? no - is there a parenthesized expression that would let you continue over multiple physical lines? no None of the conditions for splitting statements over multiple physical lines apply, and so the standard rule applies: statements end at end-of-line. This is not a wart any more than the inability to write: y = x + 1; is a wart. Maybe you're used to being able to do that in some (but not all) semi-colon languages, but they are not Python, any more than Python is Forth where you might be used to writing: x 1 + y ! To some degree warts are in the eye of the beholder, but failure of Python to be "just like language Foo" is not a wart. > when it's perfectly legal to split it like this: > > for ( > x,y,z > ) in ( > start_point, > continuation_point, > end_point > ): > pass > > Well, because you can't. It's a little odd what you can and can't do, "Why can't I drive straight through red lights, when I'm allowed to drive through green lights? That's a little odd!" No it is not. It is a fundamental aspect of Python's syntax. > until you understand the underlying system fairly well. It's something > that's highly unlikely to change; one of the premises would have to be > sacrificed (or at least modified) to achieve it. > > Something that could be changed if the devs had enough time is a > tracker issue (or a "show me some code" issue - you want to complain, > you can do the work to fix it). Something that could be changed, but > would break backward compatibility is a prime candidate for __future__ > and/or Python 4 (like the change of the division operator - that > change introduced its own oddities, some of which may be warts, eg > that int/int->float but sqrt(float) !-> complex). Are you talking about math.sqrt or cmath.sqrt or some other sqrt? In general, Python 3 now extends float to complex under regular arithmetic: py> (-100.0)**0.5 (6.123031769111886e-16+10j) math.sqrt(-100.0) on the other hand continues to raise, because the math module is by design limited to producing real-values. cmath.sqrt(-100.0) continues to give a complex result, again by design. -- Steven From phd at phdru.name Wed Jan 2 01:01:13 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 04:01:13 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> Message-ID: <20130102000113.GB672@iskra.aviel.ru> On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23 wrote: > On Jan 2, 8:16?am, Chris Angelico wrote: > > It's a little odd what you can and can't do, > > until you understand the underlying system fairly well. It's something > > that's highly unlikely to change; one of the premises would have to be > > sacrificed (or at least modified) to achieve it. > > By this definition, though, every feature of Python that someone > doesn't understand is a wart. For a new user, mutable default > parameters is a wart, but once you understand Python's execution & > object models, it's just the way the language is. > > Generally, I find "wart" means "something the user doesn't like about > the language even if it makes internal sense". What about warts that don't have internal sense? Mutable default parameters are just artifacts of the implementation. What is their "internal sense"? Paraphrasing Alan Cooper from "The Inmates are Running the Asylum": The phrase "experienced Python programmer" really means the person has been hurt so many times that the scar tissue is thick enough so he no longer feels the pain. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From rosuav at gmail.com Wed Jan 2 01:11:29 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 11:11:29 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <50E377CC.70108@pearwood.info> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <50E377CC.70108@pearwood.info> Message-ID: On Wed, Jan 2, 2013 at 10:57 AM, Steven D'Aprano wrote: > You can't break a for loop between its assignment and iterable for the > same reason you can't break any other statement at an arbitrary place. > That's not how Python does things: statements must be on a single logical > line. > > This is not a wart any more than the inability to write: > > y = > x + > 1; > > is a wart. Maybe you're used to being able to do that in some (but not > all) semi-colon languages) > > To some degree warts are in the eye of the beholder, but failure of > Python to be "just like language Foo" is not a wart. Of course. I'm just trying to find examples that have actually come up on python-list, rather than contriving my own. As per my definition of wart as given above, these are NOT things that need to be fixed - just things that need to be understood. Rule: One Python statement must be on one line. (This is the bit where Python differs from, say, C.) Modifying rule: Python statements can be broken across multiple lines, given certain conditions. Wart: There are other conditions that, though they seem superficially similar to the legal ones, don't make for valid split points. Even though a human might say that it's obvious and unambiguous that the statement continues, the rules don't allow it. >> eg int/int->float but sqrt(float) !-> complex). > > Are you talking about math.sqrt or cmath.sqrt or some other sqrt? > > In general, Python 3 now extends float to complex under regular arithmetic: > > py> (-100.0)**0.5 > (6.123031769111886e-16+10j) > > > math.sqrt(-100.0) on the other hand continues to raise, because the math > module is by design limited to producing real-values. cmath.sqrt(-100.0) > continues to give a complex result, again by design. Hmm, I was doing that one from memory. Since the ** operator happily returns complex, it was probably math.sqrt that was in question. I withdraw this one; the operators are consistent amongst themselves, all will extend to the "next type up" if necessary. (Or at least, this pair do. There might be a wart elsewhere, but this ain't it.) ChrisA From rosuav at gmail.com Wed Jan 2 01:34:47 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 11:34:47 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102000113.GB672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: On Wed, Jan 2, 2013 at 11:01 AM, Oleg Broytman wrote: > What about warts that don't have internal sense? Mutable default > parameters are just artifacts of the implementation. What is their > "internal sense"? They let you use a function for something where you'd otherwise need to instantiate an object and play with it. Take caching, for instance: def name_lookup(name,cache={}): if name not in cache: cache[name] = some_lengthy_operation # prune the cache of old stuff to keep its size down return cache[name] You can ignore the default argument and pretend it's all magic, or you can explicitly run a separate cache: name_lookup("foo",{}) # easy way to say "bypass the cache" # Do a bunch of lookups that won't be in the main cache, and which would only pollute the main cache for later local_name_cache = {} [name_lookup(n,local_name_cache) for n in names] The other consideration here is of side effects. It's all very well to wave a magic wand and say that: def foo(x,y=[]): pass will create a unique list for each y, but what about: def foo(x,y=open("main.log","w")): pass or similar? Should it reopen the log every time? Should it reevaluate the expression? There's an easy way to spell it if you want that behaviour: def foo(x,y=None): if y is None: y=whatever_expression_you_want (or using object() if None is a legal arg). Whichever way mutable objects in default args are handled, there are going to be strangenesses. Therefore the best thing to do is (almost certainly) the simplest. > Paraphrasing Alan Cooper from "The Inmates are Running the Asylum": > The phrase "experienced Python programmer" really means the person has > been hurt so many times that the scar tissue is thick enough so he no > longer feels the pain. That applies to PHP, and possibly to C (though if you treat C as "all the power of assembly language, coupled with all the readability of assembly language", then it doesn't hurt nearly as much as if you try to treat it as a modern high level language). I'm not so sure it applies to Python. ChrisA From wuwei23 at gmail.com Wed Jan 2 01:54:34 2013 From: wuwei23 at gmail.com (alex23) Date: Tue, 1 Jan 2013 16:54:34 -0800 (PST) Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102000113.GB672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: On Jan 2, 10:01?am, Oleg Broytman wrote: > Paraphrasing Alan Cooper from "The Inmates are Running the Asylum": > The phrase "experienced Python programmer" really means the person has > been hurt so many times that the scar tissue is thick enough so he no > longer feels the pain. To me, that's nonsense. The pain people are experiencing with "warts" like mutable defaults is entirely from trying to force Python to fit mental models they've constructed of other languages. The "internal sense" of mutable defaults is that everything is an object, and that functions arguments are declared at definition and not run-time. What you call "implementation artifact" I see as expected behaviour; any other implementation that didn't provide this wouldn't be Python in a number of fundamental ways. From steve at pearwood.info Wed Jan 2 01:55:58 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 11:55:58 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102000113.GB672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <50E3859E.8030003@pearwood.info> On 02/01/13 11:01, Oleg Broytman wrote: > On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23 wrote: >> On Jan 2, 8:16 am, Chris Angelico wrote: >>> It's a little odd what you can and can't do, >>> until you understand the underlying system fairly well. It's something >>> that's highly unlikely to change; one of the premises would have to be >>> sacrificed (or at least modified) to achieve it. >> >> By this definition, though, every feature of Python that someone >> doesn't understand is a wart. For a new user, mutable default >> parameters is a wart, but once you understand Python's execution& >> object models, it's just the way the language is. >> >> Generally, I find "wart" means "something the user doesn't like about >> the language even if it makes internal sense". > > What about warts that don't have internal sense? Mutable default > parameters are just artifacts of the implementation. What is their > "internal sense"? They are not artifacts of the implementation, they are a consequence of a deliberate design choice of Python. Default values in function definitions are set *once*, when the function object is created. Only the function body is run every time the function is called, not the function definition. So whether you do this: def ham(x=0): x += 1 return x or this: def spam(x=[]): x.append(1) return x the default value for both functions is a single object created once and reused every time you call the function. The consequences of this may be too subtle for beginners to predict, and that even experienced coders sometimes forget makes it a wart, but it makes perfect internal sense: * in Python, bindings ALWAYS occur when the code is executed; * in Python, "x=" is a binding; * even inside a function definition; * def is a statement which is executed at run time, not something performed at compile time; * therefore, inside the statement "def spam(x=[]): ..." the binding x=[] occurs ONCE ONLY. The same list object is always used for the default value, not a different one each time. Early binding of function defaults should, in my opinion, be preferred over late binding because: * given early binding, it is clean to get late binding semantics with just one extra line. Everything you need remains encapsulated inside the function: def spam(x=None): if x is None: x = [] x.append(1) return x * given late binding, it is ugly to get early binding semantics, since it requires you to create a separate global "constant" for every argument needing an early binding: _SPAM_DEFAULT_ARG = [] # Don't touch this! def spam(x=None): if x is None: x = _SPAM_DEFAULT_ARG x.append(1) return x -- Steven From ncoghlan at gmail.com Wed Jan 2 02:07:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jan 2013 11:07:58 +1000 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102000113.GB672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: On Wed, Jan 2, 2013 at 10:01 AM, Oleg Broytman wrote: > On Tue, Jan 01, 2013 at 03:17:34PM -0800, alex23 wrote: >> On Jan 2, 8:16 am, Chris Angelico wrote: >> > It's a little odd what you can and can't do, >> > until you understand the underlying system fairly well. It's something >> > that's highly unlikely to change; one of the premises would have to be >> > sacrificed (or at least modified) to achieve it. >> >> By this definition, though, every feature of Python that someone >> doesn't understand is a wart. For a new user, mutable default >> parameters is a wart, but once you understand Python's execution & >> object models, it's just the way the language is. >> >> Generally, I find "wart" means "something the user doesn't like about >> the language even if it makes internal sense". FWIW, I prefer the term "traps for the unwary" over "warts", since it's less judgmental and better covers the goal of issues for people which can cause problems with learning the language. I highlight some of the examples related to the import system here: http://python-notes.boredomandlaziness.org/en/latest/python_concepts/import_traps.html > What about warts that don't have internal sense? Mutable default > parameters are just artifacts of the implementation. What is their > "internal sense"? Um, no. Mutable default arguments make perfect sense once you understand the difference between compile time, definition time and execution time for a function. Defaults are evaluated at definition time, thus they are necessarily shared across all invocations of the function. If you don't want them shared, you use a sentinel value like None to postpone the creation to execution time. They're a trap for the unwary, but not a wart. Else clauses on loops are arguably closer to qualifying as a genuine wart (see http://python-notes.boredomandlaziness.org/en/latest/python_concepts/break_else.html), since they're not much shorter than the explicit sentinel value based alternative, and significantly less intuitive. However, because they exist, and people *will* encounter them in real world code, every beginner will eventually have to learn what they mean. The other complaint discussed in the thread, regarding "Why don't compound statement keywords and their trailing colon count as parentheses for purposes of ignoring line breaks?" has to do with a mix of implementation simplicity and error quality. Pairing up "if"/":", "with"/":", "for"/":" etc would certainly be possible, but may result in the infamous "missing semi-colon" style of C syntax error (or missing paren style of Lisp error), where the fault may be reported well away from the missing character, or with an error that is extremely hard for a beginner to translate into "you left out a character here". Given the likely detrimental effect on error quality, and the ability to use actual parens or backslashes for line continuation. The Design FAQ and Programming FAQ are intended to be the repository for answers to this kind of question. Addition of new questions and answers is handled like any other patch: via the tracker (and some of the existing answers could likely do with updates as well). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From phd at phdru.name Wed Jan 2 00:49:16 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 03:49:16 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130101225505.757540fa@pitrou.net> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> Message-ID: <20130101234916.GA672@iskra.aviel.ru> Hi! On Tue, Jan 01, 2013 at 10:55:05PM +0100, Antoine Pitrou wrote: > On Mon, 31 Dec 2012 04:00:12 +0400 > Oleg Broytman wrote: > > On Sun, Dec 30, 2012 at 11:20:34PM +0100, Victor Stinner wrote: > > > If I understood correctly, you would like to list some specific issues > > > like print() not flushing immediatly stdout if you ask to not write a > > > newline (print "a", in Python 2 or print("a", end=" ") in Python 3). > > > If I understood correctly, and if you want to improve Python, you > > > should help the documentation project. Or if you can build a website > > > listing such issues *and listing solutions* > > > > -- things that don't exist where they should (but the core team object > > or they are hard to implement or something); > > -- things that exist where they shouldn't; they are hard to fix because > > removing them would break backward compatibility; > > -- things that are implemented in strange, inconsistent ways. > > > > A few examples: > > [snip] > > The problem is you are listing examples which *in your opinion* are > issues with Python. Other people would have different ideas of what is > an issue and what is not. This can't be the right methodology if we > want to write a piece of Python docs. Absolutely not. I collected the list of examples in reply to a question "what are warts and why one cannot just document solutions?" I hope I managed to show that warts are built (or unbuilt, so to say) so deep in Python and the stdlib design it's impossible to fix them with code or documentation. Fixing them require major design changes. > Only things which are *well-known* annoyances can qualify. Well, some warts are quite well-known. My counter overflows when I try to count how many times anonymous code blocks have been proposed and rejected. IIRC Mr. van Rossum admitted that for/else was a design mistake. One wart is being worked on right now: async libs redesign. > I also disagree that missing features are "warts"; they are just > missing features, not something unpleasant that's difficult to get rid > of. Some of those missing features are near to impossible to get rid of. The idea of anonymous code blocks is rejected constantly so no one would dare to create a patch. As for their unpleasantness -- it's in the eye of the beholder, of course. I'm not going to fight tooth and nail for my vision. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From steve at pearwood.info Wed Jan 2 03:00:49 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 13:00:49 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> Message-ID: <50E394D1.5060908@pearwood.info> On 02/01/13 10:46, Chris Angelico wrote: > PHP's scoping rules are simpler than Python's. A variable inside a > function is local unless it's explicitly declared global; function > names are global. (That's not the whole set of rules, but close enough > for this argument.) Python, on the other hand, adds the oddity that a > name referenced inside a function is global unless, somewhere in that > function, it's assigned to. As given, comparing only treatment of locals and globals, I don't agree that this makes PHP's scoping rules simpler. PHP: if the name refers to a function: - the name is always global; otherwise: - the name is local unless explicitly declared global. Python: if the name is declared global: - the name is always global; otherwise: - the name is global unless implicitly declared local. (Implicitly local means "the name is bound to somewhere in the body of the function".) Of course, in reality Python includes further complexity: closures and nonlocal, neither of which are available in PHP due to the lack of local functions: http://gadgetopia.com/post/4089 PHP is simpler because it does less. -- Steven From phd at phdru.name Wed Jan 2 04:08:51 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 07:08:51 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <20130102030851.GA11279@iskra.aviel.ru> On Tue, Jan 01, 2013 at 04:54:34PM -0800, alex23 wrote: > The pain people are experiencing with "warts" > like mutable defaults is entirely from trying to force Python to fit > mental models they've constructed of other languages. Yes. And preserving this mental model is important. There is a common mental model for similar imperative languages, common set of built-in types (chars, strings, integers, floats) and containers (arrays and matrices), common set of operations (addition is always spelled as infix 'plus' sign, logical AND -- as '&' or '&&'); there are functions with parameters -- usually written inside round parentheses; in object-oriented languages there are classes with inheritance... So it's perfectly natural when people using one language expect features found in other languages, and expect those features to work in similar ways. Sure, every particular language deviate from that common model. Often people can tolerate the deviation, sometimes they even praise it for some reasons. But when a deviation makes pain for too many developers -- there is certainly a problem. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Wed Jan 2 04:12:01 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 07:12:01 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <20130102031201.GB11279@iskra.aviel.ru> On Wed, Jan 02, 2013 at 11:34:47AM +1100, Chris Angelico wrote: > Whichever way mutable objects in default args are handled, there are > going to be strangenesses. Therefore the best thing to do is (almost > certainly) the simplest. And the simples thing would be... let me think... forbid mutable defaults altogether? Or may be make them read-only? Current implementation is the simplest from the implementation point of view, but it requires additional documentation, especially for novice users. Is it really the simplest? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From phd at phdru.name Wed Jan 2 04:16:16 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 07:16:16 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <20130102031616.GC11279@iskra.aviel.ru> On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan wrote: > Mutable default arguments make perfect sense once you > understand the difference between compile time, definition time and > execution time for a function. Defaults are evaluated at definition > time, thus they are necessarily shared across all invocations of the > function. I.e., users have to understand the current implementation. Mutable defaults are not a language design choice, they are dictated by the implementation, right? Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Wed Jan 2 04:23:42 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jan 2013 13:23:42 +1000 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130101234916.GA672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> Message-ID: On Wed, Jan 2, 2013 at 9:49 AM, Oleg Broytman wrote: >> I also disagree that missing features are "warts"; they are just >> missing features, not something unpleasant that's difficult to get rid >> of. > > Some of those missing features are near to impossible to get rid of. > The idea of anonymous code blocks is rejected constantly so no one would > dare to create a patch. > As for their unpleasantness -- it's in the eye of the beholder, of > course. I'm not going to fight tooth and nail for my vision. This is why the "wart" term in an inherently bad choice: it polarises disputes, and creates arguments where none needs to exist. If you instead split them into "hard problems" and "traps for the unwary", it's easier to have a more rational discussion and come up with a shared list. Interoperable asynchronous IO is an inherently hard problem - Guido's probably the only person in the world capable of gathering sufficient interest from the right people to come up with a solution that the existing async frameworks will be willing to support. Packaging and software distribution is an inherently hard problem (all current packaging systems suck, with even the best of them being either language or platform specific), made even harder in the Python case by the presence of an existing 90% solution in setuptools. Anonymous blocks in a language with a strong statement/expression dichotomy is an inherently hard problem (hence the existence of not one but two deferred PEPs on the topic: PEP 403 and 3150) Switch statements in a language without compile time named constants are an inherently hard problem, and some of the demand for this construct is reduced due to the availability of higher-order programming features (i.e. dynamic dispatch to stored callables) (hence the rejected PEPs 275 and 3103) A do/until loop has the problem of coming up with an elegant syntax that is demonstrably superior to while/if/break (hence the deferred PEP 315) The design space for things that Python *could* do is unimaginably vast. The number of changes we can make that won't have the net effect of making the language worse is vanishingly small by comparison. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Jan 2 04:25:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jan 2013 13:25:35 +1000 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102031616.GC11279@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> Message-ID: On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman wrote: > On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan wrote: >> Mutable default arguments make perfect sense once you >> understand the difference between compile time, definition time and >> execution time for a function. Defaults are evaluated at definition >> time, thus they are necessarily shared across all invocations of the >> function. > > I.e., users have to understand the current implementation. Mutable > defaults are not a language design choice, they are dictated by the > implementation, right? No, they're not an implementation accident, they're part of the language design. It's OK if you don't like them, but please stop claiming they're a CPython implementation artifact. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From rosuav at gmail.com Wed Jan 2 05:22:40 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 15:22:40 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <50E3859E.8030003@pearwood.info> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <50E3859E.8030003@pearwood.info> Message-ID: On Wed, Jan 2, 2013 at 11:55 AM, Steven D'Aprano wrote: > * in Python, bindings ALWAYS occur when the code is executed; > > * in Python, "x=" is a binding; > > * even inside a function definition; Hey, that's a cool way of looking at it! I never thought of it that way. So default arguments are simply assigned to right back at function definition time, even though they're locals. Neat! ChrisA From greg.ewing at canterbury.ac.nz Wed Jan 2 05:25:13 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 02 Jan 2013 17:25:13 +1300 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <50E3B6A9.70500@canterbury.ac.nz> alex23 wrote: > The "internal sense" of mutable defaults is that everything is an > object, and that functions arguments are declared at definition and > not run-time. What you call "implementation artifact" I see as > expected behaviour; any other implementation that didn't provide this > wouldn't be Python in a number of fundamental ways. What the people who object to this behaviour are really complaining about is not that the default value is mutable, but that the default expression is not re-evaluated on every call. To me, the justification for this is clear: most of the time, evaluation on every call is not necessary, so doing it would be needlessly inefficient. For those cases where you need a fresh value each time, there is a straightforward way to get it. -- Greg From rosuav at gmail.com Wed Jan 2 05:27:07 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 15:27:07 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: On Wed, Jan 2, 2013 at 12:07 PM, Nick Coghlan wrote: > FWIW, I prefer the term "traps for the unwary" over "warts", since > it's less judgmental and better covers the goal of issues for people > which can cause problems with learning the language. Sure. I prefer a shorter keyword-like name, but I think we're talking about the same thing here. ChrisA From rosuav at gmail.com Wed Jan 2 05:35:39 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 15:35:39 +1100 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <50E394D1.5060908@pearwood.info> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <50E394D1.5060908@pearwood.info> Message-ID: On Wed, Jan 2, 2013 at 1:00 PM, Steven D'Aprano wrote: > On 02/01/13 10:46, Chris Angelico wrote: > >> PHP's scoping rules are simpler than Python's. A variable inside a >> function is local unless it's explicitly declared global; function >> names are global. (That's not the whole set of rules, but close enough >> for this argument.) Python, on the other hand, adds the oddity that a >> name referenced inside a function is global unless, somewhere in that >> function, it's assigned to. > > > > As given, comparing only treatment of locals and globals, I don't agree > that this makes PHP's scoping rules simpler. > > PHP: > if the name refers to a function: > - the name is always global; Not quite. Python has the concept of "names" which might be bound to function objects, or might be bound to simple integers. PHP has two completely separate namespaces. The variable $foo and the function foo() don't collide, so this isn't a rule that governs where the name "foo" is looked up. Python has no such distinction, so code like this does exactly what you would expect: def foo(): pass bar = foo def quux(): bar() # No assignment in the function, so look for a global name 'bar'. > PHP is simpler because it does less. Right. And the rules of a Turing tarpit like Ook are even simpler. Further proof that design warts are not, in and of themselves, necessarily bad. ChrisA From solipsis at pitrou.net Wed Jan 2 08:29:01 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 2 Jan 2013 08:29:01 +0100 Subject: [Python-ideas] Documenting Python warts References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> Message-ID: <20130102082901.2d6a4a63@pitrou.net> On Wed, 2 Jan 2013 13:25:35 +1000 Nick Coghlan wrote: > On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman wrote: > > On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan wrote: > >> Mutable default arguments make perfect sense once you > >> understand the difference between compile time, definition time and > >> execution time for a function. Defaults are evaluated at definition > >> time, thus they are necessarily shared across all invocations of the > >> function. > > > > I.e., users have to understand the current implementation. Mutable > > defaults are not a language design choice, they are dictated by the > > implementation, right? > > No, they're not an implementation accident, they're part of the > language design. It's OK if you don't like them, but please stop > claiming they're a CPython implementation artifact. Let's call them a compromise then, but calling them a language feature sounds delusional. I can't remember ever taking advantage of the fact that mutable default arguments are shared accross function invocations. Regards Antoine. From solipsis at pitrou.net Wed Jan 2 08:35:02 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 2 Jan 2013 08:35:02 +0100 Subject: [Python-ideas] Documenting Python warts References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> Message-ID: <20130102083502.7a62bf53@pitrou.net> On Wed, 2 Jan 2013 03:49:16 +0400 Oleg Broytman wrote: > > > > The problem is you are listing examples which *in your opinion* are > > issues with Python. Other people would have different ideas of what is > > an issue and what is not. This can't be the right methodology if we > > want to write a piece of Python docs. > > Absolutely not. I collected the list of examples in reply to a > question "what are warts and why one cannot just document solutions?" I > hope I managed to show that warts are built (or unbuilt, so to say) so > deep in Python and the stdlib design it's impossible to fix them with > code or documentation. Now please stop FUDding. It is outrageous to claim that missing features are "impossible to fix with code or documentation". If you come with a reasonable syntax for anonymous code blocks (and have a patch to back that up), I'm sure they would be accepted. If you can't or don't want to, then you can't accuse our community of being biased against anonymous code blocks. Regards Antoine. From mwm at mired.org Wed Jan 2 07:04:24 2013 From: mwm at mired.org (Mike Meyer) Date: Wed, 02 Jan 2013 00:04:24 -0600 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130101234916.GA672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> Message-ID: Oleg Broytman wrote: > Well, some warts are quite well-known. My counter overflows when I >try to count how many times anonymous code blocks have been proposed >and rejected. > IIRC Mr. van Rossum admitted that for/else was a design mistake. As I recall it, that wasn't because they were a bad idea per se, but because the minor upside they provide isn't worth the confusion they create for newcomers. But since we're referencing the BDFL, IIRC he isn't against anonymous code blocks (and I believe that is by far the most proposed/requested feature) per se. The proposals all seem to fail in one of three ways: 1) embedding them in expressions when indentation denotes block structure just invites unreadable code; 2) putting them in a separate block requires a name, and we already have def if the programmer provides it; or 3) providing an implicit name for a separate block isn't enough of a win to violate "explicit is better than implicit". There have been some let/where type suggestions, but those are more about namespaces than anonymous code blocks. -- Sent from my Android tablet with K-9 Mail. Please excuse my swyping. From rosuav at gmail.com Wed Jan 2 09:12:25 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 19:12:25 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102082901.2d6a4a63@pitrou.net> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> Message-ID: On Wed, Jan 2, 2013 at 6:29 PM, Antoine Pitrou wrote: > Let's call them a compromise then, but calling them a language feature > sounds delusional. I can't remember ever taking advantage of the fact > that mutable default arguments are shared accross function invocations. One common use is caching, as I mentioned earlier (with a contrived example). Another huge benefit is efficiency - construct a heavy object once and keep using it. There are others. It's a feature that can bite people, but no less a feature for that. ChrisA From tjreedy at udel.edu Wed Jan 2 09:54:43 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 02 Jan 2013 03:54:43 -0500 Subject: [Python-ideas] Documenting Python warts on Stack Overflow In-Reply-To: <20121231000012.GA10426@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> Message-ID: On 12/30/2012 7:00 PM, Oleg Broytman wrote: >> A list of such issue without solution doesn't help anyone. > > I cannot say for Anatoly but for me warts are: Another list that to me is off-topic for this list. Go to python-list, which is meant for such things. If you have a (one) specific idea for improving (c)python, that is not an energy sucking rehash of rejected ideas, then post is. -- Terry Jan Reedy From tjreedy at udel.edu Wed Jan 2 09:58:44 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 02 Jan 2013 03:58:44 -0500 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102000113.GB672@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: On 1/1/2013 7:01 PM, Oleg Broytman wrote: > What about warts that don't have internal sense? Mutable default > parameters are just artifacts of the implementation. What is their > "internal sense"? This has been discussed (asked and answered) several times on python-list. -- Terry Jan Reedy From steve at pearwood.info Wed Jan 2 10:23:26 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 20:23:26 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <50E3FC8E.4080803@pearwood.info> On 02/01/13 15:27, Chris Angelico wrote: > On Wed, Jan 2, 2013 at 12:07 PM, Nick Coghlan wrote: >> FWIW, I prefer the term "traps for the unwary" over "warts", since >> it's less judgmental and better covers the goal of issues for people >> which can cause problems with learning the language. > > Sure. I prefer a shorter keyword-like name, but I think we're talking > about the same thing here. "Gotcha". Actually I prefer to distinguish between gotchas and warts. A gotcha is something that makes sense and even has a use, but can still surprise those who aren't expecting it. (E.g. mutable defaults.) A wart is something that has no use, but can't (easily, or at all) be removed. Example: t = (None, [], None) t[1] += [0] Even though the list is successfully modified, the operation still fails with an exception. -- Steven From steve at pearwood.info Wed Jan 2 10:27:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 20:27:51 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> Message-ID: <50E3FD97.5020905@pearwood.info> On 02/01/13 17:04, Mike Meyer wrote: > > > Oleg Broytman wrote: >> Well, some warts are quite well-known. My counter overflows when I >> try to count how many times anonymous code blocks have been proposed >> and rejected. >> IIRC Mr. van Rossum admitted that for/else was a design mistake. > > As I recall it, that wasn't because they were a bad idea per se, but >because the minor upside they provide isn't worth the confusion they >create for newcomers. There would be a lot less confusion if they weren't called "else". Even now, I have to explicitly remind myself that the else block doesn't run if the for loop is empty, but *after* the for block. # Python 4000 proposal: for x in seq: ... then: # this is skipped by a break else: # this runs only if seq is empty -- Steven From steve at pearwood.info Wed Jan 2 10:31:54 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 20:31:54 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102082901.2d6a4a63@pitrou.net> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> Message-ID: <50E3FE8A.4020203@pearwood.info> On 02/01/13 18:29, Antoine Pitrou wrote: > On Wed, 2 Jan 2013 13:25:35 +1000 > Nick Coghlan wrote: >> On Wed, Jan 2, 2013 at 1:16 PM, Oleg Broytman wrote: >>> On Wed, Jan 02, 2013 at 11:07:58AM +1000, Nick Coghlan wrote: >>>> Mutable default arguments make perfect sense once you >>>> understand the difference between compile time, definition time and >>>> execution time for a function. Defaults are evaluated at definition >>>> time, thus they are necessarily shared across all invocations of the >>>> function. >>> >>> I.e., users have to understand the current implementation. Mutable >>> defaults are not a language design choice, they are dictated by the >>> implementation, right? >> >> No, they're not an implementation accident, they're part of the >> language design. It's OK if you don't like them, but please stop >> claiming they're a CPython implementation artifact. > > Let's call them a compromise then, but calling them a language feature > sounds delusional. I can't remember ever taking advantage of the fact > that mutable default arguments are shared accross function invocations. I've never taken advantage of multiprocessing. Does that mean that it is "delusional" to call multiprocessing a feature? On the other hand, I have made use of early binding of function defaults, and consider it a good feature of the language. Early binding is not just for mutable defaults. -- Steven From rosuav at gmail.com Wed Jan 2 10:37:51 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 20:37:51 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <50E3FD97.5020905@pearwood.info> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> <50E3FD97.5020905@pearwood.info> Message-ID: On Wed, Jan 2, 2013 at 8:27 PM, Steven D'Aprano wrote: > There would be a lot less confusion if they weren't called "else". Even > now, I have to explicitly remind myself that the else block doesn't > run if the for loop is empty, but *after* the for block. > > # Python 4000 proposal: > for x in seq: > ... > then: > # this is skipped by a break > else: > # this runs only if seq is empty Calling it "else" makes perfect sense if you're searching for something. for x in lst: if x.is_what_we_want(): break else: x=thing() lst.append(x) ChrisA From steve at pearwood.info Wed Jan 2 10:49:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 02 Jan 2013 20:49:32 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> <50E3FD97.5020905@pearwood.info> Message-ID: <50E402AC.1090006@pearwood.info> On 02/01/13 20:37, Chris Angelico wrote: > On Wed, Jan 2, 2013 at 8:27 PM, Steven D'Aprano wrote: >> There would be a lot less confusion if they weren't called "else". Even >> now, I have to explicitly remind myself that the else block doesn't >> run if the for loop is empty, but *after* the for block. >> >> # Python 4000 proposal: >> for x in seq: >> ... >> then: >> # this is skipped by a break >> else: >> # this runs only if seq is empty > > Calling it "else" makes perfect sense if you're searching for something. > > for x in lst: > if x.is_what_we_want(): break > else: > x=thing() > lst.append(x) Not really. The "else" doesn't match the "if", it matches the "for". That's the problem really. Besides, your example is insufficiently general. You can't assume that the "else" immediately follows the "if", let alone the correct if. for x in lst: if x.is_what_we_want(): break do_something() and_another_thing() if today is Tuesday: print("we must be in Belgium") else: x = thing() lst.append(x) So at best it makes *imperfect* sense, sometimes. -- Steven From ben+python at benfinney.id.au Wed Jan 2 10:58:07 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Wed, 02 Jan 2013 20:58:07 +1100 Subject: [Python-ideas] Documenting Python warts References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> Message-ID: <7w4nizucsw.fsf@benfinney.id.au> Nick Coghlan writes: > FWIW, I prefer the term "traps for the unwary" over "warts", since > it's less judgmental and better covers the goal of issues for people > which can cause problems with learning the language. I limit my use of ?wart? to traps for the unwary which are acknowledged by most core developers to have been a sub-optimal design decision. They are things one needs to know about Python, the language, which if the designers had their druthers would not have been such a trap ? but now we're stuck with them for backward compatibility or lack of a feasible better design, etc. In other words, I don't call it a ?wart? unless the core developers agree with me that it's a wart :-) -- \ ?There is no reason anyone would want a computer in their | `\ home.? ?Ken Olson, president, chairman and founder of Digital | _o__) Equipment Corp., 1977 | Ben Finney From rosuav at gmail.com Wed Jan 2 11:01:57 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 2 Jan 2013 21:01:57 +1100 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <50E402AC.1090006@pearwood.info> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <20130101234916.GA672@iskra.aviel.ru> <50E3FD97.5020905@pearwood.info> <50E402AC.1090006@pearwood.info> Message-ID: On Wed, Jan 2, 2013 at 8:49 PM, Steven D'Aprano wrote: > On 02/01/13 20:37, Chris Angelico wrote: >> Calling it "else" makes perfect sense if you're searching for something. >> >> for x in lst: >> if x.is_what_we_want(): break >> else: >> x=thing() >> lst.append(x) > > > Not really. The "else" doesn't match the "if", it matches the "for". That's > the problem really. Besides, your example is insufficiently general. You > can't > assume that the "else" immediately follows the "if", let alone the correct > if. > > > > for x in lst: > if x.is_what_we_want(): > break > do_something() > and_another_thing() > if today is Tuesday: > print("we must be in Belgium") > else: > x = thing() > lst.append(x) > > > So at best it makes *imperfect* sense, sometimes. Thinking functionally, the for loop is searching for an element in the list. It'll either find something (and break) or not find anything (and raise StopIteration). If it finds something, do stuff and break, else do other stuff. The "else" of the logic corresponds to the "else:" clause. Not saying it's always right, but it does at least make some sense in that particular application, which is a reasonably common one. I've coded exactly that logic in C++, using a goto to do a "break and skip the else clause" (with a comment to the effect that I'd rather be writing Python...). ChrisA From wuwei23 at gmail.com Wed Jan 2 11:59:39 2013 From: wuwei23 at gmail.com (alex23) Date: Wed, 2 Jan 2013 02:59:39 -0800 (PST) Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102030851.GA11279@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102030851.GA11279@iskra.aviel.ru> Message-ID: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> On Jan 2, 1:08?pm, Oleg Broytman wrote: > ? ?So it's perfectly natural when people using one language expect > features found in other languages, and expect those features to work in > similar ways. I think anyone coming from one language to another expecting the latter to be just like the first is either an inexperienced or a bad programmer. There is no way you can make Python fit either the call by reference or call by value models, although people regularly try, and the attempt is always painful & torturous to watch. So already Python has "deviated" drastically from the base expectations of most (generally static-type lang'd) programmers. Is this a problem, or is this one of the fundamental design decisions of Python that makes it appealing? (For me, not having to deal with either of the call by reference or value models is one of the main reasons I prefer to work with Python.) The Lisp/Scheme community might take exception over claims that addition is "always" an infix operation as well. > Often > people can tolerate the deviation, sometimes they even praise it for > some reasons. I don't really follow what you're trying to say here. I'm not "tolerating" any "deviations" in Python, I'm actively using it because I prefer its entire design. If anything, I'm choosing it _because_ it deviates from other language's approaches. What you seem to be advocating is that all languages be nothing more than syntactic sugar for the same underlying model. In that case, what advantage is there in having any language other than some baseline accepted one, like C? From wuwei23 at gmail.com Wed Jan 2 12:05:39 2013 From: wuwei23 at gmail.com (alex23) Date: Wed, 2 Jan 2013 03:05:39 -0800 (PST) Subject: [Python-ideas] Documenting Python warts In-Reply-To: <50E3B6A9.70500@canterbury.ac.nz> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <50E3B6A9.70500@canterbury.ac.nz> Message-ID: <9864e14d-84ac-42a6-aaf6-ff263daa3426@po6g2000pbb.googlegroups.com> On Jan 2, 2:25?pm, Greg Ewing wrote: > What the people who object to this behaviour are really > complaining about is not that the default value is mutable, > but that the default expression is not re-evaluated on > every call. Sorry, I should have said "mutable arguments" over "defaults", because the problem also bites people passing mutable objects to functions and expecting them to be copied. > To me, the justification for this is clear: most of the > time, evaluation on every call is not necessary, so doing > it would be needlessly inefficient. For those cases where > you need a fresh value each time, there is a straightforward > way to get it. Absolutely agreed. I have deliberately used this behaviour on a number of occasions in ways that I believe makes my code clearer, so it always frustrates me to hear it described as a "wart". From wuwei23 at gmail.com Wed Jan 2 12:08:07 2013 From: wuwei23 at gmail.com (alex23) Date: Wed, 2 Jan 2013 03:08:07 -0800 (PST) Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102082901.2d6a4a63@pitrou.net> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> Message-ID: <48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com> On Jan 2, 5:29?pm, Antoine Pitrou wrote: > Let's call them a compromise then, but calling them a language feature > sounds delusional. I can't remember ever taking advantage of the fact > that mutable default arguments are shared accross function invocations. I'd say it's slightly more delusional to believe that if _you_ haven't used a language feature, that it's not a "feature". From hernan.grecco at gmail.com Wed Jan 2 12:20:50 2013 From: hernan.grecco at gmail.com (Hernan Grecco) Date: Wed, 2 Jan 2013 12:20:50 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: <50E142FF.3070101@drees.name> References: <50E083BA.7000603@nedbatchelder.com> <50E142FF.3070101@drees.name> Message-ID: Hi, Thanks for all the feedback. I was hacking the sphinx indexer and the javacript searchtool today. I think the search results can be improved by patching sphinx upstream and adding a small project dependent (in this case Python) javascript snippet. I have created a proposal in the Sphinx Issue tracker [0]. Let's move the discussion there. best, Hernan [0] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees wrote: > On 30.12.12 20:45, Georg Brandl wrote: >> >> On 12/30/2012 07:11 PM, Ned Batchelder wrote: >>> >>> On 12/30/2012 12:54 PM, Hernan Grecco wrote: >>>> >>>> ... >>>> >>>> I have seen many people new to Python stumbling while using the Python >>>> docs due to the order of the search results. >>>> ... >>>> >>>> So my suggestion is to put the builtins first, the rest of the >>>> standard lib later including HowTos, FAQ, etc and finally the >>>> c-modules. Additionally, a section with a title matching exactly the >>>> search query should come first. (I am not sure if the last suggestion >>>> belongs in python-ideas or in >>>> the sphinx mailing list, please advice) >>> >>> >>> While we're on the topic, why in this day and age do we have a custom >>> search? Using google site search would be faster for the user, and more >>> accurate. >> >> >> I agree. Someone needs to propose a patch though. >> ... > > > a custom search in itself is a wonderful thing. To me it also shows more > appreciation of visitor concerns than thoses sites, that are just _offering_ > google site search (which is accessible anyway to every visitor capable of > memorizing the google or bing or whatnot URL). > > I second Hernans suggestion about ordering and also his question where the > request (and patches) should be directed to. > > All the best, > Stefan. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From phd at phdru.name Wed Jan 2 12:35:45 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 15:35:45 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102030851.GA11279@iskra.aviel.ru> <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> Message-ID: <20130102113545.GA23780@iskra.aviel.ru> On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 wrote: > What you seem to be > advocating is that all languages be nothing more than syntactic sugar > for the same underlying model. Yes, von Neumann architecture. > In that case, what advantage is there > in having any language other than some baseline accepted one, like C? So now we know why C is still the most popular language. Other languages have their advantages, though. Their syntactic sugar is sweeter or have different tastes. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Wed Jan 2 12:41:46 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jan 2013 21:41:46 +1000 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence Message-ID: Gah, the PEP number in the subject should, of course, be 432 (not 342). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shane at umbrellacode.com Wed Jan 2 12:52:27 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 2 Jan 2013 03:52:27 -0800 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> <48801f96-3f54-4821-837c-5156e32169f2@ui9g2000pbc.googlegroups.com> Message-ID: <03A4CAE9-4FE5-45A3-8EC9-A72BD4915985@umbrellacode.com> RE: > I can't remember ever taking advantage of the fact > that mutable default arguments are shared accross function invocations. Can you remember taking advantage of the fact Python is logical, consistent, and elegant? I tend to think its lack of syntactic sugar and exceptions set it apart. Although there are sometimes things that could bite you, there's a lot of value in having those things be perfectly predictable, like having default argument values evaluated once, when the function declaration is evaluated. To do it any other way would introduce an unnecessary "except when" into the explanation of Python. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 2, 2013, at 3:08 AM, alex23 wrote: > On Jan 2, 5:29 pm, Antoine Pitrou wrote: >> Let's call them a compromise then, but calling them a language feature >> sounds delusional. I can't remember ever taking advantage of the fact >> that mutable default arguments are shared accross function invocations. > > I'd say it's slightly more delusional to believe that if _you_ haven't > used a language feature, that it's not a "feature". > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From maxmoroz at gmail.com Wed Jan 2 13:06:14 2013 From: maxmoroz at gmail.com (Max Moroz) Date: Wed, 2 Jan 2013 04:06:14 -0800 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: On Mon, Dec 31, 2012 at 7:22 PM, Gregory P. Smith wrote: > Within CPython the way the C API is today it is too late by the time the > code to raise a MemoryError has been called so capturing all places that > could occur is not easy. > Implementing this at the C level malloc later makes > more sense. Have it dip into a reserved low memory pool to satisfy the > current request and send the process a signal indicating it is running low. > This approach would also work with C extension modules or an embedded > Python. Regarding the C malloc solution, wouldn't a callback be preferable to a signal? If I understood you correctly, signal implies that a different thread will handle it. At any reasonable size of the emergency memory pool is, there will be situations when the next memory allocation is greater than that size, leading to the very same problem you described later in your message when you talked about the disadvantage of polling. In addition, if the signal processing happens a bit slow (perhaps simply due to the thread scheduler being slow to switch), by the time enough memory is released, it may be too late - the next memory allocation may have already come in. Unless I'm missing something, the (synchronous) callback seems to be a strictly better than the (asynchronous) signal. As to your main point that this functionality should be inside C malloc rather than pymalloc, I agree, but only if the objective is to provide an all-purpose, highly general "low memory condition" handling. (I'm not sure if malloc knows enough about the OS to define "low memory condition" well; but it's certain that pymalloc doesn't). But I was going for a more modest goal. Rather than be warned of the pending for MemoryError exception, a developer could simply be notified via callback when the maximum absolute memory used by his app exceeds a certain limit. pymalloc could very easily call back a designated function when when the next memory allocation exceeds this threshold. In many real-life situations, it's not that hard to estimate how much RAM the application should be allowed to consume. Sure, the developer would need to learn a little about the platforms his app is running on, and use OS-specific rules to set the memory limit, but that effort is modest, and the payoff is huge. Not to mention, a developer with a particularly technically savvy end users could even skip this work entirely by letting his end users set the memory limit per-session. There is a huge advantage of the pymalloc solution (with the set memory limit) vs. the C malloc solution (with the generic low memory condition). On my system, I don't want the application to use (almost) all the available memory before it starts to manage its cache. In fact, by the time the physical memory use approaches my total physical RAM, the system slows down considerably as many other applications get swapped to disk by the OS. With a set memory limit, I can provide a much more granular control over the memory used by the application. Of course, the set memory limit could also be implemented inside C malloc rather than inside pymalloc. But this requires that developers rewrite C runtime's memory manager on every platform, and then recompile their Python with it. The changes to pymalloc, on the other hand, would be relatively small. > I'd expect this already exists but I haven't looked for one. All I found is this comment in XEmacs documentation about vm-limit.c: http://www.xemacs.org/Documentation/21.5/html/internals_17.html, but I'm not sure if it's XEmacs feature or if malloc itself supports it. > Having a thread polling memory use it not generally wise as that is polling > rather than event driven and could easily miss low memory situations before > it is too late and a failure has already happened (allocation demand can > come in large spikes depending on the application). Precisely. That's the problem with the best existing solutions (e.g., http://stackoverflow.com/a/7332782/336527). > OSes running processes in constrained environments or ones where the > resources available can be reduced by the OS later may already send their > own warning signals prior to outright killing the process but that should > not preclude an application being able to monitor and constrain itself on > its own without needing the OS to do it. I was thinking about regular desktop OS, which certainly doesn't warn the process sufficiently in advance. The MemoryError exception basically tells the process that it's going to die soon, and there's nothing it can do about it. Max From stefan at drees.name Wed Jan 2 13:37:35 2013 From: stefan at drees.name (Stefan Drees) Date: Wed, 02 Jan 2013 13:37:35 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: <50E083BA.7000603@nedbatchelder.com> <50E142FF.3070101@drees.name> Message-ID: <50E42A0F.2040908@drees.name> Hi hernan, On 02.01.13 12:20, Hernan Grecco wrote: > ... Thanks for all the feedback. I was hacking the sphinx indexer and the > javacript searchtool today. I think the search results can be improved > by patching sphinx upstream and adding a small project dependent (in > this case Python) javascript snippet. I have created a proposal in the > Sphinx Issue tracker [0]. Let's move the discussion there. > ... > [0] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results thanks a lot for transforming the mail thread to improve the local search facility into real code suggestions. I commented on a first snippet from your suggested patch there. All the best, Stefan. Further historic details: > > On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees wrote: >> On 30.12.12 20:45, Georg Brandl wrote: >>> On 12/30/2012 07:11 PM, Ned Batchelder wrote: >>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote: >>>>> ... >>>>> >>>>> I have seen many people new to Python stumbling while using the Python >>>>> docs due to the order of the search results. >>>>> ... >>>>> >>>>> So my suggestion is to put the builtins first, the rest of the >>>>> standard lib later including HowTos, FAQ, etc and finally the >>>>> c-modules. Additionally, a section with a title matching exactly the >>>>> search query should come first. (I am not sure if the last suggestion >>>>> belongs in python-ideas or in >>>>> the sphinx mailing list, please advice) >>>> >>>> >>>> While we're on the topic, why in this day and age do we have a custom >>>> search? Using google site search would be faster for the user, and more >>>> accurate. >>> >>> >>> I agree. Someone needs to propose a patch though. >>> ... >> >> >> a custom search in itself is a wonderful thing. To me it also shows more >> appreciation of visitor concerns than thoses sites, that are just _offering_ >> google site search (which is accessible anyway to every visitor capable of >> memorizing the google or bing or whatnot URL). >> >> I second Hernans suggestion about ordering and also his question where the >> request (and patches) should be directed to. >> ... From phd at phdru.name Wed Jan 2 14:39:21 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 2 Jan 2013 17:39:21 +0400 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102072928.52867fd3@bhuda.mired.org> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102030851.GA11279@iskra.aviel.ru> <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> <20130102113545.GA23780@iskra.aviel.ru> <20130102072928.52867fd3@bhuda.mired.org> Message-ID: <20130102133921.GA25253@iskra.aviel.ru> On Wed, Jan 02, 2013 at 07:29:28AM -0600, Mike Meyer wrote: > On Wed, 2 Jan 2013 15:35:45 +0400 > Oleg Broytman wrote: > > On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 wrote: > > > What you seem to be > > > advocating is that all languages be nothing more than syntactic sugar > > > for the same underlying model. > > Yes, von Neumann architecture. > > So all the differences between FORTRAN II, PROLOG and Python are > syntactic sugar? I guess that makes preference in programming > languages just a matter of taste. In the original message I used the word "imperative". I am crawling off of the discussion to my cave. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Wed Jan 2 14:47:16 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 2 Jan 2013 14:47:16 +0100 Subject: [Python-ideas] Documenting Python warts References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> <50E3FE8A.4020203@pearwood.info> Message-ID: <20130102144716.0fffa7eb@pitrou.net> Le Wed, 02 Jan 2013 20:31:54 +1100, Steven D'Aprano a ?crit : > >>> > >>> I.e., users have to understand the current implementation. > >>> Mutable defaults are not a language design choice, they are > >>> dictated by the implementation, right? > >> > >> No, they're not an implementation accident, they're part of the > >> language design. It's OK if you don't like them, but please stop > >> claiming they're a CPython implementation artifact. > > > > Let's call them a compromise then, but calling them a language > > feature sounds delusional. I can't remember ever taking advantage > > of the fact that mutable default arguments are shared accross > > function invocations. > > I've never taken advantage of multiprocessing. Does that mean that it > is "delusional" to call multiprocessing a feature? multiprocessing fills a definite use case (and quite an important one). Early binding of function arguments fills no use case that cannot also be filled using a private global, a closure, or a class or function attribute; at best it only saves one or two lines of typing. Regards Antoine. From storchaka at gmail.com Wed Jan 2 15:01:29 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 02 Jan 2013 16:01:29 +0200 Subject: [Python-ideas] Identity dicts and sets Message-ID: I propose to add new standard collection types: IdentityDict and IdentitySet. They are almost same as ordinal dict and set, but uses identity check instead of equality check (and id() or hash(id()) as a hash). They will be useful for pickling, for implementing __sizeof__() for compound types, and for other graph algorithms. Of course, they can be implemented using ordinal dicts: IdentityDict: key -> value as a dict: id(key) -> (key, value) IdentitySet as a dict: id(value) -> value However implementing them directly in the core has advantages, it consumes less memory and time, and more comfortable for use from C. IdentityDict and IdentitySet implementations will share almost all code with implementations of ordinal dict and set, only lookup function and metainformation will be different. However dict and set already use a lookup function overloading. From fuzzyman at gmail.com Wed Jan 2 14:58:16 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 2 Jan 2013 13:58:16 +0000 Subject: [Python-ideas] Documenting Python warts In-Reply-To: References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102031616.GC11279@iskra.aviel.ru> <20130102082901.2d6a4a63@pitrou.net> Message-ID: On 2 January 2013 08:12, Chris Angelico wrote: > On Wed, Jan 2, 2013 at 6:29 PM, Antoine Pitrou > wrote: > > Let's call them a compromise then, but calling them a language feature > > sounds delusional. I can't remember ever taking advantage of the fact > > that mutable default arguments are shared accross function invocations. > > One common use is caching, as I mentioned earlier (with a contrived > example). Another huge benefit is efficiency - construct a heavy > object once and keep using it. There are others. > > It's a feature that can bite people, but no less a feature for that. > A further (and important) use case is introspection. If default values were only added at call time (rather than definition time) then you couldn't introspect the default value - so documentation tools (and other tools) couldn't have access to them. Added to which, "evaluation at call time" has its own unexpected and weird behaviour. Consider: x = 3 def fun(a=x): pass del x With evaluation at call time this code fails - and indeed any *re-binding* of x in the definition scope (at any subsequent time - possibly far removed from the function definition) affects the function. So default values being bound at definition times have advantages for efficiency and introspection, they have use cases for caching, and it removes some unexpected behaviour. It's definitely a language feature. All the best, Michael > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 2 15:59:29 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 02 Jan 2013 23:59:29 +0900 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: <87a9srd41a.fsf@uwakimon.sk.tsukuba.ac.jp> Max Moroz writes: > All I found is this comment in XEmacs documentation about vm-limit.c: > http://www.xemacs.org/Documentation/21.5/html/internals_17.html, but > I'm not sure if it's XEmacs feature or if malloc itself supports > it. It's an XEmacs feature. Works for me (but then it would, wouldn't it ;-). The implementation is just generic C, except for the macros that are used to access the LISP arena's bounds. It uses standard functions like getrlimit where available, otherwise it just uses the end of the address space to determine the available amount of memory. I can't vouch for accuracy or efficiency in determining usage (which is why I didn't bring it up myself), but there hasn't been a complaint about its functionality since I've been consistently reading the lists (1997). https://bitbucket.org/xemacs/xemacs-beta/src/c65b0329894b09c08423739508d277548a0b1a00/src/vm-limit.c?at=default https://bitbucket.org/xemacs/xemacs-beta/src/c65b0329894b09c08423739508d277548a0b1a00/src/mem-limits.h?at=default From guido at python.org Wed Jan 2 17:16:52 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 2 Jan 2013 09:16:52 -0700 Subject: [Python-ideas] Please stop discussing warts here Message-ID: I have just had to mute two threads where people were trying to convince each other that a certain language feature is/isn't a wart. This form of educational debate belongs in python-list, not here, please. --Guido van Rossum (sent from Android phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 2 20:34:31 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 2 Jan 2013 20:34:31 +0100 Subject: [Python-ideas] Identity dicts and sets References: Message-ID: <20130102203431.7b575019@pitrou.net> On Wed, 02 Jan 2013 16:01:29 +0200 Serhiy Storchaka wrote: > I propose to add new standard collection types: IdentityDict and > IdentitySet. They are almost same as ordinal dict and set, but uses > identity check instead of equality check (and id() or hash(id()) as a > hash). They will be useful for pickling, for implementing __sizeof__() > for compound types, and for other graph algorithms. > > Of course, they can be implemented using ordinal dicts: > > IdentityDict: key -> value as a dict: id(key) -> (key, value) > IdentitySet as a dict: id(value) -> value > > However implementing them directly in the core has advantages, it > consumes less memory and time, and more comfortable for use from C. > IdentityDict and IdentitySet implementations will share almost all code > with implementations of ordinal dict and set, only lookup function and > metainformation will be different. However dict and set already use a > lookup function overloading. I'm ok with this proposal. Regards Antoine. From eliben at gmail.com Wed Jan 2 20:43:47 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 2 Jan 2013 11:43:47 -0800 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On Wed, Jan 2, 2013 at 6:01 AM, Serhiy Storchaka wrote: > I propose to add new standard collection types: IdentityDict and > IdentitySet. They are almost same as ordinal dict and set, but uses > identity check instead of equality check (and id() or hash(id()) as a > hash). They will be useful for pickling, for implementing __sizeof__() for > compound types, and for other graph algorithms. > > Of course, they can be implemented using ordinal dicts: > > IdentityDict: key -> value as a dict: id(key) -> (key, value) > IdentitySet as a dict: id(value) -> value > > However implementing them directly in the core has advantages, it consumes > less memory and time, and more comfortable for use from C. IdentityDict and > IdentitySet implementations will share almost all code with implementations > of ordinal dict and set, only lookup function and metainformation will be > different. However dict and set already use a lookup function overloading. > > I agree that the data structures may be useful, but is there no way to some allow the customization of existing data structures instead, without losing performance? It's a shame to have another kind of dict just for this purpose. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 2 21:03:48 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 2 Jan 2013 21:03:48 +0100 Subject: [Python-ideas] Identity dicts and sets References: Message-ID: <20130102210348.2ae0a985@pitrou.net> On Wed, 2 Jan 2013 11:43:47 -0800 Eli Bendersky wrote: > On Wed, Jan 2, 2013 at 6:01 AM, Serhiy Storchaka wrote: > > > I propose to add new standard collection types: IdentityDict and > > IdentitySet. They are almost same as ordinal dict and set, but uses > > identity check instead of equality check (and id() or hash(id()) as a > > hash). They will be useful for pickling, for implementing __sizeof__() for > > compound types, and for other graph algorithms. > > > > Of course, they can be implemented using ordinal dicts: > > > > IdentityDict: key -> value as a dict: id(key) -> (key, value) > > IdentitySet as a dict: id(value) -> value > > > > However implementing them directly in the core has advantages, it consumes > > less memory and time, and more comfortable for use from C. IdentityDict and > > IdentitySet implementations will share almost all code with implementations > > of ordinal dict and set, only lookup function and metainformation will be > > different. However dict and set already use a lookup function overloading. > > > > > I agree that the data structures may be useful, but is there no way to some > allow the customization of existing data structures instead, without losing > performance? It's a shame to have another kind of dict just for this > purpose. The implementation kindof already exists in _pickle.c, IIRC (it's used for the memo dict). Regards Antoine. From storchaka at gmail.com Wed Jan 2 21:34:18 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 2 Jan 2013 22:34:18 +0200 Subject: [Python-ideas] Identity dicts and sets Message-ID: <201301022234.18839.storchaka@gmail.com> ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????: > I agree that the data structures may be useful, but is there no way to some > allow the customization of existing data structures instead, without losing > performance? It's a shame to have another kind of dict just for this > purpose. What interface for the customization is possible? Obviously, a dict constructor can't have a special keyword argument. From mwm at mired.org Wed Jan 2 21:37:12 2013 From: mwm at mired.org (Mike Meyer) Date: Wed, 2 Jan 2013 14:37:12 -0600 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: <201301022234.18839.storchaka@gmail.com> References: <201301022234.18839.storchaka@gmail.com> Message-ID: On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka wrote: > ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????: >> I agree that the data structures may be useful, but is there no way to some >> allow the customization of existing data structures instead, without losing >> performance? It's a shame to have another kind of dict just for this >> purpose. >> What interface for the customization is possible? Obviously, a dict > constructor can't have a special keyword argument. How about a set_key method? It takes a single callable as an argument. You'd get your behavior with dict.set_key(id). If called when the dict is non-empty, it should throw an exception. References: <201301022234.18839.storchaka@gmail.com> Message-ID: Something curried? custom_dict(cfg=...)(key1=..., key2=...) On Thu, Jan 3, 2013 at 4:37 AM, Mike Meyer wrote: > On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka > wrote: > > ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????: > >> I agree that the data structures may be useful, but is there no way to > some > >> allow the customization of existing data structures instead, without > losing > >> performance? It's a shame to have another kind of dict just for this > >> purpose. > >> What interface for the customization is possible? Obviously, a dict > > constructor can't have a special keyword argument. > > How about a set_key method? It takes a single callable as an argument. > You'd get your behavior with dict.set_key(id). If called when the dict > is non-empty, it should throw an exception. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zuo at chopin.edu.pl Wed Jan 2 22:04:52 2013 From: zuo at chopin.edu.pl (=?utf-8?B?SmFuIEthbGlzemV3c2tp?=) Date: Wed, 02 Jan 2013 22:04:52 +0100 Subject: [Python-ideas] =?utf-8?q?Odp=3A__Identity_dicts_and_sets?= Message-ID: <20130102210457.041ED2F587@filifionka.chopin.edu.pl> Eg.: custom_dict(set_key=..., missing=...) -> a new dict subclass -------------- next part -------------- An HTML attachment was scrubbed... URL: From masklinn at masklinn.net Wed Jan 2 22:13:57 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 2 Jan 2013 22:13:57 +0100 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> Message-ID: <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net> On 2013-01-02, at 21:37 , Mike Meyer wrote: > On Wed, Jan 2, 2013 at 2:34 PM, Serhiy Storchaka wrote: >> ?????? 02 ?????? 2013 21:43:47 Eli Bendersky ?? ????????: >>> I agree that the data structures may be useful, but is there no way to some >>> allow the customization of existing data structures instead, without losing >>> performance? It's a shame to have another kind of dict just for this >>> purpose. >>> What interface for the customization is possible? Obviously, a dict >> constructor can't have a special keyword argument. > > How about a set_key method? It takes a single callable as an argument. > You'd get your behavior with dict.set_key(id). If called when the dict > is non-empty, it should throw an exception. Wouldn't it make more sense to provide e.g. collections.KeyedDictionary(key, seq, **kwargs)? It would be clear and would allow implementations to provide dedicated implementations for special cases (such as key=id) if desired or necessary. defaultdict already follows this pattern, so there's a precedent. From bruce at leapyear.org Wed Jan 2 22:33:30 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 2 Jan 2013 13:33:30 -0800 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net> References: <201301022234.18839.storchaka@gmail.com> <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net> Message-ID: On Wed, Jan 2, 2013 at 1:13 PM, Masklinn wrote: > > Wouldn't it make more sense to provide e.g. > collections.KeyedDictionary(key, seq, **kwargs)? It would be clear > and would allow implementations to provide dedicated implementations for > special cases (such as key=id) if desired or necessary. > > defaultdict already follows this pattern, so there's a precedent. I agree collections is the place to put it but that would give us three specialized subclasses of dictionary which cannot be combined. That is, I can have a dictionary with a default, one that is ordered or one that uses a key function but not any combination of those. It would seem better to have something like Haoyi Li suggested: collections.Dictionary(default=None, ordered=False, key=None) --> a dict subclass of course collections.OrderedDictionary and collections.defaultdict would continue to be available as appropriate aliases to collections.Dictionary. --- Bruce Check it out: http://kck.st/YeqGxQ -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Wed Jan 2 22:35:30 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 02 Jan 2013 21:35:30 +0000 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> Message-ID: On 02/01/2013 8:37pm, Mike Meyer wrote: > How about a set_key method? It takes a single callable as an argument. > You'd get your behavior with dict.set_key(id). If called when the dict > is non-empty, it should throw an exception. Wouldn't you need to specify a hash function at the same time? -- Richard From greg.ewing at canterbury.ac.nz Wed Jan 2 21:58:56 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 03 Jan 2013 09:58:56 +1300 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102030851.GA11279@iskra.aviel.ru> <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> Message-ID: <50E49F90.9080005@canterbury.ac.nz> alex23 wrote: > There is no way you can make Python fit either the call by reference > or call by value models, although people regularly try, No, what happens is that different people have different ideas about what those terms mean, and they talk past each other. So they've become useless nowadays, and are best avoided altogether unless you want to start a month-long argument. -- Greg From tjreedy at udel.edu Wed Jan 2 23:48:26 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 02 Jan 2013 17:48:26 -0500 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 1/2/2013 9:01 AM, Serhiy Storchaka wrote: > I propose to add new standard collection types: IdentityDict and > IdentitySet. They are almost same as ordinal dict and set, but uses What do you mean by ordinal dict, as opposed to plain dict. > identity check instead of equality check (and id() or hash(id()) as a By default, equality check is identity check. > hash). They will be useful for pickling, for implementing __sizeof__() > for compound types, and for other graph algorithms. I don't know anything about pickling or __sizeof__, by if one uses user-defined classes for nodes and edges, equality is identity, so I don't see what would be gained. The disadvantage of multiple minor variations on dict is confusion among users as to specific properties and use cases. -- Terry Jan Reedy From mwm at mired.org Wed Jan 2 14:29:28 2013 From: mwm at mired.org (Mike Meyer) Date: Wed, 2 Jan 2013 07:29:28 -0600 Subject: [Python-ideas] Documenting Python warts In-Reply-To: <20130102113545.GA23780@iskra.aviel.ru> References: <20121231000012.GA10426@iskra.aviel.ru> <20130101225505.757540fa@pitrou.net> <189ac58e-6deb-45d8-a239-5c5d8a5594e7@jl13g2000pbb.googlegroups.com> <20130102000113.GB672@iskra.aviel.ru> <20130102030851.GA11279@iskra.aviel.ru> <673c2190-c28b-4b16-be9b-fe62ff95b799@r4g2000pbi.googlegroups.com> <20130102113545.GA23780@iskra.aviel.ru> Message-ID: <20130102072928.52867fd3@bhuda.mired.org> On Wed, 2 Jan 2013 15:35:45 +0400 Oleg Broytman wrote: > On Wed, Jan 02, 2013 at 02:59:39AM -0800, alex23 wrote: > > What you seem to be > > advocating is that all languages be nothing more than syntactic sugar > > for the same underlying model. > Yes, von Neumann architecture. So all the differences between FORTRAN II, PROLOG and Python are syntactic sugar? I guess that makes preference in programming languages just a matter of taste. http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org From jimjjewett at gmail.com Thu Jan 3 03:13:47 2013 From: jimjjewett at gmail.com (Jim Jewett) Date: Wed, 2 Jan 2013 21:13:47 -0500 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: On 12/31/12, Max Moroz wrote: > Sometimes, I have the flexibility to reduce the memory used by my > program (e.g., by destroying large cached objects, etc.). It would be > great if I could ask Python interpreter to notify me when memory is > running out, so I can take such actions. Agreed, provided the overhead isn't too high. Depending on how accurately and precisely you need to track the memory usage, it might be enough to replace Objects/obmalloc.c new_arena with a wrapper that calls your callback before (maybe) allocating a new arena. -jJ From ncoghlan at gmail.com Thu Jan 3 04:37:40 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 3 Jan 2013 13:37:40 +1000 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On Thu, Jan 3, 2013 at 8:48 AM, Terry Reedy wrote: > On 1/2/2013 9:01 AM, Serhiy Storchaka wrote: >> >> I propose to add new standard collection types: IdentityDict and >> IdentitySet. They are almost same as ordinal dict and set, but uses > > > What do you mean by ordinal dict, as opposed to plain dict. I assumed Serhiy meant OrderedDict. >> identity check instead of equality check (and id() or hash(id()) as a > > By default, equality check is identity check. The point of an IdentityDict/Set is for it to be keyed by id rather than value for *all* objects, rather than just those with the default equality comparison. This can be important in some use cases: 1. It's more correct for caching. For example, "0 + 0" should give "0", while "0.0 + 0.0" should give "0.0". An identity based cache will get this right, a value based cache will get it wrong (functools.lru_cache actually splits the difference and goes with a type+value based cache rather than a simple value based cache) 2. It effectively allows you to add additional state to both mutable and immutable objects (by storing the extra state in an identity keyed dictionary). However, one important problem with this kind of data structure is that it is *very* easy to get into lifecycle problems if you don't store at least a weak reference to a real key (since id's may be recycled after an object is destroyed, as shown here: >>> [] is [] # Both objects alive at the same time, forces different id False >>> id([]) == id([]) # First id is recycled for second object True > The disadvantage of multiple minor variations on dict is confusion among > users as to specific properties and use cases. Indeed. As noted elsewhere, we already have a nasty composition problem between __missing__, order preservation and weak referencing. Adding a key function override into that mix suggests that a hashmap factory API might be a better option than continuing the proliferation of slightly different mapping types. (Guido's fears of an explosion in subtly different container types in the standard library once the collections module was added have proved to be well founded). So, -1 from me on making the composition problem worse, but tentative +0 on an API that addresses the composition problem and also includes "key=func" style support for using a decorated value in the lookup step. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From hernan.grecco at gmail.com Thu Jan 3 05:05:38 2013 From: hernan.grecco at gmail.com (Hernan Grecco) Date: Thu, 3 Jan 2013 05:05:38 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: <50E42A0F.2040908@drees.name> References: <50E083BA.7000603@nedbatchelder.com> <50E142FF.3070101@drees.name> <50E42A0F.2040908@drees.name> Message-ID: Hi, I have done some work to improve the search results on the Python Docs. You can compare the current [0] with the proposed [1], or both at the same time [2]. It is basically a patch for sphinx [4], plus a python specific javascript [3]. The ideas are briefly explained [4]. I have not optimized the scores in [4], just some educated guesses. best, Hernan [0] http://hgrecco.github.com/searchpydocs/current/ [1] http://hgrecco.github.com/searchpydocs/proposed/ [2] http://hgrecco.github.com/searchpydocs/ [3] https://github.com/hgrecco/searchpydocs/blob/master/cpy_scorer.js [4] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results On Wed, Jan 2, 2013 at 1:37 PM, Stefan Drees wrote: > Hi hernan, > On 02.01.13 12:20, Hernan Grecco wrote: >> >> ... Thanks for all the feedback. I was hacking the sphinx indexer and the >> >> javacript searchtool today. I think the search results can be improved >> by patching sphinx upstream and adding a small project dependent (in >> this case Python) javascript snippet. I have created a proposal in the >> Sphinx Issue tracker [0]. Let's move the discussion there. >> ... >> [0] >> https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results > > > thanks a lot for transforming the mail thread to improve the local search > facility into real code suggestions. > > I commented on a first snippet from your suggested patch there. > > All the best, > Stefan. > > Further historic details: >> >> >> On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees wrote: >>> >>> On 30.12.12 20:45, Georg Brandl wrote: >>>> >>>> On 12/30/2012 07:11 PM, Ned Batchelder wrote: >>>>> >>>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote: >>>>>> >>>>>> ... >>>>>> >>>>>> I have seen many people new to Python stumbling while using the Python >>>>>> docs due to the order of the search results. >>>>>> ... >>>>>> >>>>>> So my suggestion is to put the builtins first, the rest of the >>>>>> standard lib later including HowTos, FAQ, etc and finally the >>>>>> c-modules. Additionally, a section with a title matching exactly the >>>>>> search query should come first. (I am not sure if the last suggestion >>>>>> belongs in python-ideas or in >>>>>> the sphinx mailing list, please advice) >>>>> >>>>> >>>>> >>>>> While we're on the topic, why in this day and age do we have a custom >>>>> search? Using google site search would be faster for the user, and >>>>> more >>>>> accurate. >>>> >>>> >>>> >>>> I agree. Someone needs to propose a patch though. >>>> ... >>> >>> >>> >>> a custom search in itself is a wonderful thing. To me it also shows more >>> appreciation of visitor concerns than thoses sites, that are just >>> _offering_ >>> google site search (which is accessible anyway to every visitor capable >>> of >>> memorizing the google or bing or whatnot URL). >>> >>> I second Hernans suggestion about ordering and also his question where >>> the >>> request (and patches) should be directed to. >>> ... > > From solipsis at pitrou.net Thu Jan 3 08:06:49 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 3 Jan 2013 08:06:49 +0100 Subject: [Python-ideas] Identity dicts and sets References: Message-ID: <20130103080649.58dfe44b@pitrou.net> On Thu, 3 Jan 2013 13:37:40 +1000 Nick Coghlan wrote: > On Thu, Jan 3, 2013 at 8:48 AM, Terry Reedy wrote: > > On 1/2/2013 9:01 AM, Serhiy Storchaka wrote: > >> > >> I propose to add new standard collection types: IdentityDict and > >> IdentitySet. They are almost same as ordinal dict and set, but uses > > > > > > What do you mean by ordinal dict, as opposed to plain dict. > > I assumed Serhiy meant OrderedDict. I'm quite sure Serhiy meant ordinary dict. > As noted elsewhere, we already have a nasty composition > problem between __missing__, order preservation and weak referencing. Aren't you dramatizing a bit? I haven't seen anyone ask for an ordered weak dict, or a weak dict with default values. > So, -1 from me on making the composition problem worse, but tentative > +0 on an API that addresses the composition problem and also includes > "key=func" style support for using a decorated value in the lookup > step. Well, IdentityDict addresses an actual use case. I don't think a defaultidentitydict addresses any use case. Regards Antoine. From stefan at drees.name Thu Jan 3 10:05:27 2013 From: stefan at drees.name (Stefan Drees) Date: Thu, 03 Jan 2013 10:05:27 +0100 Subject: [Python-ideas] Order in the documentation search results In-Reply-To: References: <50E083BA.7000603@nedbatchelder.com> <50E142FF.3070101@drees.name> <50E42A0F.2040908@drees.name> Message-ID: <50E549D7.1000007@drees.name> Hi Hernan, On 03.01.13 05:05, Hernan Grecco wrote: > ... I have done some work to improve the search results on the Python > Docs. You can compare the current [0] with the proposed [1], or both > at the same time [2]. It is basically a patch for sphinx [4], plus a > python specific javascript [3]. The ideas are briefly explained [4]. > > I have not optimized the scores in [4], just some educated guesses. > ... > > [0] http://hgrecco.github.com/searchpydocs/current/ > [1] http://hgrecco.github.com/searchpydocs/proposed/ > [2] http://hgrecco.github.com/searchpydocs/ > [3] https://github.com/hgrecco/searchpydocs/blob/master/cpy_scorer.js > [4] https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results > that looks good to me for eg. file, dict and dict.clear. Far better, than a google/bing/whatever_external search by the way (as tested with dict, using google search on "dict site:http://docs.python.org/3/") :-)) As I read in the sphinx issue mail flow you opened, Georg asks for a pull request of the patches. I consider this very promising. Thanks again for the effort and these good first results Hernan! All the best, Stefan. Further historic details: > On Wed, Jan 2, 2013 at 1:37 PM, Stefan Drees wrote: >> Hi hernan, >> On 02.01.13 12:20, Hernan Grecco wrote: >>> >>> ... Thanks for all the feedback. I was hacking the sphinx indexer and the >>> >>> javacript searchtool today. I think the search results can be improved >>> by patching sphinx upstream and adding a small project dependent (in >>> this case Python) javascript snippet. I have created a proposal in the >>> Sphinx Issue tracker [0]. Let's move the discussion there. >>> ... >>> [0] >>> https://bitbucket.org/birkenfeld/sphinx/issue/1067/better-search-results >> >> >> thanks a lot for transforming the mail thread to improve the local search >> facility into real code suggestions. >> >> I commented on a first snippet from your suggested patch there. >> >> All the best, >> Stefan. >> >> Further historic details: >>> >>> >>> On Mon, Dec 31, 2012 at 8:47 AM, Stefan Drees wrote: >>>> >>>> On 30.12.12 20:45, Georg Brandl wrote: >>>>> >>>>> On 12/30/2012 07:11 PM, Ned Batchelder wrote: >>>>>> >>>>>> On 12/30/2012 12:54 PM, Hernan Grecco wrote: >>>>>>> >>>>>>> ... >>>>>>> >>>>>>> I have seen many people new to Python stumbling while using the Python >>>>>>> docs due to the order of the search results. >>>>>>> ... >>>>>>> >>>>>>> So my suggestion is to put the builtins first, the rest of the >>>>>>> standard lib later including HowTos, FAQ, etc and finally the >>>>>>> c-modules. Additionally, a section with a title matching exactly the >>>>>>> search query should come first. (I am not sure if the last suggestion >>>>>>> belongs in python-ideas or in >>>>>>> the sphinx mailing list, please advice) >>>>>> >>>>>> >>>>>> >>>>>> While we're on the topic, why in this day and age do we have a custom >>>>>> search? Using google site search would be faster for the user, and >>>>>> more >>>>>> accurate. >>>>> >>>>> >>>>> >>>>> I agree. Someone needs to propose a patch though. >>>>> ... >>>> >>>> >>>> >>>> a custom search in itself is a wonderful thing. To me it also shows more >>>> appreciation of visitor concerns than thoses sites, that are just >>>> _offering_ >>>> google site search (which is accessible anyway to every visitor capable >>>> of >>>> memorizing the google or bing or whatnot URL). >>>> >>>> I second Hernans suggestion about ordering and also his question where >>>> the >>>> request (and patches) should be directed to. >>>> ... From storchaka at gmail.com Thu Jan 3 12:42:57 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 13:42:57 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> Message-ID: On 02.01.13 22:45, Haoyi Li wrote: > Something curried? > > custom_dict(cfg=...)(key1=..., key2=...) Yes, it looks good. In any case custom_dict() should return a new type, not dict, for allow serialization. From storchaka at gmail.com Thu Jan 3 12:50:29 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 13:50:29 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net> Message-ID: On 02.01.13 23:33, Bruce Leban wrote: > I agree collections is the place to put it but that would give us three > specialized subclasses of dictionary which cannot be combined. That is, > I can have a dictionary with a default, one that is ordered or one that > uses a key function but not any combination of those. It would seem > better to have something like Haoyi Li suggested: > > collections.Dictionary(default=None, ordered=False, key=None) --> a dict > subclass I doubt if such combinations have a sense. At least not all features can be combined. From storchaka at gmail.com Thu Jan 3 12:51:04 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 13:51:04 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 03.01.13 00:48, Terry Reedy wrote: > What do you mean by ordinal dict, as opposed to plain dict. Sorry to have confused you. I mean "ordinary dict", same as "plain dict". > I don't know anything about pickling or __sizeof__, by if one uses > user-defined classes for nodes and edges, equality is identity, so I > don't see what would be gained. If one uses a list, a dict, or user-defined class with defined __eq__, equality is not identity. Yes, you can use an identity dict with mutable types! From storchaka at gmail.com Thu Jan 3 12:53:35 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 13:53:35 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> Message-ID: On 02.01.13 23:35, Richard Oudkerk wrote: > Wouldn't you need to specify a hash function at the same time? A hash function is hash(keyfunc(key)). From storchaka at gmail.com Thu Jan 3 13:09:21 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 14:09:21 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 03.01.13 05:37, Nick Coghlan wrote: > I assumed Serhiy meant OrderedDict. Sorry to have confused you. I meant "ordinary dict". > 1. It's more correct for caching. For example, "0 + 0" should give > "0", while "0.0 + 0.0" should give "0.0". An identity based cache will > get this right, a value based cache will get it wrong > (functools.lru_cache actually splits the difference and goes with a > type+value based cache rather than a simple value based cache) This is not a use case. Two "0" are same key in CPython, but two "1000" or two "0.0" are not. There is yet one "wart" (as in any other language which has identity maps). > However, one important problem with this kind of data structure is > that it is *very* easy to get into lifecycle problems if you don't > store at least a weak reference to a real key (since id's may be > recycled after an object is destroyed, as shown here: Of course, identity dict and set should got an ownership on its keys and values, as all other non-weak collections. Except lookup function they don't differ from their ordinary counterparts. > Indeed. As noted elsewhere, we already have a nasty composition > problem between __missing__, order preservation and weak referencing. I doubt if all combinations have a sense. From storchaka at gmail.com Thu Jan 3 13:30:56 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 14:30:56 +0200 Subject: [Python-ideas] Preventing out of memory conditions In-Reply-To: References: Message-ID: On 01.01.13 00:16, Max Moroz wrote: > But let's say I am willing to do some work to estimate the maximum > amount of memory my application can be allowed to use. If I provide > that number to Python interpreter, it may be possible for it to notify > me when the next memory allocation would exceed this limit by calling > a function I provide it (hopefully passing as arguments the amount of > memory being requested, as well as the amount currently in use). My > callback function could then destroy some objects, and return True to > indicate that some objects were destroyed. At that point, the > intepreter could run its standard garbage collection routines to > release the memory that corresponded to those objects - before > proceeding with whatever it was trying to do originally. (If I > returned False, or if I didn't provide a callback function at all, the > interpreter would simply behave as it does today.) Any memory > allocations that happen while the callback function itself is > executing, would not trigger further calls to it. The whole mechanism > would be disabled for the rest of the session if the memory freed by > the callback function was insufficient to prevent going over the > memory limit. > > Would this be worth considering for a future language extension? How > hard would it be to implement? You can't call a callback function right from memory allocation function. A lot of code in the core, in the standard and third-party extensions rely on the fact that no Python code executed on some C API functions. Violation of this rule will lead to a breakdown of all. Even copying the list could be broken. You allocate the memory of the necessary size for the new list and then copy the elements. If the callback function was called during memory allocation, the size of the original list may change. This will lead to a violation of the integrity and the crash or the wrong result. And there are thousands of such places. Change all of them is impossible, and it will lead to reduction of performance even if callbacks are not used. You can call a callback function only at safe point, at least when GIL is released. From oscar.j.benjamin at gmail.com Thu Jan 3 16:03:02 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 3 Jan 2013 15:03:02 +0000 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 3 January 2013 12:09, Serhiy Storchaka wrote: > On 03.01.13 05:37, Nick Coghlan wrote: >> >> [SNIP] >> However, one important problem with this kind of data structure is >> that it is *very* easy to get into lifecycle problems if you don't >> store at least a weak reference to a real key (since id's may be >> recycled after an object is destroyed, as shown here: > > Of course, identity dict and set should got an ownership on its keys and > values, as all other non-weak collections. Except lookup function they don't > differ from their ordinary counterparts. I think what Nick means is that if you implement this naively then you don't hold references to the keys: class IdentityDict(dict): def __setitem__(self, key, val): dict.__setitem__(self, id(key), val) # No reference to key held when this function ends ... A way to fix this is to store both objects in the value (with corresponding changes to __getitem__ etc.): class IdentityDict(dict): def __setitem__(self, key, val): dict.__setitem__(self, id(key), (key, val)) Oscar From christian at python.org Thu Jan 3 16:10:20 2013 From: christian at python.org (Christian Heimes) Date: Thu, 03 Jan 2013 16:10:20 +0100 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: <50E59F5C.3010302@python.org> Am 03.01.2013 04:37, schrieb Nick Coghlan: > This can be important in some use cases: > > 1. It's more correct for caching. For example, "0 + 0" should give > "0", while "0.0 + 0.0" should give "0.0". An identity based cache will > get this right, a value based cache will get it wrong > (functools.lru_cache actually splits the difference and goes with a > type+value based cache rather than a simple value based cache) Do you mean +0.0 or -0.0? IEEE 754 zeros are always signed although +0.0 is equal to -0.0. And NaNs are always unequal to all NaNs, even to itself. For floats we would need a type specified dict that handles special values correctly ... Can of worms? From ncoghlan at gmail.com Wed Jan 2 12:40:26 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 2 Jan 2013 21:40:26 +1000 Subject: [Python-ideas] Updated PEP 342: Simplifying the CPython update sequence Message-ID: I've updated the PEP heavily based on the previous thread and miscellanous comments in response to checkins. Latest version is at http://www.python.org/dev/peps/pep-0432/ and inline below. The biggest change in the new version is moving from a Python dictionary to a C struct as the storage for the full low level interpreter configuration as Antoine suggested. The individual settings are now either C integers for the various flag values (defaulting to -1 to indicate "figure this out"), or pointers to the appropriate specific Python type (defaulting to NULL to indicate "figure this out"). I'm happy enough with the design now that I think it's worth starting to implement it before I tinker with the PEP any further. Cheers, Nick. ================================ PEP: 432 Title: Simplifying the CPython startup sequence Version: $Revision$ Last-Modified: $Date$ Author: Nick Coghlan Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 28-Dec-2012 Python-Version: 3.4 Post-History: 28-Dec-2012, 2-Jan-2013 Abstract ======== This PEP proposes a mechanism for simplifying the startup sequence for CPython, making it easier to modify the initialization behaviour of the reference interpreter executable, as well as making it easier to control CPython's startup behaviour when creating an alternate executable or embedding it as a Python execution engine inside a larger application. Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate resolution for most of these should become clearer as the reference implementation is developed. Proposal ======== This PEP proposes that CPython move to an explicit multi-phase initialization process, where a preliminary interpreter is put in place with limited OS interaction capabilities early in the startup sequence. This essential core remains in place while all of the configuration settings are determined, until a final configuration call takes those settings and finishes bootstrapping the interpreter immediately before locating and executing the main module. In the new design, the interpreter will move through the following well-defined phases during the startup sequence: * Pre-Initialization - no interpreter available * Initialization - interpreter partially available * Initialized - full interpreter available, __main__ related metadata incomplete * Main Execution - optional state, __main__ related metadata populated, bytecode executing in the __main__ module namespace As a concrete use case to help guide any design changes, and to solve a known problem where the appropriate defaults for system utilities differ from those for running user scripts, this PEP also proposes the creation and distribution of a separate system Python (``spython``) executable which, by default, ignores user site directories and environment variables, and does not implicitly set ``sys.path[0]`` based on the current directory or the script being executed. To keep the implementation complexity under control, this PEP does *not* propose wholesale changes to the way the interpreter state is accessed at runtime, nor does it propose changes to the way subinterpreters are created after the main interpreter has already been initialized. Changing the order in which the existing initialization steps occur in order to make the startup sequence easier to maintain is already a substantial change, and attempting to make those other changes at the same time will make the change significantly more invasive and much harder to review. However, such proposals may be suitable topics for follow-on PEPs or patches - one key benefit of this PEP is decreasing the coupling between the internal storage model and the configuration interface, so such changes should be easier once this PEP has been implemented. Background ========== Over time, CPython's initialization sequence has become progressively more complicated, offering more options, as well as performing more complex tasks (such as configuring the Unicode settings for OS interfaces in Python 3 as well as bootstrapping a pure Python implementation of the import system). Much of this complexity is accessible only through the ``Py_Main`` and ``Py_Initialize`` APIs, offering embedding applications little opportunity for customisation. This creeping complexity also makes life difficult for maintainers, as much of the configuration needs to take place prior to the ``Py_Initialize`` call, meaning much of the Python C API cannot be used safely. A number of proposals are on the table for even *more* sophisticated startup behaviour, such as better control over ``sys.path`` initialization (easily adding additional directories on the command line in a cross-platform fashion, as well as controlling the configuration of ``sys.path[0]``), easier configuration of utilities like coverage tracing when launching Python subprocesses, and easier control of the encoding used for the standard IO streams when embedding CPython in a larger application. Rather than attempting to bolt such behaviour onto an already complicated system, this PEP proposes to instead simplify the status quo *first*, with the aim of making these further feature requests easier to implement. Key Concerns ============ There are a couple of key concerns that any change to the startup sequence needs to take into account. Maintainability --------------- The current CPython startup sequence is difficult to understand, and even more difficult to modify. It is not clear what state the interpreter is in while much of the initialization code executes, leading to behaviour such as lists, dictionaries and Unicode values being created prior to the call to ``Py_Initialize`` when the ``-X`` or ``-W`` options are used [1_]. By moving to an explicitly multi-phase startup sequence, developers should only need to understand which features are not available in the core bootstrapping state, as the vast majority of the configuration process will now take place in that state. By basing the new design on a combination of C structures and Python data types, it should also be easier to modify the system in the future to add new configuration options. Performance ----------- CPython is used heavily to run short scripts where the runtime is dominated by the interpreter initialization time. Any changes to the startup sequence should minimise their impact on the startup overhead. Experience with the importlib migration suggests that the startup time is dominated by IO operations. However, to monitor the impact of any changes, a simple benchmark can be used to check how long it takes to start and then tear down the interpreter:: python3 -m timeit -s "from subprocess import call" "call(['./python', '-c', 'pass'])" Current numbers on my system for 2.7, 3.2 and 3.3 (using the 3.3 subprocess and timeit modules to execute the check, all with non-debug builds):: # Python 2.7 $ py33/python -m timeit -s "from subprocess import call" "call(['py27/python', '-c', 'pass'])" 100 loops, best of 3: 17.8 msec per loop # Python 3.2 $ py33/python -m timeit -s "from subprocess import call" "call(['py32/python', '-c', 'pass'])" 10 loops, best of 3: 39 msec per loop # Python 3.3 $ py33/python -m timeit -s "from subprocess import call" "call(['py33/python', '-c', 'pass'])" 10 loops, best of 3: 25.3 msec per loop Improvements in the import system and the Unicode support already resulted in a more than 30% improvement in startup time in Python 3.3 relative to 3.2. Python 3.3 is still slightly slower to start than Python 2.7 due to the additional infrastructure that needs to be put in place to support the Unicode based text model. This PEP is not expected to have any significant effect on the startup time, as it is aimed primarily at *reordering* the existing initialization sequence, without making substantial changes to the individual steps. However, if this simple check suggests that the proposed changes to the initialization sequence may pose a performance problem, then a more sophisticated microbenchmark will be developed to assist in investigation. Required Configuration Settings =============================== A comprehensive configuration scheme requires that an embedding application be able to control the following aspects of the final interpreter state: * Whether or not to use randomised hashes (and if used, potentially specify a specific random seed) * The "Where is Python located?" elements in the ``sys`` module: * ``sys.executable`` * ``sys.base_exec_prefix`` * ``sys.base_prefix`` * ``sys.exec_prefix`` * ``sys.prefix`` * The path searched for imports from the filesystem (and other path hooks): * ``sys.path`` * The command line arguments seen by the interpeter: * ``sys.argv`` * The filesystem encoding used by: * ``sys.getfsencoding`` * ``os.fsencode`` * ``os.fsdecode`` * The IO encoding (if any) and the buffering used by: * ``sys.stdin`` * ``sys.stdout`` * ``sys.stderr`` * The initial warning system state: * ``sys.warnoptions`` * Arbitrary extended options (e.g. to automatically enable ``faulthandler``): * ``sys._xoptions`` * Whether or not to implicitly cache bytecode files: * ``sys.dont_write_bytecode`` * Whether or not to enforce correct case in filenames on case-insensitive platforms * ``os.environ["PYTHONCASEOK"]`` * The other settings exposed to Python code in ``sys.flags``: * ``debug`` (Enable debugging output in the pgen parser) * ``inspect`` (Enter interactive interpreter after __main__ terminates) * ``interactive`` (Treat stdin as a tty) * ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings) * ``no_user_site`` (don't add the user site directory to sys.path) * ``no_site`` (don't implicitly import site during startup) * ``ignore_environment`` (whether environment vars are used during config) * ``verbose`` (enable all sorts of random output) * ``bytes_warning`` (warnings/errors for implicit str/bytes interaction) * ``quiet`` (disable banner output even if verbose is also enabled or stdin is a tty and the interpreter is launched in interactive mode) * Whether or not CPython's signal handlers should be installed * What code (if any) should be executed as ``__main__``: * Nothing (just create an empty module) * A filesystem path referring to a Python script (source or bytecode) * A filesystem path referring to a valid ``sys.path`` entry (typically a directory or zipfile) * A given string (equivalent to the "-c" option) * A module or package (equivalent to the "-m" option) * Standard input as a script (i.e. a non-interactive stream) * Standard input as an interactive interpreter session Note that this just covers settings that are currently configurable in some manner when using the main CPython executable. While this PEP aims to make adding additional configuration settings easier in the future, it deliberately avoids adding any new settings of its own. The Status Quo ============== The current mechanisms for configuring the interpreter have accumulated in a fairly ad hoc fashion over the past 20+ years, leading to a rather inconsistent interface with varying levels of documentation. (Note: some of the info below could probably be cleaned up and added to the C API documentation - it's all CPython specific, so it doesn't belong in the language reference) Ignoring Environment Variables ------------------------------ The ``-E`` command line option allows all environment variables to be ignored when initializing the Python interpreter. An embedding application can enable this behaviour by setting ``Py_IgnoreEnvironmentFlag`` before calling ``Py_Initialize()``. In the CPython source code, the ``Py_GETENV`` macro implicitly checks this flag, and always produces ``NULL`` if it is set. Randomised Hashing ------------------ The randomised hashing is controlled via the ``-R`` command line option (in releases prior to 3.3), as well as the ``PYTHONHASHSEED`` environment variable. In Python 3.3, only the environment variable remains relevant. It can be used to disable randomised hashing (by using a seed value of 0) or else to force a specific hash value (e.g. for repeatability of testing, or to share hash values between processes) However, embedding applications must use the ``Py_HashRandomizationFlag`` to explicitly request hash randomisation (CPython sets it in ``Py_Main()`` rather than in ``Py_Initialize()``). The new configuration API should make it straightforward for an embedding application to reuse the ``PYTHONHASHSEED`` processing with a text based configuration setting provided by other means (e.g. a config file or separate environment variable). Locating Python and the standard library ---------------------------------------- The location of the Python binary and the standard library is influenced by several elements. The algorithm used to perform the calculation is not documented anywhere other than in the source code [3_,4_]. Even that description is incomplete, as it failed to be updated for the virtual environment support added in Python 3.3 (detailed in PEP 420). These calculations are affected by the following function calls (made prior to calling ``Py_Initialize()``) and environment variables: * ``Py_SetProgramName()`` * ``Py_SetPythonHome()`` * ``PYTHONHOME`` The filesystem is also inspected for ``pyvenv.cfg`` files (see PEP 420) or, failing that, a ``lib/os.py`` (Windows) or ``lib/python$VERSION/os.py`` file. The build time settings for PREFIX and EXEC_PREFIX are also relevant, as are some registry settings on Windows. The hardcoded fallbacks are based on the layout of the CPython source tree and build output when working in a source checkout. Configuring ``sys.path`` ------------------------ An embedding application may call ``Py_SetPath()`` prior to ``Py_Initialize()`` to completely override the calculation of ``sys.path``. It is not straightforward to only allow *some* of the calculations, as modifying ``sys.path`` after initialization is already complete means those modifications will not be in effect when standard library modules are imported during the startup sequence. If ``Py_SetPath()`` is not used prior to the first call to ``Py_GetPath()`` (implicit in ``Py_Initialize()``), then it builds on the location data calculations above to calculate suitable path entries, along with the ``PYTHONPATH`` environment variable. The ``site`` module, which is implicitly imported at startup (unless disabled via the ``-S`` option) adds additional paths to this initial set of paths, as described in its documentation [5_]. The ``-s`` command line option can be used to exclude the user site directory from the list of directories added. Embedding applications can control this by setting the ``Py_NoUserSiteDirectory`` global variable. The following commands can be used to check the default path configurations for a given Python executable on a given system: * ``./python -c "import sys, pprint; pprint.pprint(sys.path)"`` - standard configuration * ``./python -s -c "import sys, pprint; pprint.pprint(sys.path)"`` - user site directory disabled * ``./python -S -c "import sys, pprint; pprint.pprint(sys.path)"`` - all site path modifications disabled (Note: you can see similar information using ``-m site`` instead of ``-c``, but this is slightly misleading as it calls ``os.abspath`` on all of the path entries, making relative path entries look absolute. Using the ``site`` module also causes problems in the last case, as on Python versions prior to 3.3, explicitly importing site will carry out the path modifications ``-S`` avoids, while on 3.3+ combining ``-m site`` with ``-S`` currently fails) The calculation of ``sys.path[0]`` is comparatively straightforward: * For an ordinary script (Python source or compiled bytecode), ``sys.path[0]`` will be the directory containing the script. * For a valid ``sys.path`` entry (typically a zipfile or directory), ``sys.path[0]`` will be that path * For an interactive session, running from stdin or when using the ``-c`` or ``-m`` switches, ``sys.path[0]`` will be the empty string, which the import system interprets as allowing imports from the current directory Configuring ``sys.argv`` ------------------------ Unlike most other settings discussed in this PEP, ``sys.argv`` is not set implicitly by ``Py_Initialize()``. Instead, it must be set via an explicitly call to ``Py_SetArgv()``. CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The calculation of ``sys.argv[1:]`` is straightforward: they're the command line arguments passed after the script name or the argument to the ``-c`` or ``-m`` options. The calculation of ``sys.argv[0]`` is a little more complicated: * For an ordinary script (source or bytecode), it will be the script name * For a ``sys.path`` entry (typically a zipfile or directory) it will initially be the zipfile or directory name, but will later be changed by the ``runpy`` module to the full path to the imported ``__main__`` module. * For a module specified with the ``-m`` switch, it will initially be the string ``"-m"``, but will later be changed by the ``runpy`` module to the full path to the executed module. * For a package specified with the ``-m`` switch, it will initially be the string ``"-m"``, but will later be changed by the ``runpy`` module to the full path to the executed ``__main__`` submodule of the package. * For a command executed with ``-c``, it will be the string ``"-c"`` * For explicitly requested input from stdin, it will be the string ``"-"`` * Otherwise, it will be the empty string Embedding applications must call Py_SetArgv themselves. The CPython logic for doing so is part of ``Py_Main()`` and is not exposed separately. However, the ``runpy`` module does provide roughly equivalent logic in ``runpy.run_module`` and ``runpy.run_path``. Other configuration settings ---------------------------- TBD: Cover the initialization of the following in more detail: * The initial warning system state: * ``sys.warnoptions`` * (-W option, PYTHONWARNINGS) * Arbitrary extended options (e.g. to automatically enable ``faulthandler``): * ``sys._xoptions`` * (-X option) * The filesystem encoding used by: * ``sys.getfsencoding`` * ``os.fsencode`` * ``os.fsdecode`` * The IO encoding and buffering used by: * ``sys.stdin`` * ``sys.stdout`` * ``sys.stderr`` * (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO) * Whether or not to implicitly cache bytecode files: * ``sys.dont_write_bytecode`` * (-B option, PYTHONDONTWRITEBYTECODE) * Whether or not to enforce correct case in filenames on case-insensitive platforms * ``os.environ["PYTHONCASEOK"]`` * The other settings exposed to Python code in ``sys.flags``: * ``debug`` (Enable debugging output in the pgen parser) * ``inspect`` (Enter interactive interpreter after __main__ terminates) * ``interactive`` (Treat stdin as a tty) * ``optimize`` (__debug__ status, write .pyc or .pyo, strip doc strings) * ``no_user_site`` (don't add the user site directory to sys.path) * ``no_site`` (don't implicitly import site during startup) * ``ignore_environment`` (whether environment vars are used during config) * ``verbose`` (enable all sorts of random output) * ``bytes_warning`` (warnings/errors for implicit str/bytes interaction) * ``quiet`` (disable banner output even if verbose is also enabled or stdin is a tty and the interpreter is launched in interactive mode) * Whether or not CPython's signal handlers should be installed Much of the configuration of CPython is currently handled through C level global variables:: Py_BytesWarningFlag (-b) Py_DebugFlag (-d option) Py_InspectFlag (-i option, PYTHONINSPECT) Py_InteractiveFlag (property of stdin, cannot be overridden) Py_OptimizeFlag (-O option, PYTHONOPTIMIZE) Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE) Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE) Py_NoSiteFlag (-S option) Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO) Py_VerboseFlag (-v option, PYTHONVERBOSE) For the above variables, the conversion of command line options and environment variables to C global variables is handled by ``Py_Main``, so each embedding application must set those appropriately in order to change them from their defaults. Some configuration can only be provided as OS level environment variables:: PYTHONSTARTUP PYTHONCASEOK PYTHONIOENCODING The ``Py_InitializeEx()`` API also accepts a boolean flag to indicate whether or not CPython's signal handlers should be installed. Finally, some interactive behaviour (such as printing the introductory banner) is triggered only when standard input is reported as a terminal connection by the operating system. TBD: Document how the "-x" option is handled (skips processing of the first comment line in the main script) Also see detailed sequence of operations notes at [1_] Design Details ============== (Note: details here are still very much in flux, but preliminary feedback is appreciated anyway) The main theme of this proposal is to create the interpreter state for the main interpreter *much* earlier in the startup process. This will allow most of the CPython API to be used during the remainder of the initialization process, potentially simplifying a number of operations that currently need to rely on basic C functionality rather than being able to use the richer data structures provided by the CPython C API. In the following, the term "embedding application" also covers the standard CPython command line application. Interpreter Initialization Phases --------------------------------- Four distinct phases are proposed: * Pre-Initialization: * no interpreter is available. * ``Py_IsInitializing()`` returns ``0`` * ``Py_IsInitialized()`` returns ``0`` * ``Py_IsRunningMain()`` returns ``0`` * The embedding application determines the settings required to create the main interpreter and moves to the next phase by calling ``Py_BeginInitialization``. * Initialization: * the main interpreter is available, but only partially configured. * ``Py_IsInitializing()`` returns ``1`` * ``Py_IsInitialized()`` returns ``0`` * ``Py_RunningMain()`` returns ``0`` * The embedding application determines and applies the settings required to complete the initialization process by calling ``Py_ReadConfiguration`` and ``Py_EndInitialization``. * Initialized: * the main interpreter is available and fully operational, but ``__main__`` related metadata is incomplete. * ``Py_IsInitializing()`` returns ``0`` * ``Py_IsInitialized()`` returns ``1`` * ``Py_IsRunningMain()`` returns ``0`` * Optionally, the embedding application may identify and begin executing code in the ``__main__`` module namespace by calling ``Py_RunPathAsMain``, ``Py_RunModuleAsMain`` or ``Py_RunStreamAsMain``. * Main Execution: * bytecode is being executed in the ``__main__`` namespace * ``Py_IsInitializing()`` returns ``0`` * ``Py_IsInitialized()`` returns ``1`` * ``Py_IsRunningMain()`` returns ``1`` As indicated by the phase reporting functions, main module execution is an optional subphase of Initialized rather than a completely distinct phase. All 4 phases will be used by the standard CPython interpreter and the proposed System Python interpreter. Other embedding applications may choose to skip the step of executing code in the ``__main__`` namespace. An embedding application may still continue to leave initialization almost entirely under CPython's control by using the existing ``Py_Initialize`` API. Alternatively, if an embedding application wants greater control over CPython's initial state, it will be able to use the new, finer grained API, which allows the embedding application greater control over the initialization process:: /* Phase 1: Pre-Initialization */ Py_CoreConfig core_config = Py_CoreConfig_INIT; Py_Config config = Py_Config_INIT; /* Easily control the core configuration */ core_config.ignore_environment = 1; /* Ignore environment variables */ core_config.use_hash_seed = 0; /* Full hash randomisation */ Py_BeginInitialization(&core_config); /* Phase 2: Initialization */ /* Optionally preconfigure some settings here - they will then be * used to derive other settings */ Py_ReadConfiguration(&config); /* Can completely override derived settings here */ Py_EndInitialization(&config); /* Phase 3: Initialized */ /* If an embedding application has no real concept of a main module * it can leave the interpreter in this state indefinitely. * Otherwise, it can launch __main__ via the Py_Run*AsMain functions. */ Pre-Initialization Phase ------------------------ The pre-initialization phase is where an embedding application determines the settings which are absolutely required before the interpreter can be initialized at all. Currently, the only configuration settings in this category are those related to the randomised hash algorithm - the hash algorithms must be consistent for the lifetime of the process, and so they must be in place before the core interpreter is created. The specific settings needed are a flag indicating whether or not to use a specific seed value for the randomised hashes, and if so, the specific value for the seed (a seed value of zero disables randomised hashing). In addition, due to the possible use of ``PYTHONHASHSEED`` in configuring the hash randomisation, the question of whether or not to consider environment variables must also be addressed early. The proposed API for this step in the startup sequence is:: void Py_BeginInitialization(const Py_CoreConfig *config); Like Py_Initialize, this part of the new API treats initialization failures as fatal errors. While that's still not particularly embedding friendly, the operations in this step *really* shouldn't be failing, and changing them to return error codes instead of aborting would be an even larger task than the one already being proposed. The new ``Py_CoreConfig`` struct holds the settings required for preliminary configuration:: /* Note: if changing anything in Py_CoreConfig, also update * Py_CoreConfig_INIT */ typedef struct { int ignore_environment; /* -E switch */ int use_hash_seed; /* PYTHONHASHSEED */ unsigned long hash_seed; /* PYTHONHASHSEED */ } Py_CoreConfig; #define Py_CoreConfig_INIT {0, -1, 0} The core configuration settings pointer may be ``NULL``, in which case the default values are ``ignore_environment = 0`` and ``use_hash_seed = -1``. The ``Py_CoreConfig_INIT`` macro is designed to allow easy initialization of a struct instance with sensible defaults:: Py_CoreConfig core_config = Py_CoreConfig_INIT; ``ignore_environment`` controls the processing of all Python related environment variables. If the flag is zero, then environment variables are processed normally. Otherwise, all Python-specific environment variables are considered undefined (exceptions may be made for some OS specific environment variables, such as those used on Mac OS X to communicate between the App bundle and the main Python binary). ``use_hash_seed`` controls the configuration of the randomised hash algorithm. If it is zero, then randomised hashes with a random seed will be used. It it is positive, then the value in ``hash_seed`` will be used to seed the random number generator. If the ``hash_seed`` is zero in this case, then the randomised hashing is disabled completely. If ``use_hash_seed`` is negative (and ``ignore_environment`` is zero), then CPython will inspect the ``PYTHONHASHSEED`` environment variable. If it is not set, is set to the empty string, or to the value ``"random"``, then randomised hashes with a random seed will be used. If it is set to the string ``"0"`` the randomised hashing will be disabled. Otherwise, the hash seed is expected to be a string representation of an integer in the range ``[0; 4294967295]``. To make it easier for embedding applications to use the ``PYTHONHASHSEED`` processing with a different data source, the following helper function will be added to the C API:: int Py_ReadHashSeed(char *seed_text, int *use_hash_seed, unsigned long *hash_seed); This function accepts a seed string in ``seed_text`` and converts it to the appropriate flag and seed values. If ``seed_text`` is ``NULL``, the empty string or the value ``"random"``, both ``use_hash_seed`` and ``hash_seed`` will be set to zero. Otherwise, ``use_hash_seed`` will be set to ``1`` and the seed text will be interpreted as an integer and reported as ``hash_seed``. On success the function will return zero. A non-zero return value indicates an error (most likely in the conversion to an integer). The aim is to keep this initial level of configuration as small as possible in order to keep the bootstrapping environment consistent across different embedding applications. If we can create a valid interpreter state without the setting, then the setting should go in the config dict passed to ``Py_EndInitialization()`` rather than in the core configuration. A new query API will allow code to determine if the interpreter is in the bootstrapping state between the creation of the interpreter state and the completion of the bulk of the initialization process:: int Py_IsInitializing(); Attempting to call ``Py_BeginInitialization()`` again when ``Py_IsInitializing()`` or ``Py_IsInitialized()`` is true is a fatal error. While in the initializing state, the interpreter should be fully functional except that: * compilation is not allowed (as the parser and compiler are not yet configured properly) * creation of subinterpreters is not allowed * creation of additional thread states is not allowed * The following attributes in the ``sys`` module are all either missing or ``None``: * ``sys.path`` * ``sys.argv`` * ``sys.executable`` * ``sys.base_exec_prefix`` * ``sys.base_prefix`` * ``sys.exec_prefix`` * ``sys.prefix`` * ``sys.warnoptions`` * ``sys.flags`` * ``sys.dont_write_bytecode`` * ``sys.stdin`` * ``sys.stdout`` * The filesystem encoding is not yet defined * The IO encoding is not yet defined * CPython signal handlers are not yet installed * only builtin and frozen modules may be imported (due to above limitations) * ``sys.stderr`` is set to a temporary IO object using unbuffered binary mode * The ``warnings`` module is not yet initialized * The ``__main__`` module does not yet exist The main things made available by this step will be the core Python datatypes, in particular dictionaries, lists and strings. This allows them to be used safely for all of the remaining configuration steps (unlike the status quo). In addition, the current thread will possess a valid Python thread state, allow any further configuration data to be stored on the interpreter object rather than in C process globals. Any call to ``Py_BeginInitialization()`` must have a matching call to ``Py_Finalize()``. It is acceptable to skip calling Py_EndInitialization() in between (e.g. if attempting to read the configuration settings fails) Determining the remaining configuration settings ------------------------------------------------ The next step in the initialization sequence is to determine the full settings needed to complete the process. No changes are made to the interpreter state at this point. The core API for this step is:: int Py_ReadConfiguration(PyConfig *config); The config argument should be a pointer to a Python dictionary. For any supported configuration setting already in the dictionary, CPython will sanity check the supplied value, but otherwise accept it as correct. Unlike ``Py_Initialize`` and ``Py_BeginInitialization``, this call will raise an exception and report an error return rather than exhibiting fatal errors if a problem is found with the config data. Any supported configuration setting which is not already set will be populated appropriately. The default configuration can be overridden entirely by setting the value *before* calling ``Py_ReadConfiguration``. The provided value will then also be used in calculating any settings derived from that value. Alternatively, settings may be overridden *after* the ``Py_ReadConfiguration`` call (this can be useful if an embedding application wants to adjust a setting rather than replace it completely, such as removing ``sys.path[0]``). Supported configuration settings -------------------------------- The new ``Py_Config`` struct holds the settings required to complete the interpreter configuration. All fields are either pointers to Python data types (not set == ``NULL``) or numeric flags (not set == ``-1``):: /* Note: if changing anything in Py_Config, also update Py_Config_INIT */ typedef struct { /* Argument processing */ PyList *raw_argv; PyList *argv; PyList *warnoptions; /* -W switch, PYTHONWARNINGS */ PyDict *xoptions; /* -X switch */ /* Filesystem locations */ PyUnicode *program_name; PyUnicode *executable; PyUnicode *prefix; /* PYTHONHOME */ PyUnicode *exec_prefix; /* PYTHONHOME */ PyUnicode *base_prefix; /* pyvenv.cfg */ PyUnicode *base_exec_prefix; /* pyvenv.cfg */ /* Site module */ int no_site; /* -S switch */ int no_user_site; /* -s switch, PYTHONNOUSERSITE */ /* Import configuration */ int dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */ int ignore_module_case; /* PYTHONCASEOK */ PyList *import_path; /* PYTHONPATH (etc) */ /* Standard streams */ int use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */ PyUnicode *stdin_encoding; /* PYTHONIOENCODING */ PyUnicode *stdin_errors; /* PYTHONIOENCODING */ PyUnicode *stdout_encoding; /* PYTHONIOENCODING */ PyUnicode *stdout_errors; /* PYTHONIOENCODING */ PyUnicode *stderr_encoding; /* PYTHONIOENCODING */ PyUnicode *stderr_errors; /* PYTHONIOENCODING */ /* Filesystem access */ PyUnicode *fs_encoding; /* Interactive interpreter */ int stdin_is_interactive; /* Force interactive behaviour */ int inspect_main; /* -i switch, PYTHONINSPECT */ PyUnicode *startup_file; /* PYTHONSTARTUP */ /* Debugging output */ int debug_parser; /* -d switch, PYTHONDEBUG */ int verbosity; /* -v switch */ int suppress_banner; /* -q switch */ /* Code generation */ int bytes_warnings; /* -b switch */ int optimize; /* -O switch */ /* Signal handling */ int install_sig_handlers; } Py_Config; /* Struct initialization is pretty ugly in C89. Avoiding this mess would * be the most attractive aspect of using a PyDict* instead... */ #define _Py_ArgConfig_INIT NULL, NULL, NULL, NULL #define _Py_LocationConfig_INIT NULL, NULL, NULL, NULL, NULL, NULL #define _Py_SiteConfig_INIT -1, -1 #define _Py_ImportConfig_INIT -1, -1, NULL #define _Py_StreamConfig_INIT -1, NULL, NULL, NULL, NULL, NULL, NULL #define _Py_FilesystemConfig_INIT NULL #define _Py_InteractiveConfig_INIT -1, -1, NULL #define _Py_DebuggingConfig_INIT -1, -1, -1 #define _Py_CodeGenConfig_INIT -1, -1 #define _Py_SignalConfig_INIT -1 #define Py_Config_INIT {_Py_ArgConfig_INIT, _Py_LocationConfig_INIT, _Py_SiteConfig_INIT, _Py_ImportConfig_INIT, _Py_StreamConfig_INIT, _Py_FilesystemConfig_INIT, _Py_InteractiveConfig_INIT, _Py_DebuggingConfig_INIT, _Py_CodeGenConfig_INIT, _Py_SignalConfig_INIT} Completing the interpreter initialization ----------------------------------------- The final step in the initialization process is to actually put the configuration settings into effect and finish bootstrapping the interpreter up to full operation:: int Py_EndInitialization(const PyConfig *config); Like Py_ReadConfiguration, this call will raise an exception and report an error return rather than exhibiting fatal errors if a problem is found with the config data. All configuration settings are required - the configuration struct should always be passed through ``Py_ReadConfiguration()`` to ensure it is fully populated. After a successful call, ``Py_IsInitializing()`` will be false, while ``Py_IsInitialized()`` will become true. The caveats described above for the interpreter during the initialization phase will no longer hold. However, some metadata related to the ``__main__`` module may still be incomplete: * ``sys.argv[0]`` may not yet have its final value * it will be ``-m`` when executing a module or package with CPython * it will be the same as ``sys.path[0]`` rather than the location of the ``__main__`` module when executing a valid ``sys.path`` entry (typically a zipfile or directory) * the metadata in the ``__main__`` module will still indicate it is a builtin module Executing the main module ------------------------- Initial thought is that hiding the various options behind a single API would make that API too complicated, so 3 separate APIs is more likely:: Py_RunPathAsMain Py_RunModuleAsMain Py_RunStreamAsMain Query API to indicate that ``sys.argv[0]`` is fully populated:: Py_IsRunningMain() Internal Storage of Configuration Data -------------------------------------- The interpreter state will be updated to include details of the configuration settings supplied during initialization by extending the interpreter state object with an embedded copy of the ``Py_CoreConfig`` and ``Py_Config`` structs. For debugging purposes, the configuration settings will be exposed as a ``sys._configuration`` simple namespace (similar to ``sys.flags`` and ``sys.implementation``. Field names will match those in the configuration structs, exception for ``hash_seed``, which will be deliberately excluded. These are *snapshots* of the initial configuration settings. They are not consulted by the interpreter during runtime. Stable ABI ---------- All of the APIs proposed in this PEP are excluded from the stable ABI, as embedding a Python interpreter involves a much higher degree of coupling than merely writing an extension. Backwards Compatibility ----------------------- Backwards compatibility will be preserved primarily by ensuring that Py_ReadConfiguration() interrogates all the previously defined configuration settings stored in global variables and environment variables, and that Py_EndInitialization() writes affected settings back to the relevant locations. One acknowledged incompatiblity is that some environment variables which are currently read lazily may instead be read once during interpreter initialization. As the PEP matures, these will be discussed in more detail on a case by case basis. The environment variables which are currently known to be looked up dynamically are: * ``PYTHONCASEOK``: writing to ``os.environ['PYTHONCASEOK']`` will no longer dynamically alter the interpreter's handling of filename case differences on import (TBC) * ``PYTHONINSPECT``: ``os.environ['PYTHONINSPECT']`` will still be checked after execution of the ``__main__`` module terminates The ``Py_Initialize()`` style of initialization will continue to be supported. It will use (at least some elements of) the new API internally, but will continue to exhibit the same behaviour as it does today, ensuring that ``sys.argv`` is not populated until a subsequent ``PySys_SetArgv`` call. All APIs that currently support being called prior to ``Py_Initialize()`` will continue to do so, and will also support being called prior to ``Py_BeginInitialization()``. To minimise unnecessary code churn, and to ensure the backwards compatibility is well tested, the main CPython executable may continue to use some elements of the old style initialization API. (very much TBC) Open Questions ============== * Is ``Py_IsRunningMain()`` worth keeping? * Should the answers to ``Py_IsInitialized()`` and ``Py_RunningMain()`` be exposed via the ``sys`` module? * Is the ``Py_Config`` struct too unwieldy to be practical? Would a Python dictionary be a better choice? * Would it be better to manage the flag variables in ``Py_Config`` as Python integers so the struct can be initialized with a simple ``memset(&config, 0, sizeof(*config))``? A System Python Executable ========================== When executing system utilities with administrative access to a system, many of the default behaviours of CPython are undesirable, as they may allow untrusted code to execute with elevated privileges. The most problematic aspects are the fact that user site directories are enabled, environment variables are trusted and that the directory containing the executed file is placed at the beginning of the import path. Currently, providing a separate executable with different default behaviour would be prohibitively hard to maintain. One of the goals of this PEP is to make it possible to replace much of the hard to maintain bootstrapping code with more normal CPython code, as well as making it easier for a separate application to make use of key components of ``Py_Main``. Including this change in the PEP is designed to help avoid acceptance of a design that sounds good in theory but proves to be problematic in practice. Cleanly supporting this kind of "alternate CLI" is the main reason for the proposed changes to better expose the core logic for deciding between the different execution modes supported by CPython: * script execution * directory/zipfile execution * command execution ("-c" switch) * module or package execution ("-m" switch) * execution from stdin (non-interactive) * interactive stdin Implementation ============== None as yet. Once I have a reasonably solid plan of attack, I intend to work on a reference implementation as a feature branch in my BitBucket sandbox [2_] References ========== .. [1] CPython interpreter initialization notes (http://wiki.python.org/moin/CPythonInterpreterInitialization) .. [2] BitBucket Sandbox (https://bitbucket.org/ncoghlan/cpython_sandbox) .. [3] \*nix getpath implementation (http://hg.python.org/cpython/file/default/Modules/getpath.c) .. [4] Windows getpath implementation (http://hg.python.org/cpython/file/default/PC/getpathp.c) .. [5] Site module documentation (http://docs.python.org/3/library/site.html) Copyright =========== This document has been placed in the public domain. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tjreedy at udel.edu Thu Jan 3 19:32:17 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 03 Jan 2013 13:32:17 -0500 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 1/3/2013 6:51 AM, Serhiy Storchaka wrote: > On 03.01.13 00:48, Terry Reedy wrote: In my original, the following quote is preceded by the following snipped line. "By default, equality check is identity check." >> I don't know anything about pickling or __sizeof__, by if one uses >> user-defined classes for nodes and edges, equality is identity, In that context, I pretty obviously meant user-defined class with the default equality as identity. The contrapositive is "If equality is not identity, one is not using a user-defined class with default identity." > If one uses a list, a dict, or user-defined class with defined __eq__, > equality is not identity. Yes, these are are examples of 'not a user-defined class with default identity' in which equality is not identity. I thought it was clear that I know about such things. > Yes, you can use an identity dict with mutable types! Yes, and my point was that we effectively already have such things. class Node(): pass Instances of Node wrap a dict as .__dict__, but are compared by wrapper identity rather than dict value. A set of such things is effectively an identity set. In 3.3+, if the instances all have the same attributes (if the .__dicts__ all have the same keys), there is only one (not too sparse) hashed list of keys for all instances and one corresponding (not too sparse) list of values for each instance. Also, which I did not say before: if one instead represents nodes by a unique integer or string or by a list that starts with such a unique identifier, then equality is again effectively identity and a set (or sequence) of such things is effectively an identity set. This corresponds to a standard database table where records have keys, so that the identity of records is not lost when reordered or removed from the table. >> so I don't see what would be gained. You are proposing (yet-another) dict variation for use in *python* code. That requires more justification than a few percent speedup in specialized usages. It should make python programming substantially easier in multiple use cases. I do not yet see this in regard to graph algorithm. -- Terry Jan Reedy From bruce at leapyear.org Thu Jan 3 19:55:02 2013 From: bruce at leapyear.org (Bruce Leban) Date: Thu, 3 Jan 2013 10:55:02 -0800 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: <201301022234.18839.storchaka@gmail.com> <9A91AF30-F0B3-441E-996C-F502291C1F35@masklinn.net> Message-ID: On Thu, Jan 3, 2013 at 3:50 AM, Serhiy Storchaka wrote: > On 02.01.13 23:33, Bruce Leban wrote: > >> I agree collections is the place to put it but that would give us three >> specialized subclasses of dictionary which cannot be combined. That is, >> I can have a dictionary with a default, one that is ordered or one that >> uses a key function but not any combination of those. It would seem >> better to have something like Haoyi Li suggested: >> >> collections.Dictionary(**default=None, ordered=False, key=None) --> a >> dict >> subclass >> > > I doubt if such combinations have a sense. At least not all features can > be combined. I agree that all feature combinations may not make sense. I think a default ordered dict would be useful and if other dict variations are created, combinations of them may be useful too. I don't know if identity dicts are useful enough to add, but I think that if another dict variation is added, using a factory should be considered. I have specifically wanted a sorted default dict in the past. (A sorted dict is like an ordered dict but the order is sorted by key not by insertion order. It is simulated by iterating over sorted(dict.keys()). I doubt that sorted dict is common enough to be worth adding, but if it were it would be unfortunate to not have a default variation of it.) --- Bruce Check this out: http://kck.st/YeqGxQ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Thu Jan 3 21:43:29 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 03 Jan 2013 22:43:29 +0200 Subject: [Python-ideas] Identity dicts and sets In-Reply-To: References: Message-ID: On 03.01.13 20:32, Terry Reedy wrote: > Yes, and my point was that we effectively already have such things. > > class Node(): pass > > Instances of Node wrap a dict as .__dict__, but are compared by wrapper > identity rather than dict value. A set of such things is effectively an > identity set. In 3.3+, if the instances all have the same attributes (if > the .__dicts__ all have the same keys), there is only one (not too > sparse) hashed list of keys for all instances and one corresponding (not > too sparse) list of values for each instance. > > Also, which I did not say before: if one instead represents nodes by a > unique integer or string or by a list that starts with such a unique > identifier, then equality is again effectively identity and a set (or > sequence) of such things is effectively an identity set. This > corresponds to a standard database table where records have keys, so > that the identity of records is not lost when reordered or removed from > the table. Not always you can choose node type. Sometimes nodes already exists and you should just work with them. > >> so I don't see what would be gained. > > You are proposing (yet-another) dict variation for use in *python* code. In fact I think first of all about C code. Now using identity dict/set idiom is rather cumbersome in C code. With standard IdentityDict it should be so simple as using an ordinary dict. > That requires more justification than a few percent speedup in > specialized usages. It should make python programming substantially > easier in multiple use cases. I do not yet see this in regard to graph > algorithm. Identity dict/set idiom used at least in followed stdlib modules: _threading_local, xmlrpc.client, json, lib2to3 (xrange fixer), copy, unittest.mock, idlelib (rpc, remote debugger and browser), ctypes, doctest, pickle, cProfile. May be it is implicitly used in some other places or can be used. From guido at python.org Fri Jan 4 23:33:44 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 4 Jan 2013 14:33:44 -0800 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: [Markus sent this to me off-list, but agreed to me responding on-list, quoting his entire message.] On Wed, Dec 26, 2012 at 2:38 PM, Markus wrote: > Hi, Hi Markus, I don't believe we've met before, have we? It would probably help if you introduced yourself and your experience, since our past experiences color our judgment. > as I've been waiting for this to happen, I decided to speak up. > While I really look forward to this, I disagree with the PEP. Heh, we can't all agree on everything. :-) > First shoot should be getting a well established event loop into python. Perhaps. What is your definition of an event loop? > libev is great, it takes care of operating system specialities, and > only does a single job, providing an event loop. It is also written for C, and I presume much of its API design was influenced by the conventions and affordabilities of that language. > This event loop can take care of timers, sockets and signals, But sockets are not native on Windows, and I am making some effort with PEP 3156 to efficiently support higher-level abstractions without tying them to sockets. (The plan is to support IOCP on Windows. The previous version of Tulip already had a branch that did support that, as a demonstration of the power of this abstraction.) > pyev, a > great python wrapper for libev already provides this simple eventing > facility in python. But, being a libev wrapper, it is likely also strongly influenced by C. > In case you embed python in a c program, the libev default loop of the > python code and c code can even be shared, providing a great amount of > flexibility. Only if the C code also uses libev, of course. But C programs may use other event mechanisms -- e.g. AFAIK there are alternatives to libev (during the early stages of Tulip development I chatted a bit with one of the original authors of libevent, Niels Provos, and I believe there's also something called libuv), and GUI frameworks (e.g. X, Qt, Gtk, Wx) tend to have their own event loop. PEP 3156 is designed to let alternative *implementations* of the same *interface* be selected at run time. Hopefully it is possible to provide a conforming implementation using libev -- then your goal (smooth interoperability with C code using libev) is obtained. It's possible that in order to do that the PEP 3156 interface may have to be refactored into separate pieces. The Tulip implementation already has separate "pollster" implementations (which concern themselves *only* with polling for I/O using select, poll, or other alternatives). It probably makes sense to factor the part that implements transports out as well. However, the whole point of including transports and protocols (and futures) in the PEP is that some platforms may want to implement the same high-level API (e.g. create a transport that connects to a certain host/port) using a different approach altogether, e.g. on Windows the transport might not even use sockets. OTOH on UNIX it may be possible to add file descriptors representing pipes and pseudo-ttys. > libev is great as it is small - it provides exactly what's required, > and nothing beyond. Depending on your requirements. :-) > getaddrinfo/getnameinfo/create_transport are out of scope from a event > loop point of view. > This functionality already exists in python, it just does not use a > event loop and is blocking, as every other io related api. It wasn't random to add these. The "event loop" in PEP 3156 provides abstractions that leave the platform free to implement connections using the appropriate native constructs without letting those constructs "leak" into the application -- after all, whether you're on UNIX or on Windows, a TCP connection represents the same abstraction, but the network stack may have a very different interface. > I'd propose not to replicate the functionality in the event loop > namespace, but to extend the existing implementations - by allowing to > provide an event loop/callback/ctx as optional args which get used. That's an interface choice that I would regret (I really don't like writing code using callbacks). (It would also be harder to implement initially as a 3rd party framework. At the lowest level, no changes to Python itself are needed -- it already supports non-blocking sockets, for example. But adding optional callbacks to existing low-level APIs would require changes throughout the stdlib.) > If you specify something like pyev as PEP, you can still come up with > another PEP which defines the semantics for upper layer protocols like > udp/tcp on IPv4/6, which can be used to take care of dns and > 'echo-server-connections'. I could split up the PEP, but that wouldn't really change anything, since to me it is still a package deal. I am willing to put an effort into specifying a low-level event loop because I know that I can still write high-level code which is (mostly) free of callbacks, using futures, tasks and the yield-from construct. And in order to do that I need a minimum set of high-level abstractions such as getaddrinfo() and transport creation (the exact names of the transport creation methods are still under debate, as are the details of their signatures, but the need for them is established without a doubt in my mind). I note that the stdlib socket module has roughly the same set of abstractions bundled together: - socket objects - getaddrinfo(), getnameinfo() - create_connection() - the makefile() methods on socket objects, which create buffered streams PEP 3156 offers alternatives for all of these, using higher-level abstractions that have been developed and proven in practice by Twisted, *and* offers a path to interop to frameworks that previously couldn't very well interoperate -- Twisted, Tornado, and others have traditionally been pretty segregated, but with PEP 3156 they can interoperate both through the event loop and through Futures (which are friendly both to a callback style and to yield-from). > Anyway, I really hope you'll have a look on libev and pyev, both is > great and well tested software and may give you an idea what people > who dedicate themselves to event loops came up with already in terms > of names, subclassing, requirements, guarantees and workarounds for > platform specific failures (kqueue, epoll ...). I will certainly have a look! I am not so concerned about naming (it seems inevitable that everyone uses somewhat different terminology anyway, and it is probably better not to reuse terms when the meaning is different), but I do like to look at guarantees (or the absence thereof!) and best practices for dealing with the differences between platforms. > http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod > http://code.google.com/p/pyev/ > > All together, I'd limit the scope of the PEP to the API of the event > loop, just focussing on io/timers/signals and propose to extend > existing API to be usable with an event loop, instead of replicating > it. You haven't convinced me about this. However, you can help me by comparing the event loop part of PEP 3156 (ignoring anything that returns or takes a Future) to libev and pointing out things (either specific APIs or certain guarantees or requirements) that would be hard to implement using libev, as well as useful features in libev that you think every event loop should have. > For naming I'd prefer 'watcher' over 'Handler'. Hm, 'watcher' to me sounds more active than the behavior I have in mind for this class. It is just a reification of a specific function and some arguments to pass to it, with the ability to cancel the call altogether. Thanks for writing! -- --Guido van Rossum (python.org/~guido) From djmitche at gmail.com Fri Jan 4 23:50:49 2013 From: djmitche at gmail.com (Dustin J. Mitchell) Date: Fri, 4 Jan 2013 17:50:49 -0500 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: As the maintainer of a pretty large, complex app written in Twisted, I think this is great. I look forward to a future of being able to select from a broad library of async tools, and being able to write tools that can be used outside of Twisted. Buildbot began, lo these many years ago, doing a lot of things in memory on on local disk, neither of which require asynchronous IO. So a lot of API methods did not originally return Deferreds. Those methods are then used by other methods, many of which also do not return Deferreds. Now, we want to use a database backend, and parallelize some of the operations, meaning that the methods need to return a Deferred. Unfortunately, that requires a complete tree traversal of all of the methods and methods that call them, rewriting them to take and return Deferreds. There's no "halfway" solution. This is a little easier with generators (@inlineCallbacks), since the syntax doesn't change much, but it's a significant change to the API (in fact, this is a large part of the reason for the big rewrite for Buildbot-0.9.x). I bring all this up to say, this PEP will introduce a new "kind" of method signature into standard Python, one which the caller must know, and the use of which changes the signature of the caller. That can cause sweeping changes, and debugging those changes can be tricky. Two things can help: First, `yield from somemeth()` should work fine even if `somemeth` is not a coroutine function, and authors of async tools should be encouraged to use this form to assist future-compatibility. Second, `somemeth()` without a yield should fail loudly if `somemeth` is a coroutine function. Otherwise, the effects can be pretty confusing. In http://code.google.com/p/uthreads, I accomplished the latter by taking advantage of garbage collection: if the generator is garbage collected before it's begun, then it's probably not been yielded. This is a bit gross, but good enough as a debugging technique. On the topic of debugging, I also took pains to make sure that tracebacks looked reasonable, filtering out scheduler code[1]. I haven't looked closely at Tulip to see if that's a problem. Most of the "noise" in the tracebacks came from the lack of 'yield from', so it may not be an issue at all. Dustin P.S. Apologies for the bad threading - I wasn't on the list when this was last posted. [1] http://code.google.com/p/uthreads/source/browse/trunk/uthreads/core.py#253 From guido at python.org Fri Jan 4 23:59:40 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 4 Jan 2013 14:59:40 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Jan 4, 2013 at 2:38 PM, Dustin Mitchell wrote: > As the maintainer of a pretty large, complex app written in Twisted, I think > this is great. I look forward to a future of being able to select from a > broad library of async tools, and being able to write tools that can be used > outside of Twisted. Thanks. Me too. :-) > Buildbot began, lo these many years ago, doing a lot of things in memory on > on local disk, neither of which require asynchronous IO. So a lot of API > methods did not originally return Deferreds. Those methods are then used by > other methods, many of which also do not return Deferreds. Now, we want to > use a database backend, and parallelize some of the operations, meaning that > the methods need to return a Deferred. Unfortunately, that requires a > complete tree traversal of all of the methods and methods that call them, > rewriting them to take and return Deferreds. There's no "halfway" solution. > This is a little easier with generators (@inlineCallbacks), since the syntax > doesn't change much, but it's a significant change to the API (in fact, this > is a large part of the reason for the big rewrite for Buildbot-0.9.x). > > I bring all this up to say, this PEP will introduce a new "kind" of method > signature into standard Python, one which the caller must know, and the use > of which changes the signature of the caller. That can cause sweeping > changes, and debugging those changes can be tricky. Yes, and this is the biggest unproven point of the PEP. (The rest is all backed by a decade or more of experience.) > Two things can help: > > First, `yield from somemeth()` should work fine even if `somemeth` is not a > coroutine function, and authors of async tools should be encouraged to use > this form to assist future-compatibility. Second, `somemeth()` without a > yield should fail loudly if `somemeth` is a coroutine function. Otherwise, > the effects can be pretty confusing. That would be nice. But the way yield from and generators work, that's hard to accomplish without further changes to the language -- and I don't want to have to change the language again (at least not immediately -- maybe in a few releases, after we've learned what the real issues are). The best I can do for the first requirement is to define @coroutine in a way that if the decorated function isn't a generator, it is wrapped in one. For the second requirement, if you call somemeth() and ignore the result, nothing happens at all -- this is indeed infuriating but I see no way to change this.(*) If you use the result, well, Futures have different attributes than most other objects so hopefully you'll get a loud AttributeError or TypeError soon, but of course if you pass it into something else which uses it, it may still be difficult to track. Hopefully these error messages provide a hint: >>> f.foo Traceback (most recent call last): File "", line 1, in AttributeError: 'Future' object has no attribute 'foo' >>> f() Traceback (most recent call last): File "", line 1, in TypeError: 'Future' object is not callable >>> (*) There's a heavy gun we might use, but I would make this optional, as a heavy duty debugging mode only. @coroutine could wrap generators in a lightweight object with a __del__ method and an __iter__ method. If __del__ is called before __iter__ is ever called, it could raise an exception or log a warning. But this probably adds too much overhead to have it always enabled. > In http://code.google.com/p/uthreads, I accomplished the latter by taking > advantage of garbage collection: if the generator is garbage collected > before it's begun, then it's probably not been yielded. This is a bit > gross, but good enough as a debugging technique. Eh, yeah, what I said. :-) > On the topic of debugging, I also took pains to make sure that tracebacks > looked reasonable, filtering out scheduler code[1]. I haven't looked > closely at Tulip to see if that's a problem. Most of the "noise" in the > tracebacks came from the lack of 'yield from', so it may not be an issue at > all. One of the great advantages of using yield from is that the tracebacks automatically look nice. > Dustin > > [1] > http://code.google.com/p/uthreads/source/browse/trunk/uthreads/core.py#253 -- --Guido van Rossum (python.org/~guido) From josh at bartletts.id.au Sat Jan 5 09:52:11 2013 From: josh at bartletts.id.au (Joshua Bartlett) Date: Sat, 5 Jan 2013 18:52:11 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: I've just read through PEP 3156 and I thought I'd resurrect this thread from March. Giving context managers the ability to react to yield and send, and especially to yield from, would allow the eventual introduction of asynchronous locks using PEP 3156 futures. This is one of the open issues listed in the PEP. Cheers, J. D. Bartlett. On 30 March 2012 10:00, Joshua Bartlett wrote: > I'd like to propose adding the ability for context managers to catch and > handle control passing into and out of them via yield and generator.send() > / generator.next(). > > For instance, > > class cd(object): > def __init__(self, path): > self.inner_path = path > > def __enter__(self): > self.outer_path = os.getcwd() > os.chdir(self.inner_path) > > def __exit__(self, exc_type, exc_val, exc_tb): > os.chdir(self.outer_path) > > def __yield__(self): > self.inner_path = os.getcwd() > os.chdir(self.outer_path) > > def __send__(self): > self.outer_path = os.getcwd() > os.chdir(self.inner_path) > > Here __yield__() would be called when control is yielded through the with > block and __send__() would be called when control is returned via .send() > or .next(). To maintain compatibility, it would not be an error to leave > either __yield__ or __send__ undefined. > > The rationale for this is that it's sometimes useful for a context manager > to set global or thread-global state as in the example above, but when the > code is used in a generator, the author of the generator needs to make > assumptions about what the calling code is doing. e.g. > > def my_generator(path): > with cd(path): > yield do_something() > do_something_else() > > Even if the author of this generator knows what effect do_something() and > do_something_else() have on the current working directory, the author needs > to assume that the caller of the generator isn't touching the working > directory. For instance, if someone were to create two my_generator() > generators with different paths and advance them alternately, the resulting > behaviour could be most unexpected. With the proposed change, the context > manager would be able to handle this so that the author of the generator > doesn't need to make these assumptions. > > Naturally, nested with blocks would be handled by calling __yield__ from > innermost to outermost and __send__ from outermost to innermost. > > I rather suspect that if this change were included, someone could come up > with a variant of the contextlib.contextmanager decorator to simplify > writing generators for this sort of situation. > > Cheers, > > J. D. Bartlett > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jan 5 20:23:51 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 5 Jan 2013 11:23:51 -0800 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: Possibly (though it will have to be a separate PEP -- PEP 3156 needs to be able to run on unchanged Python 3.3). Does anyone on this thread have enough understanding of the implementation of context managers and generators to be able to figure out how this could be specified and implemented (or to explain why it is a bad idea, or impossible)? --Guido On Sat, Jan 5, 2013 at 12:52 AM, Joshua Bartlett wrote: > I've just read through PEP 3156 and I thought I'd resurrect this thread from > March. Giving context managers the ability to react to yield and send, and > especially to yield from, would allow the eventual introduction of > asynchronous locks using PEP 3156 futures. This is one of the open issues > listed in the PEP. > > Cheers, > > J. D. Bartlett. > > > On 30 March 2012 10:00, Joshua Bartlett wrote: >> >> I'd like to propose adding the ability for context managers to catch and >> handle control passing into and out of them via yield and generator.send() / >> generator.next(). >> >> For instance, >> >> class cd(object): >> def __init__(self, path): >> self.inner_path = path >> >> def __enter__(self): >> self.outer_path = os.getcwd() >> os.chdir(self.inner_path) >> >> def __exit__(self, exc_type, exc_val, exc_tb): >> os.chdir(self.outer_path) >> >> def __yield__(self): >> self.inner_path = os.getcwd() >> os.chdir(self.outer_path) >> >> def __send__(self): >> self.outer_path = os.getcwd() >> os.chdir(self.inner_path) >> >> Here __yield__() would be called when control is yielded through the with >> block and __send__() would be called when control is returned via .send() or >> .next(). To maintain compatibility, it would not be an error to leave either >> __yield__ or __send__ undefined. >> >> The rationale for this is that it's sometimes useful for a context manager >> to set global or thread-global state as in the example above, but when the >> code is used in a generator, the author of the generator needs to make >> assumptions about what the calling code is doing. e.g. >> >> def my_generator(path): >> with cd(path): >> yield do_something() >> do_something_else() >> >> Even if the author of this generator knows what effect do_something() and >> do_something_else() have on the current working directory, the author needs >> to assume that the caller of the generator isn't touching the working >> directory. For instance, if someone were to create two my_generator() >> generators with different paths and advance them alternately, the resulting >> behaviour could be most unexpected. With the proposed change, the context >> manager would be able to handle this so that the author of the generator >> doesn't need to make these assumptions. >> >> Naturally, nested with blocks would be handled by calling __yield__ from >> innermost to outermost and __send__ from outermost to innermost. >> >> I rather suspect that if this change were included, someone could come up >> with a variant of the contextlib.contextmanager decorator to simplify >> writing generators for this sort of situation. >> >> Cheers, >> >> J. D. Bartlett > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- --Guido van Rossum (python.org/~guido) From barry at python.org Sat Jan 5 22:42:20 2013 From: barry at python.org (Barry Warsaw) Date: Sat, 5 Jan 2013 16:42:20 -0500 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence References: Message-ID: <20130105164220.09d654be@anarchist.wooz.org> Hi Nick, PEP 432 is looking very nice. It'll be fun to watch the implementation come together. :) Some comments... The start up sequences: > * Pre-Initialization - no interpreter available > * Initialization - interpreter partially available What about "Initializing"? > * Initialized - full interpreter available, __main__ related metadata > incomplete > * Main Execution - optional state, __main__ related metadata populated, > bytecode executing in the __main__ module namespace What is "optional" about this state? Maybe it should be called "Operational"? > ... separate system Python (spython) executable ... I love the idea, but I'm not crazy about the name. What about `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :) > What about sys.implementation? > as it failed to be updated for the virtual environment support added in > Python 3.3 (detailed in PEP 420). venv is defined in PEP 405 (there are two cases of mis-referencing). Note that there may be other important build time settings on some platforms. An example is Debian/Ubuntu, where we define the multiarch triplet in the configure script, and pass that through Makefile(.pre.in) to sysmodule.c for exposure as sys.implementation._multiarch. > For a command executed with -c, it will be the string "-c" > For explicitly requested input from stdin, it will be the string "-" Wow, I couldn't believe it but it's true! That seems crazy useless. :) > Embedding applications must call Py_SetArgv themselves. The CPython logic > for doing so is part of Py_Main() and is not exposed separately. However, > the runpy module does provide roughly equivalent logic in runpy.run_module > and runpy.run_path. As I've mentioned before on the python-porting mailing list, this is actually more difficult than it seems because main() takes char*s but Py_SetArgv() and Py_SetProgramName() takes wchar_t*s. Maybe Python's own conversion could be refactored to make this easier either as part of this PEP or after the PEP is implemented. > int Py_ReadConfiguration(PyConfig *config); > The config argument should be a pointer to a Python dictionary. For any > supported configuration setting already in the dictionary, CPython will > sanity check the supplied value, but otherwise accept it as correct. So why not define this to take a PyObject* or a PyDictObject* ? (also: the Py_Config struct members need the correct concrete type pointers, e.g. PyDictObject*) > Alternatively, settings may be overridden after the Py_ReadConfiguration > call (this can be useful if an embedding application wants to adjust a > setting rather than replace it completely, such as removing sys.path[0]). How will setting something after Py_ReadConfiguration() is called change a value such as sys.path? Or is this the reason why you pass a Py_Config to Py_EndInitialization()? (also, see the type typo in the definition of Py_EndInitialization()) Also, I suggest taking the opportunity to change the sense of flags such as no_site and dont_write_bytecode. I find it much more difficult to reason that "dont_write_bytecode = 0" means *do* write bytecode, rather than "write_bytecode = 1". I.e. positives are better than double-negatives. > sys.argv[0] may not yet have its final value > it will be -m when executing a module or package with CPython Gosh, wouldn't it be nice if this could have a more useful value? > Initial thought is that hiding the various options behind a single API would > make that API too complicated, so 3 separate APIs is more likely: +1 > The interpreter state will be updated to include details of the > configuration settings supplied during initialization by extending the > interpreter state object with an embedded copy of the Py_CoreConfig and > Py_Config structs. Couldn't it just have a dict with all the values from both structs collapsed into it? > For debugging purposes, the configuration settings will be exposed as a > sys._configuration simple namespace I suggest un-underscoring the name and making it public. It might be useful for other than debugging purposes. > Is Py_IsRunningMain() worth keeping? Perhaps. Does it provide any additional information above Py_IsInitialized()? > Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via > the sys module? I can't think of a use case. > Is the Py_Config struct too unwieldy to be practical? Would a Python > dictionary be a better choice? Although I see why you've spec'd it this way, I don't like having *two* config structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter would probably be fine, and in fact you could copy the Py_Config values into it (when possible during the init sequence) and expose it in the sys module. > Would it be better to manage the flag variables in Py_Config as Python > integers so the struct can be initialized with a simple memset(&config, 0, > sizeof(*config))? Would we even notice the optimization? > A System Python Executable This should probably at least mention Christian's idea of the -I flag (which I think hasn't been PEP'd yet). We can bikeshed about the name of the executable later. :) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From tjreedy at udel.edu Sat Jan 5 23:54:52 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 05 Jan 2013 17:54:52 -0500 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: <20130105164220.09d654be@anarchist.wooz.org> References: <20130105164220.09d654be@anarchist.wooz.org> Message-ID: On 1/5/2013 4:42 PM, Barry Warsaw wrote: > Also, I suggest taking the opportunity to change the sense of flags such as > no_site and dont_write_bytecode. I find it much more difficult to reason that > "dont_write_bytecode = 0" means *do* write bytecode, rather than > "write_bytecode = 1". I.e. positives are better than double-negatives. IE, you prefer positive flags, with some on by default, over having all flags indicate a non-default condition. I would too, but I don't hack on the C code base. 'dont_write_bytecode' is especially ugly. In any case, this seems orthogonal to Nick's PEP and should be a separate discussion (on pydev), tracker issue, and patch. Is the current tradition just happenstance or something that some of the major C developers strongly care about? -- Terry Jan Reedy From guido at python.org Sun Jan 6 00:30:41 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 5 Jan 2013 15:30:41 -0800 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Fri, Jan 4, 2013 at 6:53 PM, Markus wrote: > On Fri, Jan 4, 2013 at 11:33 PM, Guido van Rossum wrote: >>On Wed, Dec 26, 2012 at 2:38 PM, Markus wrote: >>> First shoot should be getting a well established event loop into python. >> >> Perhaps. What is your definition of an event loop? > > I ask the loop to notify me via callback if something I care about happens. Heh. That's rather too general -- it depends on "something I care about" which could be impossible to guess. :-) > Usually that's fds and read/writeability. Ok, although on some platforms it can't be a fd (UNIX-style small integer) but some other abstraction, e.g. a socket *object* in Jython or a "handle" on Windows (but I am already starting to repeat myself :-). > I create a data structure which has the fd, the event I care about, > the callback and userdata, pass it to the loop, and the loop will take > care. > > Next, timers, same story, > I create a data structure which has the time I care about, the > callback and userdata, pass it to the loop, and the loop will take > care. The "create data structure" part is a specific choice of interface style, not necessarily the best for Python. Most event loop implementations I've seen for Python (pyev excluded) just have various methods that express everything through the argument list, not with a separate data structure. > Signals - sometimes having signals in the event loop is handy too. > Same story. Agreed, I've added this to the open issues section in the PEP. Do you have a suggestion for a minimal interface for signal handling? I could imagine the following: - add_signal_handler(sig, callback, *args). Whenever signal 'sig' is received, arrange for callback(*args) to be called. Returns a Handler which can be used to cancel the signal callback. Specifying another callback for the same signal replaces the previous handler (only one handler can be active per signal). - remove_signal_handler(sig). Removes the handler for signal 'sig', if one is set. Is anything else needed? Note that Python only receives signals in the main thread, and the effect may be undefined if the event loop is not running in the main thread, or if more than one event loop sets a handler for the same signal. It also can't work for signals directed to a specific thread (I think POSIX defines a few of these, but I don't know of any support for these in Python.) >> But sockets are not native on Windows, and I am making some effort >> with PEP 3156 to efficiently support higher-level abstractions without >> tying them to sockets. (The plan is to support IOCP on Windows. The >> previous version of Tulip already had a branch that did support that, >> as a demonstration of the power of this abstraction.) > > Supporting IOCP on windows is absolutely required, as WSAPoll is > broken and won't be fixed. > http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 Wow. Now I'm even more glad that we're planning to support IOCP. >> Only if the C code also uses libev, of course. But C programs may use >> other event mechanisms -- e.g. AFAIK there are alternatives to libev >> (during the early stages of Tulip development I chatted a bit with one >> of the original authors of libevent, Niels Provos, and I believe >> there's also something called libuv), and GUI frameworks (e.g. X, Qt, >> Gtk, Wx) tend to have their own event loop. > > libuv is a wrapper around libev -adding IOCP- which adds some other > things besides an event loop and is developed for/used in node.js. Ah, that's helpful. I did not realize this after briefly skimming the libuv page. (And the github logs suggest that it may no longer be the case: https://github.com/joyent/libuv/commit/1282d64868b9c560c074b9c9630391f3b18ef633 >> PEP 3156 is designed to let alternative *implementations* of the same >> *interface* be selected at run time. Hopefully it is possible to >> provide a conforming implementation using libev -- then your goal >> (smooth interoperability with C code using libev) is obtained. > > Smooth interoperability is not a major goal here - it's great if you > get it for free. > I'm just looking forward an event loop in the stdlib I want to use. Heh, so stop objecting. :-) >> (It would also be harder to implement initially as a 3rd party >> framework. At the lowest level, no changes to Python itself are needed >> -- it already supports non-blocking sockets, for example. But adding >> optional callbacks to existing low-level APIs would require changes >> throughout the stdlib.) > > As a result - making the stdlib async io aware - the complete stdlib. > Would be great. No matter what API style is chosen, making the entire stdlib async aware will be tough. No matter what you do, the async support will have to be "pulled through" every abstraction layer -- e.g. making sockets async-aware doesn't automatically make socketserver or urllib2 async-aware(*). With the strong requirements for backwards compatibility, in many cases it may be easier to define a new API that is suitable for async use instead of trying to augment existing APIs. (*) Unless you use microthreads, like gevent, but this has its own set of problems -- I don't want to get into that here, since we seem to at least agree on the need for an event loop with callbacks. >> I am not so concerned about naming (it >> seems inevitable that everyone uses somewhat different terminology >> anyway, and it is probably better not to reuse terms when the meaning >> is different), but I do like to look at guarantees (or the absence >> thereof!) and best practices for dealing with the differences between >> platforms. > > Handler - the best example for not re-using terms. ??? (Can't tell if you're sarcastic or agreeing here.) >> You haven't convinced me about this. > > Fine, if you include transports, I'll pick on the transports as well ;) ??? (Similar.) >> However, you can help me by >> comparing the event loop part of PEP 3156 (ignoring anything that >> returns or takes a Future) to libev and pointing out things (either >> specific APIs or certain guarantees or requirements) that would be >> hard to implement using libev, as well as useful features in libev >> that you think every event loop should have. > > > Note: In libev only the "default event loop" can have timers. Interesting. This seems an odd constraint. > EventLoop > * run() - ev_run(struct ev_loop) > * stop() - ev_break(EV_UNLOOP_ALL) > * run_forever() - registering an idle watcher will keep the loop alive > * run_once(timeout=None) - registering an timer, have the timer stop() the loop > * call_later(delay, callback, *args) - ev_timer > * call_repeatedly(interval, callback, **args) - ev_timer (periodic) > * call_soon(callback, *args) - Equivalent to call_later(0, callback, *args). > - call_soon_threadsafe(callback, *args) - it would be better to have > the event loops taking care of signals too, else waking up an ev_async > in the loop which checks a async queue which contains the required > information to register the call_soon callback would be possible Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race conditions when call_soon_threadsafe() is called from a signal handler or other thread(*) -- but I don't know if that is relevant or not. (*) http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#448 and http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#576 > - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev > does not do dns > - getnameinfo(sockaddr, flags=0) - libev does not do dns Note that these exist at least in part so that an event loop implementation may *choose* to implement its own DNS handling (IIUC Twisted has this), whereas the default behavior is just to run socket.getaddrinfo() -- but in a separate thread because it blocks. (This is a useful test case for run_in_executor() too.) > - create_transport(protocol_factory, host, port, **kwargs) - libev > does not do transports > - start_serving(protocol_factory, host, port, **kwds) - libev does > not do transports > * add_reader(fd, callback, *args) - create a ev_io watcher with EV_READ > * add_writer(fd, callback, *args) - create ev_io watcher with EV_WRITE > * remove_reader(fd) - in libev you have to name the watcher you want > to stop, you can not remove watchers/handlers by fd, workaround is > maintaining a dict with fd:Handler in the EventLoop Ok, this does not sound like a show-stopper for a conforming PEP 3156 implementation on top of libev then, right? Just a minor inconvenience. I'm sure everyone has *some* impedance mismatches to deal with. > * remove_writer(fd) - same > * add_connector(fd, callback, *args) - poll for writeability, getsockopt, done TBH, I'm not 100% convinced of the need for add_connector(), but Richard Oudkerk claims that it is needed for Windows. (OTOH if WSAPoll() is too broken to bother, maybe we don't need it. It's a bit of a nuisance because code that uses add_writer() instead works just fine on UNIX but would be subtly broken on Windows, leading to disappointments when porting apps to Windows. I'd rather have things break on all platforms, or on none...) > * remove_connector(fd) - same as with all other remove-by-fd methods > > As Transport are part of the PEP - some more: > > EventLoop > * create_transport(protocol_factory, host, port, **kwargs) > kwargs requires "local" - local address as tuple like > ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6 > link local scope. > or ('192.168.2.1',5060) - bind local port for udp Not sure I understand. What socket.connect() (or other API) call parameters does this correspond to? What can't expressed through the host and port parameters? > * start_serving(protocol_factory, host, port, **kwds) > what is the behaviour for SOCK_DGRAM - does this multiplex sessions > based on src host/port / dst host/port - I'd love it. TBH I haven't thought much about datagram transports. It's been years since I used UDP. I guess the API may have to distinguish between connected and unconnected UDP. I think the transport/protocol API will be different than for SOCK_STREAM: for every received datagram, the transport will call protocol.datagram_received(data, address), (the address will be a dummy for connected use) and to send a datagram, the protocol must call tranport.write_datagram(data, [address]), which returns immediately. Flow control (if supported) should work the same as for streams: if the transport finds its buffers exceed a certain limit it will tell the protocol to back off by calling protocol.pause(). > Handler: > Requiring 2 handlers for every active connection r/w is highly ineffective. How so? What is the concern? The actions of the read and write handler are typically completely different, so the first thing the handler would have to do is to decide whether to call the read or the write code. Also, depending on flow control, only one of the two may be active. If you are after minimizing the number of records passed to [e]poll or kqueue, you can always collapse the handlers at that level and distinguish between read/write based on the mask and recover the appropriate user-level handler from the readers/writers array (and this is what Tulip's epoll pollster class does). PS. Also check out this issue, where an implementation of *just* Tulip's pollster class for the stdlib is being designed: http://bugs.python.org/issue16853; also check out the code reviews here; http://bugs.python.org/review/16853/ > I'd prefer to be able to create a Handler from a loop. > Handler = EventLoop.create_handler(socket, callback, events) > and have the callback called with the returned events, so I can > multiplex read/write op in the callback. Hm. See above. > Additionally, I can .stop() the handler without having to know the fd, > .stop() the handler, change the events the handler is looking for, > restart the handler with .start(). > In your proposal, I'd create a new handler every time I want to sent > something, poll for readability - discard the handler when I'm done, > create a new one for the next sent. The questions are, does it make any difference in efficiency (when using Python -- the performance of the C API is hardly relevant here), and how often does this pattern occur. > Timers: > Not in the PEP - re-arming a timer > lets say I want to do something if nothing happens for 5 seconds. > I create a timer call_later(5.,cb), if something happens, I need to > cancel the timer and create a new one. If there was a Timer: > Timer.stop() > Timer.set(5) > Timer.start() Actually it's one less call using the PEP's proposed API: timer.cancel() timer = loop.call_later(5, callback) Which of the two idioms is faster? Who knows? libev's pattern is probably faster in C, but that has little to bear on the cost in Python. My guess is that the amount of work is about the same -- the real cost is that you have to make some changes the heap used to keep track of all timers in the order in which they will trigger, and those changes are the same regardless of how you style the API. > Transports: > I think SSL should be a Protocol not a transport - implemented using BIO pairs. > If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have > TCP / SSL / HTTP as https or TCP / SSL / SOCKS / HTTP as https via > ssl enabled socks proxy without having to much problems. Another > example, shaping a connection TCP / RATELIMIT / HTTP. Interesting idea. This may be up to the implementation -- not every implementation may have BIO wrappers available (AFAIK the stdlib doesn't), so the stackability may not be easy to implement everywhere. In any case, when you stack things like this, the stack doesn't look like transport<-->protocol<-->protocol<-->protocol; rather, it's A<-->B<-->C<-->D where each object has a "left" and a "right" API. Each arrow connects the "transport (right) half" of the object on its left (e.g. A) to the "protocol (left) half" of the object on the arrow's right (e.g. B). So maybe we can visualise this as T1 <--> P2:T2 <--> P3:T3 <--> P4. > Having SSL as a Protocol allows closing the SSL connection without > closing the TCP connection, re-using the TCP connection, re-using a > SSL session cookie during reconnect of the SSL Protocol. That seems a pretty esoteric use case (though given your background in honeypots maybe common for you :-). It also seems hard to get both sides acting correctly when you do this (but I'm certainly no SSL expert -- I just want it supported because half the web is inaccessible these days if you don't speak SSL, regardless of whether you do any actual verification). All in all I think that stackable transports/protocols are mostly something that is enabled by the interfaces defined here (the PEP takes care not to specify any base classes from which you must inherit -- you must just implement certain methods, and the rest is duck typing) but otherwise does not concern the PEP much. The only concern I have, really, is that the PEP currently hints that both protocols and transports might have pause() and resume() methods for flow control, where the protocol calls transport.pause() if protocol.data_received() is called too frequently, and the transport calls protocol.pause() if transport.write() has buffered more data than sensible. But for an object that is both a protocol and a transport, this would make it impossible to distinguish between pause() calls by its left and right neighbors. So maybe the names must differ. Given the tendency of transport method names to be shorter (e.g. write()) vs. the longer protocol method names (data_received(), connection_lost() etc.), perhaps it should be transport.pause() and protocol.pause_writing() (and similar for resume()). > * reconnect() - I'd love to be able to reconnect a transport But what does that mean in general? It depends on the protocol (e.g. FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated upon a reconnect, and how much data may have to be re-sent. This seems a higher-level feature that transports and protocols will have to implement themselves. > * timers - Transports need timers I think you mean timeouts? > * dns-resolve-timeout - dns can be slow > * connecting-timeout - connecting can take too much time, more than > we want to wait > * idle-timeout ( no action on the connection for a while ) - call > protocol.timeout_idle() > * sustain-timeout ( max session time ) - close() transport > * ssl-handshake-timeout ( in case ssl is a Transport ) - close transport > * close-timeout (shutdown is async) - close transport hard > * reconnect-timeout - (wait some seconds before reconnecting) - > reconnect connection This is an interesting point. I think some of these really do need APIs in the PEP, others may be implemented using existing machinery (e.g. call_later() to schedule a callback that calls cancel() on a task). I've added a bullet on this to Open Issue. > Now, in case we connect to a host by name, and have multiple addresses > resolved, and the first connection can not be established, there is no > way to 'reconnect()' - as the protocol does not yet exist. Twisted suggested something here which I haven't implemented yet but which seems reasonable -- using a series of short timeouts try connecting to the various addresses and keep the first one that connects successfully. If multiple addresses connect after the first timeout, too bad, just close the redundant sockets, little harm is done (though the timeouts should be tuned that this is relatively rare, because a server may waste significant resources on such redundant connects). > For almost all the timeouts I mentioned - the protocol needs to take > care - so the protocol has to exist before the connection is > established in case of outbound connections. I'm not sure I follow. Can you sketch out some code to help me here? ISTM that e.g. the DNS, connect and handshake timeouts can be implemented by the machinery that tries to set up the connection behind the scenes, and the user's protocol won't know anything of these shenanigans. The code that calls create_transport() (actually it'll probably be renamed create_client()) will just get a Future that either indicates success (and then the protocol and transport are successfully hooked up) or an error (and then no protocol was created -- whether or not a transport was created is an implementation detail). > In case aconnection is lost and reconnecting is required - > .reconnect() is handy, so the protocol can request reconnecting. I'd need more details of how you would like to specify this. > As this does not work with the current Protocols callbacks I propose > Protocols.connection_established() therefore. How does this differ from connection_made()? (I'm trying to follow Twisted's guidance here, they seem to have the longest experience doing these kinds of things. When I talked to Glyph IIRC he was skeptical about reconnecting in general.) > Protocols > I'd outline protocol_factory can be a instance of a class, which can > set specific parameters for 'things' > class p: > def __init__(self, a=1,b=2,c=3): > self.a = a > self.b = b > self.c = c > def __call__(self): > return p(a=self.a, b=self.b, c=self.c) > def ... all protocol methods ...: > pass > > EventLoop.start_serving(p(a=5,b=7), ...) > EventLoop.start_serving(p(a=9,b=4), ...) > > Same Protocol, different parameters for it. No such helper method (or class) is needed. You can use a lambda or functools.partial for the same effect. I'll add a note to the PEP to remind people of this. > + connection_established() > + timeout_dns() > + timeout_idle() > + timeout_connecting() Signatures please? > * data_received(data) - if it was possible to return the number of > bytes consumed by the protocol, and have the Transport buffer the rest > for the next io in call, one would avoid having to do this in every > Protocol on it's own - learned from experience. Twisted has a whole slew of protocol implementation subclasses that implement various strategies like line-buffering (including a really complex version where you can turn the line buffering on and off) and "netstrings". I am trying to limit the PEP's size by not including these, but I fully expect that in practice a set of useful protocol implementations will be created that handles common cases. I'm not convinced that putting this in the transport/protocol interface will make user code less buggy: it seems easy for the user code to miscount the bytes or not return a count at all in a rarely taken code branch. > * eof_received()/connection_lost(exc) - a connection can be closed > clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in > case of SSL even more, it is required to distinguish. Well, this is why eof_received() exists -- to indicate a clean close. We should never receive SIGPIPE (Python disables this signal, so you always get the errno instead). According to Glyph, SSL doesn't support sending eof, so you have to use Content-length or a chunked encoding. What other conditions do you expect from SSL that wouldn't be distinguished by the exception instance passed to connection_lost()? > + nextlayer_is_empty() - called if the Transport (or underlying > Protocol in case of chaining) write buffer is empty - Imagine an http > server sending a 1GB file, you do not want to sent 1GB at once - as > you do not have that much memory, but get a callback if the transport > done sending the chunk you've queued, so you can send the next chunk > of data. That's what the pause()/resume() flow control protocol is for. You read the file (presumably it's a file) in e.g. 16K blocks and call write() for each block; if the transport can't keep up and exceeds its buffer space, it calls protocol.pause() (or perhaps protocol.pause_writing(), see discussion above). > Next, what happens if a dns can not be resolved, ssl handshake (in > case ssl is transport) or connecting fails - in my opinion it's an > error the protocol is supposed to take care of > + error_dns > + error_ssl > + error_connecting The future returned by create_transport() (aka create_client()) will raise the exception. > I'm not that much into futures - so I may have got some things wrong. No problem. You may want to read PEP 3148, it explains Futures and much of that explanation remains valid; just in PEP 3156 to wait for a future you must use "yield from ". -- --Guido van Rossum (python.org/~guido) From shibturn at gmail.com Sun Jan 6 00:55:55 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 05 Jan 2013 23:55:55 +0000 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On 05/01/2013 11:30pm, Guido van Rossum wrote: >> >Supporting IOCP on windows is absolutely required, as WSAPoll is >> >broken and won't be fixed. >> >http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 > Wow. Now I'm even more glad that we're planning to support IOCP. > I took care to work around that bug when adding support for WSAPoll() in tulip. -- Richard From shibturn at gmail.com Sun Jan 6 00:57:43 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Sat, 05 Jan 2013 23:57:43 +0000 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On 05/01/2013 11:30pm, Guido van Rossum wrote: > TBH, I'm not 100% convinced of the need for add_connector(), but > Richard Oudkerk claims that it is needed for Windows. (OTOH if > WSAPoll() is too broken to bother, maybe we don't need it. It's a bit > of a nuisance because code that uses add_writer() instead works just > fine on UNIX but would be subtly broken on Windows, leading to > disappointments when porting apps to Windows. I'd rather have things > break on all platforms, or on none...) add_connector() is needed to work around the brokenness of WSAPoll(). -- Richard From rosuav at gmail.com Sun Jan 6 01:00:53 2013 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 6 Jan 2013 11:00:53 +1100 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: References: <20130105164220.09d654be@anarchist.wooz.org> Message-ID: On Sun, Jan 6, 2013 at 9:54 AM, Terry Reedy wrote: > On 1/5/2013 4:42 PM, Barry Warsaw wrote: > >> Also, I suggest taking the opportunity to change the sense of flags such >> as >> no_site and dont_write_bytecode. I find it much more difficult to reason >> that >> "dont_write_bytecode = 0" means *do* write bytecode, rather than >> "write_bytecode = 1". I.e. positives are better than double-negatives. > > IE, you prefer positive flags, with some on by default, over having all > flags indicate a non-default condition. I would too, but I don't hack on the > C code base. 'dont_write_bytecode' is especially ugly. Would it be less ugly if called 'suppress_bytecode'? It sounds less negative, but does the same thing. Suppressing something is an active and positive action (though the democratic decision to not publish is quite different, as Yes Minister proved). ChrisA From ncoghlan at gmail.com Sun Jan 6 08:26:14 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 6 Jan 2013 17:26:14 +1000 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: <20130105164220.09d654be@anarchist.wooz.org> References: <20130105164220.09d654be@anarchist.wooz.org> Message-ID: On Sun, Jan 6, 2013 at 7:42 AM, Barry Warsaw wrote: > Hi Nick, > > PEP 432 is looking very nice. It'll be fun to watch the implementation come > together. :) > > Some comments... > > The start up sequences: > >> * Pre-Initialization - no interpreter available >> * Initialization - interpreter partially available > > What about "Initializing"? Makes sense, changed. >> * Initialized - full interpreter available, __main__ related metadata >> incomplete >> * Main Execution - optional state, __main__ related metadata populated, >> bytecode executing in the __main__ module namespace > > What is "optional" about this state? Maybe it should be called "Operational"? Unlike the other phases which are sequential and distinct, "Main Execution" is a subphase of Initialized. Embedding applications without the concept of a "__main__" module (e.g. mod_wsgi) will never use it. >> ... separate system Python (spython) executable ... > > I love the idea, but I'm not crazy about the name. What about > `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :) Yeah, I'll go with "python-minimal". >> > > What about sys.implementation? Unaffected, since that's all configured at build time. I've added an explicit note that sys.implementation and sysconfig.get_config_vars() are not affected by this initial proposal. >> as it failed to be updated for the virtual environment support added in >> Python 3.3 (detailed in PEP 420). > > venv is defined in PEP 405 (there are two cases of mis-referencing). Oops, fixed. > Note that there may be other important build time settings on some platforms. > An example is Debian/Ubuntu, where we define the multiarch triplet in the > configure script, and pass that through Makefile(.pre.in) to sysmodule.c for > exposure as sys.implementation._multiarch. Yeah, I don't want to mess with adding new runtime configuration options at this point, beyond the features inherent in breaking up the existing initialization phases. > >> For a command executed with -c, it will be the string "-c" >> For explicitly requested input from stdin, it will be the string "-" > > Wow, I couldn't believe it but it's true! That seems crazy useless. :) Yup. While researching this PEP I had many moments where I was looking at the screen going "WTF, we seriously do that?" (most notably when I learned that using the -W and -X options means we create Python objects in Py_Main() before the call to Py_Initialize(). This is why there has to be an explicit call to _Py_Random_Init() before the option processing code) >> Embedding applications must call Py_SetArgv themselves. The CPython logic >> for doing so is part of Py_Main() and is not exposed separately. However, >> the runpy module does provide roughly equivalent logic in runpy.run_module >> and runpy.run_path. > > As I've mentioned before on the python-porting mailing list, this is actually > more difficult than it seems because main() takes char*s but Py_SetArgv() and > Py_SetProgramName() takes wchar_t*s. > > Maybe Python's own conversion could be refactored to make this easier either > as part of this PEP or after the PEP is implemented. Yeah, one of the changes in the PEP is that you can pass program_name and raw_argv as a Unicode object or a list of Unicode objects instead of use wchar_t. > >> int Py_ReadConfiguration(PyConfig *config); > >> The config argument should be a pointer to a Python dictionary. For any >> supported configuration setting already in the dictionary, CPython will >> sanity check the supplied value, but otherwise accept it as correct. > > So why not define this to take a PyObject* or a PyDictObject* ? That wording is a holdover from a previous version of the PEP where this was indeed a dictionary pointer. I came around to Antoine's point of view that since we have a fixed list of supported settings at any given point in time, a struct would be easier to deal with on the C side. However, I missed a few spots (including this one) when I made the change to the PEP. > > (also: the Py_Config struct members need the correct concrete type pointers, > e.g. PyDictObject*) Fixed. >> Alternatively, settings may be overridden after the Py_ReadConfiguration >> call (this can be useful if an embedding application wants to adjust a >> setting rather than replace it completely, such as removing sys.path[0]). > > How will setting something after Py_ReadConfiguration() is called change a > value such as sys.path? Or is this the reason why you pass a Py_Config to > Py_EndInitialization()? Correct - calling Py_ReadConfiguration has no effect on the interpreter state. The interpreter state only changes in Py_EndInitialization. I'll include a more explicit explanation of that behaviour. > (also, see the type typo in the definition of Py_EndInitialization()) > > Also, I suggest taking the opportunity to change the sense of flags such as > no_site and dont_write_bytecode. I find it much more difficult to reason that > "dont_write_bytecode = 0" means *do* write bytecode, rather than > "write_bytecode = 1". I.e. positives are better than double-negatives. While I agree with this principle in general, I'm deliberate not doing anything about most of these because these settings are already exposed in their double-negative form as environment variables (PYTHONDONTWRITEBYTECODE, PYTHONNOUSERSITE), as global variables that can be set by an embedding application (Py_DontWriteBytecodeFlag, Py_NoSiteFlag, Py_NoUserSiteDirectory) and as sys module attributes (sys.dont_write_bytecode, sys.flags.no_site, sys.flags.no_user_site). However, I *am* going to change the sense of the no_site setting to "enable_site_config". The reason for this is that the meaning of the setting actually changed in Python 3.3 to also mean "disable the side effects that are currently implicit in importing the site module", in addition to implicitly importing that module as part of the startup sequence. >> sys.argv[0] may not yet have its final value >> it will be -m when executing a module or package with CPython > > Gosh, wouldn't it be nice if this could have a more useful value? It does once runpy is done with it (it has the __file__ attribute corresponding to whatever code is actually being run as __main__). At this point in the initialisation sequence, though, __main__ is still the builtin __main__ module, and there's no getting around the fact that we need to be able to import and run arbitrary Python code (both from the standard library and from package __init__ files) in order to properly locate __main__. >> Initial thought is that hiding the various options behind a single API would >> make that API too complicated, so 3 separate APIs is more likely: > > +1 > >> The interpreter state will be updated to include details of the >> configuration settings supplied during initialization by extending the >> interpreter state object with an embedded copy of the Py_CoreConfig and >> Py_Config structs. > > Couldn't it just have a dict with all the values from both structs collapsed > into it? It could, but that's substantially less convenient from the C side of the API. > >> For debugging purposes, the configuration settings will be exposed as a >> sys._configuration simple namespace > > I suggest un-underscoring the name and making it public. It might be useful > for other than debugging purposes. The underscore is there because the specific fields are currently CPython specific. Another implementation may not make these settings configurable at all. If there are particular settings that would be useful to modules like importlib or site, then we may want to look at exposing them through sys.implementation as required attributes, but that's a distinct PEP from this one. >> Is Py_IsRunningMain() worth keeping? > > Perhaps. Does it provide any additional information above Py_IsInitialized()? Yes - it indicates that sys.argv[0] and the metadata in __main__ are fully updated (i.e. the placeholder info used while executing Python code in order to locate __main__ in the first place has been replaced with the real info). >> Should the answers to Py_IsInitialized() and Py_RunningMain() be exposed via >> the sys module? > > I can't think of a use case. Neither can I. I'll leave them as "for embedding apps only" until someone comes up with an actual reason to expose them. >> Is the Py_Config struct too unwieldy to be practical? Would a Python >> dictionary be a better choice? > > Although I see why you've spec'd it this way, I don't like having *two* config > structures (Py_CoreConfig and Py_Config). Having a dictionary for the latter > would probably be fine, and in fact you could copy the Py_Config values into > it (when possible during the init sequence) and expose it in the sys module. Yeah, I originally had just Py_CoreConfig and then a Py_DictObject for the rest of it. The first draft of Py_Config embedded a copy of Py_CoreConfig as the first field. However, I eventually settled on the current scheme as best aligning the model with the reality that we really do have two kinds of configuration setting which need to be handled differently: - Py_CoreConfig holds the settings that are required to create a Py_InterpreterState at all (passed to Py_BeginInitialization) - Py_Config holds the settings that are required to get to a fully functional interpreter (passed to Py_EndInitialization) Using a struct for both of them is easier to work with from C, and makes the number vs string vs list vs mapping distinction for the various settings self-documenting. >> Would it be better to manage the flag variables in Py_Config as Python >> integers so the struct can be initialized with a simple memset(&config, 0, >> sizeof(*config))? > > Would we even notice the optimization? I'll clarify this a bit - it's a maintainability question, rather than an optimization. (i.e. I think _Py_Config_INIT is ugly as hell, I just don't have any better ideas) > >> A System Python Executable > > This should probably at least mention Christian's idea of the -I flag (which I > think hasn't been PEP'd yet). We can bikeshed about the name of the > executable later. :) Yeah, I've gone through and added a bunch of tracker links, including that one. There's a signficant number of things which this should make easier in the future (e.g. I haven't linked to it, but the proposal to support custom memory allocators could be handled by adding more fields to Py_CoreConfig rather than more C level global variables) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 6 08:28:22 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 6 Jan 2013 17:28:22 +1000 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: References: <20130105164220.09d654be@anarchist.wooz.org> Message-ID: On Sun, Jan 6, 2013 at 5:26 PM, Nick Coghlan wrote: >> I love the idea, but I'm not crazy about the name. What about >> `python-minimal` (yes, it's deliberately longer. Symlinks ftw. :) > > Yeah, I'll go with "python-minimal". Oops, I was editing the PEP and the email at the same time, and changed my mind about this without fixing the email. I actually went with "pysystem" for now, but I also noted the need to paint this bikeshed under Open Questions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 6 10:06:31 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 6 Jan 2013 19:06:31 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 5:23 AM, Guido van Rossum wrote: > Possibly (though it will have to be a separate PEP -- PEP 3156 needs > to be able to run on unchanged Python 3.3). Does anyone on this thread > have enough understanding of the implementation of context managers > and generators to be able to figure out how this could be specified > and implemented (or to explain why it is a bad idea, or impossible)? There aren't any syntax changes needed to implement asynchronous locks, since they're unlikely to experience high latency in __exit__. For that and similar cases, it's enough to use an asynchronous operation to retrieve the CM in the first place (i.e. acquire in __iter__ rather than __enter__) or else have __enter__ produce a Future that acquires the lock in __iter__ (see http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-context-managers) The real challenge is in handling something like an asynchronous database transaction, which will need to yield on __exit__ as it commits or rolls back the database transaction. At the moment, the only solutions for that are to switch to a synchronous-to-asynchronous adapter like gevent or else write out the try/except block and avoid using the with statement. It's not an impossible problem, just a tricky one to solve in a readable fashion. Some possible constraints on the problem space: - any syntactic solution should work for at least "for" statements and "with" statements - also working for comprehensions is highly desirable - syntactic ambiguity with currently legal constructs should be avoided. Even if the compiler can figure it out, large behavioural changes due to a subtle difference in syntax should be avoided because they're hard for *humans* to read For example: # Synchronous for x in y: # Invokes _iter = iter(y) and _iter.__next__() print(x) #Asynchronous: for x in yielding y: # Invokes _iter = yield from iter(y) and yield from _iter.__next__() print(x) # Synchronous with x as y: # Invokes _cm = x, y = _cm.__enter__() and _cm.__exit__(*args) print(y) #Asynchronous: with yielding x as y: # Invokes _cm = x, y = yield from _cm.__enter__() and yield from _cm.__exit__(*args) print(y) A new keyword like "yielding" would make it explicit that what is going on differs from a (yield x) or (yield from x) in the corresponding expression slot. Approaches with function level granularity may also be of interest - PEP 3152 is largely an exploration of that idea (but would need adjustments in light of PEP 3156) Somewhat related, there's also a case to be made that "yield from x" should fall back to being equivalent to "x()" if x implements __call__ but not __iter__. That way, async ready code can be written using "yield from", but passing in a pre-canned result via lambda or functools.partial would no longer require a separate operation that just adapts the asynchronous call API (i.e. __iter__) to the synchronous call one (i.e. __call__): def async_call(f): @functools.wraps(f) def _sync(*args, **kwds): return f(*args, **kwds) yield # Force this to be a generator return _iterable_call The argument against, of course, is the ease with which this can lead to a "wrong answer" problem where the exception gets thrown a long way from the erroneous code which left out the parens for the function call. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From _ at lvh.cc Sun Jan 6 11:20:35 2013 From: _ at lvh.cc (Laurens Van Houtven) Date: Sun, 6 Jan 2013 11:20:35 +0100 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: Hi Nick, When you say "high latency" (in __exit__), what does "high" mean? Is that order of magnitude what __exit__ usually means now, or network IO included? (Use case: distributed locking and remotely stored locks: it doesn't take a long time on network scales, but it can take a long time on CPU scales.) On Sun, Jan 6, 2013 at 10:06 AM, Nick Coghlan wrote: > On Sun, Jan 6, 2013 at 5:23 AM, Guido van Rossum wrote: > > Possibly (though it will have to be a separate PEP -- PEP 3156 needs > > to be able to run on unchanged Python 3.3). Does anyone on this thread > > have enough understanding of the implementation of context managers > > and generators to be able to figure out how this could be specified > > and implemented (or to explain why it is a bad idea, or impossible)? > > There aren't any syntax changes needed to implement asynchronous > locks, since they're unlikely to experience high latency in __exit__. > For that and similar cases, it's enough to use an asynchronous > operation to retrieve the CM in the first place (i.e. acquire in > __iter__ rather than __enter__) or else have __enter__ produce a > Future that acquires the lock in __iter__ (see > > http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-context-managers > ) > > The real challenge is in handling something like an asynchronous > database transaction, which will need to yield on __exit__ as it > commits or rolls back the database transaction. At the moment, the > only solutions for that are to switch to a synchronous-to-asynchronous > adapter like gevent or else write out the try/except block and avoid > using the with statement. > > It's not an impossible problem, just a tricky one to solve in a > readable fashion. Some possible constraints on the problem space: > > - any syntactic solution should work for at least "for" statements and > "with" statements > - also working for comprehensions is highly desirable > - syntactic ambiguity with currently legal constructs should be > avoided. Even if the compiler can figure it out, large behavioural > changes due to a subtle difference in syntax should be avoided because > they're hard for *humans* to read > > For example: > > # Synchronous > for x in y: # Invokes _iter = iter(y) and _iter.__next__() > print(x) > #Asynchronous: > for x in yielding y: # Invokes _iter = yield from iter(y) and > yield from _iter.__next__() > print(x) > > # Synchronous > with x as y: # Invokes _cm = x, y = _cm.__enter__() and > _cm.__exit__(*args) > print(y) > #Asynchronous: > with yielding x as y: # Invokes _cm = x, y = yield from > _cm.__enter__() and yield from _cm.__exit__(*args) > print(y) > > A new keyword like "yielding" would make it explicit that what is > going on differs from a (yield x) or (yield from x) in the > corresponding expression slot. > > Approaches with function level granularity may also be of interest - > PEP 3152 is largely an exploration of that idea (but would need > adjustments in light of PEP 3156) > > Somewhat related, there's also a case to be made that "yield from x" > should fall back to being equivalent to "x()" if x implements __call__ > but not __iter__. That way, async ready code can be written using > "yield from", but passing in a pre-canned result via lambda or > functools.partial would no longer require a separate operation that > just adapts the asynchronous call API (i.e. __iter__) to the > synchronous call one (i.e. __call__): > > def async_call(f): > @functools.wraps(f) > def _sync(*args, **kwds): > return f(*args, **kwds) > yield # Force this to be a generator > return _iterable_call > > The argument against, of course, is the ease with which this can lead > to a "wrong answer" problem where the exception gets thrown a long way > from the erroneous code which left out the parens for the function > call. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 6 12:37:11 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 6 Jan 2013 21:37:11 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > Hi Nick, > > > When you say "high latency" (in __exit__), what does "high" mean? Is that > order of magnitude what __exit__ usually means now, or network IO included? > > (Use case: distributed locking and remotely stored locks: it doesn't take a > long time on network scales, but it can take a long time on CPU scales.) The status quo can only be made to work for in-memory locks. If the release step involves network access, then it's closer to the "database transaction" use case, because the __exit__ method may need to block. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nepenthesdev at gmail.com Sun Jan 6 16:45:52 2013 From: nepenthesdev at gmail.com (Markus) Date: Sun, 6 Jan 2013 16:45:52 +0100 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: Hi, > Do you have a suggestion for a minimal interface for signal handling? > I could imagine the following: > > Note that Python only receives signals in the main thread, and the > effect may be undefined if the event loop is not running in the main > thread, or if more than one event loop sets a handler for the same > signal. It also can't work for signals directed to a specific thread > (I think POSIX defines a few of these, but I don't know of any support > for these in Python.) Exactly - signals are a mess, threading and signals make things worse - I'm no expert here, but I just have had experienced problems with signal handling and threads, basically the same problems you describe. Creating the threads after installing signal handlers (in the main thread) works, and signals get delivered to the main thread, installing the signal handlers (in the main thread) after creating the threads - and the signals ended up in *some thread*. Additionally it depended on if you'd install your signal handler with signal() or sigaction() and flags when creating threads. >> Supporting IOCP on windows is absolutely required, as WSAPoll is >> broken and won't be fixed. >> http://social.msdn.microsoft.com/Forums/hu/wsk/thread/18769abd-fca0-4d3c-9884-1a38ce27ae90 > > Wow. Now I'm even more glad that we're planning to support IOCP. tulip already has a workaround: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#244 >> libuv is a wrapper around libev -adding IOCP- which adds some other >> things besides an event loop and is developed for/used in node.js. > > Ah, that's helpful. I did not realize this after briefly skimming the > libuv page. (And the github logs suggest that it may no longer be the > case: https://github.com/joyent/libuv/commit/1282d64868b9c560c074b9c9630391f3b18ef633 Okay, they moved to libngx - nginx core library, obviously I missed this. >> Handler - the best example for not re-using terms. > > ??? (Can't tell if you're sarcastic or agreeing here.) sarcastic. >> Fine, if you include transports, I'll pick on the transports as well ;) > > ??? (Similar.) Not sarcastic. >> Note: In libev only the "default event loop" can have timers. > > Interesting. This seems an odd constraint. I'm wrong - discard. This limitation refered to watchers for child processes. >> EventLoop >> - call_soon_threadsafe(callback, *args) - it would be better to have > Not sure I understand. PEP 3156/Tulip uses a self-pipe to prevent race > conditions when call_soon_threadsafe() is called from a signal handler > or other thread(*) -- but I don't know if that is relevant or not. ev_async is a self-pipe too. > (*) http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#448 > and http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#576 > >> - getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) - libev >> does not do dns >> - getnameinfo(sockaddr, flags=0) - libev does not do dns > > Note that these exist at least in part so that an event loop > implementation may *choose* to implement its own DNS handling (IIUC > Twisted has this), whereas the default behavior is just to run > socket.getaddrinfo() -- but in a separate thread because it blocks. > (This is a useful test case for run_in_executor() too.) I'd expect the EventLoop never to create threads on his own behalf, it's just wrong. If you can't provide some functionality without threads, don't provide the functionality. Besides, getaddrinfo() is a bad choice, as it relies on distribution specific flags. For example ip6 link local scope exists on every current platform, but - when resolving an link local scope address -not domain- with getaddrinfo, getaddrinfo will fail if no global routed ipv6 address is available on debian/ubuntu. >> As Transport are part of the PEP - some more: >> >> EventLoop >> * create_transport(protocol_factory, host, port, **kwargs) >> kwargs requires "local" - local address as tuple like >> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6 >> link local scope. >> or ('192.168.2.1',5060) - bind local port for udp > > Not sure I understand. What socket.connect() (or other API) call > parameters does this correspond to? What can't expressed through the > host and port parameters? In case you have multiple interfaces, and multiple gateways, you need to assign the connection to an address - so the kernel knows which interface to use for the connection - else he'd default to "the first" interface. In IPv6 link-local scope you can have multiple addresses in the same subnet fe80:: - IIRC if you want to connect somewhere, you have to either set the scope_id of the remote, or bind the "source" address before - I don't know how to set the scope_id in python, it's in sockaddr_in6. In terms of socket. it is a bind before a connect. s = socket.socket(AF_INET6,SOCK_DGRAM,0) s.bind(('fe80::1',0)) s.connect(('fe80::2',4712)) same for ipv4 in case you are multi homed and rely on source based routing. >> Handler: >> Requiring 2 handlers for every active connection r/w is highly ineffective. > > How so? What is the concern? Of course you can fold the fdsets, but in case you need a seperate handler for write, you re-create it for every write - see below. >> Additionally, I can .stop() the handler without having to know the fd, >> .stop() the handler, change the events the handler is looking for, >> restart the handler with .start(). >> In your proposal, I'd create a new handler every time I want to sent >> something, poll for readability - discard the handler when I'm done, >> create a new one for the next sent. > > The questions are, does it make any difference in efficiency (when > using Python -- the performance of the C API is hardly relevant here), > and how often does this pattern occur. Every time you send - you poll for write-ability, you get the callback, you write, you got nothing left, you stop polling for write-ability. >> Timers: >> ... >> Timer.stop() >> Timer.set(5) >> Timer.start() > > Actually it's one less call using the PEP's proposed API: > > timer.cancel() > timer = loop.call_later(5, callback) My example was ill-chosen, problem for both of us - how to we know it's 5 seconds? timer.restart() or timer.again() the timer could remember it's interval, else you have to store the interval somewhere, next to the timer. > Which of the two idioms is faster? Who knows? libev's pattern is > probably faster in C, but that has little to bear on the cost in > Python. My guess is that the amount of work is about the same -- the > real cost is that you have to make some changes the heap used to keep > track of all timers in the order in which they will trigger, and those > changes are the same regardless of how you style the API. Speed, nothing is fast in every circumstances, for example select is faster than epoll for small numbers of sockets. Let's look on usability. >> Transports: >> I think SSL should be a Protocol not a transport - implemented using BIO pairs. >> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have >> TCP / SSL / HTTP as https or TCP / SSL / SOCKS / HTTP as https via >> ssl enabled socks proxy without having to much problems. Another >> example, shaping a connection TCP / RATELIMIT / HTTP. > > Interesting idea. This may be up to the implementation -- not every > implementation may have BIO wrappers available (AFAIK the stdlib > doesn't), Right, for ssl bios pyopenssl is required - or ctypes. > So maybe we can visualise this as T1 <--> > P2:T2 <--> P3:T3 <--> P4. Yes, exactly. >> Having SSL as a Protocol allows closing the SSL connection without >> closing the TCP connection, re-using the TCP connection, re-using a >> SSL session cookie during reconnect of the SSL Protocol. > > That seems a pretty esoteric use case (though given your background in > honeypots maybe common for you :-). It also seems hard to get both > sides acting correctly when you do this (but I'm certainly no SSL > expert -- I just want it supported because half the web is > inaccessible these days if you don't speak SSL, regardless of whether > you do any actual verification). Well, proper shutdown is not a SSL protocol requirement, closing the connection hard saves some cycles, so it pays of not do it right in large scaled deployments - such as google. Nevertheless, doing SSL properly can help, as it allows to distinguish from connection reset errors and proper shutdown. > The only concern I have, really, is that the PEP currently hints that > both protocols and transports might have pause() and resume() methods > for flow control, where the protocol calls transport.pause() if > protocol.data_received() is called too frequently, and the transport > calls protocol.pause() if transport.write() has buffered more data > than sensible. But for an object that is both a protocol and a > transport, this would make it impossible to distinguish between > pause() calls by its left and right neighbors. So maybe the names must > differ. Given the tendency of transport method names to be shorter > (e.g. write()) vs. the longer protocol method names (data_received(), > connection_lost() etc.), perhaps it should be transport.pause() and > protocol.pause_writing() (and similar for resume()). Protocol.data_received rename to Protocol.io_in Protocol.io_out - in case the transports out buffer is empty - (instead of Protocol.next_layer_is_empty()) Protocol.pause_io_out - in case the transport wants to stop the protocol sending more as the out buffer is crowded already Protocol.resume_io_out - in case the transport wants to inform the protocol the out buffer can take some more bytes again For the Protocol limiting the amount of data received: Transport.pause -> Transport.pause_io_in Transport.resume -> Transport.resume_io_in or drop the "_io" from the names, "(pause|resume_(in|out)" >> * reconnect() - I'd love to be able to reconnect a transport > > But what does that mean in general? It depends on the protocol (e.g. > FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated > upon a reconnect, and how much data may have to be re-sent. This seems > a higher-level feature that transports and protocols will have to > implement themselves. I don't need the EventLoop to sync my state upon reconnect - just have the Transport providing the ability. Protocols are free to use this, but do not have to. >> Now, in case we connect to a host by name, and have multiple addresses >> resolved, and the first connection can not be established, there is no >> way to 'reconnect()' - as the protocol does not yet exist. > > Twisted suggested something here which I haven't implemented yet but > which seems reasonable -- using a series of short timeouts try > connecting to the various addresses and keep the first one that > connects successfully. If multiple addresses connect after the first > timeout, too bad, just close the redundant sockets, little harm is > done (though the timeouts should be tuned that this is relatively > rare, because a server may waste significant resources on such > redundant connects). Fast, yes - reasonable? - no. How would you feel if web browsers behaved like this? domain name has to be resolved, addresses ordered according to rfc X which says prefer ipv6 etc., try connecting linear. >> For almost all the timeouts I mentioned - the protocol needs to take >> care - so the protocol has to exist before the connection is >> established in case of outbound connections. > > I'm not sure I follow. Can you sketch out some code to help me here? > ISTM that e.g. the DNS, connect and handshake timeouts can be > implemented by the machinery that tries to set up the connection > behind the scenes, and the user's protocol won't know anything of > these shenanigans. The code that calls create_transport() (actually > it'll probably be renamed create_client()) will just get a Future that > either indicates success (and then the protocol and transport are > successfully hooked up) or an error (and then no protocol was created > -- whether or not a transport was created is an implementation > detail). >From my understanding the Future does not provide any information which connection to which host using which protocol and credentials failed? I'd create the Procotol when trying to create a connection, so the Protocol is informed when the Transport fails and can take action - retry, whatever. >> In case aconnection is lost and reconnecting is required - >> .reconnect() is handy, so the protocol can request reconnecting. > > I'd need more details of how you would like to specify this. Transport * is closed by remote * connecting the remote failed * resolving the domain name failed have inform the protocol about the failure - and if the Protocol changes the Transports state to "reconnect", the Transport creates a "reconnect timer of N seconds", and retries connecting then. It is up to the protocol to login, clean state and start fresh or login and regain old state by issuing required commands to get there. For ftp - this would be changing the cwd. >> As this does not work with the current Protocols callbacks I propose >> Protocols.connection_established() therefore. > > How does this differ from connection_made()? If you create the Protocol before the connection is established - you may want to distinguish from _made() and _established(). You can not distinguish by using __init__, as it may miss the Transport arg. > (I'm trying to follow Twisted's guidance here, they seem to have the > longest experience doing these kinds of things. When I talked to Glyph > IIRC he was skeptical about reconnecting in general.) Point is - connections don't last forever, even if we want them to. If the transport supports "reconnect" - it is still upto the protocol to either support it or not. If a Protocol gets disconnected and wants to reconnect -without the Transport supporting .reconnect()- the protocol has to know it's factory. >> + connection_established() >> + timeout_dns() >> + timeout_idle() >> + timeout_connecting() > > Signatures please? + connection_established(self, transport) the connection is established - in your proposal it is connection_made which I disagree with due to the lack of context in the Futures, returns None + timeout_dns(self) Resolving the domain name failed - Protocol can .reconnect() for another try. returns None + timeout_idle(self) connection was idle for some time - send a high layer keep alive or close the connection - returns None + timeout_connecting(self) connection timed out connection - Protocol can .reconnect() for another try, returns None >> * data_received(data) - if it was possible to return the number of >> bytes consumed by the protocol, and have the Transport buffer the rest >> for the next io in call, one would avoid having to do this in every >> Protocol on it's own - learned from experience. > > Twisted has a whole slew of protocol implementation subclasses that > implement various strategies like line-buffering (including a really > complex version where you can turn the line buffering on and off) and > "netstrings". I am trying to limit the PEP's size by not including > these, but I fully expect that in practice a set of useful protocol > implementations will be created that handles common cases. I'm not > convinced that putting this in the transport/protocol interface will > make user code less buggy: it seems easy for the user code to miscount > the bytes or not return a count at all in a rarely taken code branch. Please don't drop this. You never know how much data you'll receive, you never know how much data you need for a message, so the Protocol needs a buffer. Having this io in buffer in the Transports allows every Protocol to benefit, they try to read a message from the data passed to data_received(), if the data received is not sufficient to create a full message, they need to buffer it and wait for more data. So having the Protocol.data_received return the number of bytes the Protocol could process, the Transport can do the job, saving it for every Protocol. Still - a protocol can have it's own buffering strategy, i.e. in case of a incremental XML parser which does it's own buffering, and always return len(data), so the Transport does not buffer anything. In case the size returned by the Protocol is less than the size of the buffer given to the protocol, the Transport erases only the consumed bytes from the buffer, in case the len matches the size of the buffer passed, erases the buffer. In nonblocking IO - this buffering has to be done for every protocol, if Transports could take care, the data_received method of the Protocol does not need to bother. A benefit for every protocol. Else, every Protocol.data_received method starts with self.buffer += data and ends with self.buffer = self.buffer[len(consumed):] You can even default to use a return value of None like len(data). If you want to be fancy. you could even pass the data to the Protocol as long as the protocol could consume data and there is data left. This way a protocol data_received can focus on processing a single message, if more than a single message is contained in the data - it will get the data again - as it returned > 0, in case there is no message in the data left, it will return 0. This really assists when writing protocols, and as every protocol needs it, have it in Transport. >> * eof_received()/connection_lost(exc) - a connection can be closed >> clean recv()=0, unclean recv()=-1, errno, SIGPIPE when writing and in >> case of SSL even more, it is required to distinguish. > > Well, this is why eof_received() exists -- to indicate a clean close. > We should never receive SIGPIPE (Python disables this signal, so you > always get the errno instead). According to Glyph, SSL doesn't support > sending eof, so you have to use Content-length or a chunked encoding. > What other conditions do you expect from SSL that wouldn't be > distinguished by the exception instance passed to connection_lost()? Depends on the implementation of SSL, bio/fd Transport/Protocol SSL_ERROR_SYSCALL and unlikely SSL_ERROR_SSL. In case of stacking TCP / SSL / http a SSL service rejecting a client certificate for login is - to me - a connection_lost too. >> + nextlayer_is_empty() - called if the Transport (or underlying >> Protocol in case of chaining) write buffer is empty > > That's what the pause()/resume() flow control protocol is for. You > read the file (presumably it's a file) in e.g. 16K blocks and call > write() for each block; if the transport can't keep up and exceeds its > buffer space, it calls protocol.pause() (or perhaps > protocol.pause_writing(), see discussion above). I'd still love a callback for "we are empty". Protocol.io_out - maybe the name changes your mind? >> Next, what happens if a dns can not be resolved, ssl handshake (in >> case ssl is transport) or connecting fails - in my opinion it's an >> error the protocol is supposed to take care of >> + error_dns >> + error_ssl >> + error_connecting > > The future returned by create_transport() (aka create_client()) will > raise the exception. When do I get this exception - the EventLoop.run() raises? And this exception has all information required to retry connecting? Let's say I want to reconnect in case of dns error after 20s, the Future raised - depending on the Exception I call_later a callback which create_transport again?- instead of Transport.reconnect() from the Protocol, not really easier. MfG Markus From guido at python.org Sun Jan 6 17:24:07 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 6 Jan 2013 08:24:07 -0800 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Sunday, January 6, 2013, Nick Coghlan wrote: > On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > > Hi Nick, > > > > > > When you say "high latency" (in __exit__), what does "high" mean? Is that > > order of magnitude what __exit__ usually means now, or network IO > included? > > > > (Use case: distributed locking and remotely stored locks: it doesn't > take a > > long time on network scales, but it can take a long time on CPU scales.) > > The status quo can only be made to work for in-memory locks. If the > release step involves network access, then it's closer to the > "database transaction" use case, because the __exit__ method may need > to block. But you don't need to wait for the release. You can do that asynchronously. Also, have you given the implementation of your 'yielding' proposal any thought yet? > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, > Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jan 6 17:25:38 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 6 Jan 2013 17:25:38 +0100 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted References: Message-ID: <20130106172538.1a0d563b@pitrou.net> On Sun, 6 Jan 2013 16:45:52 +0100 Markus wrote: > >> Transports: > >> I think SSL should be a Protocol not a transport - implemented using BIO pairs. > >> If you can chain protocols, like Transport / ProtocolA / ProtocolB you can have > >> TCP / SSL / HTTP as https or TCP / SSL / SOCKS / HTTP as https via > >> ssl enabled socks proxy without having to much problems. Another > >> example, shaping a connection TCP / RATELIMIT / HTTP. > > > > Interesting idea. This may be up to the implementation -- not every > > implementation may have BIO wrappers available (AFAIK the stdlib > > doesn't), > > Right, for ssl bios pyopenssl is required - or ctypes. Or a patch to Python 3.4. See http://docs.python.org/devguide/ By the way, how does "SSL as a protocol" deal with SNI? How does the HTTP layer tell the SSL layer which servername to indicate? Or, on the server-side, how would the SSL layer invoke the HTTP layer's servername callback? > > (I'm trying to follow Twisted's guidance here, they seem to have the > > longest experience doing these kinds of things. When I talked to Glyph > > IIRC he was skeptical about reconnecting in general.) > > Point is - connections don't last forever, even if we want them to. > If the transport supports "reconnect" - it is still upto the protocol > to either support it or not. > If a Protocol gets disconnected and wants to reconnect -without the > Transport supporting .reconnect()- the protocol has to know it's > factory. +1 to this. > + connection_established(self, transport) > the connection is established - in your proposal it is connection_made > which I disagree with due to the lack of context in the Futures, > returns None > > + timeout_dns(self) > Resolving the domain name failed - Protocol can .reconnect() for > another try. returns None > > + timeout_idle(self) > connection was idle for some time - send a high layer keep alive or > close the connection - returns None > > + timeout_connecting(self) > connection timed out connection - Protocol can .reconnect() for > another try, returns None I would rather have connection_failed(self, exc). (where exc can be a OSError or a socket.timeout) > >> * data_received(data) - if it was possible to return the number of > >> bytes consumed by the protocol, and have the Transport buffer the rest > >> for the next io in call, one would avoid having to do this in every > >> Protocol on it's own - learned from experience. > > > > Twisted has a whole slew of protocol implementation subclasses that > > implement various strategies like line-buffering (including a really > > complex version where you can turn the line buffering on and off) and > > "netstrings". I am trying to limit the PEP's size by not including > > these, but I fully expect that in practice a set of useful protocol > > implementations will be created that handles common cases. I'm not > > convinced that putting this in the transport/protocol interface will > > make user code less buggy: it seems easy for the user code to miscount > > the bytes or not return a count at all in a rarely taken code branch. > > Please don't drop this. > > You never know how much data you'll receive, you never know how much > data you need for a message, so the Protocol needs a buffer. > Having this io in buffer in the Transports allows every Protocol to > benefit, they try to read a message from the data passed to > data_received(), if the data received is not sufficient to create a > full message, they need to buffer it and wait for more data. Another solution for every Protocol to benefit is to provide a bunch of base Protocol implementations, as Twisted does: LineReceiver, etc. Your proposed solution (returning the number of consumed bytes) implies a lot of slicing and concatenation of immutable bytes objects inside the Transport, which may be quite inefficient. Regards Antoine. From dreamingforward at gmail.com Sun Jan 6 19:01:33 2013 From: dreamingforward at gmail.com (Mark Adam) Date: Sun, 6 Jan 2013 12:01:33 -0600 Subject: [Python-ideas] Vigil Message-ID: There's an interesting python "variant" (more of an overlay actually) that is rather intriguing on github -- Vigil: a truly safe progamming language. >From the readme: "Infinitely more important than mere syntax and semantics are its addition of supreme moral vigilance. This is similar to contracts, but less legal and more medieval." http://github.com/munificent/vigil Mark From ubershmekel at gmail.com Sun Jan 6 21:08:40 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 6 Jan 2013 22:08:40 +0200 Subject: [Python-ideas] Vigil In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam wrote: > There's an interesting python "variant" (more of an overlay actually) > that is rather intriguing on github -- Vigil: a truly safe progamming > language. > > It's a joke language that deletes code when an assert fails. Python-ideas really isn't the place to post this. Try out http://www.reddit.com/r/python -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreamingforward at gmail.com Sun Jan 6 21:44:09 2013 From: dreamingforward at gmail.com (Mark Adam) Date: Sun, 6 Jan 2013 14:44:09 -0600 Subject: [Python-ideas] Vigil In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 2:08 PM, Yuval Greenfield wrote: > On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam wrote: >> >> There's an interesting python "variant" (more of an overlay actually) >> that is rather intriguing on github -- Vigil: a truly safe progamming >> language. >> > > It's a joke language that deletes code when an assert fails. Python-ideas > really isn't the place to post this. Try out http://www.reddit.com/r/python Yeah, I sort of got that, but imagine in a multi-user p2p environment (the internet "global brain"), it could be a way to enforce policy across the network. I know list policy, but I rather like the keywords it used to expand on the language. By making the programmer encode expectations, the multiprocessing code doesn't have to work so hard with exception handlin. mark From nepenthesdev at gmail.com Sun Jan 6 21:46:04 2013 From: nepenthesdev at gmail.com (Markus) Date: Sun, 6 Jan 2013 21:46:04 +0100 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: <20130106172538.1a0d563b@pitrou.net> References: <20130106172538.1a0d563b@pitrou.net> Message-ID: Hi, On Sun, Jan 6, 2013 at 5:25 PM, Antoine Pitrou wrote: > On Sun, 6 Jan 2013 16:45:52 +0100 > Markus wrote: >> >> Right, for ssl bios pyopenssl is required - or ctypes. > > Or a patch to Python 3.4. > See http://docs.python.org/devguide/ Or discuss merging pyopenssl. > By the way, how does "SSL as a protocol" deal with SNI? How does the > HTTP layer tell the SSL layer which servername to indicate? SSL_set_tlsext_host_name > Or, on the server-side, how would the SSL layer invoke the HTTP layer's > servername callback? callback - set via SSL_CTX_set_tlsext_servername_callback SSL_CTX_set_tlsext_servername_arg > I would rather have connection_failed(self, exc). > (where exc can be a OSError or a socket.timeout) I'd prefer a single callback per error, allows to preserve defaults for certain cases when inheriting from Protocol. >> You never know how much data you'll receive, you never know how much >> data you need for a message, so the Protocol needs a buffer. >> Having this io in buffer in the Transports allows every Protocol to >> benefit, they try to read a message from the data passed to >> data_received(), if the data received is not sufficient to create a >> full message, they need to buffer it and wait for more data. > > Another solution for every Protocol to benefit is to provide a bunch of > base Protocol implementations, as Twisted does: LineReceiver, etc. In case your Protocol.data_received gets called until there is nothing left or 0 is returned, the LineReceiver is simply looking for an \0 or \n in the data, process this line and return the length of the line or 0 in case there is no line terminatior. > Your proposed solution (returning the number of consumed bytes) implies > a lot of slicing and concatenation of immutable bytes objects inside > the Transport, which may be quite inefficient. Yes - but is has to be done anyway, so it's just a matter of having this problem in stdlib, where it is easy to improve for everybody, or everybody else has to come up with his own implementation as part of Protocol. I'd prefer to have this in Transport therefore - having everybody benefit from any improvement for free. Markus From solipsis at pitrou.net Sun Jan 6 22:05:39 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 6 Jan 2013 22:05:39 +0100 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted References: <20130106172538.1a0d563b@pitrou.net> Message-ID: <20130106220539.5d98f416@pitrou.net> On Sun, 6 Jan 2013 21:46:04 +0100 Markus wrote: > > By the way, how does "SSL as a protocol" deal with SNI? How does the > > HTTP layer tell the SSL layer which servername to indicate? > SSL_set_tlsext_host_name > > > Or, on the server-side, how would the SSL layer invoke the HTTP layer's > > servername callback? > > callback - set via > SSL_CTX_set_tlsext_servername_callback > SSL_CTX_set_tlsext_servername_arg Right, these are the C OpenSSL APIs. My question was about the Python protocol / transport level. How can they be exposed? > > Your proposed solution (returning the number of consumed bytes) implies > > a lot of slicing and concatenation of immutable bytes objects inside > > the Transport, which may be quite inefficient. > > Yes - but is has to be done anyway, so it's just a matter of having > this problem in stdlib, where it is easy to improve for everybody, or > everybody else has to come up with his own implementation as part of > Protocol. Actually, the point is that it doesn't have to be done. An internal buffering mechanism in a protocol can avoid making many copies and concatenations (e.g. by using a list or a deque to buffer the incoming chunks). The transport cannot, since the Protocol API mandates that data_received() be called with a bytes object representing the available data. Regards Antoine. From jkbbwr at gmail.com Sun Jan 6 22:14:15 2013 From: jkbbwr at gmail.com (Jakob Bowyer) Date: Sun, 6 Jan 2013 21:14:15 +0000 Subject: [Python-ideas] Vigil In-Reply-To: References: Message-ID: But what about constraints on processor time, memory usage, recursion limit, accept and return types, we are starting to get a bit verbose here. On Sun, Jan 6, 2013 at 8:44 PM, Mark Adam wrote: > On Sun, Jan 6, 2013 at 2:08 PM, Yuval Greenfield > wrote: > > On Sun, Jan 6, 2013 at 8:01 PM, Mark Adam > wrote: > >> > >> There's an interesting python "variant" (more of an overlay actually) > >> that is rather intriguing on github -- Vigil: a truly safe progamming > >> language. > >> > > > > It's a joke language that deletes code when an assert fails. Python-ideas > > really isn't the place to post this. Try out > http://www.reddit.com/r/python > > Yeah, I sort of got that, but imagine in a multi-user p2p environment > (the internet "global brain"), it could be a way to enforce policy > across the network. I know list policy, but I rather like the > keywords it used to expand on the language. By making the programmer > encode expectations, the multiprocessing code doesn't have to work so > hard with exception handlin. > > mark > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Jan 7 06:47:25 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 7 Jan 2013 15:47:25 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 2:24 AM, Guido van Rossum wrote: > On Sunday, January 6, 2013, Nick Coghlan wrote: >> >> On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote: >> > Hi Nick, >> > >> > >> > When you say "high latency" (in __exit__), what does "high" mean? Is >> > that >> > order of magnitude what __exit__ usually means now, or network IO >> > included? >> > >> > (Use case: distributed locking and remotely stored locks: it doesn't >> > take a >> > long time on network scales, but it can take a long time on CPU scales.) >> >> The status quo can only be made to work for in-memory locks. If the >> release step involves network access, then it's closer to the >> "database transaction" use case, because the __exit__ method may need >> to block. > > But you don't need to wait for the release. You can do that asynchronously. Ah, true, I hadn't thought of that. So yes, any case where the __exit__ method can be "fire-and-forget" is also straightforward to implement with just PEP 3156. That takes us back to things like database transactions being the only ones where > Also, have you given the implementation of your 'yielding' proposal any > thought yet? Not in depth. Off the top of my head, I'd suggest: - make "yielding" a new kind of node in the grammar (so you can't write "yielding expr" in arbitrary locations, but only in those that are marked as allowing it) - flag for loops and with statements as accepting these nodes as iterables and context managers respectively - create a new Yielding AST node (with a single Expr node as the child) - emit different bytecode in the affected compound statements based on whether the relevant subnode is an ordinary expression (thus invoking the special methods as "obj.__method__()") or a yielding one (thus invoking the special methods as "yield from obj.__method__()"). I'm not seeing any obvious holes in that strategy, but I haven't looked closely at the compiler code in a while, so there may be limitations I haven't accounted for. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nepenthesdev at gmail.com Mon Jan 7 08:31:51 2013 From: nepenthesdev at gmail.com (Markus) Date: Mon, 7 Jan 2013 08:31:51 +0100 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: <20130106220539.5d98f416@pitrou.net> References: <20130106172538.1a0d563b@pitrou.net> <20130106220539.5d98f416@pitrou.net> Message-ID: Hi, On Sun, Jan 6, 2013 at 10:05 PM, Antoine Pitrou wrote: > On Sun, 6 Jan 2013 21:46:04 +0100 > Markus wrote: >> > By the way, how does "SSL as a protocol" deal with SNI? How does the >> > HTTP layer tell the SSL layer which servername to indicate? Transport.ctrl(name, **kwargs) - if the Transport lacks the queried control, it has to ask his upper. In case of chains like TCP / SSL / HTTP, SSL can query the hostname from it's Transport - or HTTP can query >> > Or, on the server-side, how would the SSL layer invoke the HTTP layer's >> > servername callback? Transport.ctrl(name, **kwargs) HTTP can query for the name, in case of TCP / SSL / HTTP, SSL may provide an answer. > Right, these are the C OpenSSL APIs. My question was about the > Python protocol / transport level. How can they be exposed? Attributes of the Transport(-side of a Protocol in case of stacking), which can be queried. For TCP e.g. it would be handy to store connection-related things in a defined data structure which keeps domain, resolved addresses, and used-address for current connection together. like TCP.{local,remote}.{address,addresses,domain,port} For a client, SSL can query for "TCP.remote.domain" and in case it is not an ip address - use for SNI. For a server, HTTP can query SSL.server_name_indication. > An internal buffering mechanism in a protocol can avoid making many > copies and concatenations (e.g. by using a list or a deque to buffer the > incoming chunks). The transport cannot, since the Protocol API mandates > that data_received() be called with a bytes object representing the > available data. bytes-like would be much better then for the definition of data_received. same semantics, but a list of memoryviews with offset, whatever is required internally. MfG Markus From guido at python.org Tue Jan 8 02:06:35 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 7 Jan 2013 17:06:35 -0800 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan wrote: > On Mon, Jan 7, 2013 at 2:24 AM, Guido van Rossum wrote: >> On Sunday, January 6, 2013, Nick Coghlan wrote: >>> >>> On Sun, Jan 6, 2013 at 8:20 PM, Laurens Van Houtven <_ at lvh.cc> wrote: >>> > Hi Nick, >>> > >>> > >>> > When you say "high latency" (in __exit__), what does "high" mean? Is >>> > that >>> > order of magnitude what __exit__ usually means now, or network IO >>> > included? >>> > >>> > (Use case: distributed locking and remotely stored locks: it doesn't >>> > take a >>> > long time on network scales, but it can take a long time on CPU scales.) >>> >>> The status quo can only be made to work for in-memory locks. If the >>> release step involves network access, then it's closer to the >>> "database transaction" use case, because the __exit__ method may need >>> to block. >> >> But you don't need to wait for the release. You can do that asynchronously. > > Ah, true, I hadn't thought of that. So yes, any case where the > __exit__ method can be "fire-and-forget" is also straightforward to > implement with just PEP 3156. That takes us back to things like > database transactions being the only ones where And 'yielding' wouldn't do anything about this, would it? >> Also, have you given the implementation of your 'yielding' proposal any >> thought yet? > > Not in depth. Off the top of my head, I'd suggest: > - make "yielding" a new kind of node in the grammar (so you can't > write "yielding expr" in arbitrary locations, but only in those that > are marked as allowing it) > - flag for loops and with statements as accepting these nodes as > iterables and context managers respectively > - create a new Yielding AST node (with a single Expr node as the child) > - emit different bytecode in the affected compound statements based > on whether the relevant subnode is an ordinary expression (thus > invoking the special methods as "obj.__method__()") or a yielding one > (thus invoking the special methods as "yield from obj.__method__()"). > > I'm not seeing any obvious holes in that strategy, but I haven't > looked closely at the compiler code in a while, so there may be > limitations I haven't accounted for. So would 'yielding' insert the equivalent of 'yield from' or the equivalent of 'yield' in the code? -- --Guido van Rossum (python.org/~guido) From brian at python.org Tue Jan 8 04:38:47 2013 From: brian at python.org (Brian Curtin) Date: Mon, 7 Jan 2013 21:38:47 -0600 Subject: [Python-ideas] FYI - wiki.python.org compromised Message-ID: On December 28th, an unknown attacker used a previously unknown remote code exploit on http://wiki.python.org/. The attacker was able to get shell access as the "moin" user, but no other services were affected. Some time later, the attacker deleted all files owned by the "moin" user, including all instance data for both the Python and Jython wikis. The attack also had full access to all MoinMoin user data on all wikis. In light of this, the Python Software Foundation encourages all wiki users to change their password on other sites if the same one is in use elsewhere. We apologize for the inconvenience and will post further news as we bring the new and improved wiki.python.org online. If you have any questions about this incident please contact jnoller at python.org. Thank you for your patience. From ncoghlan at gmail.com Tue Jan 8 11:13:50 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 8 Jan 2013 20:13:50 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum wrote: > On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan wrote: >> Ah, true, I hadn't thought of that. So yes, any case where the >> __exit__ method can be "fire-and-forget" is also straightforward to >> implement with just PEP 3156. That takes us back to things like >> database transactions being the only ones where > > And 'yielding' wouldn't do anything about this, would it? Any new syntax should properly handle the database transaction context manager problem, otherwise what's the point? The workarounds for asynchronous __next__ and __enter__ methods aren't too bad - it's allowing asynchronous __exit__ methods that can only be solved with new syntax. >> I'm not seeing any obvious holes in that strategy, but I haven't >> looked closely at the compiler code in a while, so there may be >> limitations I haven't accounted for. > > So would 'yielding' insert the equivalent of 'yield from' or the > equivalent of 'yield' in the code? Given PEP 3156, the most logical would be for it to use "yield from", since that is becoming the asynchronous equivalent of a normal function call. Something like: with yielding db.session() as : # Do stuff here Could be made roughly equivalent to: _async_cm = db.session() conn = yield from _async_cm.__enter__() try: # Use session here except Exception as exc: # Rollback yield from _async_cm.__exit__(type(exc), exc, exc.__traceback__) else: # Commit yield from _async_cm.__exit__(None, None, None) Creating a contextlib.contextmanager style decorator for writing such asynchronous context managers would be difficult, though, as the two different meanings of "yield" would get in each other's way - you would need something like "yield EnterResult(expr)" to indicate to __enter__ in the wrapper object when to stop. It would probably be easier to just write separate __enter__ and __exit__ methods as coroutines. However, note that I just wanted to be clear that I consider the idea of a syntax for "asynchronous context managers" plausible, and sketched out a possible design to explain *why* I thought it should be possible. My focus will stay with PEP 432 until that's done. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Tue Jan 8 19:32:00 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 10:32:00 -0800 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 2:13 AM, Nick Coghlan wrote: > On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum wrote: >> On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan wrote: >>> Ah, true, I hadn't thought of that. So yes, any case where the >>> __exit__ method can be "fire-and-forget" is also straightforward to >>> implement with just PEP 3156. That takes us back to things like >>> database transactions being the only ones where >> >> And 'yielding' wouldn't do anything about this, would it? > > Any new syntax should properly handle the database transaction context > manager problem, otherwise what's the point? The workarounds for > asynchronous __next__ and __enter__ methods aren't too bad - it's > allowing asynchronous __exit__ methods that can only be solved with > new syntax. Is your idea that if you write "with yielding x as y: blah" this effectively replaces the calls to __enter__ and __exit__ with "yield from x.__enter__()" and "yield from x.__enter__()"? (And assigning the result of yield fro, x.__enter__() to y.) >>> I'm not seeing any obvious holes in that strategy, but I haven't >>> looked closely at the compiler code in a while, so there may be >>> limitations I haven't accounted for. >> >> So would 'yielding' insert the equivalent of 'yield from' or the >> equivalent of 'yield' in the code? > > Given PEP 3156, the most logical would be for it to use "yield from", > since that is becoming the asynchronous equivalent of a normal > function call. > > Something like: > > with yielding db.session() as : > # Do stuff here > > Could be made roughly equivalent to: > > _async_cm = db.session() > conn = yield from _async_cm.__enter__() > try: > # Use session here > except Exception as exc: > # Rollback > yield from _async_cm.__exit__(type(exc), exc, exc.__traceback__) > else: > # Commit > yield from _async_cm.__exit__(None, None, None) > > Creating a contextlib.contextmanager style decorator for writing such > asynchronous context managers would be difficult, though, as the two > different meanings of "yield" would get in each other's way - you > would need something like "yield EnterResult(expr)" to indicate to > __enter__ in the wrapper object when to stop. It would probably be > easier to just write separate __enter__ and __exit__ methods as > coroutines. > > However, note that I just wanted to be clear that I consider the idea > of a syntax for "asynchronous context managers" plausible, and > sketched out a possible design to explain *why* I thought it should be > possible. My focus will stay with PEP 432 until that's done. Sure, I didn't intend any time pressure. Others may take this up as well -- or if nobody cares, we can put it off until the need has been demonstrated more. possibly after Python 3.4 is release. -- --Guido van Rossum (python.org/~guido) From guido at python.org Tue Jan 8 21:11:25 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 12:11:25 -0800 Subject: [Python-ideas] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: (Trimming stuff that doesn't need a reply -- this doesn't mean I agree, just that I don't see a need for more discussion.) On Sun, Jan 6, 2013 at 7:45 AM, Markus wrote: > Exactly - signals are a mess, threading and signals make things worse > - I'm no expert here, but I just have had experienced problems with > signal handling and threads, basically the same problems you describe. > Creating the threads after installing signal handlers (in the main > thread) works, and signals get delivered to the main thread, > installing the signal handlers (in the main thread) after creating the > threads - and the signals ended up in *some thread*. > Additionally it depended on if you'd install your signal handler with > signal() or sigaction() and flags when creating threads. So I suppose you're okay with the signal handling API I proposed? I'll add it to the PEP then, with a note that it may raise an exception if not supported. > I'd expect the EventLoop never to create threads on his own behalf, > it's just wrong. Here's the way it works. You can call run_in_executor(executor, function, *args) where executor is an executor (a fancy thread pool) that you create. You have full control. However you can pass executor=None and then the event loop will create its own, default executor -- or it will use a default executor that you have created and given to it previously. It needs the default executor so that it can implement getaddrinfo() by calling the stdlib socket.getaddrinfo() in a thread -- and getaddrinfo() is essential for creating transports. The user can take full control over the executor though -- you could set the default to something that always raises an exception. > If you can't provide some functionality without threads, don't provide > the functionality. I don't see this as an absolute requirement. The threads are an implementation detail (other event loop implementations could implement getaddrinfo() differently, taking directly to DNS using tasklets or callbacks), and you can control its use of threads. > Besides, getaddrinfo() is a bad choice, as it relies on distribution > specific flags. > For example ip6 link local scope exists on every current platform, but > - when resolving an link local scope address -not domain- with > getaddrinfo, getaddrinfo will fail if no global routed ipv6 address is > available on debian/ubuntu. Nevertheless it is the only thing available in the stdlib. If you want to improve it, that's fine, but just use the issue tracker. >>> As Transport are part of the PEP - some more: >>> >>> EventLoop >>> * create_transport(protocol_factory, host, port, **kwargs) >>> kwargs requires "local" - local address as tuple like >>> ('fe80::14ad:1680:54e1:6a91%eth0',0) - so you can bind when using ipv6 >>> link local scope. >>> or ('192.168.2.1',5060) - bind local port for udp >> >> Not sure I understand. What socket.connect() (or other API) call >> parameters does this correspond to? What can't expressed through the >> host and port parameters? > > In case you have multiple interfaces, and multiple gateways, you need > to assign the connection to an address - so the kernel knows which > interface to use for the connection - else he'd default to "the first" > interface. > In IPv6 link-local scope you can have multiple addresses in the same > subnet fe80:: - IIRC if you want to connect somewhere, you have to > either set the scope_id of the remote, or bind the "source" address > before - I don't know how to set the scope_id in python, it's in > sockaddr_in6. > > In terms of socket. it is a bind before a connect. > > s = socket.socket(AF_INET6,SOCK_DGRAM,0) > s.bind(('fe80::1',0)) > s.connect(('fe80::2',4712)) > > same for ipv4 in case you are multi homed and rely on source based routing. Ok, this seems a useful option to add to create_transport(). Your example shows SOCK_DGRAM -- is it also relevant for SOCK_STREAM? >>> Handler: >>> Requiring 2 handlers for every active connection r/w is highly ineffective. >> >> How so? What is the concern? > > Of course you can fold the fdsets, but in case you need a seperate > handler for write, you re-create it for every write - see below. That would seem to depend on the write rate. >>> Additionally, I can .stop() the handler without having to know the fd, >>> .stop() the handler, change the events the handler is looking for, >>> restart the handler with .start(). >>> In your proposal, I'd create a new handler every time I want to sent >>> something, poll for readability - discard the handler when I'm done, >>> create a new one for the next sent. >> >> The questions are, does it make any difference in efficiency (when >> using Python -- the performance of the C API is hardly relevant here), >> and how often does this pattern occur. > > Every time you send - you poll for write-ability, you get the > callback, you write, you got nothing left, you stop polling for > write-ability. That's not quite how it's implemented. The code first tries to send without polling. Since the socket is non-blocking, if this succeeds, great -- only if it returns a partial send or EAGAIN we register a callback. If the protocol keeps the buffer filled the callback doesn't have to be recreated each time. If the protocol doesn't keep the buffer full, we must unregister the callback to prevent select/poll/etc. from calling it over and over again, there's nothing you can do about that. >>> * reconnect() - I'd love to be able to reconnect a transport >> >> But what does that mean in general? It depends on the protocol (e.g. >> FTP, HTTP, IRC, SMTP) how much state must be restored/renegotiated >> upon a reconnect, and how much data may have to be re-sent. This seems >> a higher-level feature that transports and protocols will have to >> implement themselves. > > I don't need the EventLoop to sync my state upon reconnect - just have > the Transport providing the ability. > Protocols are free to use this, but do not have to. Aha, I get it. You want to be able to call transport.reconnect() from connection_lost() and it should respond by eventually calling protocol.connection_made(transport) again. Of course, this only applies to clients -- for a server to reconnect to a client makes no sense (it would be up to the client). That seems simple enough to implement, but Glyph recommended strongly against this, because reusing the protocol object often means that some private state of the protocol may not be properly reinitialized. It would also be difficult to decide where errors from the reconnect attempt should go -- reconnect() itself must return immediately (since connection_lost() cannot wait for I/O, it can only schedule async I/O events). But at a higher level in your app it would be easy to set this up: you just call eventloop.create_transport(lambda: protocol, ...) where protocol is a protocol instance you've created earlier. >> Twisted suggested something here which I haven't implemented yet but >> which seems reasonable -- using a series of short timeouts try >> connecting to the various addresses and keep the first one that >> connects successfully. If multiple addresses connect after the first >> timeout, too bad, just close the redundant sockets, little harm is >> done (though the timeouts should be tuned that this is relatively >> rare, because a server may waste significant resources on such >> redundant connects). > > Fast, yes - reasonable? - no. > How would you feel if web browsers behaved like this? I have no idea -- who says they aren't doing this? Browsers do tons of stuff that I am not aware of. > domain name has to be resolved, addresses ordered according to rfc X > which says prefer ipv6 etc., try connecting linear. Sure. It was just an idea. I'll see what Twisted actually does. >>> For almost all the timeouts I mentioned - the protocol needs to take >>> care - so the protocol has to exist before the connection is >>> established in case of outbound connections. >> >> I'm not sure I follow. Can you sketch out some code to help me here? >> ISTM that e.g. the DNS, connect and handshake timeouts can be >> implemented by the machinery that tries to set up the connection >> behind the scenes, and the user's protocol won't know anything of >> these shenanigans. The code that calls create_transport() (actually >> it'll probably be renamed create_client()) will just get a Future that >> either indicates success (and then the protocol and transport are >> successfully hooked up) or an error (and then no protocol was created >> -- whether or not a transport was created is an implementation >> detail). > > From my understanding the Future does not provide any information > which connection to which host using which protocol and credentials > failed? That's not up to the Future -- it just passes an exception object along. We could make this info available as attributes on the exception object, if there is a need. > I'd create the Procotol when trying to create a connection, so the > Protocol is informed when the Transport fails and can take action - > retry, whatever. I had this in an earlier version, but Glyph convinced me that this is the wrong design -- and it doesn't work for servers anyway, you must have a protocol factory there. >>> * data_received(data) - if it was possible to return the number of >>> bytes consumed by the protocol, and have the Transport buffer the rest >>> for the next io in call, one would avoid having to do this in every >>> Protocol on it's own - learned from experience. >> >> Twisted has a whole slew of protocol implementation subclasses that >> implement various strategies like line-buffering (including a really >> complex version where you can turn the line buffering on and off) and >> "netstrings". I am trying to limit the PEP's size by not including >> these, but I fully expect that in practice a set of useful protocol >> implementations will be created that handles common cases. I'm not >> convinced that putting this in the transport/protocol interface will >> make user code less buggy: it seems easy for the user code to miscount >> the bytes or not return a count at all in a rarely taken code branch. > > Please don't drop this. > > You never know how much data you'll receive, you never know how much > data you need for a message, so the Protocol needs a buffer. That all depends on what the protocol is trying to do. (The ECHO protocol certainly doesn't need a buffer. :-) > Having this io in buffer in the Transports allows every Protocol to > benefit, they try to read a message from the data passed to > data_received(), if the data received is not sufficient to create a > full message, they need to buffer it and wait for more data. Having it in a Protocol base class also allows every protocol that wants it to benefit, without complicating the transport. I can also see problems where the transport needs to keep calling data_received() until either all data is consumed or it returns 0 (no data consumed). It just doesn't seem right to make the transport responsible for this logic, since it doesn't know enough about the needs of the protocol. > So having the Protocol.data_received return the number of bytes the > Protocol could process, the Transport can do the job, saving it for > every Protocol. > Still - a protocol can have it's own buffering strategy, i.e. in case > of a incremental XML parser which does it's own buffering, and always > return len(data), so the Transport does not buffer anything. Right, data_received() is closely related to the concept of a "feed parser" which is used in a few places in the stdlib (http://docs.python.org/3/search.html?q=feed&check_keywords=yes&area=default) and even has a 3rd party implementation (http://pypi.python.org/pypi/feedparser/), and there the parser (i.e. the protocol equivalent) is always responsible for buffering data it cannot immediately process. >>> Next, what happens if a dns can not be resolved, ssl handshake (in >>> case ssl is transport) or connecting fails - in my opinion it's an >>> error the protocol is supposed to take care of >>> + error_dns >>> + error_ssl >>> + error_connecting >> >> The future returned by create_transport() (aka create_client()) will >> raise the exception. > > When do I get this exception - the EventLoop.run() raises? No, the eventloop doesn't normally raise, just whichever task is waiting for that future using 'yield from' will get the exception. Or you can use eventloop.run_until_complete() and then that call will raise. -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Wed Jan 9 02:04:30 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Jan 2013 11:04:30 +1000 Subject: [Python-ideas] Yielding through context managers In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 4:32 AM, Guido van Rossum wrote: > On Tue, Jan 8, 2013 at 2:13 AM, Nick Coghlan wrote: >> On Tue, Jan 8, 2013 at 11:06 AM, Guido van Rossum wrote: >>> On Sun, Jan 6, 2013 at 9:47 PM, Nick Coghlan wrote: >>>> Ah, true, I hadn't thought of that. So yes, any case where the >>>> __exit__ method can be "fire-and-forget" is also straightforward to >>>> implement with just PEP 3156. That takes us back to things like >>>> database transactions being the only ones where >>> >>> And 'yielding' wouldn't do anything about this, would it? >> >> Any new syntax should properly handle the database transaction context >> manager problem, otherwise what's the point? The workarounds for >> asynchronous __next__ and __enter__ methods aren't too bad - it's >> allowing asynchronous __exit__ methods that can only be solved with >> new syntax. > > Is your idea that if you write "with yielding x as y: blah" this > effectively replaces the calls to __enter__ and __exit__ with "yield > from x.__enter__()" and "yield from x.__enter__()"? (And assigning the > result of yield fro, x.__enter__() to y.) Yep - that's why it would need a new keyword, as the subexpression itself would be evaluated normally, while the later special method invocations would be wrapped in yield from expressions. >> However, note that I just wanted to be clear that I consider the idea >> of a syntax for "asynchronous context managers" plausible, and >> sketched out a possible design to explain *why* I thought it should be >> possible. My focus will stay with PEP 432 until that's done. > > Sure, I didn't intend any time pressure. Others may take this up as > well -- or if nobody cares, we can put it off until the need has been > demonstrated more. possibly after Python 3.4 is release. Yep - the fact you can fall back to an explicit try-finally if needed, or else use something like gevent to suspend implicitly if you want to use such idioms a lot makes it easy to justify postponing doing anything about it. I'll at least mention the idea in my python-notes essay, though. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yorik.sar at gmail.com Wed Jan 9 02:14:02 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 05:14:02 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted Message-ID: Hello. I've read the PEP and some things raise questions in my consciousness. Here they are. 1. Series of sock_ methods can be organized into a wrapper around sock object. This wrappers can then be saved and used later in async-aware code. This way code like: sock = socket(...) # later, e.g. in connect() yield from tulip.get_event_loop().sock_connect(sock, ...) # later, e.g. in read() data = yield from tulip.get_event_loop().sock_recv(sock, ...) will look like: sock = socket(...) async_sock = tulip.get_event_loop().wrap_socket(sock) # later, e.g. in connect() yield from async_sock.connect(...) # later, e.g. in read() data = yield from async_sock.recv(...) Interface looks cleaner while plain calls (if they ever needed) will be only 5 chars longer. 2. Not as great, but still possible to wrap fd in similar way to make interface simpler. Instead of: add_reader(fd, callback, *args) remove_reader(fd) We can do: wrap_fd(fd).reader = functools.partial(callback, *args) wrap_fd(fd).reader = None # or del wrap_fd(fd).reader 3. Why not use properties (or fields) instead of methods for cancelled, running and done in Future class? I think, it'll be easier to use since I expect such attributes to be accessed as properties. I see it as some javaism since in Java Future have getters for this fields but they are prefixed with 'is'. 4. Why separate exception() from result() for Future class? It does the same as result() but with different interface (return instead of raise). Doesn't this violate the rule "There should be one obvious way to do it"? 5. I think, protocol and transport methods' names are not easy or understanding enough: - write_eof() does not write anything but closes smth, should be close_writing or smth alike; - the same way eof_received() should become smth like receive_closed; - pause() and resume() work with reading only, so they should be suffixed (prefixed) with read(ing), like pause_reading(), resume_reading(). Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 9 03:31:04 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Jan 2013 12:31:04 +1000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 11:14 AM, Yuriy Taraday wrote: > 4. Why separate exception() from result() for Future class? It does the same > as result() but with different interface (return instead of raise). Doesn't > this violate the rule "There should be one obvious way to do it"? The exception() method exists for the same reason that we support both "key in mapping" and raising KeyError from "mapping[key]": sometimes you want "Look Before You Leap", other times you want to let the exception fly. If you want the latter, just call .result() directly, if you want the former, check .exception() first. Regardless, the Future API isn't really being defined in PEP 3156, as it is mostly inheritied from the previously implemented PEP 3148 (http://www.python.org/dev/peps/pep-3148/#future-objects) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Wed Jan 9 03:49:50 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 18:49:50 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 6:07 PM, Benjamin Peterson wrote: > 2013/1/8 Yuriy Taraday : >> 4. Why separate exception() from result() for Future class? It does the same >> as result() but with different interface (return instead of raise). Doesn't >> this violate the rule "There should be one obvious way to do it"? > > I expect that's a copy-and-paste error. exception() will return the > exception if one occured. I don't see the typo. It is as Nick explained. -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Jan 9 04:06:19 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 19:06:19 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 6:53 PM, Benjamin Peterson wrote: > 2013/1/8 Guido van Rossum : >> On Tue, Jan 8, 2013 at 6:07 PM, Benjamin Peterson wrote: >>> 2013/1/8 Yuriy Taraday : >>>> 4. Why separate exception() from result() for Future class? It does the same >>>> as result() but with different interface (return instead of raise). Doesn't >>>> this violate the rule "There should be one obvious way to do it"? >>> >>> I expect that's a copy-and-paste error. exception() will return the >>> exception if one occured. >> >> I don't see the typo. It is as Nick explained. > > PEP 3156 says "exception(). Difference with PEP 3148: This has no > timeout argument and does not wait; if the future is not yet done, it > raises an exception." I assume it's not supposed to raise. No, actually, in that case it *does* raise an exception, because it means that the caller didn't understand the interface. It *returns* an exception object when the Future is done but the "result" is exceptional. But it *raises* when the Future is not done yet. -- --Guido van Rossum (python.org/~guido) From yorik.sar at gmail.com Wed Jan 9 04:56:17 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 07:56:17 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 6:31 AM, Nick Coghlan wrote: > On Wed, Jan 9, 2013 at 11:14 AM, Yuriy Taraday > wrote: > > 4. Why separate exception() from result() for Future class? It does the > same > > as result() but with different interface (return instead of raise). > Doesn't > > this violate the rule "There should be one obvious way to do it"? > > The exception() method exists for the same reason that we support both > "key in mapping" and raising KeyError from "mapping[key]": sometimes > you want "Look Before You Leap", other times you want to let the > exception fly. If you want the latter, just call .result() directly, > if you want the former, check .exception() first. > Ok, I get it now. Thank you for clarifying. > Regardless, the Future API isn't really being defined in PEP 3156, as > it is mostly inheritied from the previously implemented PEP 3148 > (http://www.python.org/dev/peps/pep-3148/#future-objects) > Then #3 and #4 are about PEP 3148. Why was it done this way? Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 9 05:31:58 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 20:31:58 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday wrote: > I've read the PEP and some things raise questions in my consciousness. Here > they are. Thanks! > 1. Series of sock_ methods can be organized into a wrapper around sock > object. This wrappers can then be saved and used later in async-aware code. > This way code like: > > sock = socket(...) > # later, e.g. in connect() > yield from tulip.get_event_loop().sock_connect(sock, ...) > # later, e.g. in read() > data = yield from tulip.get_event_loop().sock_recv(sock, ...) > > will look like: > > sock = socket(...) > async_sock = tulip.get_event_loop().wrap_socket(sock) > # later, e.g. in connect() > yield from async_sock.connect(...) > # later, e.g. in read() > data = yield from async_sock.recv(...) > > Interface looks cleaner while plain calls (if they ever needed) will be only > 5 chars longer. This is a semi-internal API that is mostly useful to Transport implementers, and there won't be many of those. So I prefer the API that has the fewest classes. > 2. Not as great, but still possible to wrap fd in similar way to make > interface simpler. Instead of: > > add_reader(fd, callback, *args) > remove_reader(fd) > > We can do: > > wrap_fd(fd).reader = functools.partial(callback, *args) > wrap_fd(fd).reader = None # or > del wrap_fd(fd).reader Ditto. > 3. Why not use properties (or fields) instead of methods for cancelled, > running and done in Future class? I think, it'll be easier to use since I > expect such attributes to be accessed as properties. I see it as some > javaism since in Java Future have getters for this fields but they are > prefixed with 'is'. Too late, this is how PEP 3148 defined it. It was indeed inspired by Java Futures. However I would defend using methods here, since these are not all that cheap -- they have to acquire and release a lock. > 4. Why separate exception() from result() for Future class? It does the same > as result() but with different interface (return instead of raise). Doesn't > this violate the rule "There should be one obvious way to do it"? Because it is quite awkward to check for an exception if you have to catch it (4 lines instead of 1). > 5. I think, protocol and transport methods' names are not easy or > understanding enough: > - write_eof() does not write anything but closes smth, should be > close_writing or smth alike; > - the same way eof_received() should become smth like receive_closed; I am indeed struggling a bit with these names, but "writing an EOF" is actually how I think of this (maybe I am dating myself to the time of mag tapes though :-). > - pause() and resume() work with reading only, so they should be suffixed > (prefixed) with read(ing), like pause_reading(), resume_reading(). Agreed. -- --Guido van Rossum (python.org/~guido) From guido at python.org Wed Jan 9 05:50:32 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 20:50:32 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 8:31 PM, Guido van Rossum wrote: > On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday wrote: >> - pause() and resume() work with reading only, so they should be suffixed >> (prefixed) with read(ing), like pause_reading(), resume_reading(). > > Agreed. I think I want to take that back. I think it is more common for a protocol to want to pause the transport (i.e. hold back data_received() calls) than it is for a transport to want to pause the protocol (i.e. hold back write() calls). So the more common method can have a shorter name. Also, pause_reading() is almost confusing, since the protocol's method is named data_received(), not read_data(). Also, there's no reason for the protocol to want to pause the *write* (send) actions of the transport -- if wanted to write less it should not have called write(). The reason to distinguish between the two modes of pausing is because it is sometimes useful to "stack" multiple protocols, and then a protocol in the middle of the stack acts as a transport to the protocol next to it (and vice versa). See the discussion on this list previously, e.g. http://mail.python.org/pipermail/python-ideas/2013-January/018522.html (search for the keyword "stack" in this long message to find the relevant section). -- --Guido van Rossum (python.org/~guido) From yorik.sar at gmail.com Wed Jan 9 06:02:23 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 09:02:23 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 8:31 AM, Guido van Rossum wrote: > On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday wrote: > > I've read the PEP and some things raise questions in my consciousness. > Here > > they are. > > Thanks! > > > 1. Series of sock_ methods can be organized into a wrapper around sock > > object. This wrappers can then be saved and used later in async-aware > code. > > This is a semi-internal API that is mostly useful to Transport > implementers, and there won't be many of those. So I prefer the API > that has the fewest classes. > > > 2. Not as great, but still possible to wrap fd in similar way to make > > interface simpler. > > Ditto. > Ok, I see. Should transports be bound to event loop on creation? I wonder, what would happen if someone changes current event loop between these calls. > > > 3. Why not use properties (or fields) instead of methods for cancelled, > > running and done in Future class? I think, it'll be easier to use since I > > expect such attributes to be accessed as properties. I see it as some > > javaism since in Java Future have getters for this fields but they are > > prefixed with 'is'. > > Too late, this is how PEP 3148 defined it. It was indeed inspired by > Java Futures. However I would defend using methods here, since these > are not all that cheap -- they have to acquire and release a lock. > > I understand why it should be a method, but still if it's a getter, it should have either get_ or is_ prefix. Are there any way to change this with 'Final' PEP? > > 4. Why separate exception() from result() for Future class? It does the > same > > as result() but with different interface (return instead of raise). > Doesn't > > this violate the rule "There should be one obvious way to do it"? > > Because it is quite awkward to check for an exception if you have to > catch it (4 lines instead of 1). > > > 5. I think, protocol and transport methods' names are not easy or > > understanding enough: > > - write_eof() does not write anything but closes smth, should be > > close_writing or smth alike; > > - the same way eof_received() should become smth like receive_closed; > > I am indeed struggling a bit with these names, but "writing an EOF" is > actually how I think of this (maybe I am dating myself to the time of > mag tapes though :-). > > I never saw a computer working with a tape, but it's clear to me what does they do. I've just imagined the amount of words I'll have to say to students about EOFs instead of simple "it closes our end of one half of a socket". > - pause() and resume() work with reading only, so they should be suffixed > > (prefixed) with read(ing), like pause_reading(), resume_reading(). > > Agreed. > > -- > --Guido van Rossum (python.org/~guido) > -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 9 06:14:05 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 21:14:05 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday wrote: > On Wed, Jan 9, 2013 at 8:31 AM, Guido van Rossum wrote: >> On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday wrote: >> > 1. Series of sock_ methods can be organized into a wrapper around sock >> > object. This wrappers can then be saved and used later in async-aware >> > code. >> >> This is a semi-internal API that is mostly useful to Transport >> implementers, and there won't be many of those. So I prefer the API >> that has the fewest classes. >> >> > 2. Not as great, but still possible to wrap fd in similar way to make >> > interface simpler. >> >> Ditto. > > > Ok, I see. > Should transports be bound to event loop on creation? I wonder, what would > happen if someone changes current event loop between these calls. Yes, this is what the transport implementation does. >> > 3. Why not use properties (or fields) instead of methods for cancelled, >> > running and done in Future class? I think, it'll be easier to use since >> > I >> > expect such attributes to be accessed as properties. I see it as some >> > javaism since in Java Future have getters for this fields but they are >> > prefixed with 'is'. >> >> Too late, this is how PEP 3148 defined it. It was indeed inspired by >> Java Futures. However I would defend using methods here, since these >> are not all that cheap -- they have to acquire and release a lock. >> > > I understand why it should be a method, but still if it's a getter, it > should have either get_ or is_ prefix. Why? That's not a universal coding standard. The names seem clear enough to me. > Are there any way to change this with 'Final' PEP? No, the concurrent.futures package has been released (I forget if it was Python 3.2 or 3.3) and we're bound to backwards compatibility. Also I really don't think it's a big deal at all. >> > 4. Why separate exception() from result() for Future class? It does the >> > same >> > as result() but with different interface (return instead of raise). >> > Doesn't >> > this violate the rule "There should be one obvious way to do it"? >> >> Because it is quite awkward to check for an exception if you have to >> catch it (4 lines instead of 1). >> >> >> > 5. I think, protocol and transport methods' names are not easy or >> > understanding enough: >> > - write_eof() does not write anything but closes smth, should be >> > close_writing or smth alike; >> > - the same way eof_received() should become smth like receive_closed; >> >> I am indeed struggling a bit with these names, but "writing an EOF" is >> actually how I think of this (maybe I am dating myself to the time of >> mag tapes though :-). >> > I never saw a computer working with a tape, but it's clear to me what does > they do. > I've just imagined the amount of words I'll have to say to students about > EOFs instead of simple "it closes our end of one half of a socket". But which half? A socket is two independent streams, one in each direction. Twisted uses half_close() for this concept but unless you already know what this is for you are left wondering which half. Which is why I like using 'write' in the name. -- --Guido van Rossum (python.org/~guido) From yorik.sar at gmail.com Wed Jan 9 06:26:09 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 09:26:09 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 9:14 AM, Guido van Rossum wrote: > On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday wrote: > > Should transports be bound to event loop on creation? I wonder, what > would > > happen if someone changes current event loop between these calls. > > Yes, this is what the transport implementation does. > But in theory every sock_ call is independent and returns Future bound to current event loop. So if one change event loop with active transport, nothing bad should happen. Or I'm missing something. > > I understand why it should be a method, but still if it's a getter, it > > should have either get_ or is_ prefix. > > Why? That's not a universal coding standard. The names seem clear enough > to me. > When I see (in autocompletion, for example) or remember name like "running", it triggers thought that it's a field. When I remember smth like is_running, it definitely associates with method. > > Are there any way to change this with 'Final' PEP? > > No, the concurrent.futures package has been released (I forget if it > was Python 3.2 or 3.3) and we're bound to backwards compatibility. > Also I really don't think it's a big deal at all. > Yes, not a big deal. > > >> > 5. I think, protocol and transport methods' names are not easy or > >> > understanding enough: > >> > - write_eof() does not write anything but closes smth, should be > >> > close_writing or smth alike; > >> > - the same way eof_received() should become smth like receive_closed; > >> > >> I am indeed struggling a bit with these names, but "writing an EOF" is > >> actually how I think of this (maybe I am dating myself to the time of > >> mag tapes though :-). > >> > > I never saw a computer working with a tape, but it's clear to me what > does > > they do. > > I've just imagined the amount of words I'll have to say to students about > > EOFs instead of simple "it closes our end of one half of a socket". > > But which half? A socket is two independent streams, one in each > direction. Twisted uses half_close() for this concept but unless you > already know what this is for you are left wondering which half. Which > is why I like using 'write' in the name. Yes, 'write' part is good, I should mention it. I meant to say that I won't need to explain that there were days when we had to handle a special marker at the end of file. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Wed Jan 9 06:42:30 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 09 Jan 2013 14:42:30 +0900 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: <87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp> Is this thread really ready to migrate to python-dev when we're still bikeshedding method names? Yuriy Taraday writes: > > But which half? A socket is two independent streams, one in each > > direction. Twisted uses half_close() for this concept but unless you > > already know what this is for you are left wondering which half. Which > > is why I like using 'write' in the name. > > Yes, 'write' part is good, I should mention it. I meant to say that I won't > need to explain that there were days when we had to handle a special marker > at the end of file. Mystery is good for students. Getting serious, "close_writer" occured to me as a possibility. From jstpierre at mecheye.net Wed Jan 9 06:59:39 2013 From: jstpierre at mecheye.net (Jasper St. Pierre) Date: Wed, 9 Jan 2013 00:59:39 -0500 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: <87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp> References: <87k3rmap4p.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Well, if we're at the "bikeshedding about names" stage, that means that no serious issues with the proposal are left. So it's a sign of progress. On Wed, Jan 9, 2013 at 12:42 AM, Stephen J. Turnbull wrote: > Is this thread really ready to migrate to python-dev when we're still > bikeshedding method names? > > Yuriy Taraday writes: > > > > But which half? A socket is two independent streams, one in each > > > direction. Twisted uses half_close() for this concept but unless you > > > already know what this is for you are left wondering which half. Which > > > is why I like using 'write' in the name. > > > > Yes, 'write' part is good, I should mention it. I meant to say that I > won't > > need to explain that there were days when we had to handle a special > marker > > at the end of file. > > Mystery is good for students. > > Getting serious, "close_writer" occured to me as a possibility. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Jasper -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 9 07:02:58 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 8 Jan 2013 22:02:58 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Tue, Jan 8, 2013 at 9:26 PM, Yuriy Taraday wrote: > > > > On Wed, Jan 9, 2013 at 9:14 AM, Guido van Rossum wrote: >> >> On Tue, Jan 8, 2013 at 9:02 PM, Yuriy Taraday wrote: >> > Should transports be bound to event loop on creation? I wonder, what >> > would >> > happen if someone changes current event loop between these calls. >> >> Yes, this is what the transport implementation does. > > > But in theory every sock_ call is independent and returns Future bound to > current event loop. It is bound to the event loop whose sock_() method you called. > So if one change event loop with active transport, nothing bad should > happen. Or I'm missing something. Changing event loops in the middle of event processing is not a common (or even useful) pattern. You start the event loop and then leave it alone. >> > I understand why it should be a method, but still if it's a getter, it >> > should have either get_ or is_ prefix. >> >> Why? That's not a universal coding standard. The names seem clear enough >> to me. > > > When I see (in autocompletion, for example) or remember name like "running", > it triggers thought that it's a field. When I remember smth like is_running, > it definitely associates with method. That must pretty specific to your personal experience. >> > Are there any way to change this with 'Final' PEP? >> >> No, the concurrent.futures package has been released (I forget if it >> was Python 3.2 or 3.3) and we're bound to backwards compatibility. >> Also I really don't think it's a big deal at all. > > > Yes, not a big deal. >> >> >> >> > 5. I think, protocol and transport methods' names are not easy or >> >> > understanding enough: >> >> > - write_eof() does not write anything but closes smth, should be >> >> > close_writing or smth alike; >> >> > - the same way eof_received() should become smth like receive_closed; >> >> >> >> I am indeed struggling a bit with these names, but "writing an EOF" is >> >> actually how I think of this (maybe I am dating myself to the time of >> >> mag tapes though :-). >> >> >> > I never saw a computer working with a tape, but it's clear to me what >> > does >> > they do. >> > I've just imagined the amount of words I'll have to say to students >> > about >> > EOFs instead of simple "it closes our end of one half of a socket". >> >> But which half? A socket is two independent streams, one in each >> direction. Twisted uses half_close() for this concept but unless you >> already know what this is for you are left wondering which half. Which >> is why I like using 'write' in the name. > > > Yes, 'write' part is good, I should mention it. I meant to say that I won't > need to explain that there were days when we had to handle a special marker > at the end of file. But even today you have to mark the end somehow, to distinguish it from "not done yet, more could be coming". The equivalent is typing ^D into a UNIX terminal (or ^Z on Windows). -- --Guido van Rossum (python.org/~guido) From glyph at twistedmatrix.com Wed Jan 9 10:30:43 2013 From: glyph at twistedmatrix.com (Glyph) Date: Wed, 9 Jan 2013 01:30:43 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: <69E9D1F0-50C3-4F49-998A-3EEB79611C43@twistedmatrix.com> On Jan 8, 2013, at 9:14 PM, Guido van Rossum wrote: > But which half? A socket is two independent streams, one in each > direction. Twisted uses half_close() for this concept but unless you > already know what this is for you are left wondering which half. Which > is why I like using 'write' in the name. I should add, if you don't already know what this means you really shouldn't be trying to do it ;-). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Wed Jan 9 11:28:42 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 09 Jan 2013 10:28:42 +0000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On 09/01/2013 2:31am, Nick Coghlan wrote: > The exception() method exists for the same reason that we support both > "key in mapping" and raising KeyError from "mapping[key]": sometimes > you want "Look Before You Leap", other times you want to let the > exception fly. If you want the latter, just call .result() directly, > if you want the former, check .exception() first. But how can you do LBYL. I can't see a way to check that an exception has occurred seeing whether result() raises an error: done() tells you that the operation is finished, but not whether it succeeded. -- Richard From yorik.sar at gmail.com Wed Jan 9 11:45:55 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 14:45:55 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 8:50 AM, Guido van Rossum wrote: > On Tue, Jan 8, 2013 at 8:31 PM, Guido van Rossum wrote: > > On Tue, Jan 8, 2013 at 5:14 PM, Yuriy Taraday > wrote: > >> - pause() and resume() work with reading only, so they should be > suffixed > >> (prefixed) with read(ing), like pause_reading(), resume_reading(). > > > > Agreed. > > I think I want to take that back. I think it is more common for a > protocol to want to pause the transport (i.e. hold back > data_received() calls) than it is for a transport to want to pause the > protocol (i.e. hold back write() calls). So the more common method can > have a shorter name. Also, pause_reading() is almost confusing, since > the protocol's method is named data_received(), not read_data(). Also, > there's no reason for the protocol to want to pause the *write* (send) > actions of the transport -- if wanted to write less it should not have > called write(). The reason to distinguish between the two modes of > pausing is because it is sometimes useful to "stack" multiple > protocols, and then a protocol in the middle of the stack acts as a > transport to the protocol next to it (and vice versa). See the > discussion on this list previously, e.g. > http://mail.python.org/pipermail/python-ideas/2013-January/018522.html > (search for the keyword "stack" in this long message to find the > relevant section). I totally agree with protocol/transport stacking, anyone should be able to do some ugly thing like FTP over SSL over SOCKS over SSL over HTTP (j/k). Just take a look at what you can do with netgraph in *BSD (anything over anything with any number of layers). But still we shouldn't sacrifice ease of understanding (both docs and code) for couple extra chars (10 actually). Yes, 'reading' is misleading, pause_receiving and resume_receiving are better. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 9 11:54:42 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Jan 2013 20:54:42 +1000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 8:28 PM, Richard Oudkerk wrote: > On 09/01/2013 2:31am, Nick Coghlan wrote: >> >> The exception() method exists for the same reason that we support both >> "key in mapping" and raising KeyError from "mapping[key]": sometimes >> you want "Look Before You Leap", other times you want to let the >> exception fly. If you want the latter, just call .result() directly, >> if you want the former, check .exception() first. > > > But how can you do LBYL. I can't see a way to check that an exception has > occurred seeing whether result() raises an error: done() tells you that the > operation is finished, but not whether it succeeded. You need to combine it with the other LBYL checks (f.done() and f.cancelled()) to be sure it won't throw an exception. if f.done() and not f.cancelled(): # Since we now know neither TimeoutError nor CancelledError can happen, # we can check for exceptions either by calling f.exception() or # by calling f.result() inside a try/except block # The latter will usually be the better option Just calling f.result() is by far the most common, but the other can be convenient in some cases (e.g. if you're writing a scheduler that needs to check if it should be calling send() or throw() on a generator). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yorik.sar at gmail.com Wed Jan 9 11:55:27 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 14:55:27 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 10:02 AM, Guido van Rossum wrote: > Changing event loops in the middle of event processing is not a common > (or even useful) pattern. You start the event loop and then leave it > alone. > Yes. It was not-so-great morning idea. > Yes, 'write' part is good, I should mention it. I meant to say that I > won't > > need to explain that there were days when we had to handle a special > marker > > at the end of file. > > But even today you have to mark the end somehow, to distinguish it > from "not done yet, more could be coming". The equivalent is typing ^D > into a UNIX terminal (or ^Z on Windows). My interns told me that they remember EOF as special object only from high school when they had to study Pascal. I guess, in 5 years students won't understand how one can write an EOF. (and schools will finally replace Pascal with Python) -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From yorik.sar at gmail.com Wed Jan 9 12:00:00 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 15:00:00 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: <20130109103911.4f599709@pitrou.net> References: <20130109103911.4f599709@pitrou.net> Message-ID: On Wed, Jan 9, 2013 at 1:39 PM, Antoine Pitrou wrote: > > Hi Yuriy, > > For the record, it isn't necessary to cross-post. python-ideas is > the place for discussing this, and most interested people will be > subscribed to both python-ideas and python-dev, and therefore they get > duplicate messages. > Oh, sorry. I just found this thread in both MLs, so decided to send it to both. This will be my last email (for now) on this topic at python-dev. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Jan 9 12:06:45 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 9 Jan 2013 21:06:45 +1000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 8:55 PM, Yuriy Taraday wrote: > My interns told me that they remember EOF as special object only from high > school when they had to study Pascal. I guess, in 5 years students won't > understand how one can write an EOF. (and schools will finally replace > Pascal with Python) Python really doesn't try to avoid the concept of an End-of-file marker. ================ $ python3 Python 3.2.3 (default, Jun 8 2012, 05:36:09) [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> quit Use quit() or Ctrl-D (i.e. EOF) to exit >>> import io >>> print(io.FileIO.read.__doc__) read(size: int) -> bytes. read at most size bytes, returned as bytes. Only makes one system call, so less data may be returned than requested In non-blocking mode, returns None if no data is available. On end-of-file, returns ''. ================ Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shibturn at gmail.com Wed Jan 9 13:13:19 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 09 Jan 2013 12:13:19 +0000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On 09/01/2013 10:54am, Nick Coghlan wrote: > You need to combine it with the other LBYL checks (f.done() and > f.cancelled()) to be sure it won't throw an exception. > > if f.done() and not f.cancelled(): > # Since we now know neither TimeoutError nor CancelledError can happen, > # we can check for exceptions either by calling f.exception() or > # by calling f.result() inside a try/except block > # The latter will usually be the better option > > Just calling f.result() is by far the most common, but the other can > be convenient in some cases (e.g. if you're writing a scheduler that > needs to check if it should be calling send() or throw() on a > generator). Which goes to show that it cannot be used with LBYL. For exception() to be usable with LBYL one would need to be able to check that exception() returns a value without having to catch any exceptions -- either from exception() or from result(). But you can only check that exception() doesn't raise an error by calling result() to ensure that it does raise an error. But then you might as well catch the exception from result(). And the idea of calling exception() first and then result() if it fails is just crazy. As things stand, exception() is pointless. -- Richard From yorik.sar at gmail.com Wed Jan 9 13:51:09 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 9 Jan 2013 16:51:09 +0400 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 4:13 PM, Richard Oudkerk wrote: > On 09/01/2013 10:54am, Nick Coghlan wrote: > >> You need to combine it with the other LBYL checks (f.done() and >> f.cancelled()) to be sure it won't throw an exception. >> >> if f.done() and not f.cancelled(): >> # Since we now know neither TimeoutError nor CancelledError can >> happen, >> # we can check for exceptions either by calling f.exception() or >> # by calling f.result() inside a try/except block >> # The latter will usually be the better option >> >> Just calling f.result() is by far the most common, but the other can >> be convenient in some cases (e.g. if you're writing a scheduler that >> needs to check if it should be calling send() or throw() on a >> generator). >> > > Which goes to show that it cannot be used with LBYL. > > For exception() to be usable with LBYL one would need to be able to check > that exception() returns a value without having to catch any exceptions -- > either from exception() or from result(). > > But you can only check that exception() doesn't raise an error by calling > result() to ensure that it does raise an error. But then you might as well > catch the exception from result(). > > And the idea of calling exception() first and then result() if it fails is > just crazy. > > As things stand, exception() is pointless. exception() will raise only TimeoutError or CancelledError, exceptions from the Future computation are not raised, they are returned. So to verify that a Future is properly computed, you should write: f.done() and not f.cancelled() and f.exception() is None and you won't have to catch any exceptions. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Wed Jan 9 13:59:49 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 09 Jan 2013 12:59:49 +0000 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On 09/01/2013 12:51pm, Yuriy Taraday wrote: > exception() will raise only TimeoutError or CancelledError, exceptions > from the Future computation are not raised, they are returned. > So to verify that a Future is properly computed, you should write: > > f.done() and not f.cancelled() and f.exception() is None > > and you won't have to catch any exceptions. Ah. I missed the point that exception() returns None (rather than raising) if there was no exception. -- Richard From guido at python.org Wed Jan 9 16:58:10 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 9 Jan 2013 07:58:10 -0800 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 4:13 AM, Richard Oudkerk wrote: > On 09/01/2013 10:54am, Nick Coghlan wrote: >> >> You need to combine it with the other LBYL checks (f.done() and >> f.cancelled()) to be sure it won't throw an exception. >> >> if f.done() and not f.cancelled(): >> # Since we now know neither TimeoutError nor CancelledError can >> happen, >> # we can check for exceptions either by calling f.exception() or >> # by calling f.result() inside a try/except block >> # The latter will usually be the better option >> >> Just calling f.result() is by far the most common, but the other can >> be convenient in some cases (e.g. if you're writing a scheduler that >> needs to check if it should be calling send() or throw() on a >> generator). > > > Which goes to show that it cannot be used with LBYL. > > For exception() to be usable with LBYL one would need to be able to check > that exception() returns a value without having to catch any exceptions -- > either from exception() or from result(). > > But you can only check that exception() doesn't raise an error by calling > result() to ensure that it does raise an error. But then you might as well > catch the exception from result(). > > And the idea of calling exception() first and then result() if it fails is > just crazy. > > As things stand, exception() is pointless. Not true -- if the future has a callback associated with it, the callback (or callbacks) is called when it becomes "done", and if the callback wants to check for an exception it can use exception(). The callback is guaranteed that the future is done so it doesn't have to worry about the exception that is raised if the future isn't done. (Of course a callback can also just call result() and catch the exception, or let it bubble out -- in that case it will be logged by the event loop and then dropped.) -- --Guido van Rossum (python.org/~guido) From federico.dev at reghe.net Thu Jan 10 21:44:54 2013 From: federico.dev at reghe.net (Federico Reghenzani) Date: Thu, 10 Jan 2013 21:44:54 +0100 Subject: [Python-ideas] TCP Fast Open protocol Message-ID: Hi all, I'm new in Python development. I'm interesting in the new TCP Fast Open protocol (http://research.google.com/pubs/pub37517.html). This protocol is implemented in linux kernel 3.6 for client and 3.7 for server, and in python changeset 5435a9278028 are defined the related constants. This TCP change is an important optimization, in particular for http, and it is completely backward compatible: even if a client or a server doesn't support TFO, the connection proceed with normal procedure. I think can be useful an implementation in socketserver module: an attribute "allow_tcp_fast_open" that automatically set before listening the correct socket option (another attribute is necessary to choose the queue size). Similar implementation can be done in http modules. The default value of this attribute may be "True" (according to its backward compatibility), but new versions of glibc might expose TCP_FASTOPEN costant even if the kernel does not support it (so use hasattr to check if the constant exists don't guarantee that TFO is supported by kernel). Maybe more complex code can resolve this problem, but I don't know how do that (maybe catching exception or checking kernel version?) I attached the simple patch for socketserver (and doc), let me know what you think! Federico -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tfo.patch Type: application/octet-stream Size: 2079 bytes Desc: not available URL: From phd at phdru.name Thu Jan 10 21:55:41 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 11 Jan 2013 00:55:41 +0400 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: Message-ID: <20130110205541.GA1640@iskra.aviel.ru> Hi! On Thu, Jan 10, 2013 at 09:44:54PM +0100, Federico Reghenzani wrote: > I attached the simple patch for socketserver (and doc), let me know what > you think! The patch looks good at the first glance, thank you for the work! The better place for patches is the issue tracker at http://bugs.python.org -- patches in the mailing list tend to be lost. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From federico.dev at reghe.net Thu Jan 10 22:06:21 2013 From: federico.dev at reghe.net (Federico Reghenzani) Date: Thu, 10 Jan 2013 22:06:21 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <20130110205541.GA1640@iskra.aviel.ru> References: <20130110205541.GA1640@iskra.aviel.ru> Message-ID: Hi Oleg, I've posted here because I'm asking if it may be an idea make some changes also in http module, maybe setting that option on 'True' as default (but first we need to fix the kernel-glibc problem). Thanks, Federico -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Jan 10 22:19:38 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 11 Jan 2013 01:19:38 +0400 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> Message-ID: <20130110211938.GB1640@iskra.aviel.ru> On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani wrote: > I've posted here because I'm asking if it may be an idea make some changes > also in http module, maybe setting that option on 'True' as default (but > first we need to fix the kernel-glibc problem). I think IWBN to patch as many network modules as (ftplib, urllib, urllib2, xmlrpclib). Having tests also helps. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From guido at python.org Thu Jan 10 22:24:56 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Jan 2013 13:24:56 -0800 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <20130110211938.GB1640@iskra.aviel.ru> References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: Is there sample code for an HTTP client? What if the server doesn't yet support the feature? On Thu, Jan 10, 2013 at 1:19 PM, Oleg Broytman wrote: > On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani wrote: >> I've posted here because I'm asking if it may be an idea make some changes >> also in http module, maybe setting that option on 'True' as default (but >> first we need to fix the kernel-glibc problem). > > I think IWBN to patch as many network modules as (ftplib, urllib, > urllib2, xmlrpclib). Having tests also helps. > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From phd at phdru.name Thu Jan 10 22:32:38 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 11 Jan 2013 01:32:38 +0400 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: <20130110213238.GC1640@iskra.aviel.ru> On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum wrote: > Is there sample code for an HTTP client? What if the server doesn't > yet support the feature? AFAIU the feature is implemented at the kernel level and doesn't require any change at the user level, only a socket option. If the server doesn't implement the feature the kernel on the client side transparently (to the client) reverts to normal 3-way TCP handshaking. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From benoitc at gunicorn.org Thu Jan 10 22:29:02 2013 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Thu, 10 Jan 2013 22:29:02 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: On Jan 10, 2013, at 10:24 PM, Guido van Rossum wrote: > Is there sample code for an HTTP client? What if the server doesn't > yet support the feature? Like I read it, this is transparent for the application if it doesn't support it. https://lwn.net/Articles/508865/ - beno?t > > On Thu, Jan 10, 2013 at 1:19 PM, Oleg Broytman wrote: >> On Thu, Jan 10, 2013 at 10:06:21PM +0100, Federico Reghenzani wrote: >>> I've posted here because I'm asking if it may be an idea make some changes >>> also in http module, maybe setting that option on 'True' as default (but >>> first we need to fix the kernel-glibc problem). >> >> I think IWBN to patch as many network modules as (ftplib, urllib, >> urllib2, xmlrpclib). Having tests also helps. >> >> Oleg. >> -- >> Oleg Broytman http://phdru.name/ phd at phdru.name >> Programmers don't die, they just GOSUB without RETURN. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Jan 10 22:34:54 2013 From: phd at phdru.name (Oleg Broytman) Date: Fri, 11 Jan 2013 01:34:54 +0400 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <20130110213238.GC1640@iskra.aviel.ru> References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> <20130110213238.GC1640@iskra.aviel.ru> Message-ID: <20130110213454.GD1640@iskra.aviel.ru> On Fri, Jan 11, 2013 at 01:32:38AM +0400, Oleg Broytman wrote: > On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum wrote: > > Is there sample code for an HTTP client? What if the server doesn't > > yet support the feature? > > AFAIU the feature is implemented at the kernel level and doesn't > require any change at the user level, only a socket option. If the > server doesn't implement the feature the kernel on the client side > transparently (to the client) reverts to normal 3-way TCP handshaking. Sorry, I was completely confused. Yes, clients need different calls: https://lwn.net/Articles/508865/ Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From guido at python.org Thu Jan 10 22:46:11 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 10 Jan 2013 13:46:11 -0800 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <20130110213454.GD1640@iskra.aviel.ru> References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> <20130110213238.GC1640@iskra.aviel.ru> <20130110213454.GD1640@iskra.aviel.ru> Message-ID: On Thu, Jan 10, 2013 at 1:34 PM, Oleg Broytman wrote: > On Fri, Jan 11, 2013 at 01:32:38AM +0400, Oleg Broytman wrote: >> On Thu, Jan 10, 2013 at 01:24:56PM -0800, Guido van Rossum wrote: >> > Is there sample code for an HTTP client? What if the server doesn't >> > yet support the feature? >> >> AFAIU the feature is implemented at the kernel level and doesn't >> require any change at the user level, only a socket option. If the >> server doesn't implement the feature the kernel on the client side >> transparently (to the client) reverts to normal 3-way TCP handshaking. > > Sorry, I was completely confused. Yes, clients need different calls: > https://lwn.net/Articles/508865/ Right, that's what I gleaned from skimming the referenced paper. But that and the lwn article you link only show C code. Let's see some Python! (I would try it, but no machine I have access to supports this yet.) Hopefully the OP has some sample Python code? Otherwise I think it's a little too early to adopt this... -- --Guido van Rossum (python.org/~guido) From geertj at gmail.com Fri Jan 11 00:10:10 2013 From: geertj at gmail.com (Geert Jansen) Date: Fri, 11 Jan 2013 00:10:10 +0100 Subject: [Python-ideas] TestMill - Python system testing Message-ID: Hi, my apologies if this is slightly off-topic but I believe this could be useful. As a side project, I've been working on a tool to use my company's cloud service to offer system testing for Python. The tool is called TestMill, and it allows you to test your Python project for free, remotely and in parallel on a range of different OSs (currently: Fedora, CentOS and Ubuntu). Feedback very much appreciated. Code can be found on Github here https://github.com/ravello/testmill. Regards, Geert From tjreedy at udel.edu Fri Jan 11 03:45:46 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 10 Jan 2013 21:45:46 -0500 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: On 1/10/2013 4:29 PM, Benoit Chesneau wrote: > > On Jan 10, 2013, at 10:24 PM, Guido van Rossum > > wrote: > >> Is there sample code for an HTTP client? What if the server doesn't >> yet support the feature? > > Like I read it, this is transparent for the application if it doesn't > support it. > > https://lwn.net/Articles/508865/ I read both the post (Aug 1, 2012, before the Linux 3.7 with the server code) and comments. FastOpen appears to still be an experimental proposal: "Currently, TFO is an Internet Draft with the IETF. ... (The current implementation employs the TCP Experimental Option Number facility as a placeholder for a real TCP Option Number.)". From the comments, I would say that its success outside of Google is not certain. It appears that its main use case is repeated requests to webservers from browswers. This is because the latter often make *multiple* requests, often short, to the same site in order to construct a displayed web page. There is no time saving on the first request of a series. I suspect that after Google updates Chrome to use the new feature, one of the other 'independent' browsers is likely to be the next user. To be active, the feature must be compiled into the socket code of both server and client machines AND must be explicitly requested by both client and server applications. On the server side, it must be requested because the request makes a promise that syn+data requests will be handled idempotently. (So the default should be 'off'.) This is trivial for static web pages but may require app-specific overhead for anything else. So, in general, the app should not bother being able to handle FastOpen unless it will be run on servers with FastOpen, and for efficiency, it should not add the overhead unless it is needed because a particular request is from a FastOpen client. This is not a problem for Google, with thousands of duplicate apps running on duplicate server configurations. But it was not clear in the OPs post how a Python app would know for sure whether a particular machine is FastOpen capable. I did not see the question of how a server app would know about the client connection type even addressed. On the client side, .connect and at least the first .send must be combined into either .sendto or .sendmsg (which?, still to be decided, apparently;-) with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite. If a particular server is not fastopen capable, then new fastopen client kernal socket code can potentially handle the fallback to the old way. But if the client is not fastopen capable, the the fallback must be handled in the Python .sendto code or else in the client code. (So one of those layers must *know* the client system capability.) Again, dealing with this, on multiple OSes, should be a lot easier for a monolithic browser like Chrome or Firefox (which might, on some systems, even use their own socket layer code), than for general purpose Python socket and app code. So my conclusion is that this is (mostly) premature for Python at this time. This is a slight performance enhancement of limited use that will make code at least slightly more complex in a core module that must be keep at least as rock solid as it is now. Let Google get it working on both their servers and Chrome browser. And wait for Mozilla, say, to add it to Firefox. Things might change before the first 3.4 beta, but I think 3.5 is more likely. Of course, testing will require all 4 combinations of client and server. -- Terry Jan Reedy From federico.dev at reghe.net Fri Jan 11 08:30:08 2013 From: federico.dev at reghe.net (Federico Reghenzani) Date: Fri, 11 Jan 2013 08:30:08 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy wrote: > > I read both the post (Aug 1, 2012, before the Linux 3.7 with the server > code) and comments. FastOpen appears to still be an experimental proposal: > "Currently, TFO is an Internet Draft with the IETF. ... (The current > implementation employs the TCP Experimental Option Number facility as a > placeholder for a real TCP Option Number.)". From the comments, I would say > that its success outside of Google is not certain. > > It appears that its main use case is repeated requests to webservers from > browswers. This is because the latter often make *multiple* requests, often > short, to the same site in order to construct a displayed web page. There > is no time saving on the first request of a series. I suspect that after > Google updates Chrome to use the new feature, one of the other > 'independent' browsers is likely to be the next user. > Yes, the protocol has been designed for situations where there are multiple requests such as HTTP or FTP. Probably only in these cases default 'True' option is appropriate. > > To be active, the feature must be compiled into the socket code of both > server and client machines AND must be explicitly requested by both client > and server applications. > > On the server side, it must be requested because the request makes a > promise that syn+data requests will be handled idempotently. (So the > default should be 'off'.) This is trivial for static web pages but may > require app-specific overhead for anything else. So, in general, the app > should not bother being able to handle FastOpen unless it will be run on > servers with FastOpen, and for efficiency, it should not add the overhead > unless it is needed because a particular request is from a FastOpen client. > If the server doesn't support FastOpen and receive a FastOpen request from a client capable, it simply ignores the TFO cookie and reply with a normal SYN+ACK. In this case the first packet (SYN+TFO from client) is only 4 byte larger than normal connection; no other packet is bigger than normal. So for an server app that does not support FastOpen, is completely transparent and does not cause any overhead. > > This is not a problem for Google, with thousands of duplicate apps running > on duplicate server configurations. But it was not clear in the OPs post > how a Python app would know for sure whether a particular machine is > FastOpen capable. I did not see the question of how a server app would know > about the client connection type even addressed. > The server know the client connection type by the first packet that it sends: if the first packet coming by client is a SYN+TFO cookie the server proceed to generate cookie and continue with a FastOpen connection, if the first packet is a SYN, the server proceed with normal 3-handshake connection. In any case these operations are transparent both to Python that application because they're made by kernel. > > On the client side, .connect and at least the first .send must be combined > into either .sendto or .sendmsg (which?, still to be decided, apparently;-) > with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite. > If a particular server is not fastopen capable, then new fastopen client > kernal socket code can potentially handle the fallback to the old way. But > if the client is not fastopen capable, the the fallback must be handled in > the Python .sendto code or else in the client code. (So one of those layers > must *know* the client system capability.) > As I said, if a client uses a .sendto or a .sendmsg with MSG_FASTOPEN on a server no-tfo capable, the linux kernel fallback to the old way, therefore it is as if it has done normal .connect and .send. The application don't know if the connection has been made in TFO-mode or normal mode and does not care to know. > > Again, dealing with this, on multiple OSes, should be a lot easier for a > monolithic browser like Chrome or Firefox (which might, on some systems, > even use their own socket layer code), than for general purpose Python > socket and app code. > > So my conclusion is that this is (mostly) premature for Python at this > time. This is a slight performance enhancement of limited use that will > make code at least slightly more complex in a core module that must be keep > at least as rock solid as it is now. Let Google get it working on both > their servers and Chrome browser. And wait for Mozilla, say, to add it to > Firefox. Things might change before the first 3.4 beta, but I think 3.5 is > more likely. Of course, testing will require all 4 combinations of client > and server. We can introduce TFO only in some modules such as HTTP or FTP. The code is not really complex: for the server is only a .setsockopt before .listen and for the client we should replace the .connect and the first .send with a single .sendto or .sendmsg. On Jan 10, 2013, at 10:46 PM, Guido van Rossum: > Hopefully the OP has some sample Python code? Yes, it is pratically same as C, I attached examples (I needed to declare manually TCP and MSG constants because my glibc hasn't them yet). Federico Reghenzani -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tfo.tar.gz Type: application/x-gzip Size: 10240 bytes Desc: not available URL: From benoitc at gunicorn.org Fri Jan 11 15:00:47 2013 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Fri, 11 Jan 2013 15:00:47 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: <59B626EC-5C9C-480E-AC8C-2299CAB139A9@gunicorn.org> On Jan 11, 2013, at 8:30 AM, Federico Reghenzani wrote: > > On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy wrote: > > I read both the post (Aug 1, 2012, before the Linux 3.7 with the server code) and comments. FastOpen appears to still be an experimental proposal: "Currently, TFO is an Internet Draft with the IETF. ... (The current implementation employs the TCP Experimental Option Number facility as a placeholder for a real TCP Option Number.)". From the comments, I would say that its success outside of Google is not certain. > > It appears that its main use case is repeated requests to webservers from browswers. This is because the latter often make *multiple* requests, often short, to the same site in order to construct a displayed web page. There is no time saving on the first request of a series. I suspect that after Google updates Chrome to use the new feature, one of the other 'independent' browsers is likely to be the next user. > > Yes, the protocol has been designed for situations where there are multiple requests such as HTTP or FTP. Probably only in these cases default 'True' option is appropriate. > > > To be active, the feature must be compiled into the socket code of both server and client machines AND must be explicitly requested by both client and server applications. > > On the server side, it must be requested because the request makes a promise that syn+data requests will be handled idempotently. (So the default should be 'off'.) This is trivial for static web pages but may require app-specific overhead for anything else. So, in general, the app should not bother being able to handle FastOpen unless it will be run on servers with FastOpen, and for efficiency, it should not add the overhead unless it is needed because a particular request is from a FastOpen client. > > If the server doesn't support FastOpen and receive a FastOpen request from a client capable, it simply ignores the TFO cookie and reply with a normal SYN+ACK. In this case the first packet (SYN+TFO from client) is only 4 byte larger than normal connection; no other packet is bigger than normal. So for an server app that does not support FastOpen, is completely transparent and does not cause any overhead. > > > > This is not a problem for Google, with thousands of duplicate apps running on duplicate server configurations. But it was not clear in the OPs post how a Python app would know for sure whether a particular machine is FastOpen capable. I did not see the question of how a server app would know about the client connection type even addressed. > > The server know the client connection type by the first packet that it sends: if the first packet coming by client is a SYN+TFO cookie the server proceed to generate cookie and continue with a FastOpen connection, if the first packet is a SYN, the server proceed with normal 3-handshake connection. In any case these operations are transparent both to Python that application because they're made by kernel. > > > On the client side, .connect and at least the first .send must be combined into either .sendto or .sendmsg (which?, still to be decided, apparently;-) with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite. If a particular server is not fastopen capable, then new fastopen client kernal socket code can potentially handle the fallback to the old way. But if the client is not fastopen capable, the the fallback must be handled in the Python .sendto code or else in the client code. (So one of those layers must *know* the client system capability.) > > As I said, if a client uses a .sendto or a .sendmsg with MSG_FASTOPEN on a server no-tfo capable, the linux kernel fallback to the old way, therefore it is as if it has done normal .connect and .send. The application don't know if the connection has been made in TFO-mode or normal mode and does not care to know. > > > Again, dealing with this, on multiple OSes, should be a lot easier for a monolithic browser like Chrome or Firefox (which might, on some systems, even use their own socket layer code), than for general purpose Python socket and app code. > > So my conclusion is that this is (mostly) premature for Python at this time. This is a slight performance enhancement of limited use that will make code at least slightly more complex in a core module that must be keep at least as rock solid as it is now. Let Google get it working on both their servers and Chrome browser. And wait for Mozilla, say, to add it to Firefox. Things might change before the first 3.4 beta, but I think 3.5 is more likely. Of course, testing will require all 4 combinations of client and server. > > We can introduce TFO only in some modules such as HTTP or FTP. The code is not really complex: for the server is only a .setsockopt before .listen and for the client we should replace the .connect and the first .send with a single .sendto or .sendmsg. > > > On Jan 10, 2013, at 10:46 PM, Guido van Rossum: > > Hopefully the OP has some sample Python code? > > Yes, it is pratically same as C, I attached examples (I needed to declare manually TCP and MSG constants because my glibc hasn't them yet). > > > Federico Reghenzani > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas For expetimentation I added a patch to gunicorn in the `featire/tcp_fast` branch: https://github.com/benoitc/gunicorn/pull/471 I expect to do the same in my restkit (http client lib) so i can test all together. So far this API can be interesting for internal purpose as well. - beno?t -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Jan 11 15:02:07 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Jan 2013 15:02:07 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: <50F01B5F.5060807@egenix.com> On 11.01.2013 03:45, Terry Reedy wrote: > So my conclusion is that this is (mostly) premature for Python at this time. This is a slight > performance enhancement of limited use that will make code at least slightly more complex in a core > module that must be keep at least as rock solid as it is now. Let Google get it working on both > their servers and Chrome browser. And wait for Mozilla, say, to add it to Firefox. Things might > change before the first 3.4 beta, but I think 3.5 is more likely. Of course, testing will require > all 4 combinations of client and server. Agreed. I also wonder how this relates to HTTP pipelining, a feature to improve the same multiple-requests-to-one-server situation. Pipelining has been implemented for years both on clients and servers, yet it is still turned off per default in e.g. Firefox: http://en.wikipedia.org/wiki/HTTP_pipelining There's also HTTP 2.0 on the horizon, so it may be better to what which of those technologies actually gets enough use in practice, before adding support to the Python library. That said, it may be useful to have a PyPI package which implements the FastOpen protocol in a separate socket implementation (which can then monkey itself into the stdlib, if the application developer wants this). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-01-22: Python Meeting Duesseldorf ... 11 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From rurpy at yahoo.com Fri Jan 11 18:16:05 2013 From: rurpy at yahoo.com (rurpy at yahoo.com) Date: Fri, 11 Jan 2013 09:16:05 -0800 (PST) Subject: [Python-ideas] csv dialect enhancement Message-ID: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> There is a common dialect of CSV, often used in database applications [*1], that distinguishes between an empty (quoted) string, e.g., the second field in "abc","",3 and an empty field, e.g., the second field in "abc",,3 This distinction is needed to specify or tell the difference between 0-length strings and NULLs, when sending csv data to or receiving it from a database application. AFAICT, Python's csv module does not distinguish between empty fields and empty quoted strings. Both of the examples above, when parsed by csv.Reader, will return ['abc', '', 3] (or possibly '3' for the last item depending on options). Similarly, csv.Writer produces the same output csv text (nothing or a quoted empty string depending on Dialect.quoting) for row items '' or None. csv.Reader could distinguish between the above cases by using an empty string ('') to report an empty (quoted) string field, and None to report an empty field. Thus the second example would produce ['abc', None, 3] (or ...,'3'). Similarly, csv.Writer could produce alternate text (nothing or a quoted empty string) depending on whether a row item was None or an empty string. I propose that a new dialect attribute be added, "nulls" [*2], which when false (default) will cause csv to behave as it currently does. When true it will have the following effect: Reader: When two adjacent delimiters occur, or two white-space separated delimiters when Dialect.skipinitialspace is true, a value of None will be returned for that field. Writer: When a None is present in the the list of items being formatted, it will result in an empty output field (two adjacent delimiters) regardless of other options (eg a QUOTE_ALL setting.) Sniffer: Will set "nulls" to True when both adjacent delimiters and quoted empty strings are seen in the input text. (Perhaps this behaviour needs to be optional for backward compatibility reasons?) I think this will allow the csv module to generate the csv dialect(s) commonly used by databases applications. A specific use case: I am migrating data from a MS Access database to Postgresql. I run a tool that extracts table data from Access and correctly produces CSV files in the dialect used by Postgresql with some (nullable) column values having empty fields and other non- nullable column values having empty string fields. But I need to modify some values before import. So I write a Python program that parses the csv data, modifies some of it and writes it back out, using the csv module. But the result is that all empty fields and empty strings are written out identically as one or the other (the distinction is not preserved). Result is that information is lost and the output cannot be used. I would be able to do this if the csv module provide a "nulls" option as proposed above. AFAICT, Python's csv module does not distinguish between empty fields and empty quoted strings. Both of the examples above, when parsed by csv.Reader, will return ['abc', '', 3] (or possibly '3' for the last item depending on options). Similarly, csv.Writer produces the same output csv text (nothing or a quoted empty string depending on Dialect.quoting) for row items '' or None. csv.Reader could distinguish between the above cases by using an empty string ('') to report an empty (quoted) string field, and None to report an empty field. Thus the second example would produce ['abc', None, 3] (or ...,'3'). Similarly, csv.Writer could produce alternate text (nothing or a quoted empty string) depending on whether a row item was None or an empty string. I propose that a new dialect attribute be added, "nulls" [*2], which when false (default) will cause csv to behave as it currently does. When true it will have the following effect: Reader: When two adjacent delimiters occur, or two white-space separated delimiters when Dialect.skipinitialspace is true, a value of None will be returned for that field. Writer: When a None is present in the the list of items being formatted, it will result in an empty output field (two adjacent delimiters) regardless of other options (eg a QUOTE_ALL setting.) Sniffer: Will set "nulls" to True when both adjacent delimiters and quoted empty strings are seen in the input text. (Perhaps this behaviour needs to be optional for backward compatibility reasons?) I think this will allow the csv module to generate the csv dialect(s) required for databases applications. A specific use case: I am migrating data from a MS Access database to Postgresql. I run a tool that extracts table data from Access and produces CSV files in the dialect used by Postgresql with some (nullable) column values having empty fields and other non-nullable column values having empty string fields. But I need to modify some values before import. So I write a Python program that parses the csv data, modifies some of it and writes it out, using the csv module. But the result is that all empty fields and empty strings are written out identically as one or the other (the distinction is not preserved). Result is that information is lost and the output cannot be used. I would be able to do this if the csv module provide a "nulls" option as proposed above. ---- [*1] One of the two most important open-source databases, Postgresql, uses this dialect. See: http://www.postgresql.org/docs/9.2/interactive/sql-copy.html#AEN66692 I don't know about the other. [*2] I don't really care what the attribute name is; I chose "nulls" as a trial balloon because I wanted to avoid something with "none" in it to avoid confusion with QUOTE_NONE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimjjewett at gmail.com Fri Jan 11 18:50:25 2013 From: jimjjewett at gmail.com (Jim Jewett) Date: Fri, 11 Jan 2013 12:50:25 -0500 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: On 1/11/13, Federico Reghenzani wrote: > On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy wrote: > Yes, the protocol has been designed for situations where there are multiple > requests such as HTTP or FTP. Probably only in these cases default > 'True' option is appropriate. What is the harm of using it in other situations? If the answer were truly just "4 bytes per host", then it might still be a good tradeoff. >> To be active, the feature must be compiled into the socket code of both >> server and client machines AND must be explicitly requested by both >> client and server applications. This, however, is a problem. Based on (most of) the rest of your descriptions, it sounds like a seamless drop-in replacement; it should be an implementation detail that applications never ever notice, like having a security patch applied to the operating system when python isn't even running. But if that were true, an explicit request would be overly cautious, unless this were truly still so experimental that production servers (and, thus, the python distribution in a default build) should not yet use it. Also note that if it isn't available on Windows (and probably even on Windows XP without additional dependencies), Python can't yet rely on it. Below, you also say that it is not appropriate for servers unless syn+data is idempotent -- but I don't know even what that means without looking it up, let alone whether it is true of my app -- so it sounds like a bug magnet. > The server know the client connection type by the first packet that it > sends: if the first packet coming by client is a SYN+TFO cookie the server > proceed to generate cookie and continue with a FastOpen connection, if the > first packet is a SYN, the server proceed with normal 3-handshake > connection. In any case these operations are transparent both to Python > that application because they're made by kernel. So how is this a python issue at all? Because of the explicit request? Because of the need to keep something idempotent? I see no harm in letting open accept and pass through additional optional arguments, or in a generic way to query the kernel for its extensions, but if you need something specific to this particular extension, then please do it as an external package first. >> On the client side, .connect and at least the first .send must be >> combined >> into either .sendto or .sendmsg (which?, still to be decided, >> apparently;-) >> with a new MSG_FASTOPEN argument. So programs need a non-trivial rewrite. Application programs, or just the plumbing in the httplib? -jJ From rurpy at yahoo.com Fri Jan 11 18:51:09 2013 From: rurpy at yahoo.com (rurpy at yahoo.com) Date: Fri, 11 Jan 2013 09:51:09 -0800 (PST) Subject: [Python-ideas] csv dialect enhancement (repost) In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> Message-ID: [Sorry for the duplicated text in the previous post, please ignore that one in favor of this one] There is a common dialect of CSV, often used in database applications [*1], that distinguishes between an empty (quoted) string, e.g., the second field in "abc","",3 and an empty field, e.g., the second field in "abc",,3 This distinction is needed to specify or tell the difference between 0-length strings and NULLs, when sending csv data to or receiving it from a database application. AFAICT, Python's csv module does not distinguish between empty fields and empty quoted strings. Both of the examples above, when parsed by csv.Reader, will return ['abc', '', 3] (or possibly '3' for the last item depending on options). Similarly, csv.Writer produces the same output csv text (nothing or a quoted empty string depending on Dialect.quoting) for row items '' or None. csv.Reader could distinguish between the above cases by using an empty string ('') to report an empty (quoted) string field, and None to report an empty field. Thus the second example would produce ['abc', None, 3] (or ...,'3'). Similarly, csv.Writer could produce alternate text (nothing or a quoted empty string) depending on whether a row item was None or an empty string. I propose that a new dialect attribute be added, "nulls" [*2], which when false (default) will cause csv to behave as it currently does. When true it will have the following effect: Reader: When two adjacent delimiters occur, or two white-space separated delimiters when Dialect.skipinitialspace is true, a value of None will be returned for that field. Writer: When a None is present in the the list of items being formatted, it will result in an empty output field (two adjacent delimiters) regardless of other options (eg a QUOTE_ALL setting.) Sniffer: Will set "nulls" to True when both adjacent delimiters and quoted empty strings are seen in the input text. (Perhaps this behaviour needs to be optional for backward compatibility reasons?) I think this will allow the csv module to generate the csv dialect(s) commonly used by databases applications. A specific use case: I am migrating data from a MS Access database to Postgresql. I run a tool that extracts table data from Access and correctly produces CSV files in the dialect used by Postgresql with some (nullable) column values having empty fields and other non- nullable column values having empty string fields. But I need to modify some values before import. So I write a Python program that parses the csv data, modifies some of it and writes it back out, using the csv module. But the result is that all empty fields and empty strings are written out identically as one or the other (the distinction is not preserved). Result is that information is lost and the output cannot be used. I would be able to do this if the csv module provide a "nulls" option as proposed above. ---- [*1] One of the two most important open-source databases, Postgresql, uses this dialect. See: http://www.postgresql.org/docs/9.2/interactive/sql-copy.html#AEN66692 I don't know about the other. [*2] I don't really care what the attribute name is; I chose "nulls" as a trial balloon because I wanted to avoid something with "none" in it to avoid confusion with QUOTE_NONE. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 11 18:49:27 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 11 Jan 2013 09:49:27 -0800 Subject: [Python-ideas] csv dialect enhancement In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> Message-ID: <50F050A7.1010409@stoneleaf.us> On 01/11/2013 09:16 AM, rurpy at yahoo.com wrote: > I propose that a new dialect attribute be added, "nulls", > which when false (default) will cause csv to behave as it currently > does. When true it will have the following effect: > > Reader: > When two adjacent delimiters occur, or two white-space > separated delimiters when Dialect.skipinitialspace is true, > a value of None will be returned for that field. > > Writer: > When a None is present in the the list of items being > formatted, it will result in an empty output field > (two adjacent delimiters) regardless of other options > (eg a QUOTE_ALL setting.) > > Sniffer: > Will set "nulls" to True when both adjacent delimiters and > quoted empty strings are seen in the input text. > (Perhaps this behaviour needs to be optional for backward > compatibility reasons?) +1 From yorik.sar at gmail.com Fri Jan 11 21:03:51 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Sat, 12 Jan 2013 00:03:51 +0400 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <50F01B5F.5060807@egenix.com> References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> <50F01B5F.5060807@egenix.com> Message-ID: On Fri, Jan 11, 2013 at 6:02 PM, M.-A. Lemburg wrote: > That said, it may be useful to have a PyPI package which implements > the FastOpen protocol in a separate socket implementation (which can > then monkey itself into the stdlib, if the application developer > wants this). > TCP Fast Open should be supported in client code directly, it's not enough to have socket() supporting it. It's not up to socket() implementation. Server-side is pretty simple, so to say "Python supports TCP_FASTOPEN" there should be support implemented for each (or most) client libraries in stdlib, such as almost every module in http://docs.python.org/3/library/internet.html Monkey-patching all these modules (or their connect() parts) is not very clean way, I think. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Fri Jan 11 23:12:21 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Fri, 11 Jan 2013 23:12:21 +0100 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> <50F01B5F.5060807@egenix.com> Message-ID: <50F08E45.5050602@egenix.com> On 11.01.2013 21:03, Yuriy Taraday wrote: > On Fri, Jan 11, 2013 at 6:02 PM, M.-A. Lemburg wrote: > >> That said, it may be useful to have a PyPI package which implements >> the FastOpen protocol in a separate socket implementation (which can >> then monkey itself into the stdlib, if the application developer >> wants this). >> > > TCP Fast Open should be supported in client code directly, it's not enough > to have socket() supporting it. It's not up to socket() implementation. Right, the new methods would have to be used by the application. > Server-side is pretty simple, so to say "Python supports TCP_FASTOPEN" > there should be support implemented for each (or most) client libraries in > stdlib, such as almost every module in > http://docs.python.org/3/library/internet.html > > Monkey-patching all these modules (or their connect() parts) is not very > clean way, I think. Of course not, but it's viable way to test drive such an implementation before putting the code directly into the stdlib modules. gevent uses the same approach, BTW. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 11 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2013-01-22: Python Meeting Duesseldorf ... 11 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From guido at python.org Fri Jan 11 23:23:00 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 11 Jan 2013 14:23:00 -0800 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: <50F08E45.5050602@egenix.com> References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> <50F01B5F.5060807@egenix.com> <50F08E45.5050602@egenix.com> Message-ID: So, again. Has *anyone* actually written *any* working Python code for this? -- --Guido van Rossum (python.org/~guido) From guido at python.org Sat Jan 12 00:41:05 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 11 Jan 2013 15:41:05 -0800 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation Message-ID: Here's an interesting puzzle. Check out the core of Tulip's event loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672 Specifically this does something like this: 1. poll for I/O, appending any ready handlers to the _ready queue 2. append any handlers scheduled for a time <= now to the _ready queue 3. while _ready: handler = _ready.popleft() call handler It is the latter loop that causes me some concern. In theory it is possible for a bad callback to make this loop never finish, as follows: def hogger(): tulip.get_event_loop().call_soon(hogger) Because call_soon() appends the handler to the _ready queue, the while loop will never finish. There is a simple enough solution (Tornado uses this AFAIK): now_ready = list(_ready) _ready.clear() for handler in now_ready: call handler However this implies that we go back to the I/O polling code more frequently. While the I/O polling code sets the timeout to zero when there's anything in the _ready queue, so it won't block, it still isn't free; it's an expensive system call that we'd like to put off until we have nothing better to do. I can imagine various patterns where handlers append other handlers to the _ready queue for immediate execution, and I'd make such patterns efficient (i.e. the user shouldn't have to worry about the cost of the I/O poll compared to the amount of work appended to the _ready queue). It is also convenient to say that a hogger that really wants to hog the CPU can do so anyway, e.g.: def hogger(): while True: pass However this would pretty much assume malice; real-life versions of the former hogger pattern may be spread across many callbacks and could be hard to recognize or anticipate. So what's more important? Avoid I/O starvation at all cost or make the callbacks-posting-callbacks pattern efficient? I can see several outcomes of this discussion: we could end up deciding that one or the other strategy is always best; we could also leave it up to the implementation (but then I still would want guidance for what to do in Tulip); we could even decide this is so important that the user needs to be able to control the policy here (though I hate having many configuration options, since in practice few people bother to take control, and you might as well have hard-coded the default...). Thoughts? Do I need to explain it better? -- --Guido van Rossum (python.org/~guido) From ronaldoussoren at mac.com Sat Jan 12 01:03:33 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 11 Jan 2013 16:03:33 -0800 Subject: [Python-ideas] TCP Fast Open protocol In-Reply-To: References: <20130110205541.GA1640@iskra.aviel.ru> <20130110211938.GB1640@iskra.aviel.ru> Message-ID: <8EEEA6D5-17A7-4FAF-9EEC-A0E6E15E3BBB@mac.com> On 11 Jan, 2013, at 9:50, Jim Jewett wrote: > On 1/11/13, Federico Reghenzani wrote: >> On Fri, Jan 11, 2013 at 3:45 AM, Terry Reedy wrote: > >> Yes, the protocol has been designed for situations where there are multiple >> requests such as HTTP or FTP. Probably only in these cases default >> 'True' option is appropriate. > > What is the harm of using it in other situations? If the answer were truly > just "4 bytes per host", then it might still be a good tradeoff. > >>> To be active, the feature must be compiled into the socket code of both >>> server and client machines AND must be explicitly requested by both >>> client and server applications. > > This, however, is a problem. > > Based on (most of) the rest of your descriptions, it sounds like a > seamless drop-in replacement; it should be an implementation detail > that applications never ever notice, like having a security patch > applied to the operating system when python isn't even running. > > But if that were true, an explicit request would be overly cautious, > unless this were truly still so experimental that production servers > (and, thus, the python distribution in a default build) should not yet > use it. It must be explictly requested by the server because the behavior might change, in particular the lwn.net page about this feature mentions that duplicate SYN messages are not detected, and if I parse that page correctly that might mean that the servers gets two or more requests when the connection is unreliable (or slow) and retransmission happens. That is fine for static webpages, but not if the client request has side effects (e.g. the server starts updates a database). BTW. This (linux-only) feature is very new, it would IMHO be useful to use this in real life with a package on PyPI that monkeypatches the stdlib before adding the feature to the stdlib. It is currently not clear if the option will be usefull in the long run. Ronald From zuo at chopin.edu.pl Sat Jan 12 02:28:25 2013 From: zuo at chopin.edu.pl (Jan Kaliszewski) Date: Sat, 12 Jan 2013 02:28:25 +0100 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: 12.01.2013 00:41, Guido van Rossum wrote: > Here's an interesting puzzle. Check out the core of Tulip's event > loop: > http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672 > > Specifically this does something like this: > > 1. poll for I/O, appending any ready handlers to the _ready queue > > 2. append any handlers scheduled for a time <= now to the _ready > queue > > 3. while _ready: > handler = _ready.popleft() > call handler > > It is the latter loop that causes me some concern. In theory it is > possible for a bad callback to make this loop never finish, as > follows: > > def hogger(): > tulip.get_event_loop().call_soon(hogger) > > Because call_soon() appends the handler to the _ready queue, the > while > loop will never finish. > > There is a simple enough solution (Tornado uses this AFAIK): > > now_ready = list(_ready) > _ready.clear() > for handler in now_ready: > call handler > > However this implies that we go back to the I/O polling code more > frequently. While the I/O polling code sets the timeout to zero when > there's anything in the _ready queue, so it won't block, it still > isn't free; it's an expensive system call that we'd like to put off > until we have nothing better to do. [...] > So what's more important? Avoid I/O starvation at all cost or make > the > callbacks-posting-callbacks pattern efficient? I can see several > outcomes of this discussion: we could end up deciding that one or the > other strategy is always best; we could also leave it up to the > implementation (but then I still would want guidance for what to do > in > Tulip); we could even decide this is so important that the user needs > to be able to control the policy here [...] Maybe it could be, at least for the standard Tulip implementation, parameterizable with a simple integer value -- the suggested max number of loop iterations? E.g. something like the following: # `suggested_iter_limit` is the parameter actual_limit = max(len(_ready), suggested_iter_limit) for i in range(actual_limit): if not _ready: break handler = _ready.popleft() call handler... Regards. *j From ncoghlan at gmail.com Sat Jan 12 04:08:14 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jan 2013 13:08:14 +1000 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 9:41 AM, Guido van Rossum wrote: > So what's more important? Avoid I/O starvation at all cost or make the > callbacks-posting-callbacks pattern efficient? I can see several > outcomes of this discussion: we could end up deciding that one or the > other strategy is always best; we could also leave it up to the > implementation (but then I still would want guidance for what to do in > Tulip); we could even decide this is so important that the user needs > to be able to control the policy here (though I hate having many > configuration options, since in practice few people bother to take > control, and you might as well have hard-coded the default...). > > Thoughts? Do I need to explain it better? Given the availability of "yield from" as a tool for efficiently invoking other asynchronous operations without hitting the event loop at all, it seems to me that it is more appropriate to avoid IO starvation by interleaving IO event processing and ready callback processing. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sat Jan 12 04:20:40 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jan 2013 13:20:40 +1000 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 1:08 PM, Nick Coghlan wrote: > On Sat, Jan 12, 2013 at 9:41 AM, Guido van Rossum wrote: >> So what's more important? Avoid I/O starvation at all cost or make the >> callbacks-posting-callbacks pattern efficient? I can see several >> outcomes of this discussion: we could end up deciding that one or the >> other strategy is always best; we could also leave it up to the >> implementation (but then I still would want guidance for what to do in >> Tulip); we could even decide this is so important that the user needs >> to be able to control the policy here (though I hate having many >> configuration options, since in practice few people bother to take >> control, and you might as well have hard-coded the default...). >> >> Thoughts? Do I need to explain it better? > > Given the availability of "yield from" as a tool for efficiently > invoking other asynchronous operations without hitting the event loop > at all, it seems to me that it is more appropriate to avoid IO > starvation by interleaving IO event processing and ready callback > processing. Oops, I meant to include a link to http://bugs.python.org/issue7946, which is about the convoy effect created by the GIL implementation when I/O bound threads are processed in the presence of a CPU bound thread (essentially, the I/O latency increases to the GIL check interval). (The thing that changed in 3.2 is that the magnitude of the convoy effect is now independent of the work-per-bytecode in the CPU bound thread) That's what makes me think always alternating between processing ready callbacks and checking for IO events is the right thing to do. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From robertc at robertcollins.net Sat Jan 12 06:06:22 2013 From: robertc at robertcollins.net (Robert Collins) Date: Sat, 12 Jan 2013 18:06:22 +1300 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On 12 January 2013 12:41, Guido van Rossum wrote: > Here's an interesting puzzle. Check out the core of Tulip's event > loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672 > now_ready = list(_ready) > _ready.clear() > for handler in now_ready: > call handler > > However this implies that we go back to the I/O polling code more > frequently. While the I/O polling code sets the timeout to zero when > there's anything in the _ready queue, so it won't block, it still > isn't free; it's an expensive system call that we'd like to put off > until we have nothing better to do. How expensive is it really? If its select, its terrible, but we shouldn't be using that anywhere. if its poll() it is moderately expensive, but still it doesn't scale - its linear with fd's. If its IO Completion ports in Windows, it is approximately free - the OS calls back into us every time we tell it we're ready for more events. And if its epoll it is also basically free, reading off of an event queue rather than checking every entry in the array. kqueue has similar efficiency, for BSD systems. I'd want to see some actual numbers before assuming that the call into epoll or completion is actually a driving factor in latency here. -Rob From stefan_ml at behnel.de Sat Jan 12 07:39:42 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sat, 12 Jan 2013 07:39:42 +0100 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: Jan Kaliszewski, 12.01.2013 02:28: > 12.01.2013 00:41, Guido van Rossum wrote: > >> Here's an interesting puzzle. Check out the core of Tulip's event >> loop: http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672 >> >> Specifically this does something like this: >> >> 1. poll for I/O, appending any ready handlers to the _ready queue >> >> 2. append any handlers scheduled for a time <= now to the _ready queue >> >> 3. while _ready: >> handler = _ready.popleft() >> call handler >> >> It is the latter loop that causes me some concern. In theory it is >> possible for a bad callback to make this loop never finish, as >> follows: >> >> def hogger(): >> tulip.get_event_loop().call_soon(hogger) >> >> Because call_soon() appends the handler to the _ready queue, the while >> loop will never finish. >> >> There is a simple enough solution (Tornado uses this AFAIK): >> >> now_ready = list(_ready) >> _ready.clear() >> for handler in now_ready: >> call handler >> >> However this implies that we go back to the I/O polling code more >> frequently. While the I/O polling code sets the timeout to zero when >> there's anything in the _ready queue, so it won't block, it still >> isn't free; it's an expensive system call that we'd like to put off >> until we have nothing better to do. > [...] >> So what's more important? Avoid I/O starvation at all cost or make the >> callbacks-posting-callbacks pattern efficient? I can see several >> outcomes of this discussion: we could end up deciding that one or the >> other strategy is always best; we could also leave it up to the >> implementation (but then I still would want guidance for what to do in >> Tulip); we could even decide this is so important that the user needs >> to be able to control the policy here > [...] > > Maybe it could be, at least for the standard Tulip implementation, > parameterizable with a simple integer value -- the suggested max number > of loop iterations? > > E.g. something like the following: > > # `suggested_iter_limit` is the parameter > actual_limit = max(len(_ready), suggested_iter_limit) > for i in range(actual_limit): > if not _ready: > break > handler = _ready.popleft() > call handler... Yep, it could simply use itertools.islice() when iterating over _ready with an appropriate upper bound factor relative to the actual length, and then cut down the list after the loop. So it would never go, say, 50% over the initially anticipated workload. Or rather a fixed number, I guess, to make it more predictable for users. That would be a user configurable parameter to the I/O loop. actual_limit = len(_ready) + max_additional_load_per_loop for handler in itertools.islice(_ready, None, actual_limit): call handler... del _ready[:actual_limit] Stefan From ncoghlan at gmail.com Sat Jan 12 10:53:27 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jan 2013 19:53:27 +1000 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 4:39 PM, Stefan Behnel wrote: > Yep, it could simply use itertools.islice() when iterating over _ready with > an appropriate upper bound factor relative to the actual length, and then > cut down the list after the loop. So it would never go, say, 50% over the > initially anticipated workload. Or rather a fixed number, I guess, to make > it more predictable for users. That would be a user configurable parameter > to the I/O loop. > > actual_limit = len(_ready) + max_additional_load_per_loop > for handler in itertools.islice(_ready, None, actual_limit): > call handler... > del _ready[:actual_limit] But do we need that in the reference loop? It seems like additional complexity when it has yet to be demonstrated that the simple solution of alternating processing of call_soon registrations with IO callbacks is inadequate. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Sat Jan 12 11:38:30 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 12 Jan 2013 11:38:30 +0100 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation References: Message-ID: <20130112113830.5b374f1b@pitrou.net> On Sat, 12 Jan 2013 19:53:27 +1000 Nick Coghlan wrote: > On Sat, Jan 12, 2013 at 4:39 PM, Stefan Behnel wrote: > > Yep, it could simply use itertools.islice() when iterating over _ready with > > an appropriate upper bound factor relative to the actual length, and then > > cut down the list after the loop. So it would never go, say, 50% over the > > initially anticipated workload. Or rather a fixed number, I guess, to make > > it more predictable for users. That would be a user configurable parameter > > to the I/O loop. > > > > actual_limit = len(_ready) + max_additional_load_per_loop > > for handler in itertools.islice(_ready, None, actual_limit): > > call handler... > > del _ready[:actual_limit] > > But do we need that in the reference loop? It seems like additional > complexity when it has yet to be demonstrated that the simple solution > of alternating processing of call_soon registrations with IO callbacks > is inadequate. Why do you talk about "reference loop"? It should be usable in production, not some kind of demonstration system that people will have to replace with a third-party library to get decent results. Regards Antoine. From ubershmekel at gmail.com Sat Jan 12 12:03:26 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sat, 12 Jan 2013 13:03:26 +0200 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 1:41 AM, Guido van Rossum wrote: > [...] def hogger(): > tulip.get_event_loop().call_soon(hogger) > > Because call_soon() appends the handler to the _ready queue, the while > loop will never finish. > > [...] > However this implies that we go back to the I/O polling code more > frequently. While the I/O polling code sets the timeout to zero when > there's anything in the _ready queue, so it won't block, it still > isn't free; it's an expensive system call that we'd like to put off > until we have nothing better to do. > > I can imagine various patterns where handlers append other handlers to > the _ready queue for immediate execution, and I'd make such patterns > efficient (i.e. the user shouldn't have to worry about the cost of the > I/O poll compared to the amount of work appended to the _ready queue). > > I read your statements as: * I don't want the user to cause IO starvation * I want the user to cause IO starvation Which means you have two options: * Make an opinionated decision that won't be perfect for everyone (not as bad as it sounds) * Allow configurability IMO core event loops need this configurability but not on a daily basis, e.g. Windows XP's event loop gave priority to the foreground process (i.e. UI events) and Windows Server 2003 gave priority to background processes. e.g. (warning unoptimized pseudocode follows) while True: for i in range(io_weight): pop_io() for i in range(event_weight): pop_ready() # note one of the weights can be zero -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 12 12:44:36 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 12 Jan 2013 21:44:36 +1000 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: <20130112113830.5b374f1b@pitrou.net> References: <20130112113830.5b374f1b@pitrou.net> Message-ID: On Sat, Jan 12, 2013 at 8:38 PM, Antoine Pitrou wrote: >> But do we need that in the reference loop? It seems like additional >> complexity when it has yet to be demonstrated that the simple solution >> of alternating processing of call_soon registrations with IO callbacks >> is inadequate. > > Why do you talk about "reference loop"? It should be usable in > production, not some kind of demonstration system that people will > have to replace with a third-party library to get decent results. I mean "reference loop" in the same sense that CPython is the "reference interpreter". You can get a lot more cool stuff by upgrading to, e.g. IPython, as your interactive interpreter, or using an interactive debugger other than pdb, but that doesn't mean all of those enhancements should be folded back into the core. In this case, we have a feature where there is a reasonable default behaviour (i.e. alternating between processing ready calls and checking for triggering of IO callbacks as Guido suggested), and no compelling evidence to justify a more complex solution. Ergo, the reference loop should use the simple approach, until such evidence is provided. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From nepenthesdev at gmail.com Sat Jan 12 14:01:40 2013 From: nepenthesdev at gmail.com (Markus) Date: Sat, 12 Jan 2013 14:01:40 +0100 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: Hi, On Sat, Jan 12, 2013 at 12:41 AM, Guido van Rossum wrote: > def hogger(): > tulip.get_event_loop().call_soon(hogger) > > Because call_soon() appends the handler to the _ready queue, the while > loop will never finish. Adding a poll-able descriptor to the the loop will eval it in the next iteration of the loop, so why make a difference with timers? Define call_soon to be called in the next iteration - not in the same. Basically every modification of the event loop should be evaluated in the next iteration, not the same. MfG Markus From ncoghlan at gmail.com Sat Jan 12 15:55:53 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jan 2013 00:55:53 +1000 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) Message-ID: I've started work on the PEP 432 implementation at https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits As part of that work, I'm also cleaning up some of the crazier things in the source tree layout, like "pythonrun" being this gigantic monolith covering interpreter initialisation, code execution and interpreter shutdown all in one file, as well as the source files for the application binaries being mixed in with the source files for standard library builtin and extension modules. This means I know I'm breaking the Windows builds. Rather than leaving that until the end, I'm looking for someone that's willing to take the changes from the "pep432_modular_bootstrap" in my sandbox repo, check what is needed to get them building on Windows, and then send me pull requests on BitBucket to fix them. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From dustin at v.igoro.us Sat Jan 12 16:03:40 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Sat, 12 Jan 2013 10:03:40 -0500 Subject: [Python-ideas] [Python-Dev] PEP 3156 - Asynchronous IO Support Rebooted In-Reply-To: References: Message-ID: On Wed, Jan 9, 2013 at 12:14 AM, Guido van Rossum wrote: > But which half? A socket is two independent streams, one in each > direction. Twisted uses half_close() for this concept but unless you > already know what this is for you are left wondering which half. Which > is why I like using 'write' in the name. FWIW, "half-closed" is, IMHO, a well-known term. It's not just a Twisted thing. Either name is better than "shutdown"! Dustin From dustin at v.igoro.us Sat Jan 12 18:08:13 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Sat, 12 Jan 2013 12:08:13 -0500 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 8:01 AM, Markus wrote: > Adding a poll-able descriptor to the the loop will eval it in the next > iteration of the loop, so why make a difference with timers? > Define call_soon to be called in the next iteration - not in the same. > > Basically every modification of the event loop should be evaluated in > the next iteration, not the same. We're looking for a "fair" scheduling algorithm here, and I think this describes it. Everything else should be an optimization from here. For example, if the event loop "detects" that it is CPU-bound, perhaps it skips some fraction of the relatively expensive calls to the IO check. I have no idea how to do such "detection" efficiently. Maybe just count the number of consecutive IO checks with timeout=0 that returned no actionable IO, and skip that number of checks. run_ready_queue() check_io(timeout=0) -> no IO, counter becomes 1 run_ready_queue() (skip 1) run_ready_queue() check_io(timeout=0) -> no IO, counter becomes 2 run_ready_queue() (skip 1) run_ready_queue() (skip 2) run_ready_queue() check_io(timeout=0) ... There would be some limits, of course, and this should probably be based on time, not cycles. It's not clear what to do when IO *does* occur -- divide the counter by 2? At any rate, this is an O(1) change to the the event loop that would get some interesting adaptive behavior, while still maintaining its fairness. IMHO the PEP should leave this unspecified, perhaps suggesting only that event loops have clear documentations regarding their fairness. Then users can select event loops based on their needs. Dustin From ben at bendarnell.com Sat Jan 12 18:18:45 2013 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 12 Jan 2013 12:18:45 -0500 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: On Fri, Jan 11, 2013 at 6:41 PM, Guido van Rossum wrote: > Here's an interesting puzzle. Check out the core of Tulip's event > loop: > http://code.google.com/p/tulip/source/browse/tulip/unix_events.py#672 > > Specifically this does something like this: > > 1. poll for I/O, appending any ready handlers to the _ready queue > > 2. append any handlers scheduled for a time <= now to the _ready queue > > 3. while _ready: > handler = _ready.popleft() > call handler > > It is the latter loop that causes me some concern. In theory it is > possible for a bad callback to make this loop never finish, as > follows: > > def hogger(): > tulip.get_event_loop().call_soon(hogger) > This is actually a useful pattern, not just a pathological "bad callback". If the function does some work before re-adding itself, it allows for better multitasking kind of like doing the work in another thread (with a starvation-free event loop). If the event loop starves IO in this case it's difficult to get this kind of non-blocking background execution (you have to use call_later with a non-zero timeout, slowing the work down unnecessarily). > > Because call_soon() appends the handler to the _ready queue, the while > loop will never finish. > > There is a simple enough solution (Tornado uses this AFAIK): > > now_ready = list(_ready) > _ready.clear() > for handler in now_ready: > call handler > > However this implies that we go back to the I/O polling code more > frequently. In isolation, yes. Under real-world load, it's less clear. A zero-timeout poll that has nothing to return is in some sense wasted work, but if there's other stuff going on then we may just change the timing of the poll calls rather than inserting additional ones. > > So what's more important? Avoid I/O starvation at all cost or make the > callbacks-posting-callbacks pattern efficient? I can see several > outcomes of this discussion: we could end up deciding that one or the > other strategy is always best; we could also leave it up to the > implementation (but then I still would want guidance for what to do in > Tulip); we could even decide this is so important that the user needs > to be able to control the policy here (though I hate having many > configuration options, since in practice few people bother to take > control, and you might as well have hard-coded the default...). > I'm not sure it's worth the complexity to offer both, so I'd be inclined to just have the starvation-free version. -Ben > > Thoughts? Do I need to explain it better? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From federico.dev at reghe.net Sat Jan 12 18:20:32 2013 From: federico.dev at reghe.net (Federico Reghenzani) Date: Sat, 12 Jan 2013 18:20:32 +0100 Subject: [Python-ideas] csv dialect enhancement (repost) In-Reply-To: References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> Message-ID: On Fri, Jan 11, 2013 at 6:51 PM, rurpy at yahoo.com wrote: > [Sorry for the duplicated text in the previous post, please ignore > that one in favor of this one] > > There is a common dialect of CSV, often used in database > applications [*1], that distinguishes between an empty > (quoted) string, How many DBMS have this dialect? e.g. MySQL want \N for null values, in other databases this is not even possible. Anyway I think that should be implemented because it may have different uses. > > [*2] I don't really care what the attribute name is; I chose > "nulls" as a trial balloon because I wanted to avoid something > with "none" in it to avoid confusion with QUOTE_NONE. > +1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jan 12 19:29:30 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 12 Jan 2013 10:29:30 -0800 Subject: [Python-ideas] Tulip / PEP 3156 event loop implementation question: CPU vs. I/O starvation In-Reply-To: References: Message-ID: Thanks all! It is clear what to do now. Run all those handlers that are currently ready but not those added during this run. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s at daniel.shahaf.name Sun Jan 13 01:25:04 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sun, 13 Jan 2013 02:25:04 +0200 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: References: Message-ID: <20130113002504.GT2956@lp-shahaf.local> Quick question, do you plan to expose the C argv values as part of this work? Issue #14208 asks for the full C argv array; my use-case today required only the C argv[0]. Both of the use-cases had to do with having a script reexecute itself (eg, 'os.execl(sys.executable, *args)'). I see a 'raw_argv' in the config struct, but I'm not sure if it'll be accessible to Python code. Cheers, Daniel Nick Coghlan wrote on Wed, Jan 02, 2013 at 21:40:26 +1000: > Configuring ``sys.argv`` > ------------------------ > > Unlike most other settings discussed in this PEP, ``sys.argv`` is not > set implicitly by ``Py_Initialize()``. Instead, it must be set via an > explicitly call to ``Py_SetArgv()``. > > CPython calls this in ``Py_Main()`` after calling ``Py_Initialize()``. The > calculation of ``sys.argv[1:]`` is straightforward: they're the command line > arguments passed after the script name or the argument to the ``-c`` or > ``-m`` options. > > The calculation of ``sys.argv[0]`` is a little more complicated: > > * For an ordinary script (source or bytecode), it will be the script name > * For a ``sys.path`` entry (typically a zipfile or directory) it will > initially be the zipfile or directory name, but will later be changed by > the ``runpy`` module to the full path to the imported ``__main__`` module. > * For a module specified with the ``-m`` switch, it will initially be the > string ``"-m"``, but will later be changed by the ``runpy`` module to the > full path to the executed module. > * For a package specified with the ``-m`` switch, it will initially be the > string ``"-m"``, but will later be changed by the ``runpy`` module to the > full path to the executed ``__main__`` submodule of the package. > * For a command executed with ``-c``, it will be the string ``"-c"`` > * For explicitly requested input from stdin, it will be the string ``"-"`` > * Otherwise, it will be the empty string > > Embedding applications must call Py_SetArgv themselves. The CPython logic > for doing so is part of ``Py_Main()`` and is not exposed separately. > However, the ``runpy`` module does provide roughly equivalent logic in > ``runpy.run_module`` and ``runpy.run_path``. > > > > Supported configuration settings > -------------------------------- > > The new ``Py_Config`` struct holds the settings required to complete the > interpreter configuration. All fields are either pointers to Python > data types (not set == ``NULL``) or numeric flags (not set == ``-1``):: > > /* Note: if changing anything in Py_Config, also update Py_Config_INIT */ > typedef struct { > /* Argument processing */ > PyList *raw_argv; > PyList *argv; > PyList *warnoptions; /* -W switch, PYTHONWARNINGS */ > PyDict *xoptions; /* -X switch */ > > /* Filesystem locations */ > PyUnicode *program_name; > PyUnicode *executable; > PyUnicode *prefix; /* PYTHONHOME */ > PyUnicode *exec_prefix; /* PYTHONHOME */ > PyUnicode *base_prefix; /* pyvenv.cfg */ > PyUnicode *base_exec_prefix; /* pyvenv.cfg */ > > /* Site module */ > int no_site; /* -S switch */ > int no_user_site; /* -s switch, PYTHONNOUSERSITE */ > > /* Import configuration */ > int dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */ > int ignore_module_case; /* PYTHONCASEOK */ > PyList *import_path; /* PYTHONPATH (etc) */ > > /* Standard streams */ > int use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */ > PyUnicode *stdin_encoding; /* PYTHONIOENCODING */ > PyUnicode *stdin_errors; /* PYTHONIOENCODING */ > PyUnicode *stdout_encoding; /* PYTHONIOENCODING */ > PyUnicode *stdout_errors; /* PYTHONIOENCODING */ > PyUnicode *stderr_encoding; /* PYTHONIOENCODING */ > PyUnicode *stderr_errors; /* PYTHONIOENCODING */ > > /* Filesystem access */ > PyUnicode *fs_encoding; > > /* Interactive interpreter */ > int stdin_is_interactive; /* Force interactive behaviour */ > int inspect_main; /* -i switch, PYTHONINSPECT */ > PyUnicode *startup_file; /* PYTHONSTARTUP */ > > /* Debugging output */ > int debug_parser; /* -d switch, PYTHONDEBUG */ > int verbosity; /* -v switch */ > int suppress_banner; /* -q switch */ > > /* Code generation */ > int bytes_warnings; /* -b switch */ > int optimize; /* -O switch */ > > /* Signal handling */ > int install_sig_handlers; > } Py_Config; From victor.stinner at gmail.com Sun Jan 13 01:43:55 2013 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 13 Jan 2013 01:43:55 +0100 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: <20130113002504.GT2956@lp-shahaf.local> References: <20130113002504.GT2956@lp-shahaf.local> Message-ID: 2013/1/13 Daniel Shahaf : > Quick question, do you plan to expose the C argv values as part of this > work? > > Issue #14208 asks for the full C argv array; my use-case today required > only the C argv[0]. Both of the use-cases had to do with having > a script reexecute itself (eg, 'os.execl(sys.executable, *args)'). I don't remember where, but it was already asked how is it possible to recreate the Python command line to create a subprocess with the Python command line options. For example: $ python -O -c "import sys; print(sys.argv)" ['-c'] How do you get ['python', '-O']? I guess that the question was for the multiprocessing on Windows (which does not support fork). Victor From ncoghlan at gmail.com Sun Jan 13 02:56:53 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jan 2013 11:56:53 +1000 Subject: [Python-ideas] Updated PEP 432: Simplifying the CPython update sequence In-Reply-To: <20130113002504.GT2956@lp-shahaf.local> References: <20130113002504.GT2956@lp-shahaf.local> Message-ID: It will be accessible. Currently planned spelling: sys._configuration.raw_argv Cheers, Nick. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian at python.org Sun Jan 13 02:59:14 2013 From: brian at python.org (Brian Curtin) Date: Sat, 12 Jan 2013 19:59:14 -0600 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan wrote: > I've started work on the PEP 432 implementation at > https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits > > As part of that work, I'm also cleaning up some of the crazier things > in the source tree layout, like "pythonrun" being this gigantic > monolith covering interpreter initialisation, code execution and > interpreter shutdown all in one file, as well as the source files for > the application binaries being mixed in with the source files for > standard library builtin and extension modules. > > This means I know I'm breaking the Windows builds. Rather than leaving > that until the end, I'm looking for someone that's willing to take the > changes from the "pep432_modular_bootstrap" in my sandbox repo, check > what is needed to get them building on Windows, and then send me pull > requests on BitBucket to fix them. I'll try to take a look within the next few days. From ncoghlan at gmail.com Sun Jan 13 03:15:35 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jan 2013 12:15:35 +1000 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 11:59 AM, Brian Curtin wrote: > On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan wrote: >> I've started work on the PEP 432 implementation at >> https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits >> >> As part of that work, I'm also cleaning up some of the crazier things >> in the source tree layout, like "pythonrun" being this gigantic >> monolith covering interpreter initialisation, code execution and >> interpreter shutdown all in one file, as well as the source files for >> the application binaries being mixed in with the source files for >> standard library builtin and extension modules. >> >> This means I know I'm breaking the Windows builds. Rather than leaving >> that until the end, I'm looking for someone that's willing to take the >> changes from the "pep432_modular_bootstrap" in my sandbox repo, check >> what is needed to get them building on Windows, and then send me pull >> requests on BitBucket to fix them. > > I'll try to take a look within the next few days. Richard Oudkerk has given me a patch at least for the VS 2010 files. (We discovered in the process that bitbucket only allows pull requests for forked repos back to their parent - no pull requests between sibling repos). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From g.brandl at gmx.net Sun Jan 13 11:31:54 2013 From: g.brandl at gmx.net (Georg Brandl) Date: Sun, 13 Jan 2013 11:31:54 +0100 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: Am 13.01.2013 03:15, schrieb Nick Coghlan: > On Sun, Jan 13, 2013 at 11:59 AM, Brian Curtin wrote: >> On Sat, Jan 12, 2013 at 8:55 AM, Nick Coghlan wrote: >>> I've started work on the PEP 432 implementation at >>> https://bitbucket.org/ncoghlan/cpython_sandbox/compare/pep432_modular_bootstrap..default#commits >>> >>> As part of that work, I'm also cleaning up some of the crazier things >>> in the source tree layout, like "pythonrun" being this gigantic >>> monolith covering interpreter initialisation, code execution and >>> interpreter shutdown all in one file, as well as the source files for >>> the application binaries being mixed in with the source files for >>> standard library builtin and extension modules. >>> >>> This means I know I'm breaking the Windows builds. Rather than leaving >>> that until the end, I'm looking for someone that's willing to take the >>> changes from the "pep432_modular_bootstrap" in my sandbox repo, check >>> what is needed to get them building on Windows, and then send me pull >>> requests on BitBucket to fix them. >> >> I'll try to take a look within the next few days. > > Richard Oudkerk has given me a patch at least for the VS 2010 files. > (We discovered in the process that bitbucket only allows pull requests > for forked repos back to their parent - no pull requests between > sibling repos). That sounds unfortunate -- did you open a report/feature request in their tracker? Georg From ncoghlan at gmail.com Sun Jan 13 11:48:05 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 13 Jan 2013 20:48:05 +1000 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 8:31 PM, Georg Brandl wrote: > Am 13.01.2013 03:15, schrieb Nick Coghlan: >> Richard Oudkerk has given me a patch at least for the VS 2010 files. >> (We discovered in the process that bitbucket only allows pull requests >> for forked repos back to their parent - no pull requests between >> sibling repos). > > That sounds unfortunate -- did you open a report/feature request in their > tracker? I hadn't, but I have now: https://bitbucket.org/site/master/issue/5968/allow-creation-of-pull-requests-between Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Sun Jan 13 19:53:51 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 13 Jan 2013 18:53:51 +0000 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: On 13 January 2013 02:15, Nick Coghlan wrote: > Richard Oudkerk has given me a patch at least for the VS 2010 files. > (We discovered in the process that bitbucket only allows pull requests > for forked repos back to their parent - no pull requests between > sibling repos). Looks like it's OK now - I just pulled your latest version and it built and ran all the tests fine. I didn't build the various extension modules that need external libraries, I assume they won't have changed. I can do if it would help, though. Couple of crashes in test_capi and test_faulthandler. I suspect those are expected, though. And one in test_urllib2 which I haven't investigated yet but I doubt is related to these changes. Paul From brian at python.org Sun Jan 13 19:57:16 2013 From: brian at python.org (Brian Curtin) Date: Sun, 13 Jan 2013 12:57:16 -0600 Subject: [Python-ideas] Windows assistance for PEP 432 (CPython startup sequence) In-Reply-To: References: Message-ID: On Sun, Jan 13, 2013 at 12:53 PM, Paul Moore wrote: > On 13 January 2013 02:15, Nick Coghlan wrote: >> Richard Oudkerk has given me a patch at least for the VS 2010 files. >> (We discovered in the process that bitbucket only allows pull requests >> for forked repos back to their parent - no pull requests between >> sibling repos). > > Looks like it's OK now - I just pulled your latest version and it > built and ran all the tests fine. I didn't build the various extension > modules that need external libraries, I assume they won't have > changed. I can do if it would help, though. > > Couple of crashes in test_capi and test_faulthandler. I suspect those > are expected, though. And one in test_urllib2 which I haven't > investigated yet but I doubt is related to these changes. The test_capi and test_faulthandler ones are expected but kind of a hassle on the desktop. There's an issue somewhere and I have a patch to make those work nicer, but there's a few ways we can go about it. On the buildbots those aren't a problem because we already remove and/or have a script that closes the crash dialogs. From rosuav at gmail.com Sun Jan 13 21:55:21 2013 From: rosuav at gmail.com (Chris Angelico) Date: Mon, 14 Jan 2013 07:55:21 +1100 Subject: [Python-ideas] csv dialect enhancement In-Reply-To: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> References: <0ee19141-547b-4182-880c-0d1b2a574af7@googlegroups.com> Message-ID: On Sat, Jan 12, 2013 at 4:16 AM, rurpy at yahoo.com wrote: > There is a common dialect of CSV, often used in database > applications [*1], that distinguishes between an empty > (quoted) string, > > e.g., the second field in "abc","",3 > > and an empty field, > > e.g., the second field in "abc",,3 > > This distinction is needed to specify or tell the > difference between 0-length strings and NULLs, when sending > csv data to or receiving it from a database application. Ugh, this is exactly the sort of thing that my boss didn't believe happened. He thinks that CSV is the same the world over, except for a few really old or arcane programs that can be completely ignored. Took a lot of arguing before we agreed to disagree on that one... As an explicitly-requestable dialect, looks good. > Sniffer: > Will set "nulls" to True when both adjacent delimiters and > quoted empty strings are seen in the input text. > (Perhaps this behaviour needs to be optional for backward > compatibility reasons?) Yes, and make it optional. I think the interpretation of ,,,, as empty strings is the more common, since CSV is often used in contexts that don't have a concept of NULL (spreadsheets mainly); this ought, then, to be the default, but one quick option can add recognition of this. So, +1 on the whole idea. ChrisA From p.f.moore at gmail.com Mon Jan 14 14:22:27 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Jan 2013 13:22:27 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob Message-ID: This may be simple enough to just be a feature request on the tracker, but I thought I'd post it here first to see if people thought it was a good idea. I'd like it if the glob module supported the (relatively common) facility to use ** to mean recursively search subdirectories. It's a reasonably straightforward patch, and offers a feature that is fairly difficult to implement in user code on top of the existing functionality. The syntax is supported in a number of places (for example the bash shell and things like Java Ant) so it will be relatively familiar to users. For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed). One obvious downside is that if used carelessly, it can make globbing pretty slow. So I'd propose that it be added as an optional extension enabled using a flag argument (glob(pat, allow_recursive=True)) which is false by default. That would also mean that backward compatibility should not be an issue. Any comments? Is this worth submitting a patch to the tracker? Paul. From ubershmekel at gmail.com Mon Jan 14 15:52:13 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 14 Jan 2013 16:52:13 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: http://bugs.python.org/issue13968 "Support recursive globs" Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From vinay_sajip at yahoo.co.uk Mon Jan 14 16:46:29 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Mon, 14 Jan 2013 15:46:29 +0000 (UTC) Subject: [Python-ideas] Adding '**' recursive search to glob.glob References: Message-ID: Paul Moore writes: > I'd like it if the glob module supported the (relatively common) > facility to use ** to mean recursively search subdirectories. It's a > reasonably straightforward patch, and offers a feature that is fairly > difficult to implement in user code on top of the existing > functionality. The syntax is supported in a number of places (for > example the bash shell and things like Java Ant) so it will be > relatively familiar to users. Agreed. This was in packaging/distutils2 and I have now got it in distlib [1]; it supports both recursive globs and variants using the {opt1,opt2,op3} syntax. > One obvious downside is that if used carelessly, it can make globbing > pretty slow. So I'd propose that it be added as an optional extension > enabled using a flag argument (glob(pat, allow_recursive=True)) which > is false by default. That would also mean that backward compatibility > should not be an issue. Isn't the requirement to recurse implied by the presence of '**' in the pattern? What's to be gained by specifying it using allow_recursive as well? Will having allow_recursive=True have any effect if '**' is not in the pattern? If you specify a pattern with '**' and allow_recursive=False, does that mean that '**' effectively acts as '*' would (i.e. one directory level only)? Regards, Vinay Sajip [1] https://bitbucket.org/vinay.sajip/distlib/src/29666/distlib/glob.py?at=default From p.f.moore at gmail.com Mon Jan 14 16:58:36 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Jan 2013 15:58:36 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14 January 2013 14:52, Yuval Greenfield wrote: > http://bugs.python.org/issue13968 > > "Support recursive globs" The time machine strikes again :-) I'll take a look at that tracker item. Paul From storchaka at gmail.com Mon Jan 14 17:14:55 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Jan 2013 18:14:55 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14.01.13 15:22, Paul Moore wrote: > This may be simple enough to just be a feature request on the tracker, > but I thought I'd post it here first to see if people thought it was a > good idea. There were several issues on tracker for this feature. Issue 13968 has almost ready patch (I should only protect recursive glob from infinite symlink loops). Except symlink loops the patch looks working and you can try it and make a review. I'm going to finish the work this week. > For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b > or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed). Or a/b. > One obvious downside is that if used carelessly, it can make globbing > pretty slow. So I'd propose that it be added as an optional extension > enabled using a flag argument (glob(pat, allow_recursive=True)) which > is false by default. That would also mean that backward compatibility > should not be an issue. Indeed. That's why I added the "recursive" parameter and disable this by default. From solipsis at pitrou.net Mon Jan 14 17:21:20 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 14 Jan 2013 17:21:20 +0100 Subject: [Python-ideas] Adding '**' recursive search to glob.glob References: Message-ID: <20130114172120.43c77a0d@pitrou.net> Le Mon, 14 Jan 2013 18:14:55 +0200, Serhiy Storchaka a ?crit : > > > One obvious downside is that if used carelessly, it can make > > globbing pretty slow. So I'd propose that it be added as an > > optional extension enabled using a flag argument (glob(pat, > > allow_recursive=True)) which is false by default. That would also > > mean that backward compatibility should not be an issue. > > Indeed. That's why I added the "recursive" parameter and disable this > by default. Using APIs carelessly is the user's problem, not ours. It should be sufficient to add a small warning in the docs, as I did in pathlib: https://pathlib.readthedocs.org/en/latest/#pathlib.Path.glob Regards Antoine. From storchaka at gmail.com Mon Jan 14 17:21:40 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Jan 2013 18:21:40 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14.01.13 17:46, Vinay Sajip wrote: > Isn't the requirement to recurse implied by the presence of '**' in the > pattern? What's to be gained by specifying it using allow_recursive as well? I'll be glad to make it enabled by default, however I'm feeling this is too dangerous. glob('**') on FS root takes too long time. Perhaps that's why (and for backward compatibility) this option (called "starglob") is disabled by default in Bash. > Will having allow_recursive=True have any effect if '**' is not in the > pattern? If you specify a pattern with '**' and allow_recursive=False, does > that mean that '**' effectively acts as '*' would (i.e. one directory level > only)? Yes, as now. From p.f.moore at gmail.com Mon Jan 14 17:25:59 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Jan 2013 16:25:59 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14 January 2013 16:14, Serhiy Storchaka wrote: > >> For people who don't know the syntax, "a/**/b" is equivalent to "a/*/b >> or a/*/*/b or a/*/*/*/b or ..." (for as many levels as needed). > > > Or a/b. Hmm, from my experiments, bash doesn't show a/b as matching the pattern a/**/b ... >> One obvious downside is that if used carelessly, it can make globbing >> pretty slow. So I'd propose that it be added as an optional extension >> enabled using a flag argument (glob(pat, allow_recursive=True)) which >> is false by default. That would also mean that backward compatibility >> should not be an issue. > > > Indeed. That's why I added the "recursive" parameter and disable this by > default. Although I can see Vinay's point, that ** is not useful syntax currently, so there's no compatibility problem. Careless use resulting in long glob times is more of a user issue. Having said that, this debate is *precisely* why I suggested making it a parameter in the first place, so people can choose for themselves. So I guess I agree with your decision :-) Paul. From storchaka at gmail.com Mon Jan 14 17:27:24 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Jan 2013 18:27:24 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <20130114172120.43c77a0d@pitrou.net> References: <20130114172120.43c77a0d@pitrou.net> Message-ID: On 14.01.13 18:21, Antoine Pitrou wrote: > Using APIs carelessly is the user's problem, not ours. It should be > sufficient to add a small warning in the docs, as I did in pathlib: We need a time machine to publish this warning in 1994, before anyone used the glob in his program. Pathlib has an advantage in this. From steve at pearwood.info Mon Jan 14 17:24:20 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 15 Jan 2013 03:24:20 +1100 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: <50F43134.6030902@pearwood.info> On 15/01/13 02:46, Vinay Sajip wrote: > Paul Moore writes: > >> I'd like it if the glob module supported the (relatively common) >> facility to use ** to mean recursively search subdirectories. +1 >> One obvious downside is that if used carelessly, it can make globbing >> pretty slow. So I'd propose that it be added as an optional extension >> enabled using a flag argument (glob(pat, allow_recursive=True)) which >> is false by default. That would also mean that backward compatibility >> should not be an issue. > > Isn't the requirement to recurse implied by the presence of '**' in the > pattern? What's to be gained by specifying it using allow_recursive as well? Not necessarily. At the moment, a glob like "/**/spam" is equivalent to "/*/spam": [steve at ando /]$ touch /tmp/spam [steve at ando /]$ mkdir /tmp/ham [steve at ando /]$ touch /tmp/ham/spam [steve at ando /]$ python3.3 -c "import glob; print(glob.glob('/**/spam'))" ['/tmp/spam'] With the suggested new functionality, the meaning of the glob will change. From a backwards-compatibility point of view, one might not want to enable the new semantics by default. But, from a *future*-compatibility point of view, I don't know that it is a good idea to require a flag every time a new globbing feature is added. glob.glob(pattern, allow_recurse=True, allow_spam=True, allow_ham=True, allow_eggs=True, ...) Rather than a flag, I suggest a version number: glob.glob(pattern, version=1) # current behaviour, as of 3.3 glob.glob(pattern, version=2) # adds ** recursion in Python 3.4 Then in Python 3.5 or 3.6 support for version 1 globs could be dropped. > Will having allow_recursive=True have any effect if '**' is not in the > pattern? I would expect that it will not have any effect unless ** is present. After all, it simply allows ** to recurse, and no other glob metacharacter can recurse. >If you specify a pattern with '**' and allow_recursive=False, does > that mean that '**' effectively acts as '*' would (i.e. one directory level > only)? I expect that without allow_recursive=True, ** would behave identically to a single * -- Steven From storchaka at gmail.com Mon Jan 14 17:33:42 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Jan 2013 18:33:42 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14.01.13 18:25, Paul Moore wrote: > Hmm, from my experiments, bash doesn't show a/b as matching the > pattern a/**/b ... $ shopt -s globstar $ echo Lib/**/test Lib/ctypes/test Lib/sqlite3/test Lib/test Lib/tkinter/test Lib/unittest/test From ubershmekel at gmail.com Mon Jan 14 17:39:12 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Mon, 14 Jan 2013 18:39:12 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: <20130114172120.43c77a0d@pitrou.net> Message-ID: On Mon, Jan 14, 2013 at 6:27 PM, Serhiy Storchaka wrote: > We need a time machine to publish this warning in 1994, before anyone used > the glob in his program. > > Pathlib has an advantage in this. > > The following have been discussed already: - deprecate the 'glob' module, moving its functionality to shutil - "starglob" or "use_recursive" option - have a separate "rglob" or "tree" function do this for you http://mail.python.org/pipermail/python-bugs-list/2012-February/thread.html#159056 And more at https://www.google.com/search?q=site%3Amail.python.org+recursive+glob The patch has been discussed to death already. Not to say that it's too late to speak your mind, but I think if it passes the proper tests and review - it should go in. Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Jan 14 17:41:46 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 14 Jan 2013 16:41:46 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: Message-ID: On 14 January 2013 16:33, Serhiy Storchaka wrote: > On 14.01.13 18:25, Paul Moore wrote: >> Hmm, from my experiments, bash doesn't show a/b as matching the >> pattern a/**/b ... > > $ shopt -s globstar > $ echo Lib/**/test > Lib/ctypes/test Lib/sqlite3/test Lib/test Lib/tkinter/test Lib/unittest/test Ah, thanks. I hadn't enabled globstar. See what happens when you let a Windows user near a Unix shell? :-) And the fact that globstar is an option gives some weight to having a globstar-like flag in the function signature. Sorry for the noise. Paul. From solipsis at pitrou.net Mon Jan 14 18:06:34 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 14 Jan 2013 18:06:34 +0100 Subject: [Python-ideas] Adding '**' recursive search to glob.glob References: Message-ID: <20130114180634.0da954c5@pitrou.net> Le Mon, 14 Jan 2013 18:21:40 +0200, Serhiy Storchaka a ?crit : > On 14.01.13 17:46, Vinay Sajip wrote: > > Isn't the requirement to recurse implied by the presence of '**' in > > the pattern? What's to be gained by specifying it using > > allow_recursive as well? > > I'll be glad to make it enabled by default, however I'm feeling this > is too dangerous. glob('**') on FS root takes too long time. Perhaps > that's why (and for backward compatibility) this option (called > "starglob") is disabled by default in Bash. But there's no reason to write glob('**') with the current API. Regards Antoine. From storchaka at gmail.com Mon Jan 14 21:26:49 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 14 Jan 2013 22:26:49 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <20130114180634.0da954c5@pitrou.net> References: <20130114180634.0da954c5@pitrou.net> Message-ID: On 14.01.13 19:06, Antoine Pitrou wrote: > But there's no reason to write glob('**') with the current API. There is a reason to write glob('*%s*' % escaped_substring). From greg.ewing at canterbury.ac.nz Mon Jan 14 22:38:25 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 15 Jan 2013 10:38:25 +1300 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F43134.6030902@pearwood.info> References: <50F43134.6030902@pearwood.info> Message-ID: <50F47AD1.2090904@canterbury.ac.nz> Steven D'Aprano wrote: > Rather than a flag, I suggest a version number: > > glob.glob(pattern, version=1) # current behaviour, as of 3.3 > glob.glob(pattern, version=2) # adds ** recursion in Python 3.4 Yuck, then the reader has to know what features are enabled by which version numbers -- not something that's easy to keep in one's head. -- Greg From bruce at leapyear.org Mon Jan 14 23:17:17 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 14 Jan 2013 14:17:17 -0800 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F47AD1.2090904@canterbury.ac.nz> References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> Message-ID: On Mon, Jan 14, 2013 at 1:38 PM, Greg Ewing wrote: > Steven D'Aprano wrote: > > Rather than a flag, I suggest a version number: >> >> glob.glob(pattern, version=1) # current behaviour, as of 3.3 >> glob.glob(pattern, version=2) # adds ** recursion in Python 3.4 >> > > Yuck, then the reader has to know what features are > enabled by which version numbers -- not something that's > easy to keep in one's head. And if you write glob.glob(..., foofeature=True) it will automatically raise an exception if you use it in a version that doesn't support the feature rather than silently ignoring the error. --- Bruce Check this out: http://bit.ly/yearofpuzzles -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jan 15 02:31:56 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 15 Jan 2013 12:31:56 +1100 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F47AD1.2090904@canterbury.ac.nz> References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> Message-ID: <50F4B18C.2020602@pearwood.info> On 15/01/13 08:38, Greg Ewing wrote: > Steven D'Aprano wrote: > >> Rather than a flag, I suggest a version number: >> >> glob.glob(pattern, version=1) # current behaviour, as of 3.3 >> glob.glob(pattern, version=2) # adds ** recursion in Python 3.4 > > Yuck, then the reader has to know what features are > enabled by which version numbers -- not something that's > easy to keep in one's head. True. But neither are a plethora of enable_feature flags. Is it allow_recursion or allow_recursive or enable_double_star? Globbing is not likely to be something that most people use often enough that the name of the arguments will stick in their head. People will likely need to look it up one way or the other. All this assumes that we need to care about backward compatibility of ** in existing globs. It does seem to be an unlikely thing for people to write. If we don't, then no need for a flag at all. Instead, we could raise a warning for globs with ** in 3.4, and then drop the warning in 3.5. Another option, is a new function. Bool parameters that do nothing but change the behaviour of a function are somewhat of a mild anti-pattern. Perhaps it is better to just keep glob.glob as is, and add glob.recglob or rglob to support **. -- Steven From python at mrabarnett.plus.com Tue Jan 15 04:20:00 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 15 Jan 2013 03:20:00 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F4B18C.2020602@pearwood.info> References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info> Message-ID: <50F4CAE0.2000803@mrabarnett.plus.com> On 2013-01-15 01:31, Steven D'Aprano wrote: > On 15/01/13 08:38, Greg Ewing wrote: >> Steven D'Aprano wrote: >> >>> Rather than a flag, I suggest a version number: >>> >>> glob.glob(pattern, version=1) # current behaviour, as of 3.3 >>> glob.glob(pattern, version=2) # adds ** recursion in Python 3.4 >> >> Yuck, then the reader has to know what features are >> enabled by which version numbers -- not something that's >> easy to keep in one's head. > > > True. But neither are a plethora of enable_feature flags. Is it > allow_recursion or allow_recursive or enable_double_star? Globbing > is not likely to be something that most people use often enough that > the name of the arguments will stick in their head. People will > likely need to look it up one way or the other. > > All this assumes that we need to care about backward compatibility > of ** in existing globs. It does seem to be an unlikely thing for > people to write. If we don't, then no need for a flag at all. > Instead, we could raise a warning for globs with ** in 3.4, and > then drop the warning in 3.5. > > Another option, is a new function. Bool parameters that do nothing > but change the behaviour of a function are somewhat of a mild > anti-pattern. Perhaps it is better to just keep glob.glob as is, > and add glob.recglob or rglob to support **. > If there's rglob, then shouldn't there also be riglob or irglob? If so, then which one? :-) From bruce at leapyear.org Tue Jan 15 05:03:08 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 14 Jan 2013 20:03:08 -0800 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F4B18C.2020602@pearwood.info> References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info> Message-ID: On Mon, Jan 14, 2013 at 5:31 PM, Steven D'Aprano wrote: > Yuck, then the reader has to know what features are >> enabled by which version numbers -- not something that's >> easy to keep in one's head. >> > > > True. But neither are a plethora of enable_feature flags. Is it > allow_recursion or allow_recursive or enable_double_star? Globbing > is not likely to be something that most people use often enough that > the name of the arguments will stick in their head. People will > likely need to look it up one way or the other. > I see nothing wrong with asking people to consult the documentation for features they don't use that frequently. Better to check the docs than get it wrong. But the reader of the code is more likely to notice something special is going on when they see glob(..., allow_recursive=True) then rglob(...). And I'd rather have flags than rglob to allow recursion and iglob to ignore case and then either riglob or irglob to do both. Yuck. --- Bruce Check this out: http://bit.ly/yearofpuzzles -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Tue Jan 15 06:15:20 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Tue, 15 Jan 2013 07:15:20 +0200 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info> Message-ID: On Tue, Jan 15, 2013 at 6:03 AM, Bruce Leban wrote: > > [...] and iglob to ignore case and [....] > > > OT - iglob is the iterator version of glob. perhaps in python 2 this should have been called "xglob". In python 3 it should have been just "glob". >>> rglob('**.py') or >>> glob('**.py', True) I don't mind either, though I think the first one is a bit clearer because "r" is more telling than "True". Don't mention glob('**.py', allow_recursive=True) because that's probably not going to be the norm. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 15 07:33:17 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 15 Jan 2013 16:33:17 +1000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: <50F4B18C.2020602@pearwood.info> References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info> Message-ID: On Tue, Jan 15, 2013 at 11:31 AM, Steven D'Aprano wrote: > All this assumes that we need to care about backward compatibility > of ** in existing globs. It does seem to be an unlikely thing for > people to write. If we don't, then no need for a flag at all. > Instead, we could raise a warning for globs with ** in 3.4, and > then drop the warning in 3.5. > > Another option, is a new function. Bool parameters that do nothing > but change the behaviour of a function are somewhat of a mild > anti-pattern. Perhaps it is better to just keep glob.glob as is, > and add glob.recglob or rglob to support **. Making boolean parameters less awful from a readability perspective is part of the rationale for keyword-only arguments: they force you to include the parameter name, thus making the call self-documenting. In this case, the conservative backwards compatible migration path would be: In 3.4: - add the recursive globbing capability - add "allow_recursive=None" as a keyword-only argument - emit a DeprecationWarning if the double-star pattern is seen when allow_recursive is None (but not when it is explicitly False) In 3.5: - switch the allow_recursive default value to True - drop the deprecation warning Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jeremy at jeremysanders.net Tue Jan 15 09:36:53 2013 From: jeremy at jeremysanders.net (Jeremy Sanders) Date: Tue, 15 Jan 2013 08:36:53 +0000 Subject: [Python-ideas] Adding '**' recursive search to glob.glob References: Message-ID: Vinay Sajip wrote: > Isn't the requirement to recurse implied by the presence of '**' in the > pattern? What's to be gained by specifying it using allow_recursive as > well? Will having allow_recursive=True have any effect if '**' is not in > the pattern? If you specify a pattern with '**' and allow_recursive=False, > does that mean that '**' effectively acts as '*' would (i.e. one directory > level only)? The glob string may come from the user or a remote source. It is possible that developer using glob has never considered "**" might be added, leading to an attacker accessing files in directories they are not allowed to, or DoS attacks because glob becomes very slow. Jeremy From ethan at stoneleaf.us Tue Jan 15 18:00:30 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 15 Jan 2013 09:00:30 -0800 Subject: [Python-ideas] Adding '**' recursive search to glob.glob In-Reply-To: References: <50F43134.6030902@pearwood.info> <50F47AD1.2090904@canterbury.ac.nz> <50F4B18C.2020602@pearwood.info> Message-ID: <50F58B2E.402@stoneleaf.us> On 01/14/2013 09:15 PM, Yuval Greenfield wrote: > >>> rglob('**.py') > > or > > >>> glob('**.py', True) > > I don't mind either, though I think the first one is a bit clearer > because "r" is more telling than "True". Don't mention glob('**.py', > allow_recursive=True) because that's probably not going to be the norm. If `allow_recursive` is a keyword-only parameter it will be the norm. :) ~Ethan~ From tarek at ziade.org Wed Jan 16 11:30:22 2013 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jan 2013 11:30:22 +0100 Subject: [Python-ideas] Parametrized any() and all() ? Message-ID: <50F6813E.60503@ziade.org> Hello any() and all() are very useful small functions, and I am wondering if it could be interesting to have them work with different operators, by using a callable. e.g. something like: import operator def any(iterable, filter=operator.truth): for element in iterable: if filter(element): return True return False For instance I could then us any() to find out if there's a None in the sequence: if any(iterable, op=lambda x: x is None): raise SomeError("There's a none in that list") Granted, it's easy to do it myself in a small util function - but since any() and all() are in Python... Cheers Tarek -- Tarek Ziad? ? http://ziade.org ? @tarek_ziade -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Wed Jan 16 11:33:41 2013 From: _ at lvh.cc (Laurens Van Houtven) Date: Wed, 16 Jan 2013 11:33:41 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6813E.60503@ziade.org> References: <50F6813E.60503@ziade.org> Message-ID: Hey Tarek, I would write that as any(x is None for x in it) -- the example you gave doesn't really strike me as an improvement over that, although I could see how many there are cases where it's nicer... On Wed, Jan 16, 2013 at 11:30 AM, Tarek Ziad? wrote: > Hello > > any() and all() are very useful small functions, and I am wondering if it > could be interesting to have them work > with different operators, by using a callable. > > e.g. something like: > > import operator > > def any(iterable, filter=operator.truth): > for element in iterable: > if filter(element): > return True > return False > > > For instance I could then us any() to find out if there's a None in the > sequence: > > if any(iterable, op=lambda x: x is None): > raise SomeError("There's a none in that list") > > > Granted, it's easy to do it myself in a small util function - but since > any() and all() are in Python... > > > Cheers > Tarek > > -- > Tarek Ziad? ? http://ziade.org ? @tarek_ziade > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Wed Jan 16 11:37:21 2013 From: songofacandy at gmail.com (INADA Naoki) Date: Wed, 16 Jan 2013 19:37:21 +0900 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> Message-ID: I think adding this example to docstring and document may help many people. On Wed, Jan 16, 2013 at 7:33 PM, Laurens Van Houtven <_ at lvh.cc> wrote: > Hey Tarek, > > I would write that as any(x is None for x in it) -- the example you gave > doesn't really strike me as an improvement over that, although I could see > how many there are cases where it's nicer... > > > On Wed, Jan 16, 2013 at 11:30 AM, Tarek Ziad? wrote: > >> Hello >> >> any() and all() are very useful small functions, and I am wondering if it >> could be interesting to have them work >> with different operators, by using a callable. >> >> e.g. something like: >> >> import operator >> >> def any(iterable, filter=operator.truth): >> for element in iterable: >> if filter(element): >> return True >> return False >> >> >> For instance I could then us any() to find out if there's a None in the >> sequence: >> >> if any(iterable, op=lambda x: x is None): >> raise SomeError("There's a none in that list") >> >> >> Granted, it's easy to do it myself in a small util function - but since >> any() and all() are in Python... >> >> >> Cheers >> Tarek >> >> -- >> Tarek Ziad? ? http://ziade.org ? @tarek_ziade >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> >> > > > -- > cheers > lvh > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- INADA Naoki -------------- next part -------------- An HTML attachment was scrubbed... URL: From tarek at ziade.org Wed Jan 16 11:44:13 2013 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Wed, 16 Jan 2013 11:44:13 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> Message-ID: <50F6847D.2020404@ziade.org> On 1/16/13 11:33 AM, Laurens Van Houtven wrote: > Hey Tarek, > > I would write that as any(x is None for x in it) But here you're building yet another iterable to adapt it to any(), which seems to me overkill if we can just parametrized the loop in any() Cheers Tarek -- Tarek Ziad? ? http://ziade.org ? @tarek_ziade From masklinn at masklinn.net Wed Jan 16 12:08:55 2013 From: masklinn at masklinn.net (Masklinn) Date: Wed, 16 Jan 2013 12:08:55 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6847D.2020404@ziade.org> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> Message-ID: <93F3FFBC-F145-4956-9512-04DF46A0E14C@masklinn.net> On 2013-01-16, at 11:44 , Tarek Ziad? wrote: > On 1/16/13 11:33 AM, Laurens Van Houtven wrote: >> Hey Tarek, >> >> I would write that as any(x is None for x in it) > > But here you're building yet another iterable to adapt it to any(), which seems to me overkill if we can just parametrized the loop in any() It's just a generator, and will be terminated early if possible. I'm pretty sure adding a key function to any and all has already been submitted several times, and from what I remember it was struck down every time because the use case is covered by Laurens's suggestion: key functions are necessary when you'd otherwise need DSU (because the result is the original input, not the key function's output) but it's not the case for any() and all() Here's the previous/latest instance: http://mail.python.org/pipermail/python-ideas/2012-July/015837.html From ncoghlan at gmail.com Wed Jan 16 12:10:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 16 Jan 2013 21:10:00 +1000 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6847D.2020404@ziade.org> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> Message-ID: On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad? wrote: > On 1/16/13 11:33 AM, Laurens Van Houtven wrote: >> >> Hey Tarek, >> >> I would write that as any(x is None for x in it) > > > But here you're building yet another iterable to adapt it to any(), which > seems to me overkill if we can just parametrized the loop in any() Such a micro-optimization isn't worth the cost of adding a second way to do it that everyone will then need to learn. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From oscar.j.benjamin at gmail.com Wed Jan 16 12:20:54 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 16 Jan 2013 11:20:54 +0000 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6813E.60503@ziade.org> References: <50F6813E.60503@ziade.org> Message-ID: On 16 January 2013 10:30, Tarek Ziad? wrote: > Hello > > any() and all() are very useful small functions, and I am wondering if it > could be interesting to have them work > with different operators, by using a callable. > > e.g. something like: > > import operator > > def any(iterable, filter=operator.truth): > for element in iterable: > if filter(element): > return True > return False > > > For instance I could then us any() to find out if there's a None in the > sequence: > > if any(iterable, op=lambda x: x is None): > raise SomeError("There's a none in that list") > > > Granted, it's easy to do it myself in a small util function - but since > any() and all() are in Python... I wouldn't write a util function for this. The resulting code any(iterable, op=func) is not really shorter, easier or clearer than the current methods any(map(func, iterable)) any(func(x) for x in iterable) Oscar From steve at pearwood.info Wed Jan 16 15:10:32 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 17 Jan 2013 01:10:32 +1100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> Message-ID: <50F6B4D8.6070002@pearwood.info> On 16/01/13 22:10, Nick Coghlan wrote: > On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad? wrote: >> On 1/16/13 11:33 AM, Laurens Van Houtven wrote: >>> >>> Hey Tarek, >>> >>> I would write that as any(x is None for x in it) >> >> >> But here you're building yet another iterable to adapt it to any(), which >> seems to me overkill if we can just parametrized the loop in any() > > Such a micro-optimization isn't worth the cost of adding a second way > to do it that everyone will then need to learn. For all we know, adding a filter function will be a pessimization, not an optimization, using more memory and/or being slower than using a generator expression. It certainly isn't clear to me that creating a generator expression like (x is None for x in it) is more expensive than creating a filter function like (lambda x: x is None). -1 on adding a filter function. -- Steven From tarek at ziade.org Wed Jan 16 15:52:19 2013 From: tarek at ziade.org (=?UTF-8?B?VGFyZWsgWmlhZMOp?=) Date: Wed, 16 Jan 2013 15:52:19 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6B4D8.6070002@pearwood.info> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> Message-ID: <50F6BEA3.7090807@ziade.org> On 1/16/13 3:10 PM, Steven D'Aprano wrote: > On 16/01/13 22:10, Nick Coghlan wrote: >> On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad? wrote: >>> On 1/16/13 11:33 AM, Laurens Van Houtven wrote: >>>> >>>> Hey Tarek, >>>> >>>> I would write that as any(x is None for x in it) >>> >>> >>> But here you're building yet another iterable to adapt it to any(), >>> which >>> seems to me overkill if we can just parametrized the loop in any() >> >> Such a micro-optimization isn't worth the cost of adding a second way >> to do it that everyone will then need to learn. > > > For all > we know, adding a filter function will be a pessimization, not an > optimization, > using more memory and/or being slower than using a generator > expression. It > certainly isn't clear to me that creating a generator expression like > (x is None for x in it) is more expensive than creating a filter > function like > (lambda x: x is None). > > -1 on adding a filter function. I abandoned the idea, but I'd be curious to understand how creating several iterables with one that has an 'if', can be more efficient than having a single iterable with an 'if'... > > > -- Tarek Ziad? ? http://ziade.org ? @tarek_ziade From p.f.moore at gmail.com Wed Jan 16 16:12:16 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Jan 2013 15:12:16 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events Message-ID: I've so far been lurking on the tulip/async discussions, as although I'm interested, I have no specific need for writing high-performance network code. However, I hit a use case today which seems to me to be ideal for an async-style approach, and yet I don't think it's covered by the current PEP. Specifically, I am looking at monitoring a subprocess.Popen object. This is basically an IO loop, but monitoring the 3 pipes to the subprocess (well, only stdout and stderr in my case...). Something like add_reader/add_writer would be fine, except for the fact that (a) they are documented as low-level not for the user, and (b) they don't work in all cases (e.g. in a select-based loop on Windows). I'd like PEP 3156 to include some support for waiting on IO from (one or more) subprocesses like this in a cross-platform way. If there's something in there to do this at the moment, that's great, but it wasn't obvious to me when I looked... Paul. From eliben at gmail.com Wed Jan 16 16:12:05 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 16 Jan 2013 07:12:05 -0800 Subject: [Python-ideas] question about the Tulip effort Message-ID: Hi, I've been reading PEP 3156 and looking at the reference implementation ( http://code.google.com/p/tulip/). I'll be happy to contribute to the effort, and following are a couple of questions on how to do that. 1. Questions and clarifications should be sent to this list (python-ideas), correct? 2. Is there a list of tasks help would be needed with? Is it the the TODO file in tulip's root dir? 3. How/where to contribute patches? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From ronaldoussoren at mac.com Wed Jan 16 16:13:15 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Wed, 16 Jan 2013 16:13:15 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F6BEA3.7090807@ziade.org> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> Message-ID: On 16 Jan, 2013, at 15:52, Tarek Ziad? wrote: > On 1/16/13 3:10 PM, Steven D'Aprano wrote: >> On 16/01/13 22:10, Nick Coghlan wrote: >>> On Wed, Jan 16, 2013 at 8:44 PM, Tarek Ziad? wrote: >>>> On 1/16/13 11:33 AM, Laurens Van Houtven wrote: >>>>> >>>>> Hey Tarek, >>>>> >>>>> I would write that as any(x is None for x in it) >>>> >>>> >>>> But here you're building yet another iterable to adapt it to any(), which >>>> seems to me overkill if we can just parametrized the loop in any() >>> >>> Such a micro-optimization isn't worth the cost of adding a second way >>> to do it that everyone will then need to learn. >> >> >> For all >> we know, adding a filter function will be a pessimization, not an optimization, >> using more memory and/or being slower than using a generator expression. It >> certainly isn't clear to me that creating a generator expression like >> (x is None for x in it) is more expensive than creating a filter function like >> (lambda x: x is None). >> >> -1 on adding a filter function. > > I abandoned the idea, > > but I'd be curious to understand how creating several > iterables with one that has an 'if', can be more efficient than having a single > iterable with an 'if'... Have you any reason to assume that "any(x is None for x in it)" is slow? I wouldn't be surprised if a key argument for any/all would have a higher overhead than the generator expression (if there is any difference). The key function would have to be called after all, with the overhead of normal function calls. Ronald > > >> >> >> > > > -- > Tarek Ziad? ? http://ziade.org ? @tarek_ziade > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From guido at python.org Wed Jan 16 18:52:57 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 09:52:57 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore wrote: > I've so far been lurking on the tulip/async discussions, as although > I'm interested, I have no specific need for writing high-performance > network code. > > However, I hit a use case today which seems to me to be ideal for an > async-style approach, and yet I don't think it's covered by the > current PEP. Specifically, I am looking at monitoring a > subprocess.Popen object. This is basically an IO loop, but monitoring > the 3 pipes to the subprocess (well, only stdout and stderr in my > case...). Something like add_reader/add_writer would be fine, except > for the fact that (a) they are documented as low-level not for the > user, and (b) they don't work in all cases (e.g. in a select-based > loop on Windows). > > I'd like PEP 3156 to include some support for waiting on IO from (one > or more) subprocesses like this in a cross-platform way. If there's > something in there to do this at the moment, that's great, but it > wasn't obvious to me when I looked... This is a great use case. The right approach would probably be to define a new Transport (and an event loop method to create one) that wraps pipes going into and out of a subprocess. The new method would have a standard API (probably similar to that of subprocess), whereas there would be different implementations of the Transport based on platform and event loop implementation (similar to the way the subprocess module has quite different implementations). Can you check out the Tulip source code (code.google.com/p/tulip) and come up with a patch to do this? I'll gladly review it. It's fine to only cover the UNIX case for now. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Wed Jan 16 18:59:55 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Jan 2013 17:59:55 +0000 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no> Message-ID: On 3 November 2012 21:20, Richard Oudkerk wrote: > The IOCP proactor does not support ssl (or ipv6) so main.py does not succeed > in downloading from xkcd.com using ssl. Using the other proactors it works > correctly. > > The basic interface for the proactor looks like > > class Proactor: > def recv(self, sock, n): ... > def send(self, sock, buf): ... > def connect(self, sock, address): ... > def accept(self, sock): ... > > def poll(self, timeout=None): ... > def pollable(self): ... I've just been looking at this, and from what I can see, am I right in thinking that the IOCP support is *only* for sockets? (I'm not very familiar with socket programming, so I had a bit of difficulty following the code). In particular, it can't be used to register non-socket file objects? From my understanding of the IOCP documentation on MSDN, this is fundamental - IOCP can only be used on HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED flag, which is not used by "normal" Python IO objects like file handles and pipes, so it will never be possible to poll these objects using IOCP. Just trying to make sure I understand the scope of this work... Paul From guido at python.org Wed Jan 16 19:07:15 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 10:07:15 -0800 Subject: [Python-ideas] question about the Tulip effort In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 7:12 AM, Eli Bendersky wrote: > I've been reading PEP 3156 and looking at the reference implementation > (http://code.google.com/p/tulip/). I'll be happy to contribute to the > effort, and following are a couple of questions on how to do that. And I'd be happy to have your help! > 1. Questions and clarifications should be sent to this list (python-ideas), > correct? Yes, unless you think it's of little public value, you can always mail me directly (Tulip is my top priority until the PEP is accepted and Tulip lands in the 3.4 stdlib). > 2. Is there a list of tasks help would be needed with? Is it the the TODO > file in tulip's root dir? Hm, that's mostly reminders for myself, and I don't always update it. There are also lots of TODOs and XXXs in the source code (the XXXs mark things that are *definitely* in need of fixing, like missing docstrings; TODOs are often just for pondering). You can certainly read through it, and if you see a task you would like to do, ping me for details. Some tasks that I don't think are represented well but where I would love to get help: - Write a somewhat significant server app. I have a somewhat significant client app (crawl.py) but nothing that exercises the server API at all. I suspect that there are some awkward things in the server API that will need fixing. - Try writing a significant app for a protocol other than HTTP. - Move the StreamReader class out of http_client.py and design an API to make it easy to hook it up to any protocol. - Datagram support (read the section in the PEP on this topic first). > 3. How/where to contribute patches? I like to get code review requests using codereview.appspot.com (send them to gvanrossum at gmail.com). Please use the upload.py utility to upload your patch, don't bother with defining a repository. If I like your patch I'll probably ask you to submit it yourself, I'll give you repo access once you've signed a PSF contributor form. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Wed Jan 16 19:10:34 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Jan 2013 18:10:34 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 16 January 2013 17:52, Guido van Rossum wrote: > On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore wrote: >> I've so far been lurking on the tulip/async discussions, as although >> I'm interested, I have no specific need for writing high-performance >> network code. >> >> However, I hit a use case today which seems to me to be ideal for an >> async-style approach, and yet I don't think it's covered by the >> current PEP. Specifically, I am looking at monitoring a >> subprocess.Popen object. This is basically an IO loop, but monitoring >> the 3 pipes to the subprocess (well, only stdout and stderr in my >> case...). Something like add_reader/add_writer would be fine, except >> for the fact that (a) they are documented as low-level not for the >> user, and (b) they don't work in all cases (e.g. in a select-based >> loop on Windows). >> >> I'd like PEP 3156 to include some support for waiting on IO from (one >> or more) subprocesses like this in a cross-platform way. If there's >> something in there to do this at the moment, that's great, but it >> wasn't obvious to me when I looked... > > This is a great use case. The right approach would probably be to > define a new Transport (and an event loop method to create one) that > wraps pipes going into and out of a subprocess. The new method would > have a standard API (probably similar to that of subprocess), whereas > there would be different implementations of the Transport based on > platform and event loop implementation (similar to the way the > subprocess module has quite different implementations). > > Can you check out the Tulip source code (code.google.com/p/tulip) and > come up with a patch to do this? I'll gladly review it. It's fine to > only cover the UNIX case for now. I'll have a look. There *is* one problem, though - I imagine it will be relatively easy to put something together that works on Unix, as waiting on pipes is covered by the existing select/poll mechanisms. But I'm on Windows, so I won't be able to test it. And on Windows, there's no mechanism in place to wait on arbitrary filehandles, so the process wait mechanism is a much harder nut to crack. Chicken and egg problem... Maybe I'll start by looking at waiting on arbitrary filehandles, and use that to build the process approach. Unfortunately, I don't think IOCP is any more able to wait on arbitrary files than select - see my followup to an older thread on Richard's work there. Or maybe I'll set up a hacking environment in a Linux VM or something. That'd be a fun experience in any case. I'll have to get my brain round the existing spec as well. I'm finding it hard to understand why there are so many methods on the event loop that are specific to particular use cases (for this example, your suggested method to create the new type of Transport). My instinct says that this should *also* be a good test case for a user coming up with a new type of "event source" and wanting to plug it into the event loop. Having to add a new method to the event loop seems to imply this isn't possible. OK, off to do a lot of spec reading and then some coding. With luck, you'll be patient with dumb questions from me on the way :-) Thanks, Paul From amauryfa at gmail.com Wed Jan 16 19:15:05 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 16 Jan 2013 19:15:05 +0100 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no> Message-ID: 2013/1/16 Paul Moore > On 3 November 2012 21:20, Richard Oudkerk wrote: > > The IOCP proactor does not support ssl (or ipv6) so main.py does not > succeed > > in downloading from xkcd.com using ssl. Using the other proactors it > works > > correctly. > > > > The basic interface for the proactor looks like > > > > class Proactor: > > def recv(self, sock, n): ... > > def send(self, sock, buf): ... > > def connect(self, sock, address): ... > > def accept(self, sock): ... > > > > def poll(self, timeout=None): ... > > def pollable(self): ... > > I've just been looking at this, and from what I can see, am I right in > thinking that the IOCP support is *only* for sockets? (I'm not very > familiar with socket programming, so I had a bit of difficulty > following the code). In particular, it can't be used to register > non-socket file objects? From my understanding of the IOCP > documentation on MSDN, this is fundamental - IOCP can only be used on > HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED > flag, which is not used by "normal" Python IO objects like file > handles and pipes, so it will never be possible to poll these objects > using IOCP. > It works for disk files as well, but you indeed have to pass FILE_FLAG_OVERLAPPED when opening the file. This is similar to sockets: s.setblocking(False) is required for asynchronous writes to work. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Wed Jan 16 19:20:23 2013 From: geertj at gmail.com (Geert Jansen) Date: Wed, 16 Jan 2013 20:20:23 +0200 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 8:10 PM, Paul Moore wrote: > I'll have a look. There *is* one problem, though - I imagine it will > be relatively easy to put something together that works on Unix, as > waiting on pipes is covered by the existing select/poll mechanisms. > But I'm on Windows, so I won't be able to test it. And on Windows, > there's no mechanism in place to wait on arbitrary filehandles, so the > process wait mechanism is a much harder nut to crack. Chicken and egg > problem... > > Maybe I'll start by looking at waiting on arbitrary filehandles, and > use that to build the process approach. Unfortunately, I don't think > IOCP is any more able to wait on arbitrary files than select - see my > followup to an older thread on Richard's work there. Or maybe I'll set > up a hacking environment in a Linux VM or something. That'd be a fun > experience in any case. Dealing with subprocesses on Windows in a non-blocking way is a royal pain. As far as I know, the only option is to use named pipes and block on them using a thread pool. A few years back I wrote something that did this, see the link below. However it ain't pretty.. https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py Regards, Geert From guido at python.org Wed Jan 16 19:21:40 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 10:21:40 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 10:10 AM, Paul Moore wrote: > On 16 January 2013 17:52, Guido van Rossum wrote: > > On Wed, Jan 16, 2013 at 7:12 AM, Paul Moore wrote: > >> I've so far been lurking on the tulip/async discussions, as although > >> I'm interested, I have no specific need for writing high-performance > >> network code. > >> > >> However, I hit a use case today which seems to me to be ideal for an > >> async-style approach, and yet I don't think it's covered by the > >> current PEP. Specifically, I am looking at monitoring a > >> subprocess.Popen object. This is basically an IO loop, but monitoring > >> the 3 pipes to the subprocess (well, only stdout and stderr in my > >> case...). Something like add_reader/add_writer would be fine, except > >> for the fact that (a) they are documented as low-level not for the > >> user, and (b) they don't work in all cases (e.g. in a select-based > >> loop on Windows). > >> > >> I'd like PEP 3156 to include some support for waiting on IO from (one > >> or more) subprocesses like this in a cross-platform way. If there's > >> something in there to do this at the moment, that's great, but it > >> wasn't obvious to me when I looked... > > > > This is a great use case. The right approach would probably be to > > define a new Transport (and an event loop method to create one) that > > wraps pipes going into and out of a subprocess. The new method would > > have a standard API (probably similar to that of subprocess), whereas > > there would be different implementations of the Transport based on > > platform and event loop implementation (similar to the way the > > subprocess module has quite different implementations). > > > > Can you check out the Tulip source code (code.google.com/p/tulip) and > > come up with a patch to do this? I'll gladly review it. It's fine to > > only cover the UNIX case for now. > > I'll have a look. There *is* one problem, though - I imagine it will > be relatively easy to put something together that works on Unix, as > waiting on pipes is covered by the existing select/poll mechanisms. > But I'm on Windows, so I won't be able to test it. And on Windows, > there's no mechanism in place to wait on arbitrary filehandles, so the > process wait mechanism is a much harder nut to crack. Chicken and egg > problem... > What does the subprocess module do on Windows? (I'm in the reverse position, although I have asked the kind IT folks at Dropbox to provide me with a Windows machine.) > Maybe I'll start by looking at waiting on arbitrary filehandles, and > use that to build the process approach. Unfortunately, I don't think > IOCP is any more able to wait on arbitrary files than select - see my > followup to an older thread on Richard's work there. Or maybe I'll set > up a hacking environment in a Linux VM or something. That'd be a fun > experience in any case. > I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are* more general than sockets... > I'll have to get my brain round the existing spec as well. I'm finding > it hard to understand why there are so many methods on the event loop > that are specific to particular use cases (for this example, your > suggested method to create the new type of Transport). This is mainly so that the event loop implementation can control the Transport class. Note that it isn't enough to define different Transport classes per platform -- on a single platform there may be multiple event loop implementations (e.g. on Windows you can use Select or IOCP) and these may need different Transport implementations. SO this must really be under control of the event loop object. > My instinct > says that this should *also* be a good test case for a user coming up > with a new type of "event source" and wanting to plug it into the > event loop. Having to add a new method to the event loop seems to > imply this isn't possible. > If the user is okay with solving the problem only for their particular platform and event loop implementation they don't need to add anything to the event loop. But for transports that make it into the PEP, it is essential that alternate implementations (e.g. one that proxies a Twisted Reactor) be in control of the Transport construction. > > OK, off to do a lot of spec reading and then some coding. With luck, > you'll be patient with dumb questions from me on the way :-) > I will be! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 16 19:22:21 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 10:22:21 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 10:20 AM, Geert Jansen wrote: > On Wed, Jan 16, 2013 at 8:10 PM, Paul Moore wrote: > > > I'll have a look. There *is* one problem, though - I imagine it will > > be relatively easy to put something together that works on Unix, as > > waiting on pipes is covered by the existing select/poll mechanisms. > > But I'm on Windows, so I won't be able to test it. And on Windows, > > there's no mechanism in place to wait on arbitrary filehandles, so the > > process wait mechanism is a much harder nut to crack. Chicken and egg > > problem... > > > > Maybe I'll start by looking at waiting on arbitrary filehandles, and > > use that to build the process approach. Unfortunately, I don't think > > IOCP is any more able to wait on arbitrary files than select - see my > > followup to an older thread on Richard's work there. Or maybe I'll set > > up a hacking environment in a Linux VM or something. That'd be a fun > > experience in any case. > > Dealing with subprocesses on Windows in a non-blocking way is a royal > pain. As far as I know, the only option is to use named pipes and > block on them using a thread pool. A few years back I wrote something > that did this, see the link below. However it ain't pretty.. > > https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py > Hm, doesn't IOCP support named pipes? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From eliben at gmail.com Wed Jan 16 19:26:44 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 16 Jan 2013 10:26:44 -0800 Subject: [Python-ideas] question about the Tulip effort In-Reply-To: References: Message-ID: > > > 1. Questions and clarifications should be sent to this list > (python-ideas), > > correct? > > Yes, unless you think it's of little public value, you can always mail > me directly (Tulip is my top priority until the PEP is accepted and > Tulip lands in the 3.4 stdlib). > > > 2. Is there a list of tasks help would be needed with? Is it the the TODO > > file in tulip's root dir? > > Hm, that's mostly reminders for myself, and I don't always update it. > There are also lots of TODOs and XXXs in the source code (the XXXs > mark things that are *definitely* in need of fixing, like missing > docstrings; TODOs are often just for pondering). You can certainly > read through it, and if you see a task you would like to do, ping me > for details. > > Some tasks that I don't think are represented well but where I would > love to get help: > > - Write a somewhat significant server app. I have a somewhat > significant client app (crawl.py) but nothing that exercises the > server API at all. I suspect that there are some awkward things in the > server API that will need fixing. > > - Try writing a significant app for a protocol other than HTTP. > > - Move the StreamReader class out of http_client.py and design an API > to make it easy to hook it up to any protocol. > > - Datagram support (read the section in the PEP on this topic first). > > Great, I'll start looking around. > > 3. How/where to contribute patches? > > I like to get code review requests using codereview.appspot.com (send > them to gvanrossum at gmail.com). Please use the upload.py utility to > upload your patch, don't bother with defining a repository. If I like > your patch I'll probably ask you to submit it yourself, I'll give you > repo access once you've signed a PSF contributor form. > Is that the same contributor form I had to sign for CPython a while ago (I have the asterisk near my name in the issue tracker)? Anyway, sending patches through Rietveld SGTM. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From geertj at gmail.com Wed Jan 16 19:27:10 2013 From: geertj at gmail.com (Geert Jansen) Date: Wed, 16 Jan 2013 20:27:10 +0200 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 8:22 PM, Guido van Rossum wrote: >> Dealing with subprocesses on Windows in a non-blocking way is a royal >> pain. As far as I know, the only option is to use named pipes and >> block on them using a thread pool. A few years back I wrote something >> that did this, see the link below. However it ain't pretty.. >> >> https://bitbucket.org/geertj/winpexpect/src/tip/lib/winpexpect.py > > > Hm, doesn't IOCP support named pipes? Oops, yes, I stand corrected. I got confused between select and IOCP. Sorry. Regards, Geert From solipsis at pitrou.net Wed Jan 16 19:47:56 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 16 Jan 2013 19:47:56 +0100 Subject: [Python-ideas] Parametrized any() and all() ? References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> Message-ID: <20130116194756.2efe9afe@pitrou.net> On Wed, 16 Jan 2013 15:52:19 +0100 Tarek Ziad? wrote: > > > > For all > > we know, adding a filter function will be a pessimization, not an > > optimization, > > using more memory and/or being slower than using a generator > > expression. It > > certainly isn't clear to me that creating a generator expression like > > (x is None for x in it) is more expensive than creating a filter > > function like > > (lambda x: x is None). > > > > -1 on adding a filter function. > > I abandoned the idea, > > but I'd be curious to understand how creating several > iterables with one that has an 'if', can be more efficient than having a > single > iterable with an 'if'... You know, discussing performance without posting benchmark numbers is generally pointless. Regards Antoine. From shibturn at gmail.com Wed Jan 16 19:54:50 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 16 Jan 2013 18:54:50 +0000 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no> Message-ID: On 16/01/2013 5:59pm, Paul Moore wrote: > I've just been looking at this, and from what I can see, am I right in > thinking that the IOCP support is*only* for sockets? (I'm not very > familiar with socket programming, so I had a bit of difficulty > following the code). In particular, it can't be used to register > non-socket file objects? From my understanding of the IOCP > documentation on MSDN, this is fundamental - IOCP can only be used on > HANDLE objects that have been opened with the FILE_FLAG_OVERLAPPED > flag, which is not used by "normal" Python IO objects like file > handles and pipes, so it will never be possible to poll these objects > using IOCP. Only sockets are supported because it uses WSARecv()/WSASend(), but it could very easily be made to use ReadFile()/WriteFile(). Then it would work with overlapped pipes (as currently used by multiprocessing) or other files openned with FILE_FLAG_OVERLAPPED. IOCP cannot be used with normal python file objects. But see http://bugs.python.org/issue12939 -- Richard From shibturn at gmail.com Wed Jan 16 20:18:22 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 16 Jan 2013 19:18:22 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 16/01/2013 6:21pm, Guido van Rossum wrote: > I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are* > more general than sockets... I would like to modify subprocess on Windows to use file-like objects which wrap overlapped pipe handles. Then doing async work with subprocess would become relatively straight forward, and does not really require tulip or IOCP. -- Richard From p.f.moore at gmail.com Wed Jan 16 20:35:15 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 16 Jan 2013 19:35:15 +0000 Subject: [Python-ideas] The async API of the future In-Reply-To: References: <2CEFACA8-FB96-4C17-9D14-CADEE217F662@molden.no> Message-ID: On 16 January 2013 18:54, Richard Oudkerk wrote: > Only sockets are supported because it uses WSARecv()/WSASend(), but it could > very easily be made to use ReadFile()/WriteFile(). Then it would work with > overlapped pipes (as currently used by multiprocessing) or other files > openned with FILE_FLAG_OVERLAPPED. Oh, cool. I hadn't checked the source to see if multiprocessing opened its pipes with FILE_FLAG_OVERLAPPED. Good to know it does. And yes, if normal file objects were opened that way, that would allow those to be used as well. Paul From guido at python.org Wed Jan 16 21:16:09 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 12:16:09 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 11:18 AM, Richard Oudkerk wrote: > On 16/01/2013 6:21pm, Guido van Rossum wrote: > >> I'm eagerly awaiting Richard's response. AFAIK handles on Windows *are* >> more general than sockets... >> > > I would like to modify subprocess on Windows to use file-like objects > which wrap overlapped pipe handles. Then doing async work with subprocess > would become relatively straight forward, and does not really require tulip > or IOCP. But when you want to use it in the context of an event loop would it still be *possible* to hook it up to that using a transport or add_reader/add_writer? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 16 21:16:58 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 12:16:58 -0800 Subject: [Python-ideas] question about the Tulip effort In-Reply-To: References: Message-ID: On Wed, Jan 16, 2013 at 10:26 AM, Eli Bendersky wrote: > Is that the same contributor form I had to sign for CPython a while ago (I > have the asterisk near my name in the issue tracker)? > The same. So you're all set. > Anyway, sending patches through Rietveld SGTM. > Looking forward to them! -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Wed Jan 16 21:21:05 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 16 Jan 2013 20:21:05 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 16/01/2013 8:16pm, Guido van Rossum wrote: > But when you want to use it in the context of an event loop would it > still be *possible* to hook it up to that using a transport or > add_reader/add_writer? Assuming you use an IOCP reactor, yes. -- Richard From greg.ewing at canterbury.ac.nz Wed Jan 16 21:45:40 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 17 Jan 2013 09:45:40 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: <50F71174.7070803@canterbury.ac.nz> Guido van Rossum wrote: > If the user is okay with solving the problem only for their particular > platform and event loop implementation they don't need to add anything > to the event loop. In this case, shouldn't it be sufficient for tulip to provide a way of wrapping pipes, whatever they may look like on the platform? I don't see why a Transport specific to subprocesses should be required. -- Greg From guido at python.org Wed Jan 16 22:16:42 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 16 Jan 2013 13:16:42 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F71174.7070803@canterbury.ac.nz> References: <50F71174.7070803@canterbury.ac.nz> Message-ID: On Wed, Jan 16, 2013 at 12:45 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> If the user is okay with solving the problem only for their particular >> platform and event loop implementation they don't need to add anything to >> the event loop. >> > > In this case, shouldn't it be sufficient for tulip to provide > a way of wrapping pipes, whatever they may look like on the > platform? I don't see why a Transport specific to subprocesses > should be required. Tulip on UNIX already wraps pipes (and ptys, and certain other things), since the add_reader() API takes any filedescriptor (though it makes no sense for disk files because those are always considered readable). The issue is that on other platforms (read: Windows) you have to do something completely different, and hook it up to the native (IOCP) async eventloop differently. The Transport/Protocol abstraction however would be completely appropriate in both cases though (or a slightly modified version that handles stdout/stderr separately). So, just like the subprocess module contains two completely disjoint implementations for UNIX and Windows, implementing the same API, PEP 3156 could also have a standard API for running a subprocess connected with async streams connected to stdin, stdout, stderr, backed by different implementations. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Jan 17 13:23:10 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 17 Jan 2013 12:23:10 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 16 January 2013 18:21, Guido van Rossum wrote: >> OK, off to do a lot of spec reading and then some coding. With luck, >> you'll be patient with dumb questions from me on the way :-) > > I will be! OK, I'm reading the PEP through now. I'm happy with the basics of the event loop, and it seems fine to me. When I reached create_transport, I had to skip ahead to the definitions of transport and protocol, as create_transport makes no sense if you don't know about those. Once I've read that, though, the whole transport/protocol mechanism seems to make reasonable sense to me. Although the host and port arguments to create_transport are clearly irrelevant to the case of a transport managing a process as a data source. So (a) I see why you say I'd need a new transport creation method, but (b) it strikes me that something more general that covered both cases (and any others that may come up later) would be better. On the other hand, given the existence of create_transport, I'm now struggling to understand why a user would ever use add_reader/add_writer rather than using a transport/protocol. And if they do have a reason to do so, why does a similar reason not apply to having an add_pipe type of method for waiting on (subprocess) pipes? In general, it still feels to me like the socket use case is being treated as "special", and other data sources and sinks (subprocesses being my use case, but I'm sure others exist) are either second-class or require a whole set of their own specialised methods, which isn't practical. As a strawman type of argument in favour of extensibility, consider a very specialist user with a hardware device that sends input via (say) a serial port. I can easily imagine that user wanting to plug his device data into the Python event loop. As this is a very specialised area, I wouldn't expect the core code to be able to help, but I would expect him to be able to write code that plugs into the standard event loop seamlessly. Ideally, I'd like to use the subprocess case as a proof that this is practical. Does that make sense? Paul. From ica at iki.fi Thu Jan 17 13:44:03 2013 From: ica at iki.fi (Ilkka Pelkonen) Date: Thu, 17 Jan 2013 14:44:03 +0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: Message-ID: Hi all, I ran into an issue in expression evaluation with Python for Windows 2.7.3. Consider the following code: expected_result = (expected_string != 'TRUE') # Boolean element = find_element() # Can return None or an instance of Element flag = (element and element.is_visible()) if flag == expected_result: ..# Ok ..return # Otherwise perform some failure related stuff. This code does not work. What happens on the 'flag' assignment row, is that if 'element' is None, the expression returns None, not False. This makes the if comparison to fail if expected_result is False, since boolean False is not None. To me as a primarily C++ programmer it seems there could be two different changes here, either change the behavior of the 'and' expression, forcing it to return Boolean even if the latter part is not evaluated, and/or make the comparison "False == None" return True. Although potentially complex, I'd myself go for the first approach. It seems to me more logical that False != None than an 'and' expression returning non-boolean. Also the latter change might require people change their code, while the former should not require any modifications. This behavior probably results in lots of errors when people like me, used to more traditional languages, take on Python in a serious manner. I like the concept 'pythonic', and am trying to apply it to practice like above. Hoping to hear your thoughts, Regards, Ilkka Pelkonen -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Jan 17 13:51:05 2013 From: phd at phdru.name (Oleg Broytman) Date: Thu, 17 Jan 2013 16:51:05 +0400 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: Message-ID: <20130117125105.GA2609@iskra.aviel.ru> On Thu, Jan 17, 2013 at 02:44:03PM +0200, Ilkka Pelkonen wrote: > expected_result = (expected_string != 'TRUE') # Boolean > element = find_element() # Can return None or an instance of Element > flag = (element and element.is_visible()) > if flag == expected_result: > ..# Ok > ..return > # Otherwise perform some failure related stuff. > > This code does not work. What happens on the 'flag' assignment row, is that > if 'element' is None, the expression returns None, not False. This makes > the if comparison to fail if expected_result is False, since boolean False > is not None. No need to change the language. Just do flag = bool(element and element.is_visible()) Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From jsbueno at python.org.br Thu Jan 17 13:58:11 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 17 Jan 2013 10:58:11 -0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: Message-ID: On 17 January 2013 10:44, Ilkka Pelkonen wrote: > Hi all, > I ran into an issue in expression evaluation with Python for Windows 2.7.3. > Consider the following code: > > expected_result = (expected_string != 'TRUE') # Boolean > element = find_element() # Can return None or an instance of Element > flag = (element and element.is_visible()) > if flag == expected_result: > ..# Ok > ..return > # Otherwise perform some failure related stuff. > > This code does not work. What happens on the 'flag' assignment row, is that > if 'element' is None, the expression returns None, not False. This makes the > if comparison to fail if expected_result is False, since boolean False is > not None. > > To me as a primarily C++ programmer it seems there could be two different > changes here, either change the behavior of the 'and' expression, forcing it > to return Boolean even if the latter part is not evaluated, > and/or make the > comparison "False == None" return True. Hi Ikka.. My personal suggestion - rewrite your code to read: flag = bool(element and element.is_visible()) instead. That way you don't have to mention trying to change a 20 year old behavior in a language with billions of lines of code in the wild which should be kept compatible, at expense of thinking your expressions. Nor wait for the next major "4.0" release of Python for being able to write your code. js -><- From ilkka.pelkonen at iki.fi Thu Jan 17 14:10:45 2013 From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen) Date: Thu, 17 Jan 2013 15:10:45 +0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: <20130117125105.GA2609@iskra.aviel.ru> References: <20130117125105.GA2609@iskra.aviel.ru> Message-ID: Hi Oleg, others, It's not that it can't be done, just that it does something you don't expect. I've been professionally working with C++ for nine years in large-scale Windows systems, and I do expect a boolean expression return a boolean value. Or, can you show me an example how the developer would benefit of the current behavior? Any operator traditionally considered as boolean will do. Regards, Ilkka On Thu, Jan 17, 2013 at 2:51 PM, Oleg Broytman wrote: > On Thu, Jan 17, 2013 at 02:44:03PM +0200, Ilkka Pelkonen > wrote: > > expected_result = (expected_string != 'TRUE') # Boolean > > element = find_element() # Can return None or an instance of Element > > flag = (element and element.is_visible()) > > if flag == expected_result: > > ..# Ok > > ..return > > # Otherwise perform some failure related stuff. > > > > This code does not work. What happens on the 'flag' assignment row, is > that > > if 'element' is None, the expression returns None, not False. This makes > > the if comparison to fail if expected_result is False, since boolean > False > > is not None. > > No need to change the language. Just do > > flag = bool(element and element.is_visible()) > > Oleg. > -- > Oleg Broytman http://phdru.name/ phd at phdru.name > Programmers don't die, they just GOSUB without RETURN. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Thu Jan 17 14:23:11 2013 From: phd at phdru.name (Oleg Broytman) Date: Thu, 17 Jan 2013 17:23:11 +0400 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: <20130117125105.GA2609@iskra.aviel.ru> Message-ID: <20130117132311.GA5971@iskra.aviel.ru> On Thu, Jan 17, 2013 at 03:10:45PM +0200, Ilkka Pelkonen wrote: > It's not that it can't be done, just that it does something you don't > expect. I've been professionally working with C++ for nine years in > large-scale Windows systems, and I do expect a boolean expression return a > boolean value. It does something Python developers expect. It's a well-known behaviour and there are many programs that rely on that behaviour. > Or, can you show me an example how the developer would benefit of the > current behavior? Any operator traditionally considered as boolean will do. address = user and user.address if address is None: raise ValueError("Unknown address") In the example neither user nor user.address are allowed to be None. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From jsbueno at python.org.br Thu Jan 17 14:23:47 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Thu, 17 Jan 2013 11:23:47 -0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: <20130117125105.GA2609@iskra.aviel.ru> Message-ID: On 17 January 2013 11:10, Ilkka Pelkonen wrote: > Hi Oleg, others, > It's not that it can't be done, just that it does something you don't > expect. I've been professionally working with C++ for nine years in > large-scale Windows systems, and I do expect a boolean expression return a > boolean value. > > Or, can you show me an example how the developer would benefit of the > current behavior? Any operator traditionally considered as boolean will do. Ikka, Python is a dynamic typed language. As such, there is no strict type checking for most operations. The behavior of boolean operations for Python 2.x is well defiend and described here: http://docs.python.org/2/reference/expressions.html#boolean-operations If you are testing for "truthfullness" of a given object, using the "==" value for that, as in your "if flag == expected_result:" is definetelly a non-recomended pratice. Objects that have False or True value have always been well defined in Python, and that definition follows common sense closely, on what should be False. No one would expect "None" to be True. The behavior of yielding the first part of the expression in a failed "and" operation is not unique to Python, and AFAIK, has been inspired from C - and there are tons of code which rely directly on this behavior. (Even though I'd agree a lot of this code, emulating the ternary operator in a time before it was available in Python 2.5 is very poorly written) > Regards, > Ilkka > From solipsis at pitrou.net Thu Jan 17 15:32:43 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 17 Jan 2013 15:32:43 +0100 Subject: [Python-ideas] Boolean behavior of None References: <20130117125105.GA2609@iskra.aviel.ru> Message-ID: <20130117153243.72fd7508@pitrou.net> Le Thu, 17 Jan 2013 15:10:45 +0200, Ilkka Pelkonen a ?crit : > Hi Oleg, others, > It's not that it can't be done, just that it does something you don't > expect. I've been professionally working with C++ for nine years in > large-scale Windows systems, and I do expect a boolean expression > return a boolean value. "and" is not a boolean operator, it is a shortcutting control flow operator. Basically, what you are looking for is: flag = element.is_visible() if element else False (or, more explicitly: flag = element.is_visible() if element is not None else False ) Regards Antoine. From p.f.moore at gmail.com Thu Jan 17 15:35:13 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 17 Jan 2013 14:35:13 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 17 January 2013 12:23, Paul Moore wrote: > In general, it still feels to me like the socket use case is being > treated as "special", and other data sources and sinks (subprocesses > being my use case, but I'm sure others exist) are either second-class > or require a whole set of their own specialised methods, which isn't > practical. Thinking about this some more. The key point is that for any event loop there can only be one "source of events" in terms of the thing that the event loop checks when there are no pending tasks. So the event loop is roughly: while True: process_ready_queue() new_events = block_on_event_source(src, timeout=N) add_to_ready_queue(new_events) add_timed_events_to_ready_queue() The source has to be a unique object, as there's an OS-level wait in there, and you can't do two of them at once. As things stand, methods like add_reader on the event loop object should really be methods on the event source object (and indeed, that's more or less what Tulip does internally). Would it not make more sense to explicitly expose the event source? This is (I guess) what the section "Choosing an Event Loop Implementation" in the PEP is about. But if the event source is a user-visible object, methods like add_reader would no longer be optional event loop methods, but rather they would be methods of the event source (but only for those event sources for which they make sense). The point here is that there's a lot of event loop machinery (ready queue, timed events, run methods) that are independent of the precise means by which you poll the OS to ask "has anything interesting happened?" Abstracting out that machinery would seem to me to make the design cleaner and more understandable. Other benefits - our hypothetical person with a serial port device can build his own event source and plug it into the event loop directly. Or someone could offer a multiplexer that combines two separate sources by running them in different threads and merging the output on a queue (that may be YAGNI, though). This is really just something to think about while I'm trying to build a Linux development environment so that I can do a Unix proof of concept. Once I get started on that, I'll think about the protocol/transport stuff. Paul From ned at nedbatchelder.com Thu Jan 17 15:54:28 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 17 Jan 2013 09:54:28 -0500 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: <20130117125105.GA2609@iskra.aviel.ru> Message-ID: <50F810A4.8040802@nedbatchelder.com> On 1/17/2013 8:10 AM, Ilkka Pelkonen wrote: > Hi Oleg, others, > It's not that it can't be done, just that it does something you don't > expect. I've been professionally working with C++ for nine years in > large-scale Windows systems, and I do expect a boolean expression > return a boolean value. > > Or, can you show me an example how the developer would benefit of the > current behavior? Any operator traditionally considered as boolean > will do. > > Regards, > Ilkka Ilkka, welcome to the Python community. Python is a wonderfully expressive language once you learn its subtleties. Python and C++ are different. If they weren't, we'd only have one language, not two. The short-circuiting operations "and" and "or" behave as they do for a reason. As an example, a common way to deal with default values: def accumulate(value, to=None): to = to or [] to.append(value) # Forget whether this is a good function or not.. return to If "or" always returned a boolean, as I'm assuming you'd prefer, then we'd have a much clumsier time defaulting values like this. --Ned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From grosser.meister.morti at gmx.net Thu Jan 17 17:07:13 2013 From: grosser.meister.morti at gmx.net (=?ISO-8859-1?Q?Mathias_Panzenb=F6ck?=) Date: Thu, 17 Jan 2013 17:07:13 +0100 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: Message-ID: <50F821B1.9070905@gmx.net> This change would break a lot of existing code and would make Python awkwardly stand out from all other modern dynamically typed languages (e.g. Ruby and JavaScript). You often write things like this: def foo(bar=None): bar = bar or [] ... Or: obj = obj and obj.property The proposed change would needlessly complicate these things and break existing code. Forcing a bool type really doesn't require that much code (bool(expr)) and is good practise anyway. On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote: > Hi all, > I ran into an issue in expression evaluation with Python for Windows 2.7.3. Consider the following code: > > expected_result = (expected_string != 'TRUE') # Boolean > element = find_element() # Can return None or an instance of Element > flag = (element and element.is_visible()) > if flag == expected_result: > ..# Ok > ..return > # Otherwise perform some failure related stuff. > > This code does not work. What happens on the 'flag' assignment row, is that if 'element' is None, the expression returns None, not False. This makes the if comparison to fail if expected_result is False, since boolean False is not None. > > To me as a primarily C++ programmer it seems there could be two different changes here, either change the behavior of the 'and' expression, forcing it to return Boolean even if the latter part is not evaluated, and/or make the comparison "False == None" return True. Although potentially complex, I'd > myself go for the first approach. It seems to me more logical that False != None than an 'and' expression returning non-boolean. Also the latter change might require people change their code, while the former should not require any modifications. > > This behavior probably results in lots of errors when people like me, used to more traditional languages, take on Python in a serious manner. I like the concept 'pythonic', and am trying to apply it to practice like above. > > Hoping to hear your thoughts, > Regards, > > Ilkka Pelkonen > > > From ilkka.pelkonen at iki.fi Thu Jan 17 18:03:12 2013 From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen) Date: Thu, 17 Jan 2013 19:03:12 +0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: <50F821B1.9070905@gmx.net> References: <50F821B1.9070905@gmx.net> Message-ID: Thank you all. It was just that when I started with Python, everything worked right like I expected, and I found the ways to do anything I've needed all the way until today, so when I came across with this, it appeared to me a clear bug in the language/interpreter. Casting to bool is indeed a good solution and practice, and I do now agree that there's no point in changing the language - like Antoine said, we're talking control flow operators here, not exactly boolean. (This might be a good addition to the documentation.) Thank you Ned for the warm welcome and everyone for your input. I hope to be able to contribute in the future. :) Best Regards, Ilkka On Thu, Jan 17, 2013 at 6:07 PM, Mathias Panzenb?ck < grosser.meister.morti at gmx.net> wrote: > This change would break a lot of existing code and would make Python > awkwardly stand out from all > other modern dynamically typed languages (e.g. Ruby and JavaScript). You > often write things like > this: > > def foo(bar=None): > bar = bar or [] > ... > > Or: > > obj = obj and obj.property > > The proposed change would needlessly complicate these things and break > existing code. Forcing a > bool type really doesn't require that much code (bool(expr)) and is good > practise anyway. > > > On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote: > >> Hi all, >> I ran into an issue in expression evaluation with Python for Windows >> 2.7.3. Consider the following code: >> >> expected_result = (expected_string != 'TRUE') # Boolean >> element = find_element() # Can return None or an instance of Element >> flag = (element and element.is_visible()) >> if flag == expected_result: >> ..# Ok >> ..return >> # Otherwise perform some failure related stuff. >> >> This code does not work. What happens on the 'flag' assignment row, is >> that if 'element' is None, the expression returns None, not False. This >> makes the if comparison to fail if expected_result is False, since boolean >> False is not None. >> >> To me as a primarily C++ programmer it seems there could be two different >> changes here, either change the behavior of the 'and' expression, forcing >> it to return Boolean even if the latter part is not evaluated, and/or make >> the comparison "False == None" return True. Although potentially complex, >> I'd >> myself go for the first approach. It seems to me more logical that False >> != None than an 'and' expression returning non-boolean. Also the latter >> change might require people change their code, while the former should not >> require any modifications. >> >> This behavior probably results in lots of errors when people like me, >> used to more traditional languages, take on Python in a serious manner. I >> like the concept 'pythonic', and am trying to apply it to practice like >> above. >> >> Hoping to hear your thoughts, >> Regards, >> >> Ilkka Pelkonen >> >> >> >> ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ilkka.pelkonen at iki.fi Thu Jan 17 19:43:01 2013 From: ilkka.pelkonen at iki.fi (Ilkka Pelkonen) Date: Thu, 17 Jan 2013 20:43:01 +0200 Subject: [Python-ideas] Fwd: Boolean behavior of None In-Reply-To: References: <50F821B1.9070905@gmx.net> Message-ID: To sum this up, I also thought of the None vs False issue some more, and found a use case which can bring a benefit for the separation: a function could normally return True or False, or None in a special case. Because of the separation, the developer can handle all the cases appropriately. Too little thinking, too much action. I wonder why I didn't google it this time, the issue is all over the net like any question you can imagine to ask, at least in case of Python. :) Sorry for the trouble. I'll stay around with a little lower profile. :) Regards again, Ilkka On Thu, Jan 17, 2013 at 7:03 PM, Ilkka Pelkonen wrote: > Thank you all. It was just that when I started with Python, everything > worked right like I expected, and I found the ways to do anything I've > needed all the way until today, so when I came across with this, it > appeared to me a clear bug in the language/interpreter. Casting to bool is > indeed a good solution and practice, and I do now agree that there's no > point in changing the language - like Antoine said, we're talking control > flow operators here, not exactly boolean. (This might be a good addition to > the documentation.) > > Thank you Ned for the warm welcome and everyone for your input. I hope to > be able to contribute in the future. :) > > Best Regards, > Ilkka > > > On Thu, Jan 17, 2013 at 6:07 PM, Mathias Panzenb?ck < > grosser.meister.morti at gmx.net> wrote: > >> This change would break a lot of existing code and would make Python >> awkwardly stand out from all >> other modern dynamically typed languages (e.g. Ruby and JavaScript). You >> often write things like >> this: >> >> def foo(bar=None): >> bar = bar or [] >> ... >> >> Or: >> >> obj = obj and obj.property >> >> The proposed change would needlessly complicate these things and break >> existing code. Forcing a >> bool type really doesn't require that much code (bool(expr)) and is good >> practise anyway. >> >> >> On 01/17/2013 01:44 PM, Ilkka Pelkonen wrote: >> >>> Hi all, >>> I ran into an issue in expression evaluation with Python for Windows >>> 2.7.3. Consider the following code: >>> >>> expected_result = (expected_string != 'TRUE') # Boolean >>> element = find_element() # Can return None or an instance of Element >>> flag = (element and element.is_visible()) >>> if flag == expected_result: >>> ..# Ok >>> ..return >>> # Otherwise perform some failure related stuff. >>> >>> This code does not work. What happens on the 'flag' assignment row, is >>> that if 'element' is None, the expression returns None, not False. This >>> makes the if comparison to fail if expected_result is False, since boolean >>> False is not None. >>> >>> To me as a primarily C++ programmer it seems there could be two >>> different changes here, either change the behavior of the 'and' expression, >>> forcing it to return Boolean even if the latter part is not evaluated, >>> and/or make the comparison "False == None" return True. Although >>> potentially complex, I'd >>> myself go for the first approach. It seems to me more logical that False >>> != None than an 'and' expression returning non-boolean. Also the latter >>> change might require people change their code, while the former should not >>> require any modifications. >>> >>> This behavior probably results in lots of errors when people like me, >>> used to more traditional languages, take on Python in a serious manner. I >>> like the concept 'pythonic', and am trying to apply it to practice like >>> above. >>> >>> Hoping to hear your thoughts, >>> Regards, >>> >>> Ilkka Pelkonen >>> >>> >>> >>> ______________________________**_________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/**mailman/listinfo/python-ideas >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 17 20:10:57 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Jan 2013 11:10:57 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: (I'm responding to two separate messages in one response.) On Thu, Jan 17, 2013 at 4:23 AM, Paul Moore wrote: > OK, I'm reading the PEP through now. I'm happy with the basics of the > event loop, and it seems fine to me. When I reached create_transport, > I had to skip ahead to the definitions of transport and protocol, as > create_transport makes no sense if you don't know about those. Whoops, I should fix the order in the PEP, or at least insert forward references. > Once > I've read that, though, the whole transport/protocol mechanism seems > to make reasonable sense to me. Although the host and port arguments > to create_transport are clearly irrelevant to the case of a transport > managing a process as a data source. So (a) I see why you say I'd need > a new transport creation method, but (b) it strikes me that something > more general that covered both cases (and any others that may come up > later) would be better. This is why there is a TBD item suggesting to rename create_transport() to create_connection() -- this method is for creating the most common type of transport only, i.e. one that connects a client to a server given by host and port. > On the other hand, given the existence of create_transport, I'm now > struggling to understand why a user would ever use > add_reader/add_writer rather than using a transport/protocol. And if > they do have a reason to do so, why does a similar reason not apply to > having an add_pipe type of method for waiting on (subprocess) pipes? add_reader and friends exist for the benefit of Transport implementations. The PEP even says that not all event loops need to implement these (though on UNIXy systems it is better if they do, and I am considering removing or weakening this language. Because on UNIX pipes are just file descriptors, and work fine with select()/poll()/etc., there is no need for add_pipe() (assuming that API would take an existing pipe filedescriptor and a callback), since add_reader() will do the right thing. (Or add_writer() for the other end.) > In general, it still feels to me like the socket use case is being > treated as "special", and other data sources and sinks (subprocesses > being my use case, but I'm sure others exist) are either second-class > or require a whole set of their own specialised methods, which isn't > practical. Well, sockets are treated special because on Windows they *are* special. At least the select() system call only works for sockets. IOCP supports other types of unusual handles, but the ways to create handles you can use with it are mostly custom. Basically, if you want to write code that works both on Windows and on UNIX, you have to limit yourself to sockets. (And you shouldn't use add_reader and friends either, because that limits you to the SelectSelector, whereas if you use the transport/protocol API you will be compatible with either that or IOCPSelector.) > As a strawman type of argument in favour of extensibility, consider a > very specialist user with a hardware device that sends input via (say) > a serial port. I can easily imagine that user wanting to plug his > device data into the Python event loop. As this is a very specialised > area, I wouldn't expect the core code to be able to help, but I would > expect him to be able to write code that plugs into the standard event > loop seamlessly. Ideally, I'd like to use the subprocess case as a > proof that this is practical. > > Does that make sense? Yes, it does make sense, but you have to choose whether to do it on Windows or on UNIX. If you use UNIX, presumably your serial port is accessible via a file descriptor that works with select/poll/etc. -- if it doesn't, you are going to have a really hard time integrating it with the event loop, you may have to use a separate thread that talks to the device and sends the data to the event loop over a pipe or something. On Windows, I have no idea how it would work, but I presume that serial port drivers are somehow hooked up to "handles" and "waitable events" (or whatever the Microsoft terminology is -- I am about to get educated about this) and then presumably it will integrate nicely with IOCP (but not with Select). I think that for UNIX, hooking a subprocess up to a transport should be easy enough (except perhaps for the stdout/stderr distinction), and your transport should use add_reader/writer. For Windows I am not sure but you can probably crib the details from the Windows-specific code in subprocess.py in the stdlib. On Thu, Jan 17, 2013 at 6:35 AM, Paul Moore wrote: > On 17 January 2013 12:23, Paul Moore wrote: >> In general, it still feels to me like the socket use case is being >> treated as "special", and other data sources and sinks (subprocesses >> being my use case, but I'm sure others exist) are either second-class >> or require a whole set of their own specialised methods, which isn't >> practical. > > Thinking about this some more. The key point is that for any event > loop there can only be one "source of events" in terms of the thing > that the event loop checks when there are no pending tasks. So the > event loop is roughly: > > while True: > process_ready_queue() > new_events = block_on_event_source(src, timeout=N) > add_to_ready_queue(new_events) > add_timed_events_to_ready_queue() > > The source has to be a unique object, as there's an OS-level wait in > there, and you can't do two of them at once. Right, that's the idea. > As things stand, methods like add_reader on the event loop object > should really be methods on the event source object (and indeed, > that's more or less what Tulip does internally). Would it not make > more sense to explicitly expose the event source? This is (I guess) > what the section "Choosing an Event Loop Implementation" in the PEP is > about. But if the event source is a user-visible object, methods like > add_reader would no longer be optional event loop methods, but rather > they would be methods of the event source (but only for those event > sources for which they make sense). The problem with this idea is (you may have guessed it by now :-) ... Windows. On Windows, at least when using a (at this moment purely hypothetical) IOCP-based implementation of the event loop, there will *not* be an underlying Selector object. Please track down discussion of IOCP in older posts on this list. IOCP requires you to use a different paradigm, which is supported by the separate methods sock_recv(), sock_sendall() and so on. For I/O objects that are not sockets, different methods are needed, but the idea is the same: you specify the I/O, and you get a callback when it is done. This in contrast with the UNIX selector, where you specify the file descriptor and I/O direction, and you get a callback when you can read/write without blocking. This is why the event loop has the higher-level transport/protocol-based APIs: an IOCP implementation of these creates instances of a completely different transport implementation, which however have the same interface and *meaning* as the UNIX transports (e.g. the transport created by create_connection() connects to a host and port over TCP/IP and calls the protocol's connection_made(), data_received(), connection_lost() methods). So if you want a transport that encapsulates a subprocess (instead of a TCP/IP connection), and you want to support both UNIX and Windows, you have to provide (at least) two separate implementations: one on UNIX that uses add_reader() and friends, and one on Windows that uses (I don't know what, but something). Each of these implementations by itself is dependent on the platform (and the specific event loop implementation); but together they cover all supported platforms. If you develop this as 3rd party code, and you want your users not to have to write platform-specific code, you have to write a "start subprocess" function that inspects the platform (and the event loop implementation) and then imports and instantiates the right transport implementation for the platform. If we want to add this to the PEP, the right thing is to add a "start subprocess" method to the event loop API (which can be identical to the start subprocess function in your 3rd party package :-). > The point here is that there's a lot of event loop machinery (ready > queue, timed events, run methods) that are independent of the precise > means by which you poll the OS to ask "has anything interesting > happened?" Abstracting out that machinery would seem to me to make the > design cleaner and more understandable. It is abstracted out in the implementation, but I hope I have explained with sufficient clarity why it should not be abstracted out in the PEP: the Selector abstraction only works on UNIX (or with sockets on Windows). Also note a subtlety in the PEP: while it describes a platform-independent API, it doesn't preclude that some parts of that API may have platform-specific behaviors -- for example, add_reader() may only take sockets on Windows (and in Jython, I suspect, where select() only works with sockets), but takes other file descriptors on UNIX, so you can implement your own subprocess transport for UNIX. Similarly, the PEP describes the interface between transports and protocols, but does not give you a way to construct a transport except for TCP/IP connections. But the abstraction is usable for other purposes too, and this is intentional! (E.g. you may be able to create a transport that uses a subprocess running ssh to talk to a remote server, which might be used to "tunnel" HTTP, so it would make sense to connect this custom transport with a standard HTTP protocol implementation.) > Other benefits - our hypothetical person with a serial port device can > build his own event source and plug it into the event loop directly. I think I've answered that above. > Or someone could offer a multiplexer that combines two separate > sources by running them in different threads and merging the output on > a queue (that may be YAGNI, though). I think there are Twisted reactor implementations that do things like this. My hope is that a proxy between the Twisted reactor and the PEP 3156 interface will enable this too -- and the event loop APIs for working with transports and protocols are essential for this purpose. (Twisted has a working IOCP reactor, FWIW.) > This is really just something to think about while I'm trying to build > a Linux development environment so that I can do a Unix proof of > concept. Once I get started on that, I'll think about the > protocol/transport stuff. I think it would be tremendously helpful if you tried to implement the UNIX version of the subprocess transport. (Note that AFAIK Twisted has one of these too, maybe you can get some implementation ideas from them.) -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Thu Jan 17 22:08:09 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 07:08:09 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: Hmm, there may still be something to the idea of clearly separating out "for everyone" and "for transports" methods. Even if that's just a split in the documentation, similar to the "for everyone" vs "for the executor" split in the concurrent.futures implementation. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jan 17 23:40:08 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Jan 2013 11:40:08 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: <50F87DC8.1060000@canterbury.ac.nz> Paul Moore wrote: > Although the host and port arguments > to create_transport are clearly irrelevant to the case of a transport > managing a process as a data source. Shouldn't this be called create_internet_transport or something like that? -- Greg From guido at python.org Fri Jan 18 00:39:49 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Jan 2013 15:39:49 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Thu, Jan 17, 2013 at 1:08 PM, Nick Coghlan wrote: > Hmm, there may still be something to the idea of clearly separating out "for > everyone" and "for transports" methods. Even if that's just a split in the > documentation, similar to the "for everyone" vs "for the executor" split in > the concurrent.futures implementation. Good idea, I like this idea. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 00:40:30 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Jan 2013 15:40:30 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F87DC8.1060000@canterbury.ac.nz> References: <50F87DC8.1060000@canterbury.ac.nz> Message-ID: On Thu, Jan 17, 2013 at 2:40 PM, Greg Ewing wrote: > Paul Moore wrote: >> >> Although the host and port arguments >> to create_transport are clearly irrelevant to the case of a transport >> managing a process as a data source. > Shouldn't this be called create_internet_transport or something > like that? I just renamed it to create_connection(), like I've been promising for a long time. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Fri Jan 18 00:44:18 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 17 Jan 2013 23:44:18 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On 17 January 2013 19:10, Guido van Rossum wrote: > I think it would be tremendously helpful if you tried to implement the > UNIX version of the subprocess transport. (Note that AFAIK Twisted has > one of these too, maybe you can get some implementation ideas from > them.) You were right. In starting to do so, I found out that my thinking has been solely based on a callback style of programming (users implement protocol classes and code the relevant "data received" methods themselves). From looking at some of the sample code, I see that this is not really the intended usage style. At this point my head exploded. Coroutines, what fun! I am now reading the sample code, the section of the PEP on coroutines, and the mailing list threads on the matter. I may be some time :-) (The technicalities of the implementation aren't hard - it's just a data_received type of protocol wrapper round a couple of pipes. It's the usability and design issues that matter, and they are strongly affected by "intended usage"). Paul PS From the PEP, it seems that a protocol must implement the 4 methods connection_made, data_received, eof_received and connection_lost. For a process, which has 2 output streams involved, a single data_received method isn't enough. I see two options - having 2 separate protocol classes involved, or having a process protocol with a different interface. Neither option seems obviously best, although Twisted appears to use different protocol types for different types of transport. How critical is the principle that there is a single type of protocol to the PEP? From guido at python.org Fri Jan 18 01:19:35 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Jan 2013 16:19:35 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: On Thu, Jan 17, 2013 at 3:44 PM, Paul Moore wrote: > On 17 January 2013 19:10, Guido van Rossum wrote: >> I think it would be tremendously helpful if you tried to implement the >> UNIX version of the subprocess transport. (Note that AFAIK Twisted has >> one of these too, maybe you can get some implementation ideas from >> them.) > > You were right. In starting to do so, I found out that my thinking has > been solely based on a callback style of programming (users implement > protocol classes and code the relevant "data received" methods > themselves). From looking at some of the sample code, I see that this > is not really the intended usage style. At this point my head > exploded. Coroutines, what fun! I am now reading the sample code, the > section of the PEP on coroutines, and the mailing list threads on the > matter. I may be some time :-) > > (The technicalities of the implementation aren't hard - it's just a > data_received type of protocol wrapper round a couple of pipes. It's > the usability and design issues that matter, and they are strongly > affected by "intended usage"). Right, this is a very good observation. > Paul > > PS From the PEP, it seems that a protocol must implement the 4 methods > connection_made, data_received, eof_received and connection_lost. For > a process, which has 2 output streams involved, a single data_received > method isn't enough. I see two options - having 2 separate protocol > classes involved, or having a process protocol with a different > interface. Neither option seems obviously best, although Twisted > appears to use different protocol types for different types of > transport. How critical is the principle that there is a single type > of protocol to the PEP? Not critical at all. The plan for UDP (datagrams in general) is to have different protocol methods as well. TBH I would be happy with a first cut that only deals with stdout, like os.popen(). :-) Note that I am intrigued by this problem as well and may be hacking up a version for myself in my spare time. -- --Guido van Rossum (python.org/~guido) From james.d.harding at siemens.com Fri Jan 18 02:52:28 2013 From: james.d.harding at siemens.com (Harding, James) Date: Fri, 18 Jan 2013 01:52:28 +0000 Subject: [Python-ideas] 'const' and 'require' statements Message-ID: Hello, I am new here but am itching with an idea. Here are two separate ideas but they are related so they shall both be presented at the same time. The first idea is for a 'const' statement for declaring constant names. Its syntax would be: 'const' identifier '=' expression The expression would be restricted to result in an immutable object such as 17, "green", or (1,2,3). The compiler would effectively replace any use of the identifier with this expression when seen. Some examples of constants might include: const ST_MODE = 0 const FontName = "Ariel" const Monday = 1 const Tuesday = Monday + 1 # may use previously defined const in expression. Compiler will fold constants (hopefully) Constant names would be limited in scope. A constant defined in a function would only have a life to the end of the function, for instance. Now why should there be such a syntax if the existing language already has a mechanism for effectively declaring constants, which it does? First, it opens possibilities for the compiler to do things like more constant folding and generally producing more efficient code. Second, since the compiler substitutes for the name at compile time, there is no chance for the name to be stepped on at run-time. Third, ideas such as PEP 3103 could be re-visited. One of the problems in PEP 3103 was that so often constants are represented by names and those names may be changed and/or the constant values in those names are not known until run-time. Constant names are fine but of limited use if they may only be used within the module they are declared in. This brings up the second idea of a 'require' statement. The import statement of Python is executed at run-time. This creates a disconnect between modules at compile time (which is a good thing) but gives the compiler no hint as to how to produce better code. What I propose is a 'require' statement with almost exactly the same syntax as the import and from statements but with the keyword 'require' substituted for 'module'. The word require was chosen because the require declaration from the BLISS language helped inspire this idea. C minded people might prefer using a word such as include be used instead. What the require statement would do is cause the module to be read in by the compiler and compiled when the statement is parsed. The contents of a required module could be restricted to only be const statements in order to avoid the many headaches this would produce. Examples: require font_data from stat_constants require ST_MODE from weekdays require * In the first example, the name 'font_data' would be a constant module to the compiler. An expression such as font_data.FontName would at compile-time reference the constant name FontName from the font_data module and substitute for it. In the second example, the constant name ST_MODE is added to the current scope. In the third example, all constant names defined in the module (except those with a '_' prefix) are added to the current scope. Since the names added are constant names and not variable names, it is OK to use require * at the function scope level. In order to help compatibility with existing uses and to avoid declaring constants twice, a require statement could use a 'as *' to both include constant names and assign them to a module's dictionary. For example, the file stat.py might do something like: require stat_constants as * This would add all the constant names defined in the stat_constants module and place them in the stat module's dictionary. For instance, if there is the line in stat_constant.py: const ST_MODE = 0 Then for stat.py the compiler will act as if it saw: ST_MODE = 0 Well, those are my two bits of ideas. Thank you, James Harding -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben+python at benfinney.id.au Fri Jan 18 04:37:21 2013 From: ben+python at benfinney.id.au (Ben Finney) Date: Fri, 18 Jan 2013 14:37:21 +1100 Subject: [Python-ideas] 'const' statement References: Message-ID: <7wfw1zupou.fsf@benfinney.id.au> "Harding, James" writes: > The first idea is for a 'const' statement for declaring constant > names. Do you have some concrete Python code which would clearly be improved by this proposal? > Its syntax would be: > > 'const' identifier '=' expression > > The expression would be restricted to result in an immutable object > such as 17, "green", or (1,2,3). The compiler would effectively replace > any use of the identifier with this expression when seen. Some examples > of constants might include: > > const ST_MODE = 0 > const FontName = "Ariel" > const Monday = 1 > const Tuesday = Monday + 1 # may use previously defined const > in expression. Compiler will fold constants (hopefully) So, the compiler will ?replace any use of the identifier with? the constant value. const ST_MODE = 0 const ST_FILENAME = "foo" const ST_RECURSIVE = True name_prefix = "ST_" foo = globals().get(name_prefix + "MODE") bar = globals().get(name_prefix + "FILENAME") baz = globals().get(name_prefix + "RECURSIVE") What do you expect the compiler to do in the above code? -- \ ?Airports are ugly. Some are very ugly. Some attain a degree of | `\ ugliness that can only be the result of a special effort.? | _o__) ?Douglas Adams, _The Long Dark Tea-Time of the Soul_, 1988 | Ben Finney From cs at zip.com.au Fri Jan 18 05:28:53 2013 From: cs at zip.com.au (Cameron Simpson) Date: Fri, 18 Jan 2013 15:28:53 +1100 Subject: [Python-ideas] 'const' statement In-Reply-To: <7wfw1zupou.fsf@benfinney.id.au> References: <7wfw1zupou.fsf@benfinney.id.au> Message-ID: <20130118042853.GA27650@cskk.homeip.net> On 18Jan2013 14:37, Ben Finney wrote: | "Harding, James" | writes: | > Its syntax would be: | > 'const' identifier '=' expression | > | > The expression would be restricted to result in an immutable object | > such as 17, "green", or (1,2,3). The compiler would effectively replace | > any use of the identifier with this expression when seen. Some examples | > of constants might include: | > | > const ST_MODE = 0 | > const FontName = "Ariel" | > const Monday = 1 | > const Tuesday = Monday + 1 # may use previously defined const | > in expression. Compiler will fold constants (hopefully) | | So, the compiler will ?replace any use of the identifier with? the | constant value. | | const ST_MODE = 0 | const ST_FILENAME = "foo" | const ST_RECURSIVE = True | | name_prefix = "ST_" | foo = globals().get(name_prefix + "MODE") | bar = globals().get(name_prefix + "FILENAME") | baz = globals().get(name_prefix + "RECURSIVE") | | What do you expect the compiler to do in the above code? Personally I'd expect the compiler to produce essentially the same code it does now with stock Python. After all, name_prefix isn't a const. But under his proposal I'd expect the compiler to be _able_ to produce inlined constant results for bare, direct uses of ST_MODE etc. If I'd written his proposal I'd have probably termed these things "bind-once", generating names that may not be rebound. They would still need to be carefully placed if the compiler were to have the option of constant folding i.e. they're need to be outside function and class definitions, determinable from static analysis. Just comments, not endorsement:-) -- Cameron Simpson From steve at pearwood.info Fri Jan 18 05:52:09 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 18 Jan 2013 15:52:09 +1100 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: Message-ID: <50F8D4F9.9020308@pearwood.info> On 18/01/13 12:52, Harding, James wrote: > Hello, > > I am new here but am itching with an idea. Here are two separate ideas >but they are related so they shall both be presented at the same time. > > The first idea is for a 'const' statement for declaring constant names. >Its syntax would be: > > 'const' identifier '=' expression > > The expression would be restricted to result in an immutable object What is the purpose of this restriction? I would like to see the ability to prevent rebinding or unbinding of names, with no restriction on the value. If that is useful (and I think it is), then it is useful for mutable objects as well as immutable. > such as 17, "green", or (1,2,3). The compiler would effectively replace >any use of the identifier with this expression when seen. Is that the driving use-case for your suggestion? Compile-time efficiency? If so, then I suspect that you're on the wrong track. As I understand it, the sort of optimizations that PyPy can perform at runtime are far more valuable than this sort of constant substitution. There are also complications that need to be carefully thought about. For example, in Python today, you can be sure that this assertion will always pass: k = ("Some value", "Another value") # for example x = k y = k assert x is y # this always passes, no matter the value of k But if k is a const, it will fail, because the lines "x = k" and "y = k" will be expanded at compile time: x = ("Some value", "Another value") y = ("Some value", "Another value") assert x is y # not guaranteed to pass So Python would have to intern every const, not just do a compile-time substitution. And that will have runtime consequences. Another question: what happens if the constant expression can't be evaluated until runtime? x = random.random() const k = x + 1 y = k - 1 What value should the compiler substitute for y? > Constant names would be limited in scope. A constant defined in a function >would only have a life to the end of the function, for instance. I don't think that makes sense. Since you're talking about something known to the compiler, it is meaningless to talk about the life of the constant *at runtime*. Consider: def f(n): const k = ("something", "or", "other") if n == 0: return k else: return k[n:] This will compile to the byte-code equivalent of: def f(n): if n == 0: return ("something", "or", "other") else: return ("something", "or", "other")[n:] I recommend you run that function through dis.dis to see what it will be compiled to. In the compiled code, there are two calls to the LOAD_CONST byte-code. The literal ("something", "or", "other") needs to be compiled into the byte-code, and so it will exist for as long as the function exists, not just until the function exits. > Now why should there be such a syntax if the existing language already >has a mechanism for effectively declaring constants, which it does? I dispute that Python has a mechanism for effectively declaring constants. It has a *convention* for declaring constants, and hoping that neither you, the developer, nor the caller, accidentally (or deliberately) rebind that pseudo-constant. -- Steven From greg.ewing at canterbury.ac.nz Fri Jan 18 05:59:01 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Jan 2013 17:59:01 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> Message-ID: <50F8D695.3050002@canterbury.ac.nz> Guido van Rossum wrote: > I just renamed it to create_connection(), like I've been promising for > a long time. That still doesn't spell out that it's about the internet in particular. Or is the assumption that internet connections are the only kind that matter these days? -- Greg From guido at python.org Fri Jan 18 06:08:06 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 17 Jan 2013 21:08:06 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F8D695.3050002@canterbury.ac.nz> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Thu, Jan 17, 2013 at 8:59 PM, Greg Ewing wrote: > Guido van Rossum wrote: >> >> I just renamed it to create_connection(), like I've been promising for >> a long time. > That still doesn't spell out that it's about the internet > in particular. Or is the assumption that internet connections > are the only kind that matter these days? Basically yes, in this context. The same assumption underlies socket.getaddrinfo() in the stdlib. If you have a CORBA system lying around and you want to support it, you're welcome to create the transport connection function create_corba_connection(). :-) -- --Guido van Rossum (python.org/~guido) From bruce at leapyear.org Fri Jan 18 07:04:53 2013 From: bruce at leapyear.org (Bruce Leban) Date: Thu, 17 Jan 2013 22:04:53 -0800 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: <50F8D4F9.9020308@pearwood.info> References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Thu, Jan 17, 2013 at 8:52 PM, Steven D'Aprano wrote: > On 18/01/13 12:52, Harding, James wrote: > >> The first idea is for a 'const' statement for declaring constant names. >> Its syntax would be: >> >> 'const' identifier '=' expression >> >> The expression would be restricted to result in an immutable object >> > > What is the purpose of this restriction? > > I would like to see the ability to prevent rebinding or unbinding of > names, with no restriction on the value. If that is useful (and I think > it is), then it is useful for mutable objects as well as immutable. > > Java has a keyword 'final' which means a variable must be bound exactly once. It is an error if it is bound more than once or not bound at all, or read before it is initialized. For example, if a class has a final non-static field foo, then the constructor *must* set foo. A final value may be immutable. http://en.wikipedia.org/wiki/Final_(Java) This catches double initialization errors among other things. I don't know if final belongs in Python, but I'd find that more useful than const. --- Bruce http://bit.ly/yearofpuzzles -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Fri Jan 18 07:31:52 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 18 Jan 2013 17:31:52 +1100 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: <50F8D4F9.9020308@pearwood.info> References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano wrote: > Another question: what happens if the constant expression can't be > evaluated until runtime? > > x = random.random() > const k = x + 1 > > y = k - 1 > > What value should the compiler substitute for y? That should be disallowed. In the declaration of a constant, you have to use only what can be handled by the constants evaluator. As a rule of thumb, it'd make sense to be able to use const with anything that could safely be evaluated by ast.literal_eval. As to the issues of rebinding, I'd just state that all uses of a particular named constant evaluate to the same object, just as would happen if you used any other form of name binding. I don't have the post to hand, but wasn't there a project being discussed recently that would do a lot of that work automatically? ChrisA From haoyi.sg at gmail.com Fri Jan 18 08:06:41 2013 From: haoyi.sg at gmail.com (Haoyi Li) Date: Thu, 17 Jan 2013 23:06:41 -0800 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: Compiler-enforced immutability is one of those really hard problems which, if you manage to do flexibly and correctly, would be an academically publishable result, not something you hack into the interpreter over a weekend. If you go the dumb-and-easy route, you end up with a simple "sub this variable with constant" thing, which isn't very useful (what about calculated constants?) If you go the slightly-less-dumb route, you end up with some mini-language to work with these `const` values, which has some operations but not the full power of python. This basically describes C Macros, which I don't think you'd want to include in python! If you go the "full python" route, you basically branch into two possibilities. - enforcement of `const` as part of the main program. If you do it hackily, you end up with C++'s `const` or Java's `final` declaration. Neither of these really make the object (and all of its contents!) immutable. If you want to do it properly, this would involve some sort of effect-tracking-system. This is really hard. - multi-stage computations, so the program is partially-evaluated at "compile" time and the `const` sections computed. This is also really hard. Furthermore, if you want to be able to use bits of the standard library in the early stages (you probably do, e.g. for things like min, max, len, etc.) either you'd need to manually start annotating huge chunks of the standard library to be available at "compile" time (a huge undertaking) or you'll need an effect-tracking-system to do it for you. In any case, either you get a crappy implementation that nobody wants (C Macros) something that doesn't really give the guarantees you'd hope for (java final/c++ const) or you would have a publishable result w.r.t. either effect-tracking (!) or multi-stage computations (!!!). Even though it is very easy to describe the idea (it just stops it from changing, duh!) and how it would work in a few trivial cases, doing it properly will likely require some substantial theoretical breakthroughs before it can actually happen. On Thu, Jan 17, 2013 at 10:31 PM, Chris Angelico wrote: > On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano > wrote: > > Another question: what happens if the constant expression can't be > > evaluated until runtime? > > > > x = random.random() > > const k = x + 1 > > > > y = k - 1 > > > > What value should the compiler substitute for y? > > That should be disallowed. In the declaration of a constant, you have > to use only what can be handled by the constants evaluator. As a rule > of thumb, it'd make sense to be able to use const with anything that > could safely be evaluated by ast.literal_eval. > > As to the issues of rebinding, I'd just state that all uses of a > particular named constant evaluate to the same object, just as would > happen if you used any other form of name binding. > > I don't have the post to hand, but wasn't there a project being > discussed recently that would do a lot of that work automatically? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Fri Jan 18 08:17:57 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 18 Jan 2013 20:17:57 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: <50F8F725.20505@canterbury.ac.nz> Paul Moore wrote: > PS From the PEP, it seems that a protocol must implement the 4 methods > connection_made, data_received, eof_received and connection_lost. For > a process, which has 2 output streams involved, a single data_received > method isn't enough. It looks like there would have to be at least two Transport instances involved, one for stdin/stdout and one for stderr. Connecting them both to a single Protocol object doesn't seem to be possible with the framework as defined. You would have to use a couple of adapter objects to translate the data_received calls into calls on different methods of another object. This sort of thing would be easier if, instead of the Transport calling a predefined method of the Protocol, the Protocol installed a callback into the Transport. Then a Protocol designed for dealing with subprocesses could hook different methods of itself into a pair of Transports. Stepping back a bit, I must say that from the coroutine viewpoint, the Protocol/Transport stuff just seems to get in the way. If I were writing coroutine-based code to deal with a subprocess, I would want to be able to write coroutines like def handle_output(stdout): while 1: line = yield from stdout.readline() if not line: break mungulate_line(line) def handle_errors(stderr): while 1: line = yield from stderr.readline() if not line: break complain_to_user(line) In other words, I don't want Transports or Protocols or any of that cruft, I just want a simple pair of async stream objects that I can read and write using yield-from calls. There doesn't seem to be anything like that specified in PEP 3156. It does mention something about implementing a streaming buffer on top of a Transport, but in a way that makes it sound like a suggested recipe rather than something to be provided by the library. Also it seems like a lot of layers of overhead to go through. On the whole, in PEP 3156 the idea of providing callback-based interfaces with yield-from-based ones built on top has been pushed way further up the stack than I imagined it would. I don't want to be *forced* to write my coroutine code at the level of Protocols; I want to be able to work at a lower level than that. -- Greg From aquavitae69 at gmail.com Fri Jan 18 08:22:01 2013 From: aquavitae69 at gmail.com (David Townshend) Date: Fri, 18 Jan 2013 09:22:01 +0200 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 9:06 AM, Haoyi Li wrote: > Compiler-enforced immutability is one of those really hard problems which, > if you manage to do flexibly and correctly, would be an academically > publishable result, not something you hack into the interpreter over a > weekend. > > If you go the dumb-and-easy route, you end up with a simple "sub this > variable with constant" thing, which isn't very useful (what about > calculated constants?) > > If you go the slightly-less-dumb route, you end up with some mini-language > to work with these `const` values, which has some operations but not the > full power of python. This basically describes C Macros, which I don't > think you'd want to include in python! > > If you go the "full python" route, you basically branch into two > possibilities. > > - enforcement of `const` as part of the main program. If you do it > hackily, you end up with C++'s `const` or Java's `final` declaration. > Neither of these really make the object (and all of its contents!) > immutable. If you want to do it properly, this would involve some sort of > effect-tracking-system. This is really hard. > > - multi-stage computations, so the program is partially-evaluated at > "compile" time and the `const` sections computed. This is also really hard. > Furthermore, if you want to be able to use bits of the standard library in > the early stages (you probably do, e.g. for things like min, max, len, > etc.) either you'd need to manually start annotating huge chunks of the > standard library to be available at "compile" time (a huge undertaking) or > you'll need an effect-tracking-system to do it for you. > > > In any case, either you get a crappy implementation that nobody wants (C > Macros) something that doesn't really give the guarantees you'd hope for > (java final/c++ const) or you would have a publishable result w.r.t. either > effect-tracking (!) or multi-stage computations (!!!). > > Even though it is very easy to describe the idea (it just stops it from > changing, duh!) and how it would work in a few trivial cases, doing it > properly will likely require some substantial theoretical breakthroughs > before it can actually happen. > > > > On Thu, Jan 17, 2013 at 10:31 PM, Chris Angelico wrote: > >> On Fri, Jan 18, 2013 at 3:52 PM, Steven D'Aprano >> wrote: >> > Another question: what happens if the constant expression can't be >> > evaluated until runtime? >> > >> > x = random.random() >> > const k = x + 1 >> > >> > y = k - 1 >> > >> > What value should the compiler substitute for y? >> >> That should be disallowed. In the declaration of a constant, you have >> to use only what can be handled by the constants evaluator. As a rule >> of thumb, it'd make sense to be able to use const with anything that >> could safely be evaluated by ast.literal_eval. >> >> As to the issues of rebinding, I'd just state that all uses of a >> particular named constant evaluate to the same object, just as would >> happen if you used any other form of name binding. >> >> I don't have the post to hand, but wasn't there a project being >> discussed recently that would do a lot of that work automatically? >> >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > As has already been pointed out, syntax to allow compile-time optimisations doesn't really make much sense in python, especially considering the optimisations Pypy already carries out. Some sort of "finalise" option may be somewhat useful (although I can't say I've ever needed it). To avoid adding a new keyword it could be implementer as a function, e.g. finalise("varname") or finalise(varname="value"). In a class, this would actually be quite easy to implement by simply replacing the class dict with a custom dict designed to restrict writing to finalised names. I haven't ever tried changing the globals dict type, but I imagine it would be possible, or at least possible to to provide a method to change it. I haven't thought through all the implications of doing it this way, but I'd rather see something like this than a new "const" keyword. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Jan 18 09:02:14 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 18:02:14 +1000 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 5:06 PM, Haoyi Li wrote: > Compiler-enforced immutability is one of those really hard problems which, > if you manage to do flexibly and correctly, would be an academically > publishable result, not something you hack into the interpreter over a > weekend. > > If you go the dumb-and-easy route, you end up with a simple "sub this > variable with constant" thing, which isn't very useful (what about > calculated constants?) > > If you go the slightly-less-dumb route, you end up with some mini-language > to work with these `const` values, which has some operations but not the > full power of python. This basically describes C Macros, which I don't think > you'd want to include in python! > > If you go the "full python" route, you basically branch into two > possibilities. > > - enforcement of `const` as part of the main program. If you do it hackily, > you end up with C++'s `const` or Java's `final` declaration. Neither of > these really make the object (and all of its contents!) immutable. If you > want to do it properly, this would involve some sort of > effect-tracking-system. This is really hard. > > - multi-stage computations, so the program is partially-evaluated at > "compile" time and the `const` sections computed. This is also really hard. > Furthermore, if you want to be able to use bits of the standard library in > the early stages (you probably do, e.g. for things like min, max, len, etc.) > either you'd need to manually start annotating huge chunks of the standard > library to be available at "compile" time (a huge undertaking) or you'll > need an effect-tracking-system to do it for you. > > > In any case, either you get a crappy implementation that nobody wants (C > Macros) something that doesn't really give the guarantees you'd hope for > (java final/c++ const) or you would have a publishable result w.r.t. either > effect-tracking (!) or multi-stage computations (!!!). > > Even though it is very easy to describe the idea (it just stops it from > changing, duh!) and how it would work in a few trivial cases, doing it > properly will likely require some substantial theoretical breakthroughs > before it can actually happen. As James noted, lack of a good answer to this problem is part of the reason Python doesn't have a switch/case statement [1,2] (only part, though). We already have three interesting points in time where evaluation can happen in Python code: - compile time (evaluation of literals, including tuples of literals) - function definition time (evaluation of decorator expressions, annotations and default arguments, along with decorator invocation) - execution time (normal execution time - in the case of functions, function definition time occurs during the execution time of the containing scope) We know from experience with default arguments that people find evaluation at function definition time *incredibly* confusing, because it means a data value is shared across functions. You can try to limit this by saying "immutable values only", but then you run into the problem where dynamic name lookups mean only literals can be considered truly constant, and those are *already* evaluated (and sometimes folded together) at compile time: >>> def f(): ... return 2 * 3 ... >>> dis.dis(f) 2 0 LOAD_CONST 3 (6) 3 RETURN_VALUE (The constant folding in CPython isn't especially clever, but that's an implementation issue - the language spec already *allows* such folding, we just don't always detect when it's possible). So, once you allow name lookups, the question then becomes what namespace they run in. If you say "the containing namespace" then you get a few interesting consequences: 1. We're in the same, already known to be confusing, territory as function default arguments 2. The behaviour of the new construct at module and class level will necessarily be different to that at function level 3. Quality of error messages and tracebacks will be a potential issue for debugging 4. When two of these constructs exist in the same scope, is the later one allowed to refer to the earlier one? Now we get to the meat of James's suggestion, and while I think it's a pretty decent take on the "multi-stage evaluation" proposal, it still runs afoul of many of the same problems past proposals [3] have struggled with: 1. Name binding operations other than assignment (e.g. import, function and class definitions) 2. Handling of name binding in nested functions 3. Handling of references to previous early evaluation operations 4. Breaking expectations regarding dynamic modification of module globals 5. Finding a good keyword is hard - suitable terms are either widely used as variable names, or have too much misleading baggage from other languages I can alleviate the concerns about making other components available at compile time though - if this construct was defined appropriately, Python would be able to happily import, compile and execute other modules during a suitable "pre-execution" phase. The real kicker though, is that, after all that work, you'll have to ask two questions: 1. Does this change help Python users write more readable code? 2. Does this change help JIT-compiled Python code (e.g. in PyPy) run faster? (PyPy's JIT can often identify near-constants and move their calculation out of any frequently executed code paths) If the answer to that turns out to be "No to both, but it will help CPython, which has no JIT, run some manually annotated code faster", then it's a bad idea (it's not an *obviously* bad idea - just one that is a lot trickier than it may first appear). Cheers, Nick. [1] http://www.python.org/dev/peps/pep-0275/ [2] http://www.python.org/dev/peps/pep-3103/ [3] https://encrypted.google.com/search?q=site%3Amail.python.org%20inurl%3Apython-ideas%20atdef -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jan 18 09:08:01 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 18:08:01 +1000 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 5:22 PM, David Townshend wrote: > As has already been pointed out, syntax to allow compile-time optimisations > doesn't really make much sense in python, especially considering the > optimisations Pypy already carries out. Some sort of "finalise" option may > be somewhat useful (although I can't say I've ever needed it). To avoid > adding a new keyword it could be implementer as a function, e.g. > finalise("varname") or finalise(varname="value"). In a class, this would > actually be quite easy to implement by simply replacing the class dict with > a custom dict designed to restrict writing to finalised names. I haven't > ever tried changing the globals dict type, but I imagine it would be > possible, or at least possible to to provide a method to change it. I > haven't thought through all the implications of doing it this way, but I'd > rather see something like this than a new "const" keyword. While you won't see module level support (beyond the ability to place arbitrary classes in sys.modules), this is already completely possible through the descriptor protocol (e.g. by creating read-only properties). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Jan 18 09:08:33 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 08:08:33 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 05:08, Guido van Rossum wrote: >> That still doesn't spell out that it's about the internet >> in particular. Or is the assumption that internet connections >> are the only kind that matter these days? > > Basically yes, in this context. The same assumption underlies > socket.getaddrinfo() in the stdlib. If you have a CORBA system lying > around and you want to support it, you're welcome to create the > transport connection function create_corba_connection(). :-) To create that create_corba_connection() function, you'd be expected to subclass the standard event loop, is that right? Paul From ncoghlan at gmail.com Fri Jan 18 09:38:53 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 18:38:53 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 6:08 PM, Paul Moore wrote: > On 18 January 2013 05:08, Guido van Rossum wrote: >>> That still doesn't spell out that it's about the internet >>> in particular. Or is the assumption that internet connections >>> are the only kind that matter these days? >> >> Basically yes, in this context. The same assumption underlies >> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying >> around and you want to support it, you're welcome to create the >> transport connection function create_corba_connection(). :-) > > To create that create_corba_connection() function, you'd be expected > to subclass the standard event loop, is that right? I'm not sure why CORBA would be a transport in its own right rather than a protocol running over a standard socket transport. Transports are about the communications channel - network sockets - OS pipes - shared memory - CANbus - protocol tunneling Transports should only be platform specific at the base layer where they actually need to interact with the OS through the event loop. Higher level transports should be connected to lower level protocols based on APIs provided by those transports and protocols themselves. The *whole point* of the protocol vs transport model is to allow you to write adaptive stacks. To use the example from PEP 3153, to implement full JSON-RPC support over both sockets and a HTTP-tunnel you need the following implemented: - TCP socket transport - HTTP protocol - HTTP-based transport - JSON-RPC protocol Because the transport API is standardised, the JSON-RPC protocol can be written once and run over HTTP using the full stack as shown, *or* directly over TCP by stripping out the two middle layers. The *only* layer that the event loop needs to concern itself with is the base transport layer - it doesn't care how many layers of protocols or protocol-as-transport adapters you stack on top. The other thing that may not have been emphasised sufficiently is that the *protocol* APIs is completely dependent on the protocol involved. The API of a pipe protocol is not that of HTTP or CORBA or JSON-RPC or XML-RPC. That's why tunneling, as in the example above, requires a protocol-specific adapter to translate from the protocol API back to the standard transport API. So, for example, Greg's request for the ability to pass callbacks rather than needing particular method names can be satisfied by writing a simple callback protocol: class CallbackProtocol: """Invoke arbitrary callbacks in response to transport events""" def __init__(self, on_data, on_conn, on_loss, on_eof): self.on_data = on_data self.on_conn = on_conn self.on_loss = on_loss self.on_eof = on_eof def connection_made(transport): self.on_conn(transport) def data_received(data): self.on_data(data) def eof_received(): self.on_eof() def connection_lost(exc): self.on_loss(exc) Similarly, his request for a IOStreamProtocol would likely look a lot like an asynchronous version of the existing IO stack API (to handle encoding, buffering, etc), with the lowest layer being built on the transport API rather than the file API (as it is in the io module). You would then be able to treat *any* transport, whether it's an SSH tunnel, an ordinary socket connection or a pipe to a subprocess as a non-seekable stream. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From p.f.moore at gmail.com Fri Jan 18 10:01:23 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 09:01:23 +0000 Subject: [Python-ideas] 'const' statement In-Reply-To: <20130118042853.GA27650@cskk.homeip.net> References: <7wfw1zupou.fsf@benfinney.id.au> <20130118042853.GA27650@cskk.homeip.net> Message-ID: On 18 January 2013 04:28, Cameron Simpson wrote: > If I'd written his proposal I'd have probably termed these things > "bind-once", generating names that may not be rebound. They would > still need to be carefully placed if the compiler were to have the > option of constant folding i.e. they're need to be outside function and > class definitions, determinable from static analysis. A few thoughts along the same lines: 1. Global lookups are not likely to be the performance bottleneck in any real code, so constant folding is not going to be a particular benefit. 2. The idea of names that can't be rebound isn't particularly Pythonic (given that things like private class variables aren't part of the language) 3. Constants that can't be imported from another module aren't much use, and yet if they can be imported you have real problems enforcing the non-rebindability. Consider: import my_consts print(my_consts.A_VALUE) # Presumably a constant value, but obviously the compiler can't inline it... my_consts.A_VALUE = 12 # The language has no chance to prevent this without completely changing module semantics Named values are obviously a good thing, but I see little benefit, and a lot of practical difficulty, with the idea of "enforced const-ness" in Python. Paul. From aquavitae69 at gmail.com Fri Jan 18 10:38:25 2013 From: aquavitae69 at gmail.com (David Townshend) Date: Fri, 18 Jan 2013 11:38:25 +0200 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 10:08 AM, Nick Coghlan wrote: > On Fri, Jan 18, 2013 at 5:22 PM, David Townshend > wrote: > > As has already been pointed out, syntax to allow compile-time > optimisations > > doesn't really make much sense in python, especially considering the > > optimisations Pypy already carries out. Some sort of "finalise" option > may > > be somewhat useful (although I can't say I've ever needed it). To avoid > > adding a new keyword it could be implementer as a function, e.g. > > finalise("varname") or finalise(varname="value"). In a class, this would > > actually be quite easy to implement by simply replacing the class dict > with > > a custom dict designed to restrict writing to finalised names. I haven't > > ever tried changing the globals dict type, but I imagine it would be > > possible, or at least possible to to provide a method to change it. I > > haven't thought through all the implications of doing it this way, but > I'd > > rather see something like this than a new "const" keyword. > > While you won't see module level support (beyond the ability to place > arbitrary classes in sys.modules), this is already completely possible > through the descriptor protocol (e.g. by creating read-only > properties). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > True. I was going for something which might work in modules too, but module-level descriptors would probably be a more consistent approach anyway. This is actually something I have needed in the past, and got around it by putting a class in sys.modules. Maybe finding a neat way to write module-level descriptors would be more useful, and cover the same use case as consts? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Fri Jan 18 10:33:09 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 09:33:09 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 08:38, Nick Coghlan wrote: > Transports are about the communications channel > - network sockets > - OS pipes > - shared memory > - CANbus > - protocol tunneling > > Transports should only be platform specific at the base layer where > they actually need to interact with the OS through the event loop. > Higher level transports should be connected to lower level protocols > based on APIs provided by those transports and protocols themselves. > > The *whole point* of the protocol vs transport model is to allow you > to write adaptive stacks. Interesting. On that basis, the whole subprocess interaction scenario is not a low level transport at all (contrary to what I understood from Guido's suggestion of an event loop method) and so should be built in user code (OK, probably as a standard library helper, but definitely not as specialist methods on the event loop) layered on the low-level pipe transport. That was my original instinct, but it fell afoul of 1. The Windows implementation of a low level pipe transport doesn't exist (yet) and I don't know enough about IOCP to write it [1]. 2. I don't understand the programming model well enough to understand how to write a transport/protocol layer (coroutine head explosion issue). I have now (finally!) got Guido's point that implementing a process protocol will give me a good insight into how this stuff is meant to work. I'm still struggling to understand why he thinks it needs a dedicated method on the event loop, rather than being a higher-level layer like you're suggesting, but I'm at least starting to understand what questions to ask. Paul [1] There is some stuff in the IOCP documentation about handles having to be opened in OVERLAPPED mode, which worries me here as it may imply that arbitrary pipes (such as the ones subprocess.Popen uses) can't be plugged in. It's a bit like setting a filehandle to nonblocking in Unix, but it has to be done at open time, IIUC. I think I saw an email about this that I need to hunt out. From ncoghlan at gmail.com Fri Jan 18 11:37:23 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 20:37:23 +1000 Subject: [Python-ideas] 'const' statement In-Reply-To: References: <7wfw1zupou.fsf@benfinney.id.au> <20130118042853.GA27650@cskk.homeip.net> Message-ID: On Fri, Jan 18, 2013 at 7:01 PM, Paul Moore wrote: > Named values are obviously a good thing, but I see little benefit, and > a lot of practical difficulty, with the idea of "enforced const-ness" > in Python. FWIW, people can play whatever games they like by injecting arbitrary objects into sys.modules. >>> class Locked: ... def __setattr__(self, attr, value): ... raise AttributeError("Rebinding not permitted") ... def __delattr__(self, attr): ... raise AttributeError("Deletion not permitted") ... attr1 = "Hello" ... attr2 = "World" ... >>> sys.modules["example"] = Locked >>> import example >>> example.attr1 'Hello' >>> example.attr2 'World' >>> example.attr2 = "Change" >>> example.attr2 = "World" >>> sys.modules["example"] = Locked() >>> import example >>> example.attr1 'Hello' >>> example.attr2 'World' >>> example.attr2 = "Change" Traceback (most recent call last): File "", line 1, in File "", line 3, in __setattr__ AttributeError: Rebinding not permitted The import system is even defined to expressly permit doing this in a *module's own code* by replacing "sys.module[__name__]" with a different object. So, any such proposal needs to be made with the awareness that anyone that *really* wants to do this kind of thing already can, but they don't. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From jsbueno at python.org.br Fri Jan 18 12:28:56 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Fri, 18 Jan 2013 09:28:56 -0200 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On 18 January 2013 05:22, David Townshend wrote: > > As has already been pointed out, syntax to allow compile-time optimisations > doesn't really make much sense in python, especially considering the > optimisations Pypy already carries out. Some sort of "finalise" option may > be somewhat useful (although I can't say I've ever needed it). To avoid > adding a new keyword it could be implementer as a function, e.g. > finalise("varname") or finalise(varname="value"). In a class, this would > actually be quite easy to implement by simply replacing the class dict with > a custom dict designed to restrict writing to finalised names. I haven't > ever tried changing the globals dict type, but I imagine it would be > possible, or at least possible to to provide a method to change it. I > haven't thought through all the implications of doing it this way, but I'd > rather see something like this than a new "const" keyword. > Yes - changing a module's (or object that stands for a module :-) ) dict type does work [1] - which would allow for a "module decorator" to change it. So, the functionality from Java's "final" and others can be had in Python today, with a small set of "module decorator" utilities. Now, do I think such a thing should go in the standard library? -0 for that. [1] - http://stackoverflow.com/questions/13274916/python-imported-module-is-none/13278043#13278043 > David From ncoghlan at gmail.com Fri Jan 18 12:59:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 21:59:24 +1000 Subject: [Python-ideas] 'const' and 'require' statements In-Reply-To: References: <50F8D4F9.9020308@pearwood.info> Message-ID: On Fri, Jan 18, 2013 at 7:38 PM, David Townshend wrote: > True. I was going for something which might work in modules too, but > module-level descriptors would probably be a more consistent approach > anyway. This is actually something I have needed in the past, and got > around it by putting a class in sys.modules. Maybe finding a neat way to > write module-level descriptors would be more useful, and cover the same use > case as consts? I think putting class objects in sys.modules *is* the way to get "module level" descriptors. The fact it feels like a hack is a positive in my book - techniques that are "always dubious, but sometimes necessary" *should* feel like hacks, so people stay away from them until they run out of other options :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Fri Jan 18 12:55:19 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 21:55:19 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 7:33 PM, Paul Moore wrote: > On 18 January 2013 08:38, Nick Coghlan wrote: > I have now (finally!) got Guido's point that implementing a process > protocol will give me a good insight into how this stuff is meant to > work. I'm still struggling to understand why he thinks it needs a > dedicated method on the event loop, rather than being a higher-level > layer like you're suggesting, but I'm at least starting to understand > what questions to ask. The creation of the pipe transport needs to be on the event loop, precisely because of cross-platform differences when it comes to Windows. On *nix, on the other hand, the pipe transport should look an awful lot like the socket transport and thus be able to use the existing file descriptor based interfaces on the event loop. The protocol part is then about adapting the transport API to coroutine friendly readlines/writelines API (the part that Guido points out needs more detail in http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols) As a rough untested sketch (the buffering here could likely be a lot smarter): # Remember we're not using preemptive threading, so we don't need locking for thread safety # Note that the protocol isn't designed to support reconnection - a new connection means # a new protocol instance. The create_* APIs on the event loop accept a protocol factory # specifically in order to encourage this approach class SimpleStreamingProtocol: def __init__(self): self._transport = None self._data = bytearray() self._pending = None def connection_made(self, transport): self._transport = transport def connection_lost(self, exc): self._transport = None # Could also store the exc directly on the protocol and raise # it in subsequent write calls if self._pending is not None: self._pending.set_exception(exc) def received_eof(self): self.transport = None if self._pending is not None: self._pending.set_result(False) def received_data(self, data): self.data.extend(data) if self._pending is not None: self._pending.set_result(True) # The writing side is fairly easy, as we just pass it through to the transport # These are all defined by PEP 3156 as non-blocking calls def write(self, data): if self._transport is None: raise RuntimeError("Connection not open") self._transport.write(data) def writelines(self, iterable): if self._transport is None: raise RuntimeError("Connection not open") self._transport.writelines(iterable) def close(self): if self._transport is not None: self._transport.close() self._transport = None def _read_from_buffer(self): data = bytes(self._data) self._data.clear() return data # The reading side has to adapt between coroutines and callbacks @coroutine def read(self): if self._transport is None: raise RuntimeError("Connection not open") if self._pending is not None: raise RuntimeError("Concurrent reads not permitted") # First check if we already have data waiting data = self._read_from_buffer() if data: return data # Otherwise wait for data # This method can easily be updated to use a loop and multiple # futures in order to support a "minimum read" parameter f = self._pending = tulip.Future() finished = yield from f data = b'' if finished else self._read_from_buffer() return data # This uses async iteration as described at [1] # We yield coroutines, which must then be invoked with yield from def readlines(self): cached_lines = self._data.split(b'\n') self._data.clear() if cached_lines[-1]: # Last line is incomplete self._data.extend(cached_lines[-1]) del cached_lines[-1] while not finished: # When we already have the data, a simple future will do for line in cached_lines: f = tulip.Future() f.set_result(line) yield f # Otherwise, we hand control to the event loop @coroutine def wait_for_line(): nonlocal finished data = yield from self.read() if not data: finished = True return b'' lines = data.split(b'\n') if lines[-1]: # Last line is incomplete self._data.extend(lines[-1]) cached_lines.extend(lines[1:-1]) return lines[0] yield wait_for_line() # Used as: pipe, stream = event_loop.create_pipe(SimpleStreamingProtocol) # Or even as: conn, stream = event_loop.create_connection(SimpleStreamingProtocol, ... # connection details) # Reading from the stream in a coroutine for f in stream.readlines(): line = yield from f [1] http://python-notes.boredomandlaziness.org/en/latest/pep_ideas/async_programming.html#asynchronous-iterators Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From tarek at ziade.org Fri Jan 18 13:30:15 2013 From: tarek at ziade.org (=?ISO-8859-1?Q?Tarek_Ziad=E9?=) Date: Fri, 18 Jan 2013 13:30:15 +0100 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <20130116194756.2efe9afe@pitrou.net> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> Message-ID: <50F94057.9080005@ziade.org> On 1/16/13 7:47 PM, Antoine Pitrou wrote: > You know, discussing performance without posting benchmark numbers is > generally pointless. Sure, yes, so I tried to implement it by adapting the current any() : http://tarek.pastebin.mozilla.org/2068630 but it is 20% slower in my benchmark. However, I have no idea if my implementation is the right way to do things. Cheers Tarek -- Tarek Ziad? ? http://ziade.org ? @tarek_ziade From ncoghlan at gmail.com Fri Jan 18 13:52:58 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 18 Jan 2013 22:52:58 +1000 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <50F94057.9080005@ziade.org> References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org> Message-ID: On Fri, Jan 18, 2013 at 10:30 PM, Tarek Ziad? wrote: > On 1/16/13 7:47 PM, Antoine Pitrou wrote: >> >> You know, discussing performance without posting benchmark numbers is >> generally pointless. > > > Sure, yes, so I tried to implement it by adapting the current any() : > > http://tarek.pastebin.mozilla.org/2068630 > > but it is 20% slower in my benchmark. However, I have no idea if my > implementation is the right way to do things. Resuming an existing frame (i.e. using a generator expression) is almost always going to be faster than going through the argument passing machinery and initialising a *new* frame. Chaining C level iterators together (e.g. map, itertools) is even faster. DSU is great for cases where you need it, but a transformation pipeline is otherwise likely to be faster (or at least not substantially slower). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From eliben at gmail.com Fri Jan 18 15:56:55 2013 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 18 Jan 2013 06:56:55 -0800 Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future Message-ID: Hi, I'm looking through PEP 3156 and the Tulip code, and either something is missing or I'm not looking in the right places. I can't find any sort of callback / future return for asynchronous writes, e.g. in transport. Should there be no "data_sent" parallel to "data_received" somewhere? Or, alternatively, "write" returning some sort of future that can be checked later for status? For connections that aren't infinitely fast it's useful to know when the data was actually sent/written, or alternatively if an error has occurred. This is also important for when writing would actually block because of full buffers. boost::asio has such a handler for async_write. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Fri Jan 18 16:54:42 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Fri, 18 Jan 2013 08:54:42 -0700 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org> Message-ID: On Fri, Jan 18, 2013 at 5:52 AM, Nick Coghlan wrote: > DSU is great for cases where you need it, but a transformation > pipeline is otherwise likely to be faster (or at least not > substantially slower). It took me a sec. :) DSU == "Decorate-Sort-Undecorate". [1] -eric [1] http://en.wikipedia.org/wiki/Decorate-sort-undecorate From tjreedy at udel.edu Fri Jan 18 19:36:07 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Fri, 18 Jan 2013 13:36:07 -0500 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org> Message-ID: On 1/18/2013 10:54 AM, Eric Snow wrote: > On Fri, Jan 18, 2013 at 5:52 AM, Nick Coghlan wrote: >> DSU is great for cases where you need it, but a transformation >> pipeline is otherwise likely to be faster (or at least not >> substantially slower). > > It took me a sec. :) DSU == "Decorate-Sort-Undecorate". [1] No, no, no. Its Delaware State University in Dover, as opposed to Univesity of Delaware (UD) in Newark ;-). In other words, it depends on the universe you live in. -- Terry Jan Reedy From p.f.moore at gmail.com Fri Jan 18 22:01:32 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 21:01:32 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 09:33, Paul Moore wrote: > [1] There is some stuff in the IOCP documentation about handles having > to be opened in OVERLAPPED mode, which worries me here as it may imply > that arbitrary pipes (such as the ones subprocess.Popen uses) can't be > plugged in. It's a bit like setting a filehandle to nonblocking in > Unix, but it has to be done at open time, IIUC. I think I saw an email > about this that I need to hunt out. Hmm, I'm looking at a pipe transport on Unix, and I find I don't know enough about programming Unix. How do I set a file descriptor (specifically a pipe) in Unix to be nonblocking? For a socket, sock.setblocking(False) does the job. But for a pipe/file, the only thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not possible to set an already open file descriptor to be nonblocking? If that's the case, it means that Unix has the same problem as I suspect exists for Windows - existing pipes and filehandles can't be used in async code as they won't necessarily be in nonblocking mode. Is there a way round this on Unix that I'm not aware of? Otherwise, it seems that there's going to have to be a whole load of duplication in the "async world" (an async version of subprocess.Popen, for a start, as well as any other "open" type of calls that might need to produce handles that can be used asynchronously). Either that or everything that returns a pipe/handle that you might want to use in async code will have to grow some sort of "async" flag. Paul From guido at python.org Fri Jan 18 22:02:16 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 13:02:16 -0800 Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future In-Reply-To: References: Message-ID: On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky wrote: > I'm looking through PEP 3156 and the Tulip code, and either something is > missing or I'm not looking in the right places. > > I can't find any sort of callback / future return for asynchronous writes, > e.g. in transport. I guess you should read some Twisted tutorial. :-) > Should there be no "data_sent" parallel to "data_received" somewhere? Or, > alternatively, "write" returning some sort of future that can be checked > later for status? For connections that aren't infinitely fast it's useful to > know when the data was actually sent/written, or alternatively if an error > has occurred. This is also important for when writing would actually block > because of full buffers. boost::asio has such a handler for async_write. The model is a little different. Glyph has convinced me that it works well in practice. We just buffer what is written (when it can't all be sent immediately). This is enough for most apps that don't serve 100MB files. If the buffer becomes too large, the transport will call .pause() on the protocol until it is drained, then it calls .resume(). (The names of these are TBD, maybe they will end up .pause_writing() and .resume_writing().) There are some default behaviors that we can add here too, e.g. suspending the task. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 22:24:15 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 13:24:15 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 12:08 AM, Paul Moore wrote: > On 18 January 2013 05:08, Guido van Rossum wrote: >>> That still doesn't spell out that it's about the internet >>> in particular. Or is the assumption that internet connections >>> are the only kind that matter these days? >> >> Basically yes, in this context. The same assumption underlies >> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying >> around and you want to support it, you're welcome to create the >> transport connection function create_corba_connection(). :-) > > To create that create_corba_connection() function, you'd be expected > to subclass the standard event loop, is that right? No, it doesn't need to be a method on the event loop at all. It can just be a function in a different package; it can use events.get_current_event_loop() to reference the event loop. -- --Guido van Rossum (python.org/~guido) From phd at phdru.name Fri Jan 18 22:25:31 2013 From: phd at phdru.name (Oleg Broytman) Date: Sat, 19 Jan 2013 01:25:31 +0400 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: <20130118212531.GA19497@iskra.aviel.ru> On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore wrote: > Hmm, I'm looking at a pipe transport on Unix, and I find I don't know > enough about programming Unix. How do I set a file descriptor > (specifically a pipe) in Unix to be nonblocking? For a socket, > sock.setblocking(False) does the job. But for a pipe/file, the only > thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not > possible to set an already open file descriptor to be nonblocking? http://linuxmanpages.com/man2/fcntl.2.php The file status flags A file descriptor has certain associated flags, initialized by open(2) and possibly modified by fcntl(2). The flags are shared between copies (made with dup(2), fork(2), etc.) of the same file descriptor. The flags and their semantics are described in open(2). F_GETFL Read the file descriptor's flags. F_SETFL Set the file status flags part of the descriptor's flags to the value specified by arg. Remaining bits (access mode, file creation flags) in arg are ignored. On Linux this command can only change the O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From james.d.harding at siemens.com Fri Jan 18 22:28:43 2013 From: james.d.harding at siemens.com (Harding, James) Date: Fri, 18 Jan 2013 21:28:43 +0000 Subject: [Python-ideas] Regarding 'const' and 'require' statements Message-ID: Hello, There are so many replies that I am going to try and summarize responses with a lot of cut and paste in one post. Sorry if this is the wrong way to do it. > Do you have some concrete Python code which would clearly be improved by this proposal? Let me explain myself. I am a low-level programmer fascinated by Python's elegant syntax and how it is executed. We actually do little Python programming here but we do allow interaction between Python and our product and so I am not able to show any concrete code. I guess that makes me a crank but I am fine with that. At a low level, I look at what Python has to go through to execute statements and thoughts swirl through my mind as to how it could be improved. I finally cracked and made a post here with one of those improvements. >> const ST_MODE = 0 >> So, the compiler will ?replace any use of the identifier with? the constant value. Yes, the compiler will replace any use of the name with its value. A statement like: If c == ST_MODE: Would be treated by the compiler at compile-time as if it had seen: If c == 0: The name ST_MODE in this example is not a bindable name. The name only lives during compilation and is not accessible at run-time. It would not be stored in a dictionary (unless the magic syntax 'require module as *' were used that only confuses what I am trying to say). > name_prefix = "ST_" > foo = globals().get(name_prefix + "MODE") > > What do you expect the compiler to do in the above code? Since the name is not accessible at run-time, the above would produce an exception. Const names are only available at compile-time. > If I'd written his proposal I'd have probably termed these things "bind-once", generating names that may not be rebound. They would still > need to be carefully placed if the compiler were to have the option of constant folding i.e. they're need to be outside function and class > definitions, determinable from static analysis. These are "bind-never" names. The compiler would have to be able to see the definition when a module that uses them is being compiled. That is the reason for the require statement. The compiler does not normally look at the contents of other modules when parsing a source file. The require statement tells it to do so. >> The expression would be restricted to result in an immutable object >What is the purpose of this restriction? My thought is that a constant name should have the same value regardless of context. If I were to say something like "const A = B" then A is no longer a constant and when substituted depends on how B is interpreted within the current context (is it a global? A local? A nonlocal?). If I were to say "const A = [1,2,3]" then you need to worry about side effects. You would have to entirely clone the value at compile-time for each use rather than simply incrementing the reference count. >Is that the driving use-case for your suggestion? Compile-time efficiency? >If so, then I suspect that you're on the wrong track. As I understand it, >the sort of optimizations that PyPy can perform at runtime are far more >valuable than this sort of constant substitution. That is my basic track. If the existing tools handle this better than my idea should be discarded as not providing any significant improvement and adding additional baggage to the language. >k = ("Some value", "Another value") # for example >x = k >y = k >assert x is y # this always passes, no matter the value of k > >But if k is a const, it will fail, because the lines "x = k" and "y = k" >will be expanded at compile time: The restriction that constant names be immutable objects would allow their values to be placed in the constant pool for the function. In the above, if 'k' were a constant name then it would (hopefully) reside in a single location in the constant pool and the assignments to 'x' and 'y' would access the same constant pool location. >Another question: what happens if the constant expression can't be >evaluated until runtime? Constant expressions would be restricted to be compile-time constants. They would not be evaluated at run-time. >Compiler-enforced immutability is one of those really hard problems which, >if you manage to do flexibly and correctly, would be an academically >publishable result, not something you hack into the interpreter over a >weekend. I have to plead guilty here. I am not an academic and do not know all the implications of things. I do not follow research either and so am basically proposing this as a crank/hacker sort of person. >- multi-stage computations, so the program is partially-evaluated at >"compile" time and the `const` sections computed. This is also really hard. >Furthermore, if you want to be able to use bits of the standard library in >the early stages (you probably do, e.g. for things like min, max, len, >etc.) either you'd need to manually start annotating huge chunks of the >standard library to be available at "compile" time (a huge undertaking) or >you'll need an effect-tracking-system to do it for you. This is indeed a big worry. I would have had it such that a module could (but not required in any sense) be split into two parts. One part that is referenced at run-time using the existing import mechanism. This would not change. The second part of a module would be constants (and only constants) that are referenced at compile-time. There would be no requirement that modules change over to this new method. It would just mean that constants defined in the module are available to the compiler. That last statement is apparently not exactly true if I understand the comments about what PyPy optimizations do. The idea of the compiler accessing the source files for other modules does give me pause. Currently, compiling one module is fairly disjoint from other modules in that a change to one module does not require a re-compile of modules that use it even if 'constants' are changed. This is a good feature of Python and maybe something to boast about but I would worry if these ideas introduced bad practices. I don't think that there have been many cases of Python programmers saying: "I made a change to my module - you need to recompile your module to get the changes". I would like to thank you all for critiquing my ideas and pointing out its flaws with patience and respect. In many ways this was just an exercise in getting things off my chest because in the end I am just a crank. Thank you, James Harding -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Fri Jan 18 22:37:34 2013 From: phd at phdru.name (Oleg Broytman) Date: Sat, 19 Jan 2013 01:37:34 +0400 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130118212531.GA19497@iskra.aviel.ru> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130118212531.GA19497@iskra.aviel.ru> Message-ID: <20130118213734.GB19497@iskra.aviel.ru> On Sat, Jan 19, 2013 at 01:25:31AM +0400, Oleg Broytman wrote: > On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore wrote: > > Hmm, I'm looking at a pipe transport on Unix, and I find I don't know > > enough about programming Unix. How do I set a file descriptor > > (specifically a pipe) in Unix to be nonblocking? For a socket, > > sock.setblocking(False) does the job. But for a pipe/file, the only > > thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not > > possible to set an already open file descriptor to be nonblocking? > > F_GETFL > Read the file descriptor's flags. > F_SETFL > Set the file status flags part of the descriptor's flags to the > value specified by arg. Remaining bits (access mode, file creation > flags) in arg are ignored. On Linux this command can only change the > O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags. So you have to call fnctl() on the pipe's descriptor to F_GETFL flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From eliben at gmail.com Fri Jan 18 22:40:44 2013 From: eliben at gmail.com (Eli Bendersky) Date: Fri, 18 Jan 2013 13:40:44 -0800 Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future In-Reply-To: References: Message-ID: On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum wrote: > On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky wrote: > > I'm looking through PEP 3156 and the Tulip code, and either something is > > missing or I'm not looking in the right places. > > > > I can't find any sort of callback / future return for asynchronous > writes, > > e.g. in transport. > > I guess you should read some Twisted tutorial. :-) > Yes, I noticed that Twisted also doesn't have it, so I suspected that influence. > > > Should there be no "data_sent" parallel to "data_received" somewhere? Or, > > alternatively, "write" returning some sort of future that can be checked > > later for status? For connections that aren't infinitely fast it's > useful to > > know when the data was actually sent/written, or alternatively if an > error > > has occurred. This is also important for when writing would actually > block > > because of full buffers. boost::asio has such a handler for async_write. > > The model is a little different. Glyph has convinced me that it works > well in practice. We just buffer what is written (when it can't all be > sent immediately). This is enough for most apps that don't serve 100MB > files. If the buffer becomes too large, the transport will call > .pause() on the protocol until it is drained, then it calls .resume(). > (The names of these are TBD, maybe they will end up .pause_writing() > and .resume_writing().) There are some default behaviors that we can > add here too, e.g. suspending the task. > > I agree it can be made to work, but how would even simple "done sending" notification work? Or "send error" for that matter? AFAIR, low-level socket async API do provide this information. Are we confident enough it will never be needed to simply hide it away? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 18 22:48:24 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 13:48:24 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 12:38 AM, Nick Coghlan wrote: > On Fri, Jan 18, 2013 at 6:08 PM, Paul Moore wrote: >> On 18 January 2013 05:08, Guido van Rossum wrote: >>>> That still doesn't spell out that it's about the internet >>>> in particular. Or is the assumption that internet connections >>>> are the only kind that matter these days? >>> >>> Basically yes, in this context. The same assumption underlies >>> socket.getaddrinfo() in the stdlib. If you have a CORBA system lying >>> around and you want to support it, you're welcome to create the >>> transport connection function create_corba_connection(). :-) >> >> To create that create_corba_connection() function, you'd be expected >> to subclass the standard event loop, is that right? > > I'm not sure why CORBA would be a transport in its own right rather > than a protocol running over a standard socket transport. I don't know -- but I could imagine that a particular CORBA implementation might be provided as a set of API function calls rather than something that hooks into sockets. I don't care about CORBA, but that was the use case I intended to highlight -- something that (for whatever reason, no matter how misguided) doesn't use sockets and doesn't have an underlying file descriptor you can wait on. (IIRC most GUI frameworks also fall into that category.) > Transports are about the communications channel > - network sockets > - OS pipes > - shared memory > - CANbus > - protocol tunneling Hm. I think of transports more as an abstraction of a specific set of semantics for a communication channel -- bidrectional streams, in particular, presumably with error correction/detection so that you can assume that you either see what the other end sent you, in the order in which it sent it (but not preserving buffer/packet/record boundaries!), or you get a "broken connection" error. Now, we may be in violent agreement here -- the transports I am thinking of can certainly use any of the mechanisms you list as underlying abstraction. But I wouldn't call it a transport unless it had standardized semantics and a standardized interface with the protocol. (For datagrams, we need slightly different abstractions, with different guarantees and semantics. But, again, all datagram transports should be more or less interchangeable.) > Transports should only be platform specific at the base layer where > they actually need to interact with the OS through the event loop. > Higher level transports should be connected to lower level protocols > based on APIs provided by those transports and protocols themselves. Yeah, well, but in practice I expect that layering transports on top of each other is rare, and using platform specific transport implementations is by far the common case. (Note that in theory you could layer SSL over any unencrypted transport; but in practice (a) few people need that, and (b) the ssl module doesn't support this -- hence I am comfortable with treating SSL as another platform-specific transport.) > The *whole point* of the protocol vs transport model is to allow you > to write adaptive stacks. To use the example from PEP 3153, to > implement full JSON-RPC support over both sockets and a HTTP-tunnel > you need the following implemented: > > - TCP socket transport > - HTTP protocol > - HTTP-based transport > - JSON-RPC protocol > > Because the transport API is standardised, the JSON-RPC protocol can > be written once and run over HTTP using the full stack as shown, *or* > directly over TCP by stripping out the two middle layers. I don't know enough about JSON-RPC (shame on me!) but this sounds very reasonable. > The *only* layer that the event loop needs to concern itself with is > the base transport layer - it doesn't care how many layers of > protocols or protocol-as-transport adapters you stack on top. True. There's one important issue here: *constructing* the stack is not up to the event loop. It is totally fine if the HTTP-based transport is a 3rd party package that exports a function to set up the stack, given an event loop and a protocol to run on top (JSON-RPC in this example). This function can have a custom signature that is not compatible with any other transport-creating APIs in existence. (In fact this is why I renamed create_transport() to create_connection() -- the standardized API just has methods for creating internet connections.) > The other thing that may not have been emphasised sufficiently is that > the *protocol* APIs is completely dependent on the protocol involved. > The API of a pipe protocol is not that of HTTP or CORBA or JSON-RPC or > XML-RPC. That's why tunneling, as in the example above, requires a > protocol-specific adapter to translate from the protocol API back to > the standard transport API. I'm not even sure what you mean by the protocol API. From the PEP's POV, the "protcol API" is just the methods that the transport calls (connection_made(), data_received(), etc.) and those certainly *are* supposed to be standardized. > So, for example, Greg's request for the ability to pass callbacks > rather than needing particular method names Hm, I have yet to respond to Greg's message, but I'm not sure that's a reasonable request. > can be satisfied by writing a simple callback protocol: > > class CallbackProtocol: > """Invoke arbitrary callbacks in response to transport events""" > def __init__(self, on_data, on_conn, on_loss, on_eof): > self.on_data = on_data > self.on_conn = on_conn > self.on_loss = on_loss > self.on_eof = on_eof > > def connection_made(transport): > self.on_conn(transport) > > def data_received(data): > self.on_data(data) > > def eof_received(): > self.on_eof() > > def connection_lost(exc): > self.on_loss(exc) Well, except that you can't just pass CallbackProtocol where a protocol factory is required by the PEP -- you'll have to pass a lambda or partial function without arguments that calls CallbackProtocol with some arguments taken from elsewhere. No big deal though. > Similarly, his request for a IOStreamProtocol would likely look a lot > like an asynchronous version of the existing IO stack API (to handle > encoding, buffering, etc), with the lowest layer being built on the > transport API rather than the file API (as it is in the io module). That sounds like an intriguing idea which I'd like to explore in the distant future. One point of light: a transport probably already is acceptable as a binary *output* stream, because its write() method is not a coroutine. (This is intentional.) But doing the same for input is harder. > You would then be able to treat *any* transport, whether it's an SSH > tunnel, an ordinary socket connection or a pipe to a subprocess as a > non-seekable stream. Right. (TBH, I'm often not sure whether you are just explaining the PEP's philosophy or trying to propose changes... Sorry for the confusion this may cause.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 22:52:59 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 13:52:59 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130118213734.GB19497@iskra.aviel.ru> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130118212531.GA19497@iskra.aviel.ru> <20130118213734.GB19497@iskra.aviel.ru> Message-ID: On Fri, Jan 18, 2013 at 1:37 PM, Oleg Broytman wrote: > On Sat, Jan 19, 2013 at 01:25:31AM +0400, Oleg Broytman wrote: >> On Fri, Jan 18, 2013 at 09:01:32PM +0000, Paul Moore wrote: >> > Hmm, I'm looking at a pipe transport on Unix, and I find I don't know >> > enough about programming Unix. How do I set a file descriptor >> > (specifically a pipe) in Unix to be nonblocking? For a socket, >> > sock.setblocking(False) does the job. But for a pipe/file, the only >> > thing I can see is the O_NONBLOCK flag to os.open/os.pipe2. Is it not >> > possible to set an already open file descriptor to be nonblocking? >> >> F_GETFL >> Read the file descriptor's flags. >> F_SETFL >> Set the file status flags part of the descriptor's flags to the >> value specified by arg. Remaining bits (access mode, file creation >> flags) in arg are ignored. On Linux this command can only change the >> O_APPEND, O_NONBLOCK, O_ASYNC, and O_DIRECT flags. > > So you have to call fnctl() on the pipe's descriptor to F_GETFL > flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back. Here's my code for this: def _setnonblocking(fd): flags = fcntl.fcntl(fd, fcntl.F_GETFL) fcntl.fcntl(fd, fcntl.F_SETFL, flags | os.O_NONBLOCK) -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Fri Jan 18 23:07:23 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 22:07:23 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130118213734.GB19497@iskra.aviel.ru> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130118212531.GA19497@iskra.aviel.ru> <20130118213734.GB19497@iskra.aviel.ru> Message-ID: On 18 January 2013 21:37, Oleg Broytman wrote: > So you have to call fnctl() on the pipe's descriptor to F_GETFL > flags, set O_NONBLOCK and call fnctl() to F_SETFL the new flags back. Ah, excellent. Thanks for the information - I'll use that in my code. Paul From guido at python.org Fri Jan 18 23:15:07 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 14:15:07 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F8F725.20505@canterbury.ac.nz> References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing wrote: > Paul Moore wrote: >> >> PS From the PEP, it seems that a protocol must implement the 4 methods >> connection_made, data_received, eof_received and connection_lost. For >> a process, which has 2 output streams involved, a single data_received >> method isn't enough. > It looks like there would have to be at least two Transport instances > involved, one for stdin/stdout and one for stderr. > > Connecting them both to a single Protocol object doesn't seem to be > possible with the framework as defined. You would have to use a > couple of adapter objects to translate the data_received calls into > calls on different methods of another object. So far this makes sense. But for this specific case there's a simpler solution -- require the protocol to support a few extra methods, in particular, err_data_received() and err_eof_received(), which are to stderr what data_received() and eof_received() are for stdout. (After all, the point of a subprocess is that "normal" data goes to stdout.) There's only one input stream to the subprocess, so there's no ambiguity for write(), and neither is there a need for multiple connection_made()/lost() methods. (However, we could argue endlessly over whether connection_lost() should be called when the subprocess exits, or when the other side of all three pipes is closed. :-) > This sort of thing would be easier if, instead of the Transport calling > a predefined method of the Protocol, the Protocol installed a callback > into the Transport. Then a Protocol designed for dealing with subprocesses > could hook different methods of itself into a pair of Transports. Hm. Not excited. I like everyone using the same names for these callback methods, so that a reader (who is familiar with the transport/protocol API) can instantly know what kind of callback it is and what its arguments are. (But see Nick's simple solution for having your cake and eating it, too.) > Stepping back a bit, I must say that from the coroutine viewpoint, > the Protocol/Transport stuff just seems to get in the way. If I were > writing coroutine-based code to deal with a subprocess, I would want > to be able to write coroutines like > > def handle_output(stdout): > while 1: > line = yield from stdout.readline() > if not line: > break > mungulate_line(line) > > def handle_errors(stderr): > while 1: > line = yield from stderr.readline() > if not line: > break > complain_to_user(line) > > In other words, I don't want Transports or Protocols or any of that > cruft, I just want a simple pair of async stream objects that I can > read and write using yield-from calls. There doesn't seem to be > anything like that specified in PEP 3156. This is a good observation -- one that I've made myself as well. I also have a plan for dealing with it -- but I haven't coded it up properly yet and consequently I haven't written it up for the PEP yet either. The idea is that there will be some even-higher-level functions for tasks to call to open connections (etc.) which just give you two unidrectional streams (one for reading, one for writing). The write-stream can just be the transport (its write() and writelines() methods are familiar from regular I/O streams) and the read-stream can be a StreamReader -- a class I've written but which needs to be moved into a better place: http://code.google.com/p/tulip/source/browse/tulip/http_client.py#37 Anyway, the reason for having the transport/protocol abstractions in the middle is so that other frameworks can ignore coroutines if they want to -- all they have to do is work with Futures, which can be fully controlled through callbacks (which are native at the lowest level of almost all frameworks, including Tulip / PEP 3156). > It does mention something about implementing a streaming buffer on > top of a Transport, but in a way that makes it sound like a suggested > recipe rather than something to be provided by the library. Also it > seems like a lot of layers of overhead to go through. It'll be in the stdlib, no worries. I don't expect the overhead to be a problem. > On the whole, in PEP 3156 the idea of providing callback-based > interfaces with yield-from-based ones built on top has been > pushed way further up the stack than I imagined it would. I don't > want to be *forced* to write my coroutine code at the level of > Protocols; I want to be able to work at a lower level than that. You can write an alternative framework using coroutines and callbacks, bypassing transports and protocols. (You'll still need Futures.) However you'd be missing the interoperability offered by the protocol/transport abstractions: in an IOCP world you'd have to interact with the event loop's callbacks differently than in a select/poll/etc. world. PEP 3156 is trying to make different groups happy: people who like callbacks, people who like coroutines; people who like UNIX, people who like Windows. Everybody may have to compromise a little bit, but the reward will (hopefully) be better portability and better interoperability. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 23:22:34 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 14:22:34 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 3:55 AM, Nick Coghlan wrote: > On Fri, Jan 18, 2013 at 7:33 PM, Paul Moore wrote: >> On 18 January 2013 08:38, Nick Coghlan wrote: >> I have now (finally!) got Guido's point that implementing a process >> protocol will give me a good insight into how this stuff is meant to >> work. I'm still struggling to understand why he thinks it needs a >> dedicated method on the event loop, rather than being a higher-level >> layer like you're suggesting, but I'm at least starting to understand >> what questions to ask. > > The creation of the pipe transport needs to be on the event loop, > precisely because of cross-platform differences when it comes to > Windows. On *nix, on the other hand, the pipe transport should look an > awful lot like the socket transport and thus be able to use the > existing file descriptor based interfaces on the event loop. Thanks for clarifying that -- I'm behind on this thread! > The protocol part is then about adapting the transport API to > coroutine friendly readlines/writelines API (the part that Guido > points out needs more detail in > http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols) > > As a rough untested sketch (the buffering here could likely be a lot smarter): I have a more-or-less working but probably incomplete version checked into the tulip repo: http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py Note that this completely ignores stderr -- this makes the code simpler while still useful (there's plenty of useful stuff you can do without reading stderr), and avoids the questions Greg Ewing brought up about needing two transports (one for stdout, another for stderr). -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 23:25:10 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 14:25:10 -0800 Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future In-Reply-To: References: Message-ID: On Fri, Jan 18, 2013 at 1:40 PM, Eli Bendersky wrote: > On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum wrote: >> >> On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky wrote: >> > I'm looking through PEP 3156 and the Tulip code, and either something is >> > missing or I'm not looking in the right places. >> > >> > I can't find any sort of callback / future return for asynchronous >> > writes, >> > e.g. in transport. >> >> I guess you should read some Twisted tutorial. :-) > > > Yes, I noticed that Twisted also doesn't have it, so I suspected that > influence. > >> >> >> > Should there be no "data_sent" parallel to "data_received" somewhere? >> > Or, >> > alternatively, "write" returning some sort of future that can be checked >> > later for status? For connections that aren't infinitely fast it's >> > useful to >> > know when the data was actually sent/written, or alternatively if an >> > error >> > has occurred. This is also important for when writing would actually >> > block >> > because of full buffers. boost::asio has such a handler for async_write. >> >> The model is a little different. Glyph has convinced me that it works >> well in practice. We just buffer what is written (when it can't all be >> sent immediately). This is enough for most apps that don't serve 100MB >> files. If the buffer becomes too large, the transport will call >> .pause() on the protocol until it is drained, then it calls .resume(). >> (The names of these are TBD, maybe they will end up .pause_writing() >> and .resume_writing().) There are some default behaviors that we can >> add here too, e.g. suspending the task. >> > > I agree it can be made to work, but how would even simple "done sending" > notification work? Or "send error" for that matter? AFAIR, low-level socket > async API do provide this information. Are we confident enough it will never > be needed to simply hide it away? AFAIK the Twisted folks have found that most of the time (basically all of the time) you don't need a positive "done sending" notification; when the send eventually *fails*, the transport calls the protocol's connection_lost() method with an exception indicating what failed. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Fri Jan 18 23:32:17 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 22:32:17 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 21:24, Guido van Rossum wrote: >> To create that create_corba_connection() function, you'd be expected >> to subclass the standard event loop, is that right? > > No, it doesn't need to be a method on the event loop at all. It can > just be a function in a different package; it can use > events.get_current_event_loop() to reference the event loop. Aargh. I'm confused again! (I did warn you about dumb questions, didn't I? :-)) The event loop implementation contains the code that does the OS-level poll for events to process. (In tulip, that is handled by the selector object, but that's not mentioned in the PEP so I assume it should be considered an implementation detail). So, the event loop has to define what types of (OS-level) objects can be registered. At the moment, event loops only handle sockets (via select/poll/etc) and even the raw add_reader methods are not for end user use. So a standalone create_corba_connection function can certainly get the event loop using get_current_event_loop(), but it has no means of asking the event loop to poll the CORBA connection it creates for new messages. Without direct access to the selector (or equivalent) it can't add the extra event source. (Unless that source is a pollable file descriptor and it's willing to play with the optional add_reader methods, but that's not a "new event source" then...) The same problem will likely occur if you try to integrate Windows GUI events (you check for a GUI message by calling a Windows API). I don't think this matters except in obscure cases (it's likely a huge case of YAGNI) but I genuinely don't understand how you can say that create_corba_connection() could be written as a standalone function, and yet that create_connection() has to be a method of the event loop. That's what I'm getting at when I keep saying that I see you treating sockets as "special". There's clearly something I'm missing in your thinking, and it keeps tripping me up. Paul. From p.f.moore at gmail.com Fri Jan 18 23:48:36 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 22:48:36 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 22:22, Guido van Rossum wrote: >> The protocol part is then about adapting the transport API to >> coroutine friendly readlines/writelines API (the part that Guido >> points out needs more detail in >> http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols) >> >> As a rough untested sketch (the buffering here could likely be a lot smarter): > > I have a more-or-less working but probably incomplete version checked > into the tulip repo: > http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py Ha! You beat me to it. OK, looking at your code, I see that you freely used the add_reader/add_writer functions and friends, and the fact that the Unix selectors handle pipes as well as sockets. With the freedom to do that, your code looks both reasonable and pretty straightforward. I was having trouble getting past the fact that this approach wouldn't work on Windows, and confusing "nonportable" with "not allowed". My apologies. You kept telling me that writing the code for Unix would be helpful, but I kept thinking in terms of writing code that worked on Unix but with portability to Windows in mind, which completely misses the point. I knew that the transport/protocol code I'd end up writing would look something like this, but TBH I'd not seen that as the interesting part of the problem... BTW, to avoid duplication of the fork/exec stuff, I would probably have written the transport to take a subprocess.Popen object as its only argument, then hooked up self._wstdin to popen.stdin and self._rstdout to popen.stdout. That requires the user to have created the Popen object with those file descriptors as pipes (I don't know if it's possible to introspect a Popen object to check that) but avoids duplicating the subprocess logic. I can probably fairly quickly modify your code to demonstrate, but it's late and I don't want to start booting my Unix environment now, so it'll have to wait till tomorrow :-) Paul. From guido at python.org Fri Jan 18 23:51:41 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 14:51:41 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 2:32 PM, Paul Moore wrote: > On 18 January 2013 21:24, Guido van Rossum wrote: >>> To create that create_corba_connection() function, you'd be expected >>> to subclass the standard event loop, is that right? >> >> No, it doesn't need to be a method on the event loop at all. It can >> just be a function in a different package; it can use >> events.get_current_event_loop() to reference the event loop. > > Aargh. I'm confused again! (I did warn you about dumb questions, didn't I? :-)) > > The event loop implementation contains the code that does the OS-level > poll for events to process. (In tulip, that is handled by the selector > object, but that's not mentioned in the PEP so I assume it should be > considered an implementation detail). So, the event loop has to define > what types of (OS-level) objects can be registered. At the moment, > event loops only handle sockets (via select/poll/etc) and even the raw > add_reader methods are not for end user use. Well, *on UNIX* the event loop also handles other file descriptors, and there's nothing to actually *prevent* an end user using add_reader. It just may not work when their code is run on Windows, but then it probably won't run on Windows anyway. :-) > So a standalone create_corba_connection function can certainly get the > event loop using get_current_event_loop(), but it has no means of > asking the event loop to poll the CORBA connection it creates for new > messages. Right, unless it is in on the conspiracy between the event loop and the selector (IOW if it is effectively aware and/or part of the event loop implementation for the specific platform). > Without direct access to the selector (or equivalent) it > can't add the extra event source. (Unless that source is a pollable > file descriptor and it's willing to play with the optional add_reader > methods, but that's not a "new event source" then...) The same problem > will likely occur if you try to integrate Windows GUI events (you > check for a GUI message by calling a Windows API). Let's say that you are thinking through the example much farther than I had intended... :-) > I don't think this matters except in obscure cases (it's likely a huge > case of YAGNI) but I genuinely don't understand how you can say that > create_corba_connection() could be written as a standalone function, > and yet that create_connection() has to be a method of the event loop. > That's what I'm getting at when I keep saying that I see you treating > sockets as "special". There's clearly something I'm missing in your > thinking, and it keeps tripping me up. Let's assume that create_corba_connection() actually *can* be written using add_reader(), but only on UNIX. So the app is limited to UNIX, and in that context create_corba_connection() can be a function in another package. It's not so much that create_connection() *must* be a method on the event loop. It's just that I *want* it to be a method on the event loop so you will be able to write user code that is portable between UNIX and Windows. It will call create_connection(), which is a portable API with two platform-specific implementations; on Windows (when using IOCP) it will return an instance of, say, _IocpSocketTransport(), while on UNIX it returns a _UnixSocketTransport() instance. But we have no hope of making create_corba_connection() on Windows (in my example -- please just play along) and hence there is no need to make it a method of the event loop. -- --Guido van Rossum (python.org/~guido) From guido at python.org Fri Jan 18 23:53:15 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 14:53:15 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 2:48 PM, Paul Moore wrote: > On 18 January 2013 22:22, Guido van Rossum wrote: >>> The protocol part is then about adapting the transport API to >>> coroutine friendly readlines/writelines API (the part that Guido >>> points out needs more detail in >>> http://www.python.org/dev/peps/pep-3156/#coroutines-and-protocols) >>> >>> As a rough untested sketch (the buffering here could likely be a lot smarter): >> >> I have a more-or-less working but probably incomplete version checked >> into the tulip repo: >> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py > > Ha! You beat me to it. > > OK, looking at your code, I see that you freely used the > add_reader/add_writer functions and friends, and the fact that the > Unix selectors handle pipes as well as sockets. With the freedom to do > that, your code looks both reasonable and pretty straightforward. I > was having trouble getting past the fact that this approach wouldn't > work on Windows, and confusing "nonportable" with "not allowed". My > apologies. You kept telling me that writing the code for Unix would be > helpful, but I kept thinking in terms of writing code that worked on > Unix but with portability to Windows in mind, which completely misses > the point. I knew that the transport/protocol code I'd end up writing > would look something like this, but TBH I'd not seen that as the > interesting part of the problem... Glad you've got it now! > BTW, to avoid duplication of the fork/exec stuff, I would probably > have written the transport to take a subprocess.Popen object as its > only argument, then hooked up self._wstdin to popen.stdin and > self._rstdout to popen.stdout. That requires the user to have created > the Popen object with those file descriptors as pipes (I don't know if > it's possible to introspect a Popen object to check that) but avoids > duplicating the subprocess logic. I can probably fairly quickly modify > your code to demonstrate, but it's late and I don't want to start > booting my Unix environment now, so it'll have to wait till tomorrow > :-) I would love for you to create that version. I only checked it in so I could point to it -- I am not happy with either the implementation, the API spec, or the unit test... -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Fri Jan 18 23:53:54 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 22:53:54 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 22:51, Guido van Rossum wrote: > It's not so much that create_connection() *must* be a method on the > event loop. It's just that I *want* it to be a method on the event > loop so you will be able to write user code that is portable between > UNIX and Windows. It will call create_connection(), which is a > portable API with two platform-specific implementations; on Windows > (when using IOCP) it will return an instance of, say, > _IocpSocketTransport(), while on UNIX it returns a > _UnixSocketTransport() instance. > > But we have no hope of making create_corba_connection() on Windows (in > my example -- please just play along) and hence there is no need to > make it a method of the event loop. Ah, OK. I've got it now, thanks! Paul From p.f.moore at gmail.com Fri Jan 18 23:57:45 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 18 Jan 2013 22:57:45 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On 18 January 2013 22:15, Guido van Rossum wrote: > But for this specific case there's a simpler solution -- require the > protocol to support a few extra methods, in particular, > err_data_received() and err_eof_received(), which are to stderr what > data_received() and eof_received() are for stdout. (After all, the > point of a subprocess is that "normal" data goes to stdout.) There's > only one input stream to the subprocess, so there's no ambiguity for > write(), and neither is there a need for multiple > connection_made()/lost() methods. (However, we could argue endlessly > over whether connection_lost() should be called when the subprocess > exits, or when the other side of all three pipes is closed. :-) While I don't really care about arguing over *when* connection_lost should be called, it *is* relevant to my thinking that getting notified when the process exits doesn't seem to me to be possible - again it's the issue that the transport can't ask the event loop to poll for anything that the event loop isn't already coded to check. So (once again, unless I've missed something) the only viable option for a standalone transport is to call connection_lost when all the pipes are closed. Am I still missing something? Paul From guido at python.org Sat Jan 19 00:01:54 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 15:01:54 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 2:57 PM, Paul Moore wrote: > On 18 January 2013 22:15, Guido van Rossum wrote: >> But for this specific case there's a simpler solution -- require the >> protocol to support a few extra methods, in particular, >> err_data_received() and err_eof_received(), which are to stderr what >> data_received() and eof_received() are for stdout. (After all, the >> point of a subprocess is that "normal" data goes to stdout.) There's >> only one input stream to the subprocess, so there's no ambiguity for >> write(), and neither is there a need for multiple >> connection_made()/lost() methods. (However, we could argue endlessly >> over whether connection_lost() should be called when the subprocess >> exits, or when the other side of all three pipes is closed. :-) > > While I don't really care about arguing over *when* connection_lost > should be called, it *is* relevant to my thinking that getting > notified when the process exits doesn't seem to me to be possible - > again it's the issue that the transport can't ask the event loop to > poll for anything that the event loop isn't already coded to check. So > (once again, unless I've missed something) the only viable option for > a standalone transport is to call connection_lost when all the pipes > are closed. That is typically how these things are done (e.g. popen and subprocess work this way). It is also probably the most useful, since it is *possible* that the parent process forks a child and then exits itself, where the child does all the work of the pipeline. > Am I still missing something? I believe it is, at least in theory, possible to implement waiting for the process to exit, using signals. The event loop can add signal handlers, and there is a signal that gets sent upon child process exit. There are lots of problems here (what if some other piece of code forked that process) but we could come up with reasonable solutions for these. However waiting for the pipes closing makes the most sense, so no need to bother. :-) -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Sat Jan 19 00:12:34 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 12:12:34 +1300 Subject: [Python-ideas] Regarding 'const' and 'require' statements In-Reply-To: References: Message-ID: <50F9D6E2.9020703@canterbury.ac.nz> Harding, James wrote: > The name ST_MODE in this example is not a bindable name. The name only > lives during compilation and is not accessible at run-time. I don't think that's a good idea. It would be better for it to be available at run time like a normal module-level name, but protected from rebinding. There may be cases where the compiler can't work out the value, such as when the module is imported dynamically. Such code would then continue to work, it just wouldn't be optimised. Not having the name present at run time could also lead to unexpected results. If something tries to rebind the name, it will succeed, but it won't affect compiler-optimised code using the name. It would be better if attempting to rebind a const name raised an exception. -- Greg From greg.ewing at canterbury.ac.nz Sat Jan 19 00:21:02 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 12:21:02 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: <50F9D8DE.9040003@canterbury.ac.nz> Paul Moore wrote: > Is it not > possible to set an already open file descriptor to be nonblocking? If > that's the case, it means that Unix has the same problem as I suspect > exists for Windows - existing pipes and filehandles can't be used in > async code as they won't necessarily be in nonblocking mode. No, it doesn't -- a fd doesn't *have* to be non-blocking in order to use it with select/poll/whatever. Sometimes people do, but only to allow a performance optimisation by attempting another read before going back to the event loop, just in case more data came in while you were processing the first lot. But doing that is entirely optional. Having said that, fcntl() is usually the way to change the O_NONBLOCK flag of an already-opened fd, althought the details may vary from one unix to another. -- Greg From greg.ewing at canterbury.ac.nz Sat Jan 19 00:59:38 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 12:59:38 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: <50F9E1EA.4010305@canterbury.ac.nz> Guido van Rossum wrote: > Well, except that you can't just pass CallbackProtocol where a > protocol factory is required by the PEP -- you'll have to pass a > lambda or partial function without arguments that calls > CallbackProtocol with some arguments taken from elsewhere. Something smells wrong to me about APIs that require protocol factories. I don't see what advantage there is in writing create_connection(HTTPProtocol, "some.where.net", 80) as opposed to just writing something like HTTPProtocol(TCPTransport("some.where.net", 80)) You're going to have to use the latter style anyway to set up anything other than the very simplist configurations, e.g. your earlier 4-layer protocol stack example. So create_connection() can't be anything more than a convenience function, and unless I'm missing something, it hardly seems to add enough convenience to be worth the bother. -- Greg From guido at python.org Sat Jan 19 01:12:29 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 16:12:29 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F9E1EA.4010305@canterbury.ac.nz> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing wrote: > Guido van Rossum wrote: > >> Well, except that you can't just pass CallbackProtocol where a >> protocol factory is required by the PEP -- you'll have to pass a >> lambda or partial function without arguments that calls >> CallbackProtocol with some arguments taken from elsewhere. > > > Something smells wrong to me about APIs that require protocol > factories. I don't see what advantage there is in writing > > create_connection(HTTPProtocol, "some.where.net", 80) > > as opposed to just writing something like > > HTTPProtocol(TCPTransport("some.where.net", 80)) > > You're going to have to use the latter style anyway to set up > anything other than the very simplist configurations, e.g. > your earlier 4-layer protocol stack example. > > So create_connection() can't be anything more than a convenience > function, and unless I'm missing something, it hardly seems to > add enough convenience to be worth the bother. Glyph should really answer this one. Personally I don't feel strongly either way for this case. There may be an advantage to not calling the protocol factory if the connection can't be made (in which case the Future returned by create_connection() has the exception). -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Sat Jan 19 01:19:55 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 19 Jan 2013 01:19:55 +0100 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> Message-ID: <20130119011955.644003f3@pitrou.net> On Fri, 18 Jan 2013 16:12:29 -0800 Guido van Rossum wrote: > On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing wrote: > > Guido van Rossum wrote: > > > >> Well, except that you can't just pass CallbackProtocol where a > >> protocol factory is required by the PEP -- you'll have to pass a > >> lambda or partial function without arguments that calls > >> CallbackProtocol with some arguments taken from elsewhere. > > > > > > Something smells wrong to me about APIs that require protocol > > factories. I don't see what advantage there is in writing > > > > create_connection(HTTPProtocol, "some.where.net", 80) > > > > as opposed to just writing something like > > > > HTTPProtocol(TCPTransport("some.where.net", 80)) Except that you probably want the protocol to outlive the transport if you want to deal with reconnections or connection failures, and therefore: TCPClient(HTTPProtocol(), ("some.where.net", 80)) Regards Antoine. From cs at zip.com.au Sat Jan 19 01:30:38 2013 From: cs at zip.com.au (Cameron Simpson) Date: Sat, 19 Jan 2013 11:30:38 +1100 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: Message-ID: <20130119003038.GA15133@cskk.homeip.net> On 18Jan2013 15:01, Guido van Rossum wrote: | It is also probably the most useful, since it is | *possible* that the parent process forks a child and then exits | itself, where the child does all the work of the pipeline. For me, even common. I often make grandchildren instead of children when only the I/O matters so that I don't leave zombies around, nor spurious processes to interfere with wait calls. -- Cameron Simpson To have no errors Would be life without meaning No struggle, no joy - Haiku Error Messages http://www.salonmagazine.com/21st/chal/1998/02/10chal2.html From james.d.harding at siemens.com Sat Jan 19 01:35:46 2013 From: james.d.harding at siemens.com (Harding, James) Date: Sat, 19 Jan 2013 00:35:46 +0000 Subject: [Python-ideas] Regarding 'const' and 'require' statements Message-ID: Thank you everyone for your comments. I wish to retract my idea due to some killer issues. First, if a module had platform dependencies in defining constants, my scheme would not be able to handle that. The idea fails because many common situations would not work. Python should work for all. Second, this scheme would require some sort of time stamping of files where constants were taken from in order to see if a re-compile is necessary. The time needed for time-stamp checks would likely be more than any time saved by using constant names. Now, back to the shadows. Thank you, James Harding -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Jan 19 01:42:20 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 13:42:20 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: <50F9EBEC.2090106@canterbury.ac.nz> Guido van Rossum wrote: > I like everyone using the same names for these > callback methods, so that a reader (who is familiar with the > transport/protocol API) can instantly know what kind of callback it is > and what its arguments are. You don't seem to follow this philosophy anywhere else in the PEP, though. In all the other places a callback is specified, you get to pass in an arbitrary function. The PEP offers no rationale as to why transports should be the odd one out. > You can write an alternative framework using coroutines and callbacks, > bypassing transports and protocols. (You'll still need Futures.) > However you'd be missing the interoperability offered by the > protocol/transport abstractions: in an IOCP world you'd have to > interact with the event loop's callbacks differently than in a > select/poll/etc. world. I was hoping there would be a slightly higher-level layer, that provides a coroutine interface but hides the platform differences. What would you think of the idea of making the Transport objects themselves fill both roles, by having read_async and write_async methods? They wouldn't have to do any buffering, I'd be happy to wrap another object around it if I wanted that. -- Greg From guido at python.org Sat Jan 19 02:16:54 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 18 Jan 2013 17:16:54 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50F9EBEC.2090106@canterbury.ac.nz> References: <50F8F725.20505@canterbury.ac.nz> <50F9EBEC.2090106@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 4:42 PM, Greg Ewing wrote: > Guido van Rossum wrote: >> I like everyone using the same names for these >> callback methods, so that a reader (who is familiar with the >> transport/protocol API) can instantly know what kind of callback it is >> and what its arguments are. > You don't seem to follow this philosophy anywhere else in > the PEP, though. In all the other places a callback is > specified, you get to pass in an arbitrary function. > The PEP offers no rationale as to why transports should > be the odd one out. Well, yes, it *is* the odd one (or two, counting start_serving()) out. That's because it is the high-level API. >> You can write an alternative framework using coroutines and callbacks, >> bypassing transports and protocols. (You'll still need Futures.) >> However you'd be missing the interoperability offered by the >> protocol/transport abstractions: in an IOCP world you'd have to >> interact with the event loop's callbacks differently than in a >> select/poll/etc. world. > I was hoping there would be a slightly higher-level layer, > that provides a coroutine interface but hides the platform > differences. Hm, Transports+Protocols *is* the higher level layer. > What would you think of the idea of making the Transport > objects themselves fill both roles, by having read_async > and write_async methods? They wouldn't have to do any > buffering, I'd be happy to wrap another object around it > if I wanted that. You could code that up very simply using sock_recv() and sock_sendall(). But everyone who's thought about performance of select/poll/etc., seems to think that that is not a good model because it will cause many extra calls to add/remove reader/writer. -- --Guido van Rossum (python.org/~guido) From glyph at twistedmatrix.com Sat Jan 19 02:23:50 2013 From: glyph at twistedmatrix.com (Glyph) Date: Fri, 18 Jan 2013 17:23:50 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> Message-ID: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> On Jan 18, 2013, at 4:12 PM, Guido van Rossum wrote: > Glyph should really answer this one. Thanks for pointing it out to me, keeping up with python-ideas is always a challenge :). > Personally I don't feel strongly > either way for this case. There may be an advantage to not calling the > protocol factory if the connection can't be made (in which case the > Future returned by create_connection() has the exception). > On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing wrote: >> Guido van Rossum wrote: >> >>> Well, except that you can't just pass CallbackProtocol where a >>> protocol factory is required by the PEP -- you'll have to pass a >>> lambda or partial function without arguments that calls >>> CallbackProtocol with some arguments taken from elsewhere. >> >> Something smells wrong to me about APIs that require protocol >> factories. For starters, nothing "smells wrong" to me about protocol factories. Responding to this kind of criticism is difficult, because it's not substantive - what's the actual problem? I think that some Python programmers have an aversion to factories because a common path to Python is flight from Java environments that over- or mis-use the factory pattern. >> I don't see what advantage there is in writing >> >> create_connection(HTTPProtocol, "some.where.net", 80) >> >> as opposed to just writing something like >> >> HTTPProtocol(TCPTransport("some.where.net", 80)) Guido mentioned one advantage already; you don't have to create the protocol object if the connection fails, so your protocol objects are real honest-to-goodness connections, not "well, maybe there's a connection or maybe there'll be a connection later". To be fair, this is rarely of practical utility, but in edge cases where you are doing something like, "simultaneously try to connect to these 1000 hosts, and give up on all outstanding connections when the first 3 connections succeed", being able to avoid all the construction overhead for your protocols if they're not going to be used is nice. There's a more pressing issue of correctness though: even if you create the protocol in advance, you really don't want to tell it about the transport until the transport truly exists. The connection to some.where.net (by which I mean, ahem, "somewhere.example.com"; "where.net" will not thank you if you ignore BCP 32 in the documentation or examples) might fail, and if the client wants to issue a client greeting, it should not have access to its half-formed transport before that failure. Of course, it's possible to present an API that works around this by buffering writes issued before the connection is established, and by the protocol waiting for the connection_made callback before actually doing its work. Finally, using a factory also makes client-creating and server-creating code more symmetrical, since you clearly need a protocol factory in the listening-socket case. If your main example protocol is HTTP, this doesn't make sense*, but once you start trying to do things like SIP or XMPP, where the participants in a connection are really peers, having the structure be similar is handy. In the implementation, it's nice to have things set up this way so that the order of the protocol<->transport symmetric setup is less important and by the time the appropriate methods are being invoked, everybody knows about everybody else. The transport can't really have a reference to the protocol in the protocol's constructor. *: Unless you're doing this, of course . However, aside from the factory-or-not issue, the fact that TCPTransport's name implies that it is both (1) a class and (2) the actual transport implementation, is more problematic. TCPTransport will need multiple backends for different multiplexing and I/O mechanisms. This is why I keep bringing up IOCP; this is a major API where the transport implementation is actually quite different. In Twisted, they're entirely different classes. They could probably share a bit more implementation than they do and reduce a little duplication, but it's nice that they don't have to. You don't want to burden application code with picking the right one, and it's ugly to smash the socket-implementation-selection into a class. (create_connection really ought to be a method on an event-loop object of some kind, which produces the appropriate implementation. I think right now it implicitly looks it up in thread-local storage for the "current" main loop, and I'd rather it were more explicit, but the idea is the same.) Your example is misleadingly named; surely you mean TCPClient, because a TCPTransport would implicitly support both clients and servers - and a server would start with a socket returned from accept(), not a host and port. (Certainly not a DNS host name.) create_connection will actually need to create multiple sockets internally. See covers this, in part (for a more condensed discussion, see ). >> You're going to have to use the latter style anyway to set up >> anything other than the very simplist configurations, e.g. >> your earlier 4-layer protocol stack example. I don't see how this is true. I've written layered protocols over and over again in Twisted and never wanted to manually construct the bottom transport for that reason.* In fact, the more elaborate multi-layered structures you have to construct when a protocol finishes connecting, the more you want to avoid being required to do it in advance of actually needing the protocols to exist. *: I _have_ had to manually construct transports to deal with some fiddly performance-tuning issues, but those are just deficiencies in the existing transport implementation that ought to be remedied. >> So create_connection() can't be anything more than a convenience >> function, and unless I'm missing something, it hardly seems to >> add enough convenience to be worth the bother. *Just* implementing the multiple-parallel-connection-attempts algorithm required to deal with the IPv6 transition period would be enough convenience to be worth having a function, even if none of the other stuff I just wrote applied :). -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sat Jan 19 05:16:17 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 17:16:17 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130119011955.644003f3@pitrou.net> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <20130119011955.644003f3@pitrou.net> Message-ID: <50FA1E11.1060107@canterbury.ac.nz> Antoine Pitrou wrote: > Except that you probably want the protocol to outlive the transport if > you want to deal with reconnections or connection failures, and > therefore: > > TCPClient(HTTPProtocol(), ("some.where.net", 80)) I don't see how to generalise that to more complicated protocol stacks, though. For dealing with re-connections, it seems like both the protocol *and* the transport need to outlive the connection failure, and the transport needs a reconnect() method that is called by a protocol that can deal with that situation. Reconnection can then propagate along the whole chain. -- Greg From greg.ewing at canterbury.ac.nz Sat Jan 19 07:05:35 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sat, 19 Jan 2013 19:05:35 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: <50FA37AF.40306@canterbury.ac.nz> Glyph wrote: > I think that some Python > programmers have an aversion to factories because a common path to > Python is flight from Java environments that over- or mis-use the > factory pattern. I'm not averse to using the factory pattern when it genuinely helps. I'm questioning whether it helps enough in this case to be worth using. > Guido mentioned one advantage already; you don't have to create the > protocol object if the connection fails, so your protocol objects are > real honest-to-goodness connections, not "well, maybe there's a > connection or maybe there'll be a connection later". I would suggest that merely instantiating a protocol object should be cheap enough that you don't normally care. Any substantive setup work should be done in the connection_made() method, not in __init__(). Transports are already a "maybe there's a connection" kind of deal, otherwise why does connection_made() exist at all? > if the client wants to issue > a client greeting, it should not have access to its half-formed > transport before that failure. Of course, it's possible to present an > API that works around this by buffering writes issued before the > connection is established, and by the protocol waiting for the > connection_made callback before actually doing its work. Which it seems to me is the way *all* protocols should be written. If necessary, you could "encourage" people to write them this way by having a transport refuse to accept any writes until the connection_made() call has occurred. > However, aside from the factory-or-not issue, the fact that > TCPTransport's name implies that it is both (1) a class and (2) the > actual transport implementation, is more problematic. They don't have to be classes, they could be functions: create_http_protocol(create_tcp_transport("hammerme.seeificare.com", 80)) The important thing is that each function concerns itself with just one step of the chain, and chains of any length can be constructed by composing them in the obvious way. > Your example is misleadingly named; surely you mean TCP*Client*, because > a TCP*Transport* would implicitly support both clients and servers - and > a server would start with a socket returned from accept(), not a host > and port. Maybe. Or maybe the constructor could be called in more than one way -- create_tcp_transport(host, port) on the client side and create_tcp_transport(socket) on the server side. > > create_connection will actually need to create multiple sockets > internally. See covers this, in > part (for a more condensed discussion, see > ). Couldn't all that be handled inside the transport? > I've written layered protocols over and > over again in Twisted and never wanted to manually construct the bottom > transport for that reason. So what does the code for setting up a multi-layer stack look like? How does it make use of create_connection()? Also, what does an implementation of create_connection() look like that avoids creating the protocol until the connection is made? It seems tricky, because the way you know the connection is made is that it calls connection_made() on the protocol. But there's no protocol yet. So you would have to install a temporary protocol whose connection_made() creates the real protocol. That sounds like it could be even more overhead than just creating the real protocol in the first place, as long as the protocol doesn't do any work until its connection_made() is called. -- Greg From shane at umbrellacode.com Sat Jan 19 07:20:29 2013 From: shane at umbrellacode.com (Shane Green) Date: Fri, 18 Jan 2013 22:20:29 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <50FA1E11.1060107@canterbury.ac.nz> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <20130119011955.644003f3@pitrou.net> <50FA1E11.1060107@canterbury.ac.nz> Message-ID: Just like there's no reason for having a protocol without a transport, it seems like there's no reason for a transport without a connection, and that separating the two might further normalize differences between client and server channels Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 18, 2013, at 8:16 PM, Greg Ewing wrote: > Antoine Pitrou wrote: >> Except that you probably want the protocol to outlive the transport if >> you want to deal with reconnections or connection failures, and >> therefore: >> TCPClient(HTTPProtocol(), ("some.where.net", 80)) > > I don't see how to generalise that to more complicated > protocol stacks, though. > > For dealing with re-connections, it seems like both the > protocol *and* the transport need to outlive the connection > failure, and the transport needs a reconnect() method that > is called by a protocol that can deal with that situation. > Reconnection can then propagate along the whole chain. > > -- > Greg > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s at daniel.shahaf.name Sat Jan 19 11:10:24 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sat, 19 Jan 2013 12:10:24 +0200 Subject: [Python-ideas] chdir context manager Message-ID: <20130119101024.GB2969@lp-shahaf.local> The following is a common pattern (used by, for example, shutil.make_archive): save_cwd = os.getcwd() try: foo() finally: os.chdir(save_cwd) I suggest this deserves a context manager: with saved_cwd(): foo() Initial feedback on IRC suggests shutil as where this functionality should live (other suggestions were made, such as pathlib). Hence, attached patch implements this as shutil.saved_cwd, based on os.fchdir. The patch also adds os.chdir to os.supports_dir_fd and documents the context manager abilities of builtins.open() in its reference. Thoughts? Thanks, Daniel diff -r 74b0461346f0 Doc/library/functions.rst --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000 @@ -828,6 +828,9 @@ are always available. They are listed h Open *file* and return a corresponding :term:`file object`. If the file cannot be opened, an :exc:`OSError` is raised. + This function can be used as a :term:`context manager` that closes the + file when it exits. + *file* is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor diff -r 74b0461346f0 Doc/library/os.rst --- a/Doc/library/os.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/os.rst Sat Jan 19 09:39:27 2013 +0000 @@ -1315,6 +1315,9 @@ features: This function can support :ref:`specifying a file descriptor `. The descriptor must refer to an opened directory, not an open file. + See also :func:`shutil.saved_cwd` for a context manager that restores the + current working directory. + Availability: Unix, Windows. .. versionadded:: 3.3 diff -r 74b0461346f0 Doc/library/shutil.rst --- a/Doc/library/shutil.rst Fri Jan 18 17:53:18 2013 -0800 +++ b/Doc/library/shutil.rst Sat Jan 19 09:39:27 2013 +0000 @@ -36,6 +36,19 @@ copying and removal. For operations on i Directory and files operations ------------------------------ +.. function:: saved_cwd() + + Return a :term:`context manager` that restores the current working directory + when it exits. See :func:`os.chdir` for changing the current working + directory. + + The context manager returns an open file descriptor for the saved directory. + + Only available when :func:`os.chdir` supports file descriptor arguments. + + .. versionadded:: 3.4 + + .. function:: copyfileobj(fsrc, fdst[, length]) Copy the contents of the file-like object *fsrc* to the file-like object *fdst*. diff -r 74b0461346f0 Lib/os.py --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000 @@ -120,6 +120,7 @@ if _exists("_have_functions"): _set = set() _add("HAVE_FACCESSAT", "access") + _add("HAVE_FCHDIR", "chdir") _add("HAVE_FCHMODAT", "chmod") _add("HAVE_FCHOWNAT", "chown") _add("HAVE_FSTATAT", "stat") diff -r 74b0461346f0 Lib/shutil.py --- a/Lib/shutil.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/shutil.py Sat Jan 19 09:39:27 2013 +0000 @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c "unregister_unpack_format", "unpack_archive", "ignore_patterns", "chown", "which"] # disk_usage is added later, if available on the platform + # saved_cwd is added later, if available on the platform class Error(OSError): pass @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p if _access_check(name, mode): return name return None + +# Define the chdir context manager. +if os.chdir in os.supports_dir_fd: + class saved_cwd: + def __init__(self): + pass + def __enter__(self): + self.dh = os.open(os.curdir, + os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)) + return self.dh + def __exit__(self, exc_type, exc_value, traceback): + try: + os.chdir(self.dh) + finally: + os.close(self.dh) + return False + __all__.append('saved_cwd') diff -r 74b0461346f0 Lib/test/test_shutil.py --- a/Lib/test/test_shutil.py Fri Jan 18 17:53:18 2013 -0800 +++ b/Lib/test/test_shutil.py Sat Jan 19 09:39:27 2013 +0000 @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase): rv = shutil.copytree(src_dir, dst_dir) self.assertEqual(['foo'], os.listdir(rv)) + def test_saved_cwd(self): + if hasattr(os, 'fchdir'): + temp_dir = self.mkdtemp() + orig_dir = os.getcwd() + with shutil.saved_cwd() as dir_fd: + os.chdir(temp_dir) + new_dir = os.getcwd() + self.assertIsInstance(dir_fd, int) + final_dir = os.getcwd() + self.assertEqual(orig_dir, final_dir) + self.assertEqual(temp_dir, new_dir) + else: + self.assertFalse(hasattr(shutil, 'saved_cwd')) + class TestWhich(unittest.TestCase): From _ at lvh.cc Sat Jan 19 11:19:41 2013 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 19 Jan 2013 11:19:41 +0100 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: +1 On Sat, Jan 19, 2013 at 11:10 AM, Daniel Shahaf wrote: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() > > Initial feedback on IRC suggests shutil as where this functionality > should live (other suggestions were made, such as pathlib). Hence, > attached patch implements this as shutil.saved_cwd, based on os.fchdir. > > The patch also adds os.chdir to os.supports_dir_fd and documents the > context manager abilities of builtins.open() in its reference. > > Thoughts? > > Thanks, > > Daniel > > > diff -r 74b0461346f0 Doc/library/functions.rst > --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -828,6 +828,9 @@ are always available. They are listed h > Open *file* and return a corresponding :term:`file object`. If the > file > cannot be opened, an :exc:`OSError` is raised. > > + This function can be used as a :term:`context manager` that closes the > + file when it exits. > + > *file* is either a string or bytes object giving the pathname > (absolute or > relative to the current working directory) of the file to be opened or > an integer file descriptor of the file to be wrapped. (If a file > descriptor > diff -r 74b0461346f0 Doc/library/os.rst > --- a/Doc/library/os.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/os.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -1315,6 +1315,9 @@ features: > This function can support :ref:`specifying a file descriptor > `. The > descriptor must refer to an opened directory, not an open file. > > + See also :func:`shutil.saved_cwd` for a context manager that restores > the > + current working directory. > + > Availability: Unix, Windows. > > .. versionadded:: 3.3 > diff -r 74b0461346f0 Doc/library/shutil.rst > --- a/Doc/library/shutil.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/shutil.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -36,6 +36,19 @@ copying and removal. For operations on i > Directory and files operations > ------------------------------ > > +.. function:: saved_cwd() > + > + Return a :term:`context manager` that restores the current working > directory > + when it exits. See :func:`os.chdir` for changing the current working > + directory. > + > + The context manager returns an open file descriptor for the saved > directory. > + > + Only available when :func:`os.chdir` supports file descriptor > arguments. > + > + .. versionadded:: 3.4 > + > + > .. function:: copyfileobj(fsrc, fdst[, length]) > > Copy the contents of the file-like object *fsrc* to the file-like > object *fdst*. > diff -r 74b0461346f0 Lib/os.py > --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000 > @@ -120,6 +120,7 @@ if _exists("_have_functions"): > > _set = set() > _add("HAVE_FACCESSAT", "access") > + _add("HAVE_FCHDIR", "chdir") > _add("HAVE_FCHMODAT", "chmod") > _add("HAVE_FCHOWNAT", "chown") > _add("HAVE_FSTATAT", "stat") > diff -r 74b0461346f0 Lib/shutil.py > --- a/Lib/shutil.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/shutil.py Sat Jan 19 09:39:27 2013 +0000 > @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c > "unregister_unpack_format", "unpack_archive", > "ignore_patterns", "chown", "which"] > # disk_usage is added later, if available on the platform > + # saved_cwd is added later, if available on the platform > > class Error(OSError): > pass > @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p > if _access_check(name, mode): > return name > return None > + > +# Define the chdir context manager. > +if os.chdir in os.supports_dir_fd: > + class saved_cwd: > + def __init__(self): > + pass > + def __enter__(self): > + self.dh = os.open(os.curdir, > + os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)) > + return self.dh > + def __exit__(self, exc_type, exc_value, traceback): > + try: > + os.chdir(self.dh) > + finally: > + os.close(self.dh) > + return False > + __all__.append('saved_cwd') > diff -r 74b0461346f0 Lib/test/test_shutil.py > --- a/Lib/test/test_shutil.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/test/test_shutil.py Sat Jan 19 09:39:27 2013 +0000 > @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase): > rv = shutil.copytree(src_dir, dst_dir) > self.assertEqual(['foo'], os.listdir(rv)) > > + def test_saved_cwd(self): > + if hasattr(os, 'fchdir'): > + temp_dir = self.mkdtemp() > + orig_dir = os.getcwd() > + with shutil.saved_cwd() as dir_fd: > + os.chdir(temp_dir) > + new_dir = os.getcwd() > + self.assertIsInstance(dir_fd, int) > + final_dir = os.getcwd() > + self.assertEqual(orig_dir, final_dir) > + self.assertEqual(temp_dir, new_dir) > + else: > + self.assertFalse(hasattr(shutil, 'saved_cwd')) > + > > class TestWhich(unittest.TestCase): > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From _ at lvh.cc Sat Jan 19 11:28:07 2013 From: _ at lvh.cc (Laurens Van Houtven) Date: Sat, 19 Jan 2013 11:28:07 +0100 Subject: [Python-ideas] PEP 3156 / Tulip question: write/send callback/future In-Reply-To: References: Message-ID: Also, ISTR that it's not always possible to consistently have that behavior everywhere (i.e. have it in the first place or fake it where it's not directly available), so it's of somewhat limited utility, since a protocol can't actually rely on it existing. Most behavior that requires it is generally implemented using IPushProducer/IPullProducer (i.e. the pause/resume API Guido mentioned earlier). There's been some attempts at work towards a better producer/consumer API (e.g. supporting things like buffer changes, and generally just simplifying things that seem duplicated amongst transports and consumers/producers) called 'tubes', but I don't think any of that is ready enough to be a template for tulip :) On Fri, Jan 18, 2013 at 11:25 PM, Guido van Rossum wrote: > On Fri, Jan 18, 2013 at 1:40 PM, Eli Bendersky wrote: > > On Fri, Jan 18, 2013 at 1:02 PM, Guido van Rossum > wrote: > >> > >> On Fri, Jan 18, 2013 at 6:56 AM, Eli Bendersky > wrote: > >> > I'm looking through PEP 3156 and the Tulip code, and either something > is > >> > missing or I'm not looking in the right places. > >> > > >> > I can't find any sort of callback / future return for asynchronous > >> > writes, > >> > e.g. in transport. > >> > >> I guess you should read some Twisted tutorial. :-) > > > > > > Yes, I noticed that Twisted also doesn't have it, so I suspected that > > influence. > > > >> > >> > >> > Should there be no "data_sent" parallel to "data_received" somewhere? > >> > Or, > >> > alternatively, "write" returning some sort of future that can be > checked > >> > later for status? For connections that aren't infinitely fast it's > >> > useful to > >> > know when the data was actually sent/written, or alternatively if an > >> > error > >> > has occurred. This is also important for when writing would actually > >> > block > >> > because of full buffers. boost::asio has such a handler for > async_write. > >> > >> The model is a little different. Glyph has convinced me that it works > >> well in practice. We just buffer what is written (when it can't all be > >> sent immediately). This is enough for most apps that don't serve 100MB > >> files. If the buffer becomes too large, the transport will call > >> .pause() on the protocol until it is drained, then it calls .resume(). > >> (The names of these are TBD, maybe they will end up .pause_writing() > >> and .resume_writing().) There are some default behaviors that we can > >> add here too, e.g. suspending the task. > >> > > > > I agree it can be made to work, but how would even simple "done sending" > > notification work? Or "send error" for that matter? AFAIR, low-level > socket > > async API do provide this information. Are we confident enough it will > never > > be needed to simply hide it away? > > AFAIK the Twisted folks have found that most of the time (basically > all of the time) you don't need a positive "done sending" > notification; when the send eventually *fails*, the transport calls > the protocol's connection_lost() method with an exception indicating > what failed. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- cheers lvh -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Jan 19 12:15:14 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 19 Jan 2013 11:15:14 +0000 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On 19 January 2013 10:10, Daniel Shahaf wrote: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() +1. I've written this myself many times... Paul From p.f.moore at gmail.com Sat Jan 19 13:12:52 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 19 Jan 2013 12:12:52 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 18 January 2013 22:53, Guido van Rossum wrote: > I can probably fairly quickly modify >> your code to demonstrate, but it's late and I don't want to start >> booting my Unix environment now, so it'll have to wait till tomorrow >> :-) > > I would love for you to create that version. I only checked it in so I > could point to it -- I am not happy with either the implementation, > the API spec, or the unit test... May be a few days before I can get to it. Apparently when Ubuntu installs an automatic upgrade, it feels that it's OK to break the wireless drivers. I now have the choice of scouring the internet on another PC to find possible solutions (so far that approach is a waste of time...), or reinstalling the OS. How do you Linux users put up with this sort of thing? :-) Seriously, I'm probably going to have to build a VM so I don't get this sort of unnecessary hardware issue holding me up. Paul From jsbueno at python.org.br Sat Jan 19 13:17:57 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Sat, 19 Jan 2013 10:17:57 -0200 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: > On 19 January 2013 10:10, Daniel Shahaf wrote: >> I suggest this deserves a context manager: >> >> with saved_cwd(): >> foo() > But if doing that, why does "foo" have to implement the directory changing itself? Why not something along: with temp_dir("/tmp"): # things that perform in /tmp # directory is restored. Of course that one function could do both things, depending ob wether a parameter is passed. js -><- From d.s at daniel.shahaf.name Sat Jan 19 13:33:29 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sat, 19 Jan 2013 14:33:29 +0200 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: <20130119123329.GD2969@lp-shahaf.local> Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200: > Why not something along: > > with temp_dir("/tmp"): > # things that perform in /tmp > > # directory is restored. > > Of course that one function could do both things, > depending ob wether a parameter is passed. > +1 From tjreedy at udel.edu Sat Jan 19 14:37:17 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 19 Jan 2013 08:37:17 -0500 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On 1/19/2013 5:10 AM, Daniel Shahaf wrote: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() This strikes me as not a proper context manager. A context manager should create a temporary, altered context. One way is to add something that is deleted on exit. File as context managers are the typical example. Another way is to alter something after saving the restore info, and restoring on exit. An example would be a context manager to temporarily change stdout. (Do we have one? If not, it would be at least as generally useful as this proposal.) So to me, your proposal is only 1/2 or 2/3 of a context manager. (And 'returns an open file descriptor for the saved directory' seems backward or wrong for a context manager.) It does not actually make a new context. A proper temp_cwd context manager should have one parameter, the new working directory, with chdir(new_cwd) in the enter method. To allow for conditional switching, the two chdir system calls could be conditional on new_cwd (either None or '' would mean no chdir calls). Looking at your pattern, if foo() does not change cwd, the save and restore is pointless, even if harmless. If foo does change cwd, it should also restore it, whether explicitly or with a context manager temp_cwd. Looking at your actual example, shutil.make_archive, the change and restore are conditional and asymmetrical. save_cwd = os.getcwd() if root_dir is not None: ... if not dry_run: os.chdir(root_dir) ... finally: if root_dir is not None: ... os.chdir(save_cwd) The initial chdir is conditional on dry_run (undocumented, but passed on to the archive function), the restore is not. Since I believe not switching on dry_runs is just a minor optimization, I believe that that condition could be dropped and the code re-written as with new_cwd(root_dir): ... I am aware that this would require a change in the finally logging, but that would be true of the original proposal also. -- Terry Jan Reedy From p.f.moore at gmail.com Sat Jan 19 14:37:37 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 19 Jan 2013 13:37:37 +0000 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119123329.GD2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> <20130119123329.GD2969@lp-shahaf.local> Message-ID: On 19 January 2013 12:33, Daniel Shahaf wrote: >> Of course that one function could do both things, >> depending ob wether a parameter is passed. >> > > +1 Yes, that's a better idea. Paul From vinay_sajip at yahoo.co.uk Sat Jan 19 14:46:26 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sat, 19 Jan 2013 13:46:26 +0000 (UTC) Subject: [Python-ideas] chdir context manager References: <20130119101024.GB2969@lp-shahaf.local> <20130119123329.GD2969@lp-shahaf.local> Message-ID: Daniel Shahaf writes: > Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200: > > with temp_dir("/tmp"): > > # things that perform in /tmp > > # directory is restored. > > +1 I implemented this in distlib as: @contextlib.contextmanager def chdir(d): cwd = os.getcwd() try: os.chdir(d) yield finally: os.chdir(cwd) which could perhaps be placed in shutil, so usage would be: with shutil.chdir('new_dir'): # work with new_dir as current dir # directory restored when you get here. Regards, Vinay Sajip From phd at phdru.name Sat Jan 19 15:02:19 2013 From: phd at phdru.name (Oleg Broytman) Date: Sat, 19 Jan 2013 18:02:19 +0400 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> <20130119123329.GD2969@lp-shahaf.local> Message-ID: <20130119140219.GA10303@iskra.aviel.ru> On Sat, Jan 19, 2013 at 01:46:26PM +0000, Vinay Sajip wrote: > Daniel Shahaf writes: > > > Joao S. O. Bueno wrote on Sat, Jan 19, 2013 at 10:17:57 -0200: > > > with temp_dir("/tmp"): > > > # things that perform in /tmp > > > # directory is restored. > > > > +1 > > I implemented this in distlib as: > > @contextlib.contextmanager > def chdir(d): > cwd = os.getcwd() > try: > os.chdir(d) > yield > finally: > os.chdir(cwd) > > which could perhaps be placed in shutil, so usage would be: > > with shutil.chdir('new_dir'): > # work with new_dir as current dir > # directory restored when you get here. Pushd or pushdir would be a better name, IMHO. https://en.wikipedia.org/wiki/Pushd_and_popd Quite a known pair of names. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From ncoghlan at gmail.com Sat Jan 19 15:57:47 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 00:57:47 +1000 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: -1 from me I consider caring about the current directory to be an anti-pattern - paths should be converted to absolute ASAP, and for invocation of other tools that care about the current directory, that's why the subprocess APIs accept a "cwd" argument. I certainly don't want to encourage people to unnecessarily rely on global state by providing a standard library context manager that makes it easier to do so. Regards, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From d.s at daniel.shahaf.name Sat Jan 19 16:06:31 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sat, 19 Jan 2013 17:06:31 +0200 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: <20130119150631.GF2969@lp-shahaf.local> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500: > On 1/19/2013 5:10 AM, Daniel Shahaf wrote: >> The following is a common pattern (used by, for example, >> shutil.make_archive): >> >> save_cwd = os.getcwd() >> try: >> foo() >> finally: >> os.chdir(save_cwd) >> >> I suggest this deserves a context manager: >> >> with saved_cwd(): >> foo() > > So to me, your proposal is only 1/2 or 2/3 of a context manager. (And > 'returns an open file descriptor for the saved directory' seems backward > or wrong for a context manager.) It does not actually make a new What should __enter__ return, then? It could return None, the to-be-restored directory's file descriptor, or the newly-changed-to directory (once a "directory to chdir to" optional argument is added). The latter could be either a pathname (string) or a file descriptor (since it's just passed through to os.chdir). It seems to me returning the old dir's fd would be the most useful of the three option, since the other two are things callers already have --- None, which is global, and the argument to the context manager. > context. A proper temp_cwd context manager should have one parameter, > the new working directory, with chdir(new_cwd) in the enter method. To > allow for conditional switching, the two chdir system calls could be > conditional on new_cwd (either None or '' would mean no chdir calls). > I think making the new_cwd argument optional would be useful if the context manager body does multiple chdir() calls: with saved_cwd(): os.chdir('/foo') do_something() os.chdir('/bar') do_something() I'm not sure if that's exactly what you suggest --- you seem to be suggesting that saved_cwd(None) will avoid calling fchdir() from __exit__()? > Looking at your pattern, if foo() does not change cwd, the save and > restore is pointless, even if harmless. Do you have a better suggestion? Determining whether the fchdir() call can be avoided, if possible, presumably requires a system call, so I figure you might as well call fchdir() without trying to make that determination. > If foo does change cwd, it should also restore it, whether explicitly > or with a context manager temp_cwd. > > Looking at your actual example, shutil.make_archive, the change and > restore are conditional and asymmetrical. > shutil.make_archive is just the first place in stdlib which uses the pattern, or something close to it. It's not exactly a canonical example. There are some canonical examples of the "pattern" in test_os.py. > -- > Terry Jan Reedy Cheers Daniel From ironfroggy at gmail.com Sat Jan 19 16:27:56 2013 From: ironfroggy at gmail.com (Calvin Spealman) Date: Sat, 19 Jan 2013 10:27:56 -0500 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: -1 from me, as well. Encouraging a bad habit. On Sat, Jan 19, 2013 at 9:57 AM, Nick Coghlan wrote: > -1 from me > > I consider caring about the current directory to be an anti-pattern - > paths should be converted to absolute ASAP, and for invocation of > other tools that care about the current directory, that's why the > subprocess APIs accept a "cwd" argument. I certainly don't want to > encourage people to unnecessarily rely on global state by providing a > standard library context manager that makes it easier to do so. > > Regards, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://techblog.ironfroggy.com/ Follow me if you're into that sort of thing: http://www.twitter.com/ironfroggy -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian at python.org Sat Jan 19 16:33:44 2013 From: christian at python.org (Christian Heimes) Date: Sat, 19 Jan 2013 16:33:44 +0100 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: <50FABCD8.9080709@python.org> Am 19.01.2013 11:10, schrieb Daniel Shahaf: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() -1 from me, too. chdir() is not a safe operation because if affects the whole process. You can NOT make it work properly and safe in a multi-threaded environment or from code like signal handlers. The Open Group has acknowledged the issue and added a new set of functions to POSIX.1-2008 in order to address the issue. The *at() variants of functions like open() take an additional file descriptor as first argument. The fd must refer to a directory and is used as base for relative paths. Python 3.3 supports the new *at() feature. Christian From christian at python.org Sat Jan 19 16:40:32 2013 From: christian at python.org (Christian Heimes) Date: Sat, 19 Jan 2013 16:40:32 +0100 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: <50FABE70.7040902@python.org> Am 19.01.2013 16:27, schrieb Calvin Spealman: > -1 from me, as well. Encouraging a bad habit. It's not just bad habit. It's broken by design because it's a major race condition. From eliben at gmail.com Sat Jan 19 16:53:12 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sat, 19 Jan 2013 07:53:12 -0800 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf wrote: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() > > Initial feedback on IRC suggests shutil as where this functionality > should live (other suggestions were made, such as pathlib). Hence, > attached patch implements this as shutil.saved_cwd, based on os.fchdir. > > The patch also adds os.chdir to os.supports_dir_fd and documents the > context manager abilities of builtins.open() in its reference. > > Thoughts? > > I don't think that every trivial convenience context manager should be added to the standard library. It's just "yet another thing to look up". As the discussion shows, the semantics of such a context manager are unclear (does it do the change-dir itself or does the user code do it?), which makes it even more important to look-up once you see it. Moreover, this kind of a pattern is too general and specializing it for each use case is burdensome. I've frequently written similar context managers for other uses. The pattern is: saved = save_call() yield restore_call(saved) You can have it for chdir, for sys.path, for seek position in stream, for anything really where it may be useful to do some operation with a temporary state. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sat Jan 19 18:18:18 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jan 2013 09:18:18 -0800 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On Sat, Jan 19, 2013 at 6:57 AM, Nick Coghlan wrote: > -1 from me > > I consider caring about the current directory to be an anti-pattern - > paths should be converted to absolute ASAP, and for invocation of > other tools that care about the current directory, that's why the > subprocess APIs accept a "cwd" argument. I certainly don't want to > encourage people to unnecessarily rely on global state by providing a > standard library context manager that makes it easier to do so. Also it's not thread-safe. TBH I think if people are doing this today it's probably a good idea to suggest that they make their code more reliable by turning it into a context manager; but I think having that context manager in the stdlib is encouraging dubious practices. (The recommendation to use absolute filenames is a good one but not always easy to implement given a large codebase relying on the current directory.) -- --Guido van Rossum (python.org/~guido) From vinay_sajip at yahoo.co.uk Sat Jan 19 18:29:33 2013 From: vinay_sajip at yahoo.co.uk (Vinay Sajip) Date: Sat, 19 Jan 2013 17:29:33 +0000 (UTC) Subject: [Python-ideas] chdir context manager References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: Nick Coghlan writes: > > -1 from me > > I consider caring about the current directory to be an anti-pattern I would agree, but in some places we unfortunately have to care about this, because of stdlib history - for example, distutils. Wherever you have to do "python setup.py ..." there is an implicit assumption that anything setup.py looks at will be relative to wherever the setup.py is - it's seldom invoked as "python /path/to/setup.py", and from what I've seen, very few projects do the right thing in their setup.py and code called from it in terms of getting an absolute path for the directory setup.py is in, and then using it in subsequent operations. I agree that we shouldn't encourage this kind of behaviour :-) Regards, Vinay Sajip From ben at bendarnell.com Sat Jan 19 18:32:55 2013 From: ben at bendarnell.com (Ben Darnell) Date: Sat, 19 Jan 2013 12:32:55 -0500 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: On Fri, Jan 18, 2013 at 8:23 PM, Glyph wrote: > > On Jan 18, 2013, at 4:12 PM, Guido van Rossum wrote: > > On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing > wrote: > > > Guido van Rossum wrote: > > Well, except that you can't just pass CallbackProtocol where a > protocol factory is required by the PEP -- you'll have to pass a > lambda or partial function without arguments that calls > CallbackProtocol with some arguments taken from elsewhere. > > > Something smells wrong to me about APIs that require protocol > factories. > > > For starters, nothing "smells wrong" to me about protocol factories. > Responding to this kind of criticism is difficult, because it's not > substantive - what's the actual problem? I think that some Python > programmers have an aversion to factories because a common path to Python > is flight from Java environments that over- or mis-use the factory pattern. > > I think the smell is that the factory is A) only used once and B) invoked without adding any additional arguments that weren't available when the factory was passed in, so there's no clear reason to defer creation of the protocol. I think it would make more sense if the transport were passed as an argument to the factory (and then we could get rid of connection_made as a required method on Protocol, although libraries or applications that want to separate protocol creation from connection_made could still do so in their own factories). -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Sat Jan 19 18:40:50 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 19 Jan 2013 12:40:50 -0500 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On 1/19/2013 9:57 AM, Nick Coghlan wrote: > -1 from me > > I consider caring about the current directory to be an anti-pattern - > paths should be converted to absolute ASAP, and for invocation of > other tools that care about the current directory, that's why the > subprocess APIs accept a "cwd" argument. I certainly don't want to > encourage people to unnecessarily rely on global state by providing a > standard library context manager that makes it easier to do so. Are you suggesting then that stdlib functions, such as archive makers, should 1) not require any particular setting of cwd but should have parameters that allow all paths to passed as absolute paths, and 2) not change cwd? If so, then shutil.make_archive should be able to pass absolute source and target paths to the archive makers, rather than having to set cwd before calling them. -- Terry Jan Reedy From guido at python.org Sat Jan 19 19:06:41 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jan 2013 10:06:41 -0800 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: AFAICT shutil.make_archive() already has all the information it needs to be able to do it sjob without using chdir -- it's just being lazy. On Sat, Jan 19, 2013 at 9:40 AM, Terry Reedy wrote: > On 1/19/2013 9:57 AM, Nick Coghlan wrote: >> >> -1 from me >> >> I consider caring about the current directory to be an anti-pattern - >> paths should be converted to absolute ASAP, and for invocation of >> other tools that care about the current directory, that's why the >> subprocess APIs accept a "cwd" argument. I certainly don't want to >> encourage people to unnecessarily rely on global state by providing a >> standard library context manager that makes it easier to do so. > > > Are you suggesting then that stdlib functions, such as archive makers, > should 1) not require any particular setting of cwd but should have > parameters that allow all paths to passed as absolute paths, and 2) not > change cwd? If so, then shutil.make_archive should be able to pass absolute > source and target paths to the archive makers, rather than having to set cwd > before calling them. > > -- > Terry Jan Reedy > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Sat Jan 19 19:07:09 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 19 Jan 2013 13:07:09 -0500 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119150631.GF2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> <20130119150631.GF2969@lp-shahaf.local> Message-ID: On 1/19/2013 10:06 AM, Daniel Shahaf wrote: > Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500: >> On 1/19/2013 5:10 AM, Daniel Shahaf wrote: >>> The following is a common pattern (used by, for example, >>> shutil.make_archive): >>> >>> save_cwd = os.getcwd() >>> try: >>> foo() >>> finally: >>> os.chdir(save_cwd) >>> >>> I suggest this deserves a context manager: >>> >>> with saved_cwd(): >>> foo() >> >> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And >> 'returns an open file descriptor for the saved directory' seems backward >> or wrong for a context manager.) It does not actually make a new > > What should __enter__ return, then? > > It could return None, the to-be-restored directory's file descriptor, or > the newly-changed-to directory (once a "directory to chdir to" optional > argument is added). The latter could be either a pathname (string) or > a file descriptor (since it's just passed through to os.chdir). > > It seems to me returning the old dir's fd would be the most useful of > the three option, since the other two are things callers already have > --- None, which is global, and the argument to the context manager. make_archive would prefer the old dir pathname, as it wants that for the logging call. But I do not think that that should drive design. >> context. A proper temp_cwd context manager should have one parameter, >> the new working directory, with chdir(new_cwd) in the enter method. To >> allow for conditional switching, the two chdir system calls could be >> conditional on new_cwd (either None or '' would mean no chdir calls). >> > > I think making the new_cwd argument optional would be useful if the > context manager body does multiple chdir() calls: > > with saved_cwd(): > os.chdir('/foo') > do_something() > os.chdir('/bar') > do_something() > > I'm not sure if that's exactly what you suggest --- you seem to be > suggesting that saved_cwd(None) will avoid calling fchdir() from > __exit__()? I was, but that is a non-essential optimization. My idea is basically similar to Bueno's except for parameter absent versus None (and the two cases could be handled differently). I think this proposal suffers a bit from being both too specific and too general. Eli explained the 'too specific' part: there are many things that might be changed and changed back. The 'too general' part is that specific applications need different specific details. There are various possibilities of what to do in and return from __enter__. However, given the strong -1 from at least three core developers and one other person, the detail seem moot. -- Terry Jan Reedy -- Terry Jan Reedy From python at mrabarnett.plus.com Sat Jan 19 19:32:21 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sat, 19 Jan 2013 18:32:21 +0000 Subject: [Python-ideas] chdir context manager In-Reply-To: References: <20130119101024.GB2969@lp-shahaf.local> <20130119150631.GF2969@lp-shahaf.local> Message-ID: <50FAE6B5.3030507@mrabarnett.plus.com> On 2013-01-19 18:07, Terry Reedy wrote: > On 1/19/2013 10:06 AM, Daniel Shahaf wrote: >> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500: >>> On 1/19/2013 5:10 AM, Daniel Shahaf wrote: >>>> The following is a common pattern (used by, for example, >>>> shutil.make_archive): >>>> >>>> save_cwd = os.getcwd() >>>> try: >>>> foo() >>>> finally: >>>> os.chdir(save_cwd) >>>> >>>> I suggest this deserves a context manager: >>>> >>>> with saved_cwd(): >>>> foo() >>> >>> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And >>> 'returns an open file descriptor for the saved directory' seems backward >>> or wrong for a context manager.) It does not actually make a new >> >> What should __enter__ return, then? >> >> It could return None, the to-be-restored directory's file descriptor, or >> the newly-changed-to directory (once a "directory to chdir to" optional >> argument is added). The latter could be either a pathname (string) or >> a file descriptor (since it's just passed through to os.chdir). >> >> It seems to me returning the old dir's fd would be the most useful of >> the three option, since the other two are things callers already have >> --- None, which is global, and the argument to the context manager. > > make_archive would prefer the old dir pathname, as it wants that for the > logging call. But I do not think that that should drive design. > >>> context. A proper temp_cwd context manager should have one parameter, >>> the new working directory, with chdir(new_cwd) in the enter method. To >>> allow for conditional switching, the two chdir system calls could be >>> conditional on new_cwd (either None or '' would mean no chdir calls). >>> >> >> I think making the new_cwd argument optional would be useful if the >> context manager body does multiple chdir() calls: >> >> with saved_cwd(): >> os.chdir('/foo') >> do_something() >> os.chdir('/bar') >> do_something() >> >> I'm not sure if that's exactly what you suggest --- you seem to be >> suggesting that saved_cwd(None) will avoid calling fchdir() from >> __exit__()? > > I was, but that is a non-essential optimization. My idea is basically > similar to Bueno's except for parameter absent versus None (and the two > cases could be handled differently). > > I think this proposal suffers a bit from being both too specific and too > general. Eli explained the 'too specific' part: there are many things > that might be changed and changed back. The 'too general' part is that > specific applications need different specific details. There are various > possibilities of what to do in and return from __enter__. > > However, given the strong -1 from at least three core developers and > one other person, the detail seem moot. > FWIW, -1 from me too because, as has been said already, you shouldn't really be using os.chdir; use absolute paths instead. From chris.jerdonek at gmail.com Sat Jan 19 19:52:54 2013 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Sat, 19 Jan 2013 10:52:54 -0800 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130119101024.GB2969@lp-shahaf.local> References: <20130119101024.GB2969@lp-shahaf.local> Message-ID: On Sat, Jan 19, 2013 at 2:10 AM, Daniel Shahaf wrote: > The following is a common pattern (used by, for example, > shutil.make_archive): > > save_cwd = os.getcwd() > try: > foo() > finally: > os.chdir(save_cwd) FWIW, test.support has such a context manager (though test.support is not for public consumption and test.support's implementation does more than one thing, though see issue 15415): http://hg.python.org/cpython/file/48cddcb9c841/Lib/test/support.py#l738 --Chris > > I suggest this deserves a context manager: > > with saved_cwd(): > foo() > > Initial feedback on IRC suggests shutil as where this functionality > should live (other suggestions were made, such as pathlib). Hence, > attached patch implements this as shutil.saved_cwd, based on os.fchdir. > > The patch also adds os.chdir to os.supports_dir_fd and documents the > context manager abilities of builtins.open() in its reference. > > Thoughts? > > Thanks, > > Daniel > > > diff -r 74b0461346f0 Doc/library/functions.rst > --- a/Doc/library/functions.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/functions.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -828,6 +828,9 @@ are always available. They are listed h > Open *file* and return a corresponding :term:`file object`. If the file > cannot be opened, an :exc:`OSError` is raised. > > + This function can be used as a :term:`context manager` that closes the > + file when it exits. > + > *file* is either a string or bytes object giving the pathname (absolute or > relative to the current working directory) of the file to be opened or > an integer file descriptor of the file to be wrapped. (If a file descriptor > diff -r 74b0461346f0 Doc/library/os.rst > --- a/Doc/library/os.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/os.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -1315,6 +1315,9 @@ features: > This function can support :ref:`specifying a file descriptor `. The > descriptor must refer to an opened directory, not an open file. > > + See also :func:`shutil.saved_cwd` for a context manager that restores the > + current working directory. > + > Availability: Unix, Windows. > > .. versionadded:: 3.3 > diff -r 74b0461346f0 Doc/library/shutil.rst > --- a/Doc/library/shutil.rst Fri Jan 18 17:53:18 2013 -0800 > +++ b/Doc/library/shutil.rst Sat Jan 19 09:39:27 2013 +0000 > @@ -36,6 +36,19 @@ copying and removal. For operations on i > Directory and files operations > ------------------------------ > > +.. function:: saved_cwd() > + > + Return a :term:`context manager` that restores the current working directory > + when it exits. See :func:`os.chdir` for changing the current working > + directory. > + > + The context manager returns an open file descriptor for the saved directory. > + > + Only available when :func:`os.chdir` supports file descriptor arguments. > + > + .. versionadded:: 3.4 > + > + > .. function:: copyfileobj(fsrc, fdst[, length]) > > Copy the contents of the file-like object *fsrc* to the file-like object *fdst*. > diff -r 74b0461346f0 Lib/os.py > --- a/Lib/os.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/os.py Sat Jan 19 09:39:27 2013 +0000 > @@ -120,6 +120,7 @@ if _exists("_have_functions"): > > _set = set() > _add("HAVE_FACCESSAT", "access") > + _add("HAVE_FCHDIR", "chdir") > _add("HAVE_FCHMODAT", "chmod") > _add("HAVE_FCHOWNAT", "chown") > _add("HAVE_FSTATAT", "stat") > diff -r 74b0461346f0 Lib/shutil.py > --- a/Lib/shutil.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/shutil.py Sat Jan 19 09:39:27 2013 +0000 > @@ -38,6 +38,7 @@ __all__ = ["copyfileobj", "copyfile", "c > "unregister_unpack_format", "unpack_archive", > "ignore_patterns", "chown", "which"] > # disk_usage is added later, if available on the platform > + # saved_cwd is added later, if available on the platform > > class Error(OSError): > pass > @@ -1111,3 +1112,20 @@ def which(cmd, mode=os.F_OK | os.X_OK, p > if _access_check(name, mode): > return name > return None > + > +# Define the chdir context manager. > +if os.chdir in os.supports_dir_fd: > + class saved_cwd: > + def __init__(self): > + pass > + def __enter__(self): > + self.dh = os.open(os.curdir, > + os.O_RDONLY | getattr(os, 'O_DIRECTORY', 0)) > + return self.dh > + def __exit__(self, exc_type, exc_value, traceback): > + try: > + os.chdir(self.dh) > + finally: > + os.close(self.dh) > + return False > + __all__.append('saved_cwd') > diff -r 74b0461346f0 Lib/test/test_shutil.py > --- a/Lib/test/test_shutil.py Fri Jan 18 17:53:18 2013 -0800 > +++ b/Lib/test/test_shutil.py Sat Jan 19 09:39:27 2013 +0000 > @@ -1276,6 +1276,20 @@ class TestShutil(unittest.TestCase): > rv = shutil.copytree(src_dir, dst_dir) > self.assertEqual(['foo'], os.listdir(rv)) > > + def test_saved_cwd(self): > + if hasattr(os, 'fchdir'): > + temp_dir = self.mkdtemp() > + orig_dir = os.getcwd() > + with shutil.saved_cwd() as dir_fd: > + os.chdir(temp_dir) > + new_dir = os.getcwd() > + self.assertIsInstance(dir_fd, int) > + final_dir = os.getcwd() > + self.assertEqual(orig_dir, final_dir) > + self.assertEqual(temp_dir, new_dir) > + else: > + self.assertFalse(hasattr(shutil, 'saved_cwd')) > + > > class TestWhich(unittest.TestCase): > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From glyph at twistedmatrix.com Sat Jan 19 23:53:43 2013 From: glyph at twistedmatrix.com (Glyph) Date: Sat, 19 Jan 2013 14:53:43 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: On Jan 19, 2013, at 9:32 AM, Ben Darnell wrote: > On Fri, Jan 18, 2013 at 8:23 PM, Glyph wrote: > > On Jan 18, 2013, at 4:12 PM, Guido van Rossum wrote: > >> On Fri, Jan 18, 2013 at 3:59 PM, Greg Ewing wrote: > >>> Guido van Rossum wrote: >>> >>>> Well, except that you can't just pass CallbackProtocol where a >>>> protocol factory is required by the PEP -- you'll have to pass a >>>> lambda or partial function without arguments that calls >>>> CallbackProtocol with some arguments taken from elsewhere. >>> >>> Something smells wrong to me about APIs that require protocol >>> factories. > > For starters, nothing "smells wrong" to me about protocol factories. Responding to this kind of criticism is difficult, because it's not substantive - what's the actual problem? I think that some Python programmers have an aversion to factories because a common path to Python is flight from Java environments that over- or mis-use the factory pattern. > > > I think the smell is that the factory is A) only used once and B) invoked without adding any additional arguments that weren't available when the factory was passed in, so there's no clear reason to defer creation of the protocol. I think it would make more sense if the transport were passed as an argument to the factory (and then we could get rid of connection_made as a required method on Protocol, although libraries or applications that want to separate protocol creation from connection_made could still do so in their own factories). The problem with creating the protocol with the transport as an argument to its constructor is that in order to behave correctly, the transport needs to know about the protocol as well; so it also wants to be constructed with a reference to the protocol to *its* constructor. So adding a no-protocol-yet case adds more edge-cases to every transport's implementation. All these solutions are roughly isomorphic to each other, so I don't care deeply about it. However, my proposed architecture has been in use for a decade in Twisted without any major problems I can see. I'm not saying that Twisted programs are perfect, but it would *really* be useful to discuss this in terms of problems you can identify with the humungous existing corpus of Twisted-using code, and say "here's a problem that develops in some programs due to the sub-optimal shape of this API". Unnecessary class definitions, for example, or a particular type of bug; something like that. For example, I can identify several difficulties with Twisted's current flow-control setup code and would not recommend that it be copied exactly. Talking about how the code smells or what might hypothetically make more sense is just bikeshedding. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 20 02:51:14 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 11:51:14 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: On Sun, Jan 20, 2013 at 8:53 AM, Glyph wrote: > >> On Jan 19, 2013, at 9:32 AM, Ben Darnell wrote: >> >> On Fri, Jan 18, 2013 at 8:23 PM, Glyph wrote: >>> For starters, nothing "smells wrong" to me about protocol factories. >>> Responding to this kind of criticism is difficult, because it's not >>> substantive - what's the actual problem? I think that some Python >>> programmers have an aversion to factories because a common path to Python is >>> flight from Java environments that over- or mis-use the factory pattern. >>> >> >> I think the smell is that the factory is A) only used once and B) invoked >> without adding any additional arguments that weren't available when the >> factory was passed in, so there's no clear reason to defer creation of the >> protocol. I think it would make more sense if the transport were passed as >> an argument to the factory (and then we could get rid of connection_made as >> a required method on Protocol, although libraries or applications that want >> to separate protocol creation from connection_made could still do so in >> their own factories). > > The problem with creating the protocol with the transport as an argument to > its constructor is that in order to behave correctly, the transport needs to > know about the protocol as well; so it also wants to be constructed with a > reference to the protocol to *its* constructor. So adding a no-protocol-yet > case adds more edge-cases to every transport's implementation. But the trade-off in separating protocol creation from notification of the connection is that it means every *protocol* has to be written to handle the "no connection yet" gap between __init__ and the call to connection_made. However, if we instead delay the call to the protocol factory until *after the connection is made*, then most protocols can be written assuming they always have a connection (at least until connection_lost is called). A persistent protocol that spanned multiple connect/reconnect cycles could be written such that you passed "my_protocol.connection_made" as the protocol factory, while normal protocols (that last only the length of a single connection) would pass "MyProtocol" directly. At the transport layer, the two states "has a protocol" and "has a connection" could then be collapsed into one - if there is a connection, then there will be a protocol, and vice-versa. This differs from the current status in PEP 3156, where it's possible for a transport to have a protocol without a connection if it calls the protocol factory well before calling connection_made. Now, it may be that *there's a good reason* why conflating "has a protocol" and "has a connection" at the transport layer is a bad idea, and thus we actually *need* the "protocol creation" and "protocol association with a connection" events to be distinct. However, the PEP currently doesn't explain *why* it's necessary to separate the two, hence the confusion for at least Greg, Ben and myself. Given that new protocol implementations should be significantly more common than new transport implementations, there's a strong case to be made for pushing any required complexity into the transports. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From wuwei23 at gmail.com Sun Jan 20 03:02:32 2013 From: wuwei23 at gmail.com (alex23) Date: Sat, 19 Jan 2013 18:02:32 -0800 (PST) Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: References: <50F6813E.60503@ziade.org> <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org> Message-ID: <2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com> On Jan 19, 4:36?am, Terry Reedy wrote: > On 1/18/2013 10:54 AM, Eric Snow wrote: > > It took me a sec. ?:) ?DSU == "Decorate-Sort-Undecorate". [1] > > No, no, no. Its Delaware State University in Dover, as opposed to > Univesity of Delaware (UD) in Newark ;-). > > In other words, it depends on the universe you live in. "Namespaces are one honking great idea" :) From ncoghlan at gmail.com Sun Jan 20 03:34:24 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 12:34:24 +1000 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API Message-ID: PEP 3156 currently lists *29* proposed methods for the event loop API. These methods serve quite different purposes and I think a bit more structure in the overall API could help clarify that. First proposal: clearly split the abstract EventLoop API from concrete DescriptorEventLoop and IOCPEventLoop subclasses. The main benefit here is to help clarify that: 1. the additional methods defined on DescriptorEventLoop and IOCPEventLoop are not available on all event loop implementations, so any code using them is necessarily event loop specific 2. the goal of the transport abstraction is to mask the differences between these low level platform specific APIs 3. other event loops are free to use a completely different API between their low level transports and the event loop Second proposal: better separate the "event loop management", "event monitoring" and "do things" methods I don't have a clear idea of how to do this yet (beyond restructuring the documentation of the event loop API in the PEP), but I can at least describe the split I see (along with a few name changes that may be worth considering). Event loop management: - run_once() - run() # Perhaps "run_until_idle()"? - run_forever() # Perhaps "run_until_stop()"? - run_until_complete() - stop() - close() - set_default_executor() Event monitoring: - add_signal_handler() - remove_signal_handler() - start_serving() # (The "stop serving" API is TBD in the PEP) Do things (fire and forget): - call_soon() - call_soon_threadsafe() - call_later() - call_repeatedly() Do things (and get the result with "yield from"): - wrap_future() # Perhaps "wrap_executor_future"? - run_in_executor() - getaddrinfo() - getnameinfo() Low level transport creation: - create_connection() - create_pipe() # Once it exists in the PEP Cheers, Nick. P.S. Off-topic for the thread, but I think the existence of run_once vs run (or run_until_idle) validates the decision to stick with only running one generation of ready callbacks per iteration. I forgot about it when we were discussing that question. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From d.s at daniel.shahaf.name Sun Jan 20 05:23:15 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sun, 20 Jan 2013 06:23:15 +0200 Subject: [Python-ideas] chdir context manager In-Reply-To: <50FABE70.7040902@python.org> References: <20130119101024.GB2969@lp-shahaf.local> <50FABE70.7040902@python.org> Message-ID: <20130120042315.GB2950@lp-shahaf.local> Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100: > Am 19.01.2013 16:27, schrieb Calvin Spealman: > > -1 from me, as well. Encouraging a bad habit. > > It's not just bad habit. It's broken by design because it's a major race > condition. In other words, single-threaded processes will need to implement their own chdir context manager because using it in multi-threaded applications would be a bug. I note the same reasoning applies to a hypothetical context manager that changes the nice(2) level of the current process. This reasoning reduces to "make all of stdlib thread-safe" --- at the expense of people who write single-threaded code, know full well that chdir and nice are global state and should normally not be used, and want to use them anyway. I know enough programming to never call renice or chdir in library code. But when I write some __main__ code, I might want to use them. I should be able to. From d.s at daniel.shahaf.name Sun Jan 20 05:25:55 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sun, 20 Jan 2013 06:25:55 +0200 Subject: [Python-ideas] chdir context manager In-Reply-To: <50FABE70.7040902@python.org> References: <20130119101024.GB2969@lp-shahaf.local> <50FABE70.7040902@python.org> Message-ID: <20130120042555.GC2950@lp-shahaf.local> Christian Heimes wrote on Sat, Jan 19, 2013 at 16:40:32 +0100: > Am 19.01.2013 16:27, schrieb Calvin Spealman: > > -1 from me, as well. Encouraging a bad habit. > > It's not just bad habit. It's broken by design because it's a major race > condition. A couple of other clarifications: Christian clarified on IRC that this refers to the use chdir() (which modifies process-global state) in multithreaded applications. The code uses fchdir so it's not vulnerable to race conditions whereby another process removes the original cwd before it chdir's back to it. (In that respect it's better than the common "try: getcwd() finally: chdir()" pattern.) Someone suggested adding a mutex to the saved_cwd context manager. That would solve the race condition, but I don't have a use-case for it --- precisely because I can't imagine multithreaded code where threads depend their cwd. From guido at python.org Sun Jan 20 05:35:04 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jan 2013 20:35:04 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: On Sat, Jan 19, 2013 at 5:51 PM, Nick Coghlan wrote: > But the trade-off in separating protocol creation from notification of > the connection is that it means every *protocol* has to be written to > handle the "no connection yet" gap between __init__ and the call to > connection_made. That doesn't strike me as a problematic design. I've seen it plenty of times. > However, if we instead delay the call to the protocol factory until > *after the connection is made*, then most protocols can be written > assuming they always have a connection (at least until connection_lost > is called). A persistent protocol that spanned multiple > connect/reconnect cycles could be written such that you passed > "my_protocol.connection_made" as the protocol factory, while normal > protocols (that last only the length of a single connection) would > pass "MyProtocol" directly. Well, almost. connection_made() would have to return self to make this work. But we could certainly use add some other method that did that. (At first I thought it would be harder to pass other parameters to the constructor for the non-reconnecting case, but the solution is about the same as before -- use a partial function or a lambda that takes a protocol and calls the constructor with that and whatever other parameters it wants to pass.) > At the transport layer, the two states "has a protocol" and "has a > connection" could then be collapsed into one - if there is a > connection, then there will be a protocol, and vice-versa. This > differs from the current status in PEP 3156, where it's possible for a > transport to have a protocol without a connection if it calls the > protocol factory well before calling connection_made. This doesn't strike me as important. The code I've written for Tulip puts most of the connection-making code outside the transport, and the transport constructor is completely private. Every transport implementation is completely free in how it works, and every event loop implementation is free to put as much or as little of the connection set-up in the transport as it wants to. The same is true for transports written by users (and there will be some of these). The *only* things we care about for transports is that the thing passed to the protocol's connection_made() has the methods specified by the PEP (write(), writelines(), pause(), resume(), and a few more). Also, it does not matter one iota whether it is the transport or some other entity that calls the protocol's methods (connection_made(), data_received(), etc.) -- the only thing that matters is the order in which they are called. IOW, even though a transport may "have" a protocol without a connection, nobody should care about that state, and nobody should be calling its methods (again, write() etc.) in that state. In fact, nobody except event loop internal code should ever have a reference to a transport in that state. (The transport that is returned by create_connection() is fully connected to the socket (or whatever might takes its place) as well as to the protocol.) I think we can make the same assumptions for transports implemented by user code. > Now, it may be that *there's a good reason* why conflating "has a > protocol" and "has a connection" at the transport layer is a bad idea, > and thus we actually *need* the "protocol creation" and "protocol > association with a connection" events to be distinct. However, the PEP > currently doesn't explain *why* it's necessary to separate the two, > hence the confusion for at least Greg, Ben and myself. So, your whole point here seems to be that you'd rather see the PEP specify that the sequence when a connection is made is protocol = protocol_factory(transport) rather than protocol = protocol_factory() protocol.connection_made(transport) I looked in the Tulip code to see whether this would cause any problems. I think it could be done, but the solution would feel a little awkward to me, because currently the protocol's connection_made() method is not called directly by the transport: it is called indirectly via the event loop's call_soon() method. So using your approach the transport wouldn't have a protocol attribute until this callback is called -- or we'd have to change things to call it directly rather than via call_soon(). Now I'm pretty sure I can prove that nothing will be referencing the protocol *before* the connection_made() call is actually made, and also that directly calling it instead of using call_soon() is fine. But nevertheless the transport code would feel a little harder to reason about. > Given that new protocol implementations should be significantly more > common than new transport implementations, there's a strong case to be > made for pushing any required complexity into the transports. TBH I don't see the protocol implementation getting any simpler because of this. There is some protocol initialization code that doesn't depend on the transport, and some that does. Using your approach, these all go in __init__(). Using the PEP's current proposal, the latter go in a separate method, connection_made(). But using your approach, writing the lambda or partial function that calls the constructor with the right arguments (to be passed as protocol_factory) becomes a tad more complex, since now it must take a transport argument. On the third hand, rigging things so that a pre-existing protocol instance can be reused becomes a little harder to figure out, since you have to write a helper method that takes a transport and returns the protocol (i.e., self). All in all I see it as six of one, half a dozen of the other, and I am happy with Glyph's testimony that the Twisted design works well in practice. -- --Guido van Rossum (python.org/~guido) From d.s at daniel.shahaf.name Sun Jan 20 05:43:08 2013 From: d.s at daniel.shahaf.name (Daniel Shahaf) Date: Sun, 20 Jan 2013 06:43:08 +0200 Subject: [Python-ideas] chdir context manager In-Reply-To: <50FAE6B5.3030507@mrabarnett.plus.com> References: <20130119101024.GB2969@lp-shahaf.local> <20130119150631.GF2969@lp-shahaf.local> <50FAE6B5.3030507@mrabarnett.plus.com> Message-ID: <20130120044308.GD2950@lp-shahaf.local> MRAB wrote on Sat, Jan 19, 2013 at 18:32:21 +0000: > On 2013-01-19 18:07, Terry Reedy wrote: >> On 1/19/2013 10:06 AM, Daniel Shahaf wrote: >>> Terry Reedy wrote on Sat, Jan 19, 2013 at 08:37:17 -0500: >>>> On 1/19/2013 5:10 AM, Daniel Shahaf wrote: >>>>> The following is a common pattern (used by, for example, >>>>> shutil.make_archive): >>>>> >>>>> save_cwd = os.getcwd() >>>>> try: >>>>> foo() >>>>> finally: >>>>> os.chdir(save_cwd) >>>>> >>>>> I suggest this deserves a context manager: >>>>> >>>>> with saved_cwd(): >>>>> foo() >>>> >>>> So to me, your proposal is only 1/2 or 2/3 of a context manager. (And >>>> 'returns an open file descriptor for the saved directory' seems backward >>>> or wrong for a context manager.) It does not actually make a new >>> >>> What should __enter__ return, then? >>> >>> It could return None, the to-be-restored directory's file descriptor, or >>> the newly-changed-to directory (once a "directory to chdir to" optional >>> argument is added). The latter could be either a pathname (string) or >>> a file descriptor (since it's just passed through to os.chdir). >>> >>> It seems to me returning the old dir's fd would be the most useful of >>> the three option, since the other two are things callers already have >>> --- None, which is global, and the argument to the context manager. >> >> make_archive would prefer the old dir pathname, as it wants that for the >> logging call. But I do not think that that should drive design. >> >>>> context. A proper temp_cwd context manager should have one parameter, >>>> the new working directory, with chdir(new_cwd) in the enter method. To >>>> allow for conditional switching, the two chdir system calls could be >>>> conditional on new_cwd (either None or '' would mean no chdir calls). >>>> >>> >>> I think making the new_cwd argument optional would be useful if the >>> context manager body does multiple chdir() calls: >>> >>> with saved_cwd(): >>> os.chdir('/foo') >>> do_something() >>> os.chdir('/bar') >>> do_something() >>> >>> I'm not sure if that's exactly what you suggest --- you seem to be >>> suggesting that saved_cwd(None) will avoid calling fchdir() from >>> __exit__()? >> >> I was, but that is a non-essential optimization. My idea is basically >> similar to Bueno's except for parameter absent versus None (and the two >> cases could be handled differently). >> >> I think this proposal suffers a bit from being both too specific and too >> general. Eli explained the 'too specific' part: there are many things >> that might be changed and changed back. The 'too general' part is that >> specific applications need different specific details. There are various >> possibilities of what to do in and return from __enter__. >> OK. >> However, given the strong -1 from at least three core developers and >> one other person, the detail seem moot. >> *nod*, I see. > FWIW, -1 from me too because, as has been said already, you shouldn't > really be using os.chdir; use absolute paths instead. I don't use chdir in library code or multithreaded code, but I do use it in __main__ of short scripts, where there's no "caller" or "other thread" to consider. Consider sys.argv. The language and stdlib don't prevent library code from accessing (or modifying) sys.argv, but well-behaved libraries neither read sys.argv nor modify it. The same is true of the cwd. From guido at python.org Sun Jan 20 05:37:34 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 19 Jan 2013 20:37:34 -0800 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API In-Reply-To: References: Message-ID: (I'm out of time to respond at length, but I think you have a good point here and I expect I will heed it. It may be a while before I have time for another sprint with the PEP and Tulip though.) On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan wrote: > PEP 3156 currently lists *29* proposed methods for the event loop API. > These methods serve quite different purposes and I think a bit more > structure in the overall API could help clarify that. > > First proposal: clearly split the abstract EventLoop API from concrete > DescriptorEventLoop and IOCPEventLoop subclasses. > > The main benefit here is to help clarify that: > 1. the additional methods defined on DescriptorEventLoop and > IOCPEventLoop are not available on all event loop implementations, so > any code using them is necessarily event loop specific > 2. the goal of the transport abstraction is to mask the differences > between these low level platform specific APIs > 3. other event loops are free to use a completely different API > between their low level transports and the event loop > > Second proposal: better separate the "event loop management", "event > monitoring" and "do things" methods > > I don't have a clear idea of how to do this yet (beyond restructuring > the documentation of the event loop API in the PEP), but I can at > least describe the split I see (along with a few name changes that may > be worth considering). > > Event loop management: > - run_once() > - run() # Perhaps "run_until_idle()"? > - run_forever() # Perhaps "run_until_stop()"? > - run_until_complete() > - stop() > - close() > - set_default_executor() > > Event monitoring: > - add_signal_handler() > - remove_signal_handler() > - start_serving() # (The "stop serving" API is TBD in the PEP) > > Do things (fire and forget): > - call_soon() > - call_soon_threadsafe() > - call_later() > - call_repeatedly() > > Do things (and get the result with "yield from"): > - wrap_future() # Perhaps "wrap_executor_future"? > - run_in_executor() > - getaddrinfo() > - getnameinfo() > > Low level transport creation: > - create_connection() > - create_pipe() # Once it exists in the PEP > > Cheers, > Nick. > > P.S. Off-topic for the thread, but I think the existence of run_once > vs run (or run_until_idle) validates the decision to stick with only > running one generation of ready callbacks per iteration. I forgot > about it when we were discussing that question. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sun Jan 20 07:13:39 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 16:13:39 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: On Sun, Jan 20, 2013 at 2:35 PM, Guido van Rossum wrote: > TBH I don't see the protocol implementation getting any simpler > because of this. There is some protocol initialization code that > doesn't depend on the transport, and some that does. Using your > approach, these all go in __init__(). Using the PEP's current > proposal, the latter go in a separate method, connection_made(). When the two are separated without a clear definition of what else can happen in between, *every other method on the protocol* needs to cope with the fact that other calls to protocol methods may happen in between the call to __init__ and the call to connection_made - you simply can't write a protocol without dealing with that problem. As you correctly figured out, my specific proposal was to move from: protocol = protocol_factory() protocol.connection_made(transport) To a single event: protocol = protocol_factory(transport) The *reason* I wanted to do this is that I *don't understand* what may happen to my protocol implementation between construction and the call to make_connection. Your description of the current implementation actually worries me, as it suggests to me that when I get a (transport, protocol) pair back from a call to "create_connection", "connection_made" may *not* have been called yet - the protocol may be in exactly the state I am worried about, because the event loop is sending the notification in a fire-and-forget fashion, instead of waiting until the call is complete: protocol = protocol_factory() loop.call_soon(protocol.connection_made, transport) # The protocol isn't actually fully initialized here... However, that description also made me realise why two distinct operations are needed, so I'd like to change my suggestion to the following: protocol = factory() yield from protocol.connection_made(transport) # Or callback equivalent The protocol factory would still be used to create the protocol object. However, the PEP would be updated to make it clear that immediately after creation the *only* permitted method invocation on the result is "connection_made", which will complete the protocol initialization process. The connection_made event handler would be redefined to return a *Future* (or equivalent object) rather than completing synchronously. create_connection would then call connection_made and *wait for it to finish*, rather than using call_soon in a fire-and-forget fashion. The advantage of this is that the rationale for the various possible states become clear: - the protocol factory is invoked synchronously, and is thus not allowed to perform any blocking actions (but may trigger "fire-and-forget" operations) - connection_made is invoked asynchronously, and is thus able to wait for various operations - a protocol returned from create_connection is certain to have had connection_made already called, thus a protocol implementation may safely assume in other methods that both __init__ and connection_made will have been called during the initialization process. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Sun Jan 20 07:31:32 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 16:31:32 +1000 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API In-Reply-To: References: Message-ID: On Sun, Jan 20, 2013 at 12:34 PM, Nick Coghlan wrote: > Do things (and get the result with "yield from"): > - wrap_future() # Perhaps "wrap_executor_future"? > - run_in_executor() > - getaddrinfo() > - getnameinfo() > > Low level transport creation: > - create_connection() > - create_pipe() # Once it exists in the PEP Somewhere early in the PEP, there may need to be a concise description of the two APIs for waiting for an asynchronous Future: 1. "f.add_done_callback()" 2. "yield from f" in a coroutine (resumes the coroutine when the future completes, with either the result or exception as appropriate) At the moment, these are buried in amongst much larger APIs, yet they're key to understanding the way everything above the core event loop layer interacts. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From glyph at twistedmatrix.com Sun Jan 20 07:51:31 2013 From: glyph at twistedmatrix.com (Glyph) Date: Sat, 19 Jan 2013 22:51:31 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> On Jan 19, 2013, at 10:13 PM, Nick Coghlan wrote: > When the two are separated without a clear definition of what else can > happen in between, *every other method on the protocol* needs to cope > with the fact that other calls to protocol methods may happen in > between the call to __init__ and the call to connection_made - you > simply can't write a protocol without dealing with that problem. Nope. You only have to deal with the methods that the transport will call on the protocol in that state, since nothing else has a reference to it yet. Except the transport won't call them in that state, so... still nope. Again: there's enormous corpus of Twisted code out there that is written this way, you can go look at that code to see how it deals with the problem you've imagined, which is to say: it doesn't. It doesn't need to. Now, if you make the change you're proposing, and tie together the protocol's construction with the transport's construction, so that you end up with protocol(transport(...)), this means that the protocol will immediately begin interacting with the transport in this vague, undefined, not quite connected state, because, since the protocol didn't even exist at the time of the transport's construction, the transport can't possibly have a reference to a protocol yet. And the side of the protocol that issues a greeting will necessarily need to do transport.write(), which may want to trigger a notification to the protocol of some kind (flow control?), and where will that notification go? It needs to be solved less often, but it's a much trickier problem to solve. There are also some potential edge-cases where the existing Twisted-style design might be nicer, like delivering explicit TLS handshake notifications to protocols which support them in the vague state between protocol construction and connection_made, but seeing as how I still haven't gotten around to implementing that properly in Twisted, I imagine it will be another 10 years before Tulip is practically concerned with it :). Finally, I should say that Guido's point about the transport constructor being private is actually somewhat important. We've been saying 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop, socket)'. Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd, writefd)'. In the case of an actual outbound TCP connection with name resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname, port)'. Making these all methods that hide the details and timing of the transport's construction is a definite plus. -glyph -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 20 08:18:33 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 17:18:33 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> Message-ID: On Sun, Jan 20, 2013 at 4:51 PM, Glyph wrote: > > On Jan 19, 2013, at 10:13 PM, Nick Coghlan wrote: > > When the two are separated without a clear definition of what else can > happen in between, *every other method on the protocol* needs to cope > with the fact that other calls to protocol methods may happen in > between the call to __init__ and the call to connection_made - you > simply can't write a protocol without dealing with that problem. > > > Nope. You only have to deal with the methods that the transport will call on > the protocol in that state, since nothing else has a reference to it yet. > > Except the transport won't call them in that state, so... still nope. Yes, after Guido explained how tulip was currently handling this, I realised that the problem was mostly one of documentation. However, I think there is one key bug in the current implementation, which is that create_connection is returning *before* the call to "connection_made" is completed, thus exposing the protocol in an incompletely initialised state. > Finally, I should say that Guido's point about the transport constructor > being private is actually somewhat important. We've been saying > 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop, > socket)'. Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd, > writefd)'. In the case of an actual outbound TCP connection with name > resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname, > port)'. Making these all methods that hide the details and timing of the > transport's construction is a definite plus. Yes, I didn't have a problem with that part - it was just the lack of clear explanation of the different roles of the protocol constructor and the connection_made callback that I found problematic. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From glyph at twistedmatrix.com Sun Jan 20 09:22:11 2013 From: glyph at twistedmatrix.com (Glyph) Date: Sun, 20 Jan 2013 00:22:11 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> Message-ID: On Jan 19, 2013, at 11:18 PM, Nick Coghlan wrote: > On Sun, Jan 20, 2013 at 4:51 PM, Glyph wrote: >> >> On Jan 19, 2013, at 10:13 PM, Nick Coghlan wrote: >> >> When the two are separated without a clear definition of what else can >> happen in between, *every other method on the protocol* needs to cope >> with the fact that other calls to protocol methods may happen in >> between the call to __init__ and the call to connection_made - you >> simply can't write a protocol without dealing with that problem. >> >> >> Nope. You only have to deal with the methods that the transport will call on >> the protocol in that state, since nothing else has a reference to it yet. >> >> Except the transport won't call them in that state, so... still nope. > > Yes, after Guido explained how tulip was currently handling this, I > realised that the problem was mostly one of documentation. However, I > think there is one key bug in the current implementation, which is > that create_connection is returning *before* the call to > "connection_made" is completed, thus exposing the protocol in an > incompletely initialised state. Aah. Yes, I think you're right about that being a bug. There are probably some docs in Twisted that could be improved to explain that this ordering is part of our analogous interface's contract... >> Finally, I should say that Guido's point about the transport constructor >> being private is actually somewhat important. We've been saying >> 'transport(...)' thus far, but in fact it's more like 'SocketTransport(loop, >> socket)'. Or perhaps in the case of a pipe, 'PipeTransport(loop, readfd, >> writefd)'. In the case of an actual outbound TCP connection with name >> resolution, it's 'yield from make_outgoing_tcp_transport(loop, hostname, >> port)'. Making these all methods that hide the details and timing of the >> transport's construction is a definite plus. > > Yes, I didn't have a problem with that part - it was just the lack of > clear explanation of the different roles of the protocol constructor > and the connection_made callback that I found problematic. I wasn't clear if you were arguing against it; I just wanted to make it clear :). -glyph From phd at phdru.name Sun Jan 20 11:13:39 2013 From: phd at phdru.name (Oleg Broytman) Date: Sun, 20 Jan 2013 14:13:39 +0400 Subject: [Python-ideas] Parametrized any() and all() ? In-Reply-To: <2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com> References: <50F6847D.2020404@ziade.org> <50F6B4D8.6070002@pearwood.info> <50F6BEA3.7090807@ziade.org> <20130116194756.2efe9afe@pitrou.net> <50F94057.9080005@ziade.org> <2e59f105-83fb-46b0-8e6f-e854a71ab08f@th3g2000pbc.googlegroups.com> Message-ID: <20130120101339.GA7617@iskra.aviel.ru> On Sat, Jan 19, 2013 at 06:02:32PM -0800, alex23 wrote: > On Jan 19, 4:36?am, Terry Reedy wrote: > > On 1/18/2013 10:54 AM, Eric Snow wrote: > > > It took me a sec. ?:) ?DSU == "Decorate-Sort-Undecorate". [1] > > > > No, no, no. Its Delaware State University in Dover, as opposed to > > Univesity of Delaware (UD) in Newark ;-). > > > > In other words, it depends on the universe you live in. > > "Namespaces are one honking great idea" :) "In 1989, a random of the journalistic persuasion asked hacker Paul Boutin "What do you think will be the biggest problem in computing in the 90s?" Paul's straight-faced response: "There are only 17,000 three-letter acronyms." (To be exact, there are 26^3 = 17,576.)" Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From solipsis at pitrou.net Sun Jan 20 13:36:54 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jan 2013 13:36:54 +0100 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> Message-ID: <20130120133654.60dbfdb2@pitrou.net> On Sat, 19 Jan 2013 20:35:04 -0800 Guido van Rossum wrote: > IOW, even though a transport may "have" a protocol without a > connection, nobody should care about that state, and nobody should be > calling its methods (again, write() etc.) in that state. In fact, > nobody except event loop internal code should ever have a reference to > a transport in that state. This is just not true. When the connection breaks, the protocol still has a reference to the transport and may still be trying to do things with the transport (because connection_lost() has not been called yet). Regards Antoine. From solipsis at pitrou.net Sun Jan 20 13:38:35 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jan 2013 13:38:35 +0100 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API References: Message-ID: <20130120133835.458f0c9e@pitrou.net> On Sun, 20 Jan 2013 12:34:24 +1000 Nick Coghlan wrote: > > Low level transport creation: > - create_connection() > - create_pipe() # Once it exists in the PEP You need some kind of create_listener() too. Regards Antoine. From ncoghlan at gmail.com Sun Jan 20 14:10:56 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 20 Jan 2013 23:10:56 +1000 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API In-Reply-To: <20130120133835.458f0c9e@pitrou.net> References: <20130120133835.458f0c9e@pitrou.net> Message-ID: On Jan 20, 2013 10:46 PM, "Antoine Pitrou" wrote: > > On Sun, 20 Jan 2013 12:34:24 +1000 > Nick Coghlan wrote: > > > > Low level transport creation: > > - create_connection() > > - create_pipe() # Once it exists in the PEP > > You need some kind of create_listener() too. That's actually the "start_serving" method up in the event monitoring section. While it does end up creating transports, the overall flow is rather different from the client side one. Cheers, Nick. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Jan 20 14:18:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 20 Jan 2013 14:18:10 +0100 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API References: <20130120133835.458f0c9e@pitrou.net> Message-ID: <20130120141810.3a162eb7@pitrou.net> On Sun, 20 Jan 2013 23:10:56 +1000 Nick Coghlan wrote: > On Jan 20, 2013 10:46 PM, "Antoine Pitrou" wrote: > > > > On Sun, 20 Jan 2013 12:34:24 +1000 > > Nick Coghlan wrote: > > > > > > Low level transport creation: > > > - create_connection() > > > - create_pipe() # Once it exists in the PEP > > > > You need some kind of create_listener() too. > > That's actually the "start_serving" method up in the event monitoring > section. While it does end up creating transports, the overall flow is > rather different from the client side one. Ah, right. Well, in any case, the API is much too limited. It doesn't support SSL, and it doesn't support UDP. ?TBD: Support SSL? I don't even know how to do that synchronously, and I suppose it needs a certificate.? See http://docs.python.org/dev/library/ssl.html#server-side-operation (and the non-blocking handshake part also applies) Regards Antoine. From eliben at gmail.com Sun Jan 20 15:18:02 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 20 Jan 2013 06:18:02 -0800 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API In-Reply-To: References: Message-ID: On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan wrote: > PEP 3156 currently lists *29* proposed methods for the event loop API. > These methods serve quite different purposes and I think a bit more > structure in the overall API could help clarify that. > > First proposal: clearly split the abstract EventLoop API from concrete > DescriptorEventLoop and IOCPEventLoop subclasses. > The main benefit here is to help clarify that: > 1. the additional methods defined on DescriptorEventLoop and > IOCPEventLoop are not available on all event loop implementations, so > any code using them is necessarily event loop specific > 2. the goal of the transport abstraction is to mask the differences > between these low level platform specific APIs > 3. other event loops are free to use a completely different API > between their low level transports and the event loop > > I like the idea of splitting up the big interface, but could you clarify what would go into such subclasses? I.e. isn't the current EventLoop interface supposed to represent an interface all event loops will adhere to? And sorry if this was discussed before and I'm missing the context, but what kinds of EventLoop implementations are we expecting to see eventually? Is it only a matter of implementing the API per platform (akin to the current tulip.unix_events.UnixEventLoop) or a broader expectation of frameworks like Twisted to plug into the API by providing their own implementation (PEP 3156 mentions this somewhere). > Second proposal: better separate the "event loop management", "event > monitoring" and "do things" methods > > > Do things (and get the result with "yield from"): > - wrap_future() # Perhaps "wrap_executor_future"? > - run_in_executor() > - getaddrinfo() > - getnameinfo() > > Low level transport creation: > - create_connection() > - create_pipe() # Once it exists in the PEP > > +1 These certainly look somewhat out of place in the generic EventLoop API, but concretely - how do you propose to structure the split? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Jan 20 15:49:59 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jan 2013 00:49:59 +1000 Subject: [Python-ideas] PEP 3156: Clarifying the different components of the event loop API In-Reply-To: References: Message-ID: The concrete event loop methods are already separated in the PEP - they're just flagged as optional methods rather than distinct subclasses. The rest I think actually do belong on the event loop, hence the suggestion to start just by rearranging them into those categories, without making the class hierarchy any more complicated. -- Sent from my phone, thus the relative brevity :) On Jan 21, 2013 12:18 AM, "Eli Bendersky" wrote: > On Sat, Jan 19, 2013 at 6:34 PM, Nick Coghlan wrote: > >> PEP 3156 currently lists *29* proposed methods for the event loop API. >> These methods serve quite different purposes and I think a bit more >> structure in the overall API could help clarify that. >> >> First proposal: clearly split the abstract EventLoop API from concrete >> DescriptorEventLoop and IOCPEventLoop subclasses. > > >> The main benefit here is to help clarify that: >> 1. the additional methods defined on DescriptorEventLoop and >> IOCPEventLoop are not available on all event loop implementations, so >> any code using them is necessarily event loop specific >> 2. the goal of the transport abstraction is to mask the differences >> between these low level platform specific APIs >> 3. other event loops are free to use a completely different API >> between their low level transports and the event loop >> >> > I like the idea of splitting up the big interface, but could you clarify > what would go into such subclasses? I.e. isn't the current EventLoop > interface supposed to represent an interface all event loops will adhere to? > > And sorry if this was discussed before and I'm missing the context, but > what kinds of EventLoop implementations are we expecting to see eventually? > Is it only a matter of implementing the API per platform (akin to the > current tulip.unix_events.UnixEventLoop) or a broader expectation of > frameworks like Twisted to plug into the API by providing their own > implementation (PEP 3156 mentions this somewhere). > > >> Second proposal: better separate the "event loop management", "event >> monitoring" and "do things" methods >> > > > >> >> Do things (and get the result with "yield from"): >> - wrap_future() # Perhaps "wrap_executor_future"? >> - run_in_executor() >> - getaddrinfo() >> - getnameinfo() >> >> Low level transport creation: >> - create_connection() >> - create_pipe() # Once it exists in the PEP >> >> > +1 These certainly look somewhat out of place in the generic EventLoop > API, but concretely - how do you propose to structure the split? > > Eli > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 20 20:03:22 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 20 Jan 2013 11:03:22 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130120133654.60dbfdb2@pitrou.net> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <20130120133654.60dbfdb2@pitrou.net> Message-ID: On Sun, Jan 20, 2013 at 4:36 AM, Antoine Pitrou wrote: > On Sat, 19 Jan 2013 20:35:04 -0800 > Guido van Rossum wrote: >> IOW, even though a transport may "have" a protocol without a >> connection, nobody should care about that state, and nobody should be >> calling its methods (again, write() etc.) in that state. In fact, >> nobody except event loop internal code should ever have a reference to >> a transport in that state. > > This is just not true. When the connection breaks, the protocol still > has a reference to the transport and may still be trying to do things > with the transport (because connection_lost() has not been called yet). That's a different case though. There once *was* a connection. You are right that the transport needs to protect itself against the protocol making further calls to the transport API in this case. Anyway, I think Nick is okay with the separation between the protocol_factory() call and the connection_made() call, as long as the future returned by create_connection() isn't marked done until the connection_made() call returns. That's an easy fix in the current Tulip code. It's a little harder though to fix up the PEP to clarify all this... -- --Guido van Rossum (python.org/~guido) From barry at python.org Sun Jan 20 21:38:10 2013 From: barry at python.org (Barry Warsaw) Date: Sun, 20 Jan 2013 15:38:10 -0500 Subject: [Python-ideas] chdir context manager References: <20130119101024.GB2969@lp-shahaf.local> <50FABCD8.9080709@python.org> Message-ID: <20130120153810.331f685b@anarchist.wooz.org> On Jan 19, 2013, at 04:33 PM, Christian Heimes wrote: >chdir() is not a safe operation because if affects the whole process. >You can NOT make it work properly and safe in a multi-threaded >environment or from code like signal handlers. I've used a homebrewed chdir() context manager but only in certain limited and specific test cases. It's easy enough to write so it doesn't bother me that once in a while I have to. -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From andrew at bemusement.org Sun Jan 20 23:53:43 2013 From: andrew at bemusement.org (Andrew Bennetts) Date: Mon, 21 Jan 2013 09:53:43 +1100 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: <20130120225343.GA26816@flay.puzzling.org> Guido van Rossum wrote: [?] > I have a more-or-less working but probably incomplete version checked > into the tulip repo: > http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py > > Note that this completely ignores stderr -- this makes the code > simpler while still useful (there's plenty of useful stuff you can do > without reading stderr), and avoids the questions Greg Ewing brought > up about needing two transports (one for stdout, another for stderr). Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual convention, it's not the only possibility, so that configuration shouldn't be hard-coded. On POSIX some programs can and do make use of the ability to have more pipes to a subprocess; e.g. the various *fd options of gnupg (--status-fd, --logger-fd, --command-fd, and so on). And some programs give the child process file descriptors that aren't pipes, like sockets (e.g. an inetd-like server that accepts a socket then spawns a subprocess to serve it). So I hope tulip will support these possibilities (although obviously the stdin/out/err style should be the convenient default). You will be unsurprised to hear that Twisted does :) (Please forgive me if this was already pointed out. It's hard keeping up with python-ideas.) -Andrew. From p.f.moore at gmail.com Mon Jan 21 00:25:12 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 20 Jan 2013 23:25:12 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130120225343.GA26816@flay.puzzling.org> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130120225343.GA26816@flay.puzzling.org> Message-ID: On 20 January 2013 22:53, Andrew Bennetts wrote: > Guido van Rossum wrote: > [?] >> I have a more-or-less working but probably incomplete version checked >> into the tulip repo: >> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py >> >> Note that this completely ignores stderr -- this makes the code >> simpler while still useful (there's plenty of useful stuff you can do >> without reading stderr), and avoids the questions Greg Ewing brought >> up about needing two transports (one for stdout, another for stderr). > > Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual > convention, it's not the only possibility, so that configuration > shouldn't be hard-coded. On POSIX some programs can and do make use of > the ability to have more pipes to a subprocess; e.g. the various *fd > options of gnupg (--status-fd, --logger-fd, --command-fd, and so on). > And some programs give the child process file descriptors that aren't > pipes, like sockets (e.g. an inetd-like server that accepts a socket > then spawns a subprocess to serve it). > > So I hope tulip will support these possibilities (although obviously the > stdin/out/err style should be the convenient default). You will be > unsurprised to hear that Twisted does :) My plan is to modify Guido's current code to take a subprocess.Popen object when creating a connection to a subprocess. So you'd use the existing API to start the process, and then tulip to interact with it. Having said that, I have no idea if or how subprocess.Popen would support the extra fds you are talking about. If you can show me some sample code, I can see what would be needed to handle it. But as far as I know, subprocess.Popen objects only have the 3 standard handles exposed as attributes - stdin, stdout and stderr. If you have to create your own pipes and manage them yourself in "normal" code, then I would expect that you'd have to do the same with tulip. That may indicate a need for (yet another) event loop API to create a pipe which can then be used with subprocess. Or you could use the add_reader/add_writer interfaces, at the expense of portability. Paul PS The above is still my plan. But at the moment, every PC in my house seems to have decided to stop working, so I'm rebuilding PCs rather than doing anything useful :-( Normal service will be resumed in due course... From eliben at gmail.com Mon Jan 21 00:29:50 2013 From: eliben at gmail.com (Eli Bendersky) Date: Sun, 20 Jan 2013 15:29:50 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130120225343.GA26816@flay.puzzling.org> Message-ID: On Sun, Jan 20, 2013 at 3:25 PM, Paul Moore wrote: > On 20 January 2013 22:53, Andrew Bennetts wrote: > > Guido van Rossum wrote: > > [?] > >> I have a more-or-less working but probably incomplete version checked > >> into the tulip repo: > >> > http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py > >> > >> Note that this completely ignores stderr -- this makes the code > >> simpler while still useful (there's plenty of useful stuff you can do > >> without reading stderr), and avoids the questions Greg Ewing brought > >> up about needing two transports (one for stdout, another for stderr). > > > > Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual > > convention, it's not the only possibility, so that configuration > > shouldn't be hard-coded. On POSIX some programs can and do make use of > > the ability to have more pipes to a subprocess; e.g. the various *fd > > options of gnupg (--status-fd, --logger-fd, --command-fd, and so on). > > And some programs give the child process file descriptors that aren't > > pipes, like sockets (e.g. an inetd-like server that accepts a socket > > then spawns a subprocess to serve it). > > > > So I hope tulip will support these possibilities (although obviously the > > stdin/out/err style should be the convenient default). You will be > > unsurprised to hear that Twisted does :) > > My plan is to modify Guido's current code to take a subprocess.Popen > object when creating a connection to a subprocess. So you'd use the > existing API to start the process, and then tulip to interact with it. > Having said that, I have no idea if or how subprocess.Popen would > support the extra fds you are talking about. If you can show me some > sample code, I can see what would be needed to handle it. But as far > as I know, subprocess.Popen objects only have the 3 standard handles > exposed as attributes - stdin, stdout and stderr. > > subprocess.Popen has the pass_fds argument, documented as follows: *pass_fds* is an optional sequence of file descriptors to keep open between the parent and child. Providing any *pass_fds* forces *close_fds*to be True . (Unix only) Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Jan 21 00:40:34 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 20 Jan 2013 23:40:34 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130120225343.GA26816@flay.puzzling.org> Message-ID: On 20 January 2013 23:29, Eli Bendersky wrote: > subprocess.Popen has the pass_fds argument, documented as follows: > > pass_fds is an optional sequence of file descriptors to keep open > between the parent and child. Providing any pass_fds forces close_fds to be > True. (Unix only) I thought that was the case, but it seems like this is only really enabling you to manually manage the extra pipes as I was suggesting in my comment. My current expectation is that the API would be something like eventloop.connect_process(protocol_factory, popen_obj) and the protocol would have data_received and err_received methods called when the stdout or stderr fds have data, and the transport would have a write method to write to stdin. If anyone has a suggestion for an API that could be used for arbitrary FDs (which I presume could be either input or output) on top of this, I'd be happy to incorporate it - but personally, I can't think of anything that wouldn't be unusably complex :-( Paul From ncoghlan at gmail.com Mon Jan 21 08:52:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 21 Jan 2013 17:52:38 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <20130120133654.60dbfdb2@pitrou.net> Message-ID: On Mon, Jan 21, 2013 at 5:03 AM, Guido van Rossum wrote: > Anyway, I > think Nick is okay with the separation between the protocol_factory() > call and the connection_made() call, as long as the future returned by > create_connection() isn't marked done until the connection_made() call > returns. That's an easy fix in the current Tulip code. It's a little > harder though to fix up the PEP to clarify all this... Right, I understand what the separate method enables now. I think one way to make it clearer in the PEP is to require that "connection_made" return a Future or coroutine, rather than being an ordinary method returning None. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Mon Jan 21 17:13:45 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Jan 2013 08:13:45 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <20130120225343.GA26816@flay.puzzling.org> References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <20130120225343.GA26816@flay.puzzling.org> Message-ID: On Sun, Jan 20, 2013 at 2:53 PM, Andrew Bennetts wrote: > Guido van Rossum wrote: > [?] >> I have a more-or-less working but probably incomplete version checked >> into the tulip repo: >> http://code.google.com/p/tulip/source/browse/tulip/subprocess_transport.py >> >> Note that this completely ignores stderr -- this makes the code >> simpler while still useful (there's plenty of useful stuff you can do >> without reading stderr), and avoids the questions Greg Ewing brought >> up about needing two transports (one for stdout, another for stderr). > > Although 3 pipes to a subprocess (stdin, stdout, stderr) is the usual > convention, it's not the only possibility, so that configuration > shouldn't be hard-coded. On POSIX some programs can and do make use of > the ability to have more pipes to a subprocess; e.g. the various *fd > options of gnupg (--status-fd, --logger-fd, --command-fd, and so on). > And some programs give the child process file descriptors that aren't > pipes, like sockets (e.g. an inetd-like server that accepts a socket > then spawns a subprocess to serve it). Hm. I agree that something to represent an arbitrary pipe or pair of pipes may be useful occasionally, and we need to have an implementation that can deal with stdout and stderr separately anyway, but I don't think such extended configurations are common enough that we need to completely generalize the API. I think it is fine to follow the example of subprocess.py, which allows but does not encourage extra pipes and treats stdin, stdout and stderr differently. > So I hope tulip will support these possibilities (although obviously the > stdin/out/err style should be the convenient default). You will be > unsurprised to hear that Twisted does :) > > (Please forgive me if this was already pointed out. It's hard keeping > up with python-ideas.) -- --Guido van Rossum (python.org/~guido) From guido at python.org Mon Jan 21 17:22:19 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Jan 2013 08:22:19 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <20130120133654.60dbfdb2@pitrou.net> Message-ID: On Sun, Jan 20, 2013 at 11:52 PM, Nick Coghlan wrote: > On Mon, Jan 21, 2013 at 5:03 AM, Guido van Rossum wrote: >> Anyway, I >> think Nick is okay with the separation between the protocol_factory() >> call and the connection_made() call, as long as the future returned by >> create_connection() isn't marked done until the connection_made() call >> returns. That's an easy fix in the current Tulip code. It's a little >> harder though to fix up the PEP to clarify all this... > > Right, I understand what the separate method enables now. I think one > way to make it clearer in the PEP is to require that "connection_made" > return a Future or coroutine, rather than being an ordinary method > returning None. Hm. This would seem to introduce Futures / coroutines at the wrong level (I want to allow protocol implementers to use them, but not require them). If connection_made() wants to initiate some blocking I/O, it is free to do so, but it ought to wrap that in a Task. If the class needs completion of this task to be a prerequisite for handling data passed to a subsequent data_received() call, it will need to devise some buffering and/or locking scheme that's outside the scope of the PEP. Note that I am also hoping to produce a more coroutine-oriented style for writing protocols. The main piece of code for this already exists, the StreamReader class (http://code.google.com/p/tulip/source/browse/tulip/http_client.py?r=b1028ab02dc0f722d790aac4768663a972d9d555#37), but I need to think about how to hook it all together nicely (for writing, the transport's API is ready to be used by coroutines). -- --Guido van Rossum (python.org/~guido) From Steve.Dower at microsoft.com Mon Jan 21 17:50:12 2013 From: Steve.Dower at microsoft.com (Steve Dower) Date: Mon, 21 Jan 2013 16:50:12 +0000 Subject: [Python-ideas] chdir context manager In-Reply-To: <20130120153810.331f685b@anarchist.wooz.org> References: <20130119101024.GB2969@lp-shahaf.local> <50FABCD8.9080709@python.org> <20130120153810.331f685b@anarchist.wooz.org> Message-ID: FWIW, when Windows revised their API set, GetCurrentDirectory and SetCurrentDirectory were completely removed. This seems a pretty strong move away from these APIs. (This only applies to new-style Windows 8 apps; desktop apps can still call them, but the intent is clear.) Cheers, Steve > -----Original Message----- > From: Python-ideas [mailto:python-ideas- > bounces+steve.dower=microsoft.com at python.org] On Behalf Of Barry > Warsaw > Sent: Sunday, January 20, 2013 1238 > To: python-ideas at python.org > Subject: Re: [Python-ideas] chdir context manager > > On Jan 19, 2013, at 04:33 PM, Christian Heimes wrote: > > >chdir() is not a safe operation because if affects the whole process. > >You can NOT make it work properly and safe in a multi-threaded > >environment or from code like signal handlers. > > I've used a homebrewed chdir() context manager but only in certain limited > and specific test cases. It's easy enough to write so it doesn't bother me that > once in a while I have to. > > -Barry From storchaka at gmail.com Mon Jan 21 20:20:08 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Mon, 21 Jan 2013 21:20:08 +0200 Subject: [Python-ideas] More details in MemoryError Message-ID: I propose to add new optional attributes to MemoryError, which show how many memory was required in failed allocation and how many memory was used at this moment. From ben at bendarnell.com Mon Jan 21 22:23:06 2013 From: ben at bendarnell.com (Ben Darnell) Date: Mon, 21 Jan 2013 16:23:06 -0500 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum wrote: > On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing > wrote: > > Paul Moore wrote: > >> > >> PS From the PEP, it seems that a protocol must implement the 4 methods > >> connection_made, data_received, eof_received and connection_lost. For > >> a process, which has 2 output streams involved, a single data_received > >> method isn't enough. > > > It looks like there would have to be at least two Transport instances > > involved, one for stdin/stdout and one for stderr. > > > > Connecting them both to a single Protocol object doesn't seem to be > > possible with the framework as defined. You would have to use a > > couple of adapter objects to translate the data_received calls into > > calls on different methods of another object. > > So far this makes sense. > > But for this specific case there's a simpler solution -- require the > protocol to support a few extra methods, in particular, > err_data_received() and err_eof_received(), which are to stderr what > data_received() and eof_received() are for stdout. (After all, the > point of a subprocess is that "normal" data goes to stdout.) There's > only one input stream to the subprocess, so there's no ambiguity for > write(), and neither is there a need for multiple > connection_made()/lost() methods. (However, we could argue endlessly > over whether connection_lost() should be called when the subprocess > exits, or when the other side of all three pipes is closed. :-) > > Using separate methods for stderr breaks compatibility with existing Protocols for no good reason (UDP needs a different protocol interface because individual datagrams can't be concatenated; that doesn't apply here since pipes are stream-oriented). We'll have intermediate Protocol classes like LineReceiver that work with sockets; why should they be reimplemented for stderr? It's also likely that if I do care about both stdout and stderr, I'm going to take stdout as a blob and redirect it to a file, but I'll want to read stderr with a line-oriented protocol to get error messages, so I don't think we want to favor stdout over stderr in the interface. I think we should have a pipe-based Transport and the subprocess should just contain several of these transports (depending on which fds the caller cares about; in my experience I rarely have more than one pipe per subprocess, but whether that pipe is stdout or stderr varies). The process object itself should also be able to run a callback when the child exits; waiting for the standard streams to close is sufficient in most cases but not always. -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjamin at python.org Mon Jan 21 23:12:10 2013 From: benjamin at python.org (Benjamin Peterson) Date: Mon, 21 Jan 2013 22:12:10 +0000 (UTC) Subject: [Python-ideas] More details in MemoryError References: Message-ID: Serhiy Storchaka writes: > > I propose to add new optional attributes to MemoryError, which show how > many memory was required in failed allocation and how many memory was > used at this moment. What is this useful for? From ben at bendarnell.com Mon Jan 21 23:13:37 2013 From: ben at bendarnell.com (Ben Darnell) Date: Mon, 21 Jan 2013 17:13:37 -0500 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? Message-ID: While working on proof-of-concept tornado/tulip integration ( https://gist.github.com/4582282), I found a few methods that could not easily be implemented on top of the tornado IOLoop because they rely on details that Tornado does not expose. While it wouldn't be hard to add support for these methods to Tornado, I would argue that they are unnecessary and expose implementation details, and so they are good candidates for removal from this already very broad interface. First, run_once and call_every_iteration both expose the event loop's underlying iterations to the application. The trouble is that the duration of one iteration is so widely variable that it's not a very useful concept (and when implementing the EventLoop interface on top of some existing event loop these methods may not be available). When is it better to use run_once instead of just using call_later to schedule a stop after a short timeout, or call_every_iteration instead of call_repeatedly? Second, while run_until_idle is convenient (especially for tests), it's kind of fragile and exposes you to implementation details in the libraries you use. If anyone uses call_repeatedly, run_until_idle won't work unless that callback is cancelled. As an example, I once had to introduce Tornado's equivalent of call_repeatedly in a library to work around a bug in libcurl. If had been using run_until_idle in my tests, they'd have all broken. I think we should either remove run_until_idle or add a "daemon" flag to call_repeatedly (and call_later, and possibly others). -Ben -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Jan 22 03:31:41 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 21 Jan 2013 18:31:41 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On Mon, Jan 21, 2013 at 1:23 PM, Ben Darnell wrote: > On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum wrote: >> >> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing >> wrote: >> > Paul Moore wrote: >> >> >> >> PS From the PEP, it seems that a protocol must implement the 4 methods >> >> connection_made, data_received, eof_received and connection_lost. For >> >> a process, which has 2 output streams involved, a single data_received >> >> method isn't enough. >> >> > It looks like there would have to be at least two Transport instances >> > involved, one for stdin/stdout and one for stderr. >> > >> > Connecting them both to a single Protocol object doesn't seem to be >> > possible with the framework as defined. You would have to use a >> > couple of adapter objects to translate the data_received calls into >> > calls on different methods of another object. >> >> So far this makes sense. >> >> But for this specific case there's a simpler solution -- require the >> protocol to support a few extra methods, in particular, >> err_data_received() and err_eof_received(), which are to stderr what >> data_received() and eof_received() are for stdout. (After all, the >> point of a subprocess is that "normal" data goes to stdout.) There's >> only one input stream to the subprocess, so there's no ambiguity for >> write(), and neither is there a need for multiple >> connection_made()/lost() methods. (However, we could argue endlessly >> over whether connection_lost() should be called when the subprocess >> exits, or when the other side of all three pipes is closed. :-) > Using separate methods for stderr breaks compatibility with existing > Protocols for no good reason (UDP needs a different protocol interface > because individual datagrams can't be concatenated; that doesn't apply here > since pipes are stream-oriented). We'll have intermediate Protocol classes > like LineReceiver that work with sockets; why should they be reimplemented > for stderr? This is a good point. > It's also likely that if I do care about both stdout and > stderr, I'm going to take stdout as a blob and redirect it to a file, but > I'll want to read stderr with a line-oriented protocol to get error > messages, so I don't think we want to favor stdout over stderr in the > interface. That all depends rather on the application. > I think we should have a pipe-based Transport and the subprocess should just > contain several of these transports (depending on which fds the caller cares > about; in my experience I rarely have more than one pipe per subprocess, but > whether that pipe is stdout or stderr varies). The process object itself > should also be able to run a callback when the child exits; waiting for the > standard streams to close is sufficient in most cases but not always. Unfortunately you'll also need a separate protocol for each transport, since the transport calls methods with fixed names on the protocol (and you've just argued that that we should stick to that -- and I agree :-). Note that since there's (normally) only one input file to the subprocess, only one of these transports should have a write() method -- but both of them have to call data_received() and potentially eof_received() on different objects. And in this case it doesn't seem easy to use the StreamReader class, since you can't know which of the two (stdout or stderr) will have data available first, and guessing wrong might cause a deadlock. (So, yes, this is a case where coroutines are less convenient than callbacks.) -- --Guido van Rossum (python.org/~guido) From ben at bendarnell.com Tue Jan 22 04:29:57 2013 From: ben at bendarnell.com (Ben Darnell) Date: Mon, 21 Jan 2013 22:29:57 -0500 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: On Mon, Jan 21, 2013 at 9:31 PM, Guido van Rossum wrote: > On Mon, Jan 21, 2013 at 1:23 PM, Ben Darnell wrote: > > On Fri, Jan 18, 2013 at 5:15 PM, Guido van Rossum > wrote: > >> > >> On Thu, Jan 17, 2013 at 11:17 PM, Greg Ewing > >> wrote: > >> > Paul Moore wrote: > >> >> > >> >> PS From the PEP, it seems that a protocol must implement the 4 > methods > >> >> connection_made, data_received, eof_received and connection_lost. For > >> >> a process, which has 2 output streams involved, a single > data_received > >> >> method isn't enough. > >> > >> > It looks like there would have to be at least two Transport instances > >> > involved, one for stdin/stdout and one for stderr. > >> > > >> > Connecting them both to a single Protocol object doesn't seem to be > >> > possible with the framework as defined. You would have to use a > >> > couple of adapter objects to translate the data_received calls into > >> > calls on different methods of another object. > >> > >> So far this makes sense. > >> > >> But for this specific case there's a simpler solution -- require the > >> protocol to support a few extra methods, in particular, > >> err_data_received() and err_eof_received(), which are to stderr what > >> data_received() and eof_received() are for stdout. (After all, the > >> point of a subprocess is that "normal" data goes to stdout.) There's > >> only one input stream to the subprocess, so there's no ambiguity for > >> write(), and neither is there a need for multiple > >> connection_made()/lost() methods. (However, we could argue endlessly > >> over whether connection_lost() should be called when the subprocess > >> exits, or when the other side of all three pipes is closed. :-) > > > Using separate methods for stderr breaks compatibility with existing > > Protocols for no good reason (UDP needs a different protocol interface > > because individual datagrams can't be concatenated; that doesn't apply > here > > since pipes are stream-oriented). We'll have intermediate Protocol > classes > > like LineReceiver that work with sockets; why should they be > reimplemented > > for stderr? > > This is a good point. > > > It's also likely that if I do care about both stdout and > > stderr, I'm going to take stdout as a blob and redirect it to a file, but > > I'll want to read stderr with a line-oriented protocol to get error > > messages, so I don't think we want to favor stdout over stderr in the > > interface. > > That all depends rather on the application. > Exactly. > > > I think we should have a pipe-based Transport and the subprocess should > just > > contain several of these transports (depending on which fds the caller > cares > > about; in my experience I rarely have more than one pipe per subprocess, > but > > whether that pipe is stdout or stderr varies). The process object itself > > should also be able to run a callback when the child exits; waiting for > the > > standard streams to close is sufficient in most cases but not always. > > Unfortunately you'll also need a separate protocol for each transport, > since the transport calls methods with fixed names on the protocol > (and you've just argued that that we should stick to that -- and I > agree :-). Well, to be precise I was arguing that pipe transports should work the same way as socket transports. I'm still not a fan of the use of fixed method names. (As an alternative, what if protocols were just callables that took a Future argument? for data_received future.result() would return the data and for eof_received and connection_lost it would raise an appropriate exception type. That just leaves connection_made, which I was arguing in the other thread should be on the protocol factory instead of the protocol.) > Note that since there's (normally) only one input file to > the subprocess, only one of these transports should have a write() > method -- but both of them have to call data_received() and > potentially eof_received() on different objects. > I'd actually give stdin its own transport and protocol, distinct from stdout and stderr (remember that using all three pipes on the same process is relatively uncommon). It's a degenerate case since it will never call data_received, but it's analogous to the way that subprocess uses three read-only and write-only file objects instead of trying to glue stdin and stdout together. This is fairly new and little-tested, but it shows the interface I have in mind: http://tornado.readthedocs.org/en/latest/process.html#tornado.process.Subprocess > > And in this case it doesn't seem easy to use the StreamReader class, > since you can't know which of the two (stdout or stderr) will have > data available first, and guessing wrong might cause a deadlock. (So, > yes, this is a case where coroutines are less convenient than > callbacks.) > I'm not sure I follow. Couldn't you just attach a StreamReader to each stream and use as_completed to read from them both in parallel? You'd get in trouble if one of the streams has a line longer than the StreamReader's buffer size, but that sort of peril is everywhere if you're using both stdout and stderr, no matter what the interface is (unless you just use a large or unlimited buffer and hope you won't run out of memory, like subprocess.communicate). At least with "yield from stderr_stream.readline()" you're better off than with a synchronous subprocess since the StreamReader's buffer size is adjustable, unlike the pipe buffer size. -Ben > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Tue Jan 22 05:13:50 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 22 Jan 2013 17:13:50 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F8F725.20505@canterbury.ac.nz> Message-ID: <50FE11FE.80905@canterbury.ac.nz> Guido van Rossum wrote: > And in this case it doesn't seem easy to use the StreamReader class, > since you can't know which of the two (stdout or stderr) will have > data available first, and guessing wrong might cause a deadlock. I don't see the problem. You run two Tasks, one handling stdin/stdout and one handling stderr. (Or three tasks if stdin and stdout are not synchronised.) Seems like an ideal use case for coroutines to me. -- Greg From greg.ewing at canterbury.ac.nz Tue Jan 22 05:17:57 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 22 Jan 2013 17:17:57 +1300 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> References: <50F8D695.3050002@canterbury.ac.nz> <50F9E1EA.4010305@canterbury.ac.nz> <1D762043-961F-4F1E-A0A6-0C8C78AFA59B@twistedmatrix.com> <96E6B3B4-FC23-4AEE-AE8E-E16A5AA54B55@twistedmatrix.com> Message-ID: <50FE12F5.80108@canterbury.ac.nz> Glyph wrote: > > this means that the protocol will > immediately begin interacting with the transport in this vague, > undefined, not quite connected state, You still haven't explained why the protocol can't simply refrain from doing anything with the transport until its connection_made() is called. If a transport is always to be assumed ready-to-go as soon as it's exposed to the outside world, what is the point of having connection_made() at all? -- Greg From phd at phdru.name Mon Jan 21 20:35:48 2013 From: phd at phdru.name (Oleg Broytman) Date: Mon, 21 Jan 2013 23:35:48 +0400 Subject: [Python-ideas] More details in MemoryError In-Reply-To: References: Message-ID: <20130121193548.GA20342@iskra.aviel.ru> On Mon, Jan 21, 2013 at 09:20:08PM +0200, Serhiy Storchaka wrote: > I propose to add new optional attributes to MemoryError, which show > how many memory was required in failed allocation and how many > memory was used at this moment. I'd very much like to see a situation when a program can survive MemoryError. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From cf.natali at gmail.com Tue Jan 22 08:13:23 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Tue, 22 Jan 2013 08:13:23 +0100 Subject: [Python-ideas] More details in MemoryError In-Reply-To: <20130121193548.GA20342@iskra.aviel.ru> References: <20130121193548.GA20342@iskra.aviel.ru> Message-ID: 2013/1/21 Oleg Broytman : > I'd very much like to see a situation when a program can survive > MemoryError. Let's say your using an image processing program. You have several images open on which you've been working for a couple minutes/hours. You open a new one, and it's so large that it results in MemoryError : instead of just losing all your current work (yeah, the program should support auto-save anyway, but let's pretend it doesn't), the program catches MemoryError, and displays a popup saying "No enough memory to process this image". Now, sure, there are cases where an OOM condition will result in thrashing to death, or simply because of overcommit malloc() will never return NULL and you'll get nuked by the OOM killer, but depending on your operating system and allocation pattern, there are times when you can reasonably recover from a MemoryError. Also, a memory allocation failure doesn't necessarily mean you're OOM, it could be that youve exhausted your address space (on 32-bit), or hit RLIMIT_VM/RLIMIT_DATA. 2013/1/21 Benjamin Peterson : > What is this useful for? Even if the exception isn't caught, if the extra information gets dumped in the traceback, it can be used for post-mortem debugging (to help distinguish between OOM, address space exhaustion, heap fragmentation, overflow in computation of malloc() argument, etc). So I think it could probably be useful, but I see two problems: - right now, the amount of memory isn't tracked. IIRC, Antoine added recently a counter for allocated blocks, not bytes - the exception is raised at the calling site where the allocation routine failed (this comes from Modules/_pickle.c): """ PyMemoTable *memo = PyMem_MALLOC(sizeof(PyMemoTable)); if (memo == NULL) { PyErr_NoMemory(); return NULL; } """ So we can't easily capture the current allocated memory and the requested memory (the former could probably be retrieved in PyErr_NoMemory(), but the later would require modifying every call site and repeating it). From geertj at gmail.com Tue Jan 22 09:04:30 2013 From: geertj at gmail.com (Geert Jansen) Date: Tue, 22 Jan 2013 09:04:30 +0100 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Mon, Jan 21, 2013 at 11:13 PM, Ben Darnell wrote: > While working on proof-of-concept tornado/tulip integration > (https://gist.github.com/4582282), I found a few methods that could not > easily be implemented on top of the tornado IOLoop because they rely on > details that Tornado does not expose. While it wouldn't be hard to add > support for these methods to Tornado, I would argue that they are > unnecessary and expose implementation details, and so they are good > candidates for removal from this already very broad interface. > > First, run_once and call_every_iteration both expose the event loop's > underlying iterations to the application. The trouble is that the duration > of one iteration is so widely variable that it's not a very useful concept > (and when implementing the EventLoop interface on top of some existing event > loop these methods may not be available). When is it better to use run_once > instead of just using call_later to schedule a stop after a short timeout, > or call_every_iteration instead of call_repeatedly? - run_once() vs call_later(0) is probably the same thing and just an matter of API design. If Tornado has call_later() it might be able to emulate call_once() as call_later(0), depending on how call_once works. In Guido's latest code for example call_once() callbacks, when added inside a callback, will run in the *next* iteration. This makes call_soon() and call_later(0) the same. - call_every_iteration() vs call_repeatedly(): you really need both. I did a small proof of concept to integrate libdbus with the tulip event loop. I use call_every_iteration() to dispatch events every time after IO has happened. The idea is that events will always originate from IO, and therefore having a callback on every iteration is a convenient way to check for events that need to be dispatched. Using call_repeatedly() here is not right, because there may be times that there are 100s of events per second, and times there are none. There is no sensible fixed polling frequency. If Tornado doesn't have infrastructure for call_every_iteration() you could emulate it with a function that re-reschedules itself using call_soon() just before calling the callback. (See my first point about when call_soon() callbacks are scheduled.) If you want to see how event loop adapters for libev and libuv look like, you can check out my project here: https://github.com/geertj/looping Regards, Geert From ncoghlan at gmail.com Tue Jan 22 09:53:49 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jan 2013 18:53:49 +1000 Subject: [Python-ideas] More details in MemoryError In-Reply-To: References: <20130121193548.GA20342@iskra.aviel.ru> Message-ID: There's a bigger reason memory error must be stateless: we preallocate and reuse it. -- Sent from my phone, thus the relative brevity :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Jan 22 10:00:48 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 22 Jan 2013 11:00:48 +0200 Subject: [Python-ideas] More details in MemoryError In-Reply-To: References: Message-ID: On 22.01.13 00:12, Benjamin Peterson wrote: > Serhiy Storchaka writes: >> I propose to add new optional attributes to MemoryError, which show how >> many memory was required in failed allocation and how many memory was >> used at this moment. > > What is this useful for? Bigmem testing. From solipsis at pitrou.net Tue Jan 22 10:12:21 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 22 Jan 2013 10:12:21 +0100 Subject: [Python-ideas] More details in MemoryError References: <20130121193548.GA20342@iskra.aviel.ru> Message-ID: <20130122101221.3e7b44a6@pitrou.net> Le Tue, 22 Jan 2013 18:53:49 +1000, Nick Coghlan a ?crit : > There's a bigger reason memory error must be stateless: we > preallocate and reuse it. Not anymore, it's a freelist now: http://hg.python.org/cpython/file/e8f40d4f497c/Objects/exceptions.c#l2123 The "stateless" part was bogus in Python 3, because of the embedded traceback and context. Regards Antoine. From solipsis at pitrou.net Tue Jan 22 10:14:38 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 22 Jan 2013 10:14:38 +0100 Subject: [Python-ideas] More details in MemoryError References: Message-ID: <20130122101438.42a58bc0@pitrou.net> Le Mon, 21 Jan 2013 21:20:08 +0200, Serhiy Storchaka a ?crit : > I propose to add new optional attributes to MemoryError, which show > how many memory was required in failed allocation and how many memory > was used at this moment. +1 on the principle. I hope you can devise an implementation :-) Regards Antoine. From ncoghlan at gmail.com Tue Jan 22 11:42:38 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jan 2013 20:42:38 +1000 Subject: [Python-ideas] More details in MemoryError In-Reply-To: <20130122101221.3e7b44a6@pitrou.net> References: <20130121193548.GA20342@iskra.aviel.ru> <20130122101221.3e7b44a6@pitrou.net> Message-ID: On Tue, Jan 22, 2013 at 7:12 PM, Antoine Pitrou wrote: > Le Tue, 22 Jan 2013 18:53:49 +1000, > Nick Coghlan a > ?crit : >> There's a bigger reason memory error must be stateless: we >> preallocate and reuse it. > > Not anymore, it's a freelist now: > http://hg.python.org/cpython/file/e8f40d4f497c/Objects/exceptions.c#l2123 > > The "stateless" part was bogus in Python 3, because of the embedded > traceback and context. Oh cool, I forgot about that change. In that case, +0 for at least reporting how much memory was being requested for the call that failed, even if that only turns out to be useful in our own test suite. -0 for the "currently allocated" suggestion though, as I don't see how we can provide a meaningful value for that (too much memory usage can be outside of the controller of the Python memory allocator, and we don't even track our own usage all that closely in non-debug builds). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Jan 22 11:50:28 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 22 Jan 2013 11:50:28 +0100 Subject: [Python-ideas] More details in MemoryError References: <20130121193548.GA20342@iskra.aviel.ru> <20130122101221.3e7b44a6@pitrou.net> Message-ID: <20130122115028.537fb2fc@pitrou.net> Le Tue, 22 Jan 2013 20:42:38 +1000, Nick Coghlan a ?crit : > > In that case, +0 for at least reporting how much memory was being > requested for the call that failed, even if that only turns out to be > useful in our own test suite. -0 for the "currently allocated" > suggestion though, as I don't see how we can provide a meaningful > value for that (too much memory usage can be outside of the controller > of the Python memory allocator, and we don't even track our own usage > all that closely in non-debug builds). Windows makes it easy to retrive the current process' memory statistics: http://hg.python.org/benchmarks/file/43f8a0f5edd3/perf.py#l240 As usual, though, POSIX platforms are stupidly painful to work with: http://hg.python.org/benchmarks/file/43f8a0f5edd3/perf.py#l202 Regards Antoine. From p.f.moore at gmail.com Tue Jan 22 13:03:43 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Jan 2013 12:03:43 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 19 January 2013 12:12, Paul Moore wrote: >> I would love for you to create that version. I only checked it in so I >> could point to it -- I am not happy with either the implementation, >> the API spec, or the unit test... > > May be a few days before I can get to it. OK, I finally have a working VM. The subprocess test code assumes that it can call transport.write_eof() in the protocol connection_made() function. I'm not sure if that is fundamental, or just an artifact of the current implementation. Certainly if you have a stdin pipe open, you likely want to close it to avoid deadlocks, but with the subprocess.Popen approach, it's entirely possible to not open a pipe to stdin. In that case, writing to stdin is neither possible nor necessary. Clearly, writing data to stdin if you didn't open a pipe should be flagged as an error. And my immediate thought is that write_eof should also be an error. But I can imagine people wanting to write reusable protocols that pre-emptively write EOF to the stdin pipe to avoid deadlocks. So, a question: If the user passed a popen object without a stdin pipe, should write_eof be an error or should it just silently do nothing? Paul From steve at pearwood.info Tue Jan 22 13:42:05 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 22 Jan 2013 23:42:05 +1100 Subject: [Python-ideas] More details in MemoryError In-Reply-To: References: Message-ID: <50FE891D.2080603@pearwood.info> On 22/01/13 09:12, Benjamin Peterson wrote: > Serhiy Storchaka writes: > >> >> I propose to add new optional attributes to MemoryError, which show how >> many memory was required in failed allocation and how many memory was >> used at this moment. > > What is this useful for? After locking up a production machine with a foolishly large list multiplication (I left it thrashing overnight, and 16+ hours later gave up and power-cycled the machine), I have come to appreciate ulimit on Linux systems. That means I often see MemoryErrors while testing. [steve at ando ~]$ ulimit -v 20000 [steve at ando ~]$ python3.3 Python 3.3.0rc3 (default, Sep 27 2012, 18:44:58) [GCC 4.1.2 20080704 (Red Hat 4.1.2-52)] on linux Type "help", "copyright", "credits" or "license" for more information. === startup script executed === py> x = [0]*1000000 py> x = [0]*123456789012 # oops what was I thinking? Traceback (most recent call last): File "", line 1, in MemoryError For interactive use, it would be really useful in such a situation to see how much memory was requested and how much was available. That would allow me to roughly estimate (say) how big a list I could make in the available memory, instead of tediously trying larger and smaller lists. Something like this could be used to decide whether or not to flush unimportant in-memory caches, compact data structures, etc., or just give up and exit. -- Steven From rosuav at gmail.com Tue Jan 22 14:04:15 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Jan 2013 00:04:15 +1100 Subject: [Python-ideas] More details in MemoryError In-Reply-To: <50FE891D.2080603@pearwood.info> References: <50FE891D.2080603@pearwood.info> Message-ID: On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano wrote: > Something like this could be used to decide whether or not to flush > unimportant in-memory caches, compact data structures, etc., or just > give up and exit. That's a nice idea, but unless the requested allocation was fairly large, there's a good chance you don't have room to allocate anything more. That may make it a bit tricky to do a compaction operation. But if there's some sort of "automatically freeable memory" (simple example: exception-triggered stack unwinding results in a whole bunch of locals disappearing), and you can stay within that, then you might be able to recover. Would require some tightrope-walking in the exception handler, but ought to be possible. ChrisA From ncoghlan at gmail.com Tue Jan 22 14:34:48 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 22 Jan 2013 23:34:48 +1000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On Tue, Jan 22, 2013 at 10:03 PM, Paul Moore wrote: > So, a question: If the user passed a popen object without a stdin > pipe, should write_eof be an error or should it just silently do > nothing? It should be an error. The analogy is similar to calling flush() vs close(). Calling flush() on an already closed file is an error, while you can call close() as many times as you like. If you want to ensure a pipe is closed gracefully, call close(), not write_eof(). (abort() is the method for abrupt closure). Also, I agree with the comment someone else made that attempting to pair stdin with either stderr or stdout is a bad idea - better to treat them as three independent transports (as the subprocess module does), so that the close() semantics and error handling are clear. sockets are different, as those actually *are* bidirectional data streams, whereas pipes are unidirectional. I don't know whether it's worth defining separate SimplexTransmit (e.g. stdin pipe in parent process, stdout, stderr pipes in child process), SimplexReceive (stdout, stderr pipes in parent process, stdin pip in child process), HalfDuplex (e.g. some radio transmitters) and FullDuplex (e.g. sockets) transport abstractions - I guess if Twisted haven't needed them, it probably isn't worth bothering. It's also fairly obvious how to implement the first three based on the full duplex API currently described in the PEP just be raising the appropriate exceptions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Tue Jan 22 14:49:36 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 22 Jan 2013 14:49:36 +0100 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: <20130122144936.3ead5006@pitrou.net> Le Tue, 22 Jan 2013 23:34:48 +1000, Nick Coghlan a ?crit : > > Also, I agree with the comment someone else made that attempting to > pair stdin with either stderr or stdout is a bad idea - better to > treat them as three independent transports (as the subprocess module > does), so that the close() semantics and error handling are clear. > > sockets are different, as those actually *are* bidirectional data > streams, whereas pipes are unidirectional. +1 > I don't know whether it's worth defining separate SimplexTransmit > (e.g. stdin pipe in parent process, stdout, stderr pipes in child > process), SimplexReceive (stdout, stderr pipes in parent process, > stdin pip in child process), HalfDuplex (e.g. some radio transmitters) > and FullDuplex (e.g. sockets) transport abstractions - I guess if > Twisted haven't needed them, it probably isn't worth bothering. It's an implementation detail, since the user should only see transport instances, not transport classes. (until the user tries to write their own transport, that is) Regards Antoine. From ben at bendarnell.com Tue Jan 22 16:31:22 2013 From: ben at bendarnell.com (Ben Darnell) Date: Tue, 22 Jan 2013 10:31:22 -0500 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 3:04 AM, Geert Jansen wrote: > - call_every_iteration() vs call_repeatedly(): you really need both. I > did a small proof of concept to integrate libdbus with the tulip event > loop. I use call_every_iteration() to dispatch events every time after > IO has happened. The idea is that events will always originate from > IO, and therefore having a callback on every iteration is a convenient > way to check for events that need to be dispatched. Using > call_repeatedly() here is not right, because there may be times that > there are 100s of events per second, and times there are none. There > is no sensible fixed polling frequency. > I don't understand what you mean by "events will always originate from IO" (I don't know anything about libdbus). If the events are coming from IO that causes an event loop iteration, it must be from some tulip callback. Why can't that callback be responsible for scheduling any further dispatching that may be needed? > > If Tornado doesn't have infrastructure for call_every_iteration() you > could emulate it with a function that re-reschedules itself using > call_soon() just before calling the callback. (See my first point > about when call_soon() callbacks are scheduled.) > No, because call_soon (and call_later(0)) cause the event loop to use a timeout of zero on its next poll call, so a function that reschedules itself with call_soon will be a busy loop. There is no good way to emulate call_every_iteration from the other methods; you'll either busy loop with call_soon or use a fixed timeout. If you need it it's an easy thing to offer, but since neither tornado nor twisted have such a method I'm questioning the need. run_once() will run for an unpredictable amount of time (until the next IO or timeout); run_forever() with call_soon(stop) will handle events that are ready at that moment and then stop. -Ben > > If you want to see how event loop adapters for libev and libuv look > like, you can check out my project here: > https://github.com/geertj/looping > > Regards, > Geert > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Tue Jan 22 16:43:51 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Jan 2013 15:43:51 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 22 January 2013 13:34, Nick Coghlan wrote: > Also, I agree with the comment someone else made that attempting to > pair stdin with either stderr or stdout is a bad idea - better to > treat them as three independent transports (as the subprocess module > does), so that the close() semantics and error handling are clear. That was my original feeling - although I made my case badly by arguing in terms of portability rather than clearer design. But Guido argued for a higher-level portable subprocess transport that was implemented "under the hood" using the existing nonportable add_reader/add_writer methods on Unix, and an as-yet-unimplemented IOCP-based alternative on Windows. I still feel that a more general approach would be to have two methods on the event loop connect_input_pipe(protocol_factory, readable_pipe) and connect_output_pipe(protocol_factory, writeable_pipe) which use the standard transport/protocol methods as defined in the PEP. Then the subprocess transport can be layered on top of that as one possible example of a "higher layer" convenience transport. I know that twisted has a create_process event loop (reactor) method, but I suspect part of the reason for that is that it predates the subprocess module's unified interface. I'll try implementing the pipe transport approach and see how it looks in contrast. Paul. From p.f.moore at gmail.com Tue Jan 22 17:10:18 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Jan 2013 16:10:18 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 22 January 2013 15:43, Paul Moore wrote: > I'll try implementing the pipe transport approach and see how it looks > in contrast. Here's a quick proof of concept (for a read pipe): class UnixEventLoop(events.EventLoop): ... @tasks.task def connect_read_pipe(self, protocol_factory, rpipe): protocol = protocol_factory() waiter = futures.Future() transport = _UnixReadPipeTransport(self, rpipe, protocol, waiter) yield from waiter return transport, protocol class _UnixReadPipeTransport(transports.Transport): def __init__(self, event_loop, rpipe, protocol, waiter=None): self._event_loop = event_loop self._pipe = rpipe.fileno() self._protocol = protocol self._buffer = [] self._event_loop.add_reader(self._pipe.fileno(), self._read_ready) self._event_loop.call_soon(self._protocol.connection_made, self) if waiter is not None: self._event_loop.call_soon(waiter.set_result, None) def _read_ready(self): try: data = os.read(self._pipe, 16*1024) except BlockingIOError: return if data: self._event_loop.call_soon(self._protocol.data_received, data) else: self._event_loop.remove_reader(self._pipe) self._event_loop.call_soon(self._protocol.eof_received) Using this to re-implement the subprocess test looks something like this (the protocol is unchanged from the existing test): def testUnixSubprocessWithPipe(self): proc = subprocess.Popen(['/bin/ls', '-lR'], stdout=subprocess.PIPE) t, p = yield from self.event_loop.connect_read_pipe(MyProto, proc.stdout) self.event_loop.run() To be honest, this looks sufficiently straightforward that I don't see the benefit in a less-general high-level transport type... Paul From geertj at gmail.com Tue Jan 22 17:16:15 2013 From: geertj at gmail.com (Geert Jansen) Date: Tue, 22 Jan 2013 17:16:15 +0100 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 4:31 PM, Ben Darnell wrote: > I don't understand what you mean by "events will always originate from IO" > (I don't know anything about libdbus). What I meant is that if there is something to dispatch, then this is due to an inbound IO (or a timeout for that matter). Due either event, the loop will advance by one tick, and hit my call_every_iteration() handler where I dispatch. > If the events are coming from IO > that causes an event loop iteration, it must be from some tulip callback. > Why can't that callback be responsible for scheduling any further > dispatching that may be needed? Well your original question was why not call_repeatedly() instead of call_every_iteration(). I tried to answer that for my use case. Indeed, call_soon() could be used to schedule a dispatch every time when an IO is received. However, I preferred to have a fixed callback that I do not need to allocate and register every time, for efficiency. >> If Tornado doesn't have infrastructure for call_every_iteration() you >> could emulate it with a function that re-reschedules itself using >> call_soon() just before calling the callback. (See my first point >> about when call_soon() callbacks are scheduled.) > > > No, because call_soon (and call_later(0)) cause the event loop to use a > timeout of zero on its next poll call, so a function that reschedules itself > with call_soon will be a busy loop. There is no good way to emulate > call_every_iteration from the other methods; you'll either busy loop with > call_soon or use a fixed timeout. If you need it it's an easy thing to > offer, but since neither tornado nor twisted have such a method I'm > questioning the need. Yes, you're right. I was confusing things with libuv and libev. I may have actually implemented call_soon() the wrong way there :) Maybe I am abusing call_every_iteration() when I use it for dispatching. If you look at the libuv and libev documentation, then they say that their call_every_iteration() equivalents (Prepare and Check) are for integrating with external event loops. So maybe that is the use case. However, I've not looked into this in any detail. If Tornado and Twisted cannot implement call_every_iteration(), then I think that is a good reason to remove it. Regards, Geert From guido at python.org Tue Jan 22 17:33:28 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jan 2013 08:33:28 -0800 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: I am not actually very committed to a particular design for a subprocess transport. I'll happily leave it up to others to come up with a design and make it work on multiple platforms. --Guido van Rossum (sent from Android phone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 22 19:27:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 22 Jan 2013 19:27:10 +0100 Subject: [Python-ideas] More details in MemoryError References: <50FE891D.2080603@pearwood.info> Message-ID: <20130122192710.6f94d16f@pitrou.net> On Wed, 23 Jan 2013 00:04:15 +1100 Chris Angelico wrote: > On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano wrote: > > Something like this could be used to decide whether or not to flush > > unimportant in-memory caches, compact data structures, etc., or just > > give up and exit. > > That's a nice idea, but unless the requested allocation was fairly > large, there's a good chance you don't have room to allocate anything > more. I wouldn't be surprised if most cases of MemoryErrors were on fairly large allocation requests ;-) Regards Antoine. From python at mrabarnett.plus.com Tue Jan 22 19:36:07 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 22 Jan 2013 18:36:07 +0000 Subject: [Python-ideas] More details in MemoryError In-Reply-To: References: <50FE891D.2080603@pearwood.info> Message-ID: <50FEDC17.7010608@mrabarnett.plus.com> On 2013-01-22 13:04, Chris Angelico wrote: > On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano wrote: >> Something like this could be used to decide whether or not to flush >> unimportant in-memory caches, compact data structures, etc., or just >> give up and exit. > > That's a nice idea, but unless the requested allocation was fairly > large, there's a good chance you don't have room to allocate anything > more. That may make it a bit tricky to do a compaction operation. But > if there's some sort of "automatically freeable memory" (simple > example: exception-triggered stack unwinding results in a whole bunch > of locals disappearing), and you can stay within that, then you might > be able to recover. Would require some tightrope-walking in the > exception handler, but ought to be possible. > FYI, allocating memory specially for such cases is sometimes called a "memory parachute". I wonder whether you could have a subclass of MemoryError called LowMemoryError. If allocation fails and there's a parachute, it would free the parachute and raise LowMemoryError. That would gave you a chance to tidy up before quitting or even, perhaps, free enough stuff to make a new parachute and continue working. If allocation fails and there's no parachute, it would raise MemoryError as at present. With LowMemoryError as a subclass of MemoryError, existing code would still work the same. From guido at python.org Tue Jan 22 20:19:04 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 22 Jan 2013 11:19:04 -0800 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 8:16 AM, Geert Jansen wrote: > On Tue, Jan 22, 2013 at 4:31 PM, Ben Darnell wrote: > >> I don't understand what you mean by "events will always originate from IO" >> (I don't know anything about libdbus). > > What I meant is that if there is something to dispatch, then this is > due to an inbound IO (or a timeout for that matter). Due either event, > the loop will advance by one tick, and hit my call_every_iteration() > handler where I dispatch. > >> If the events are coming from IO >> that causes an event loop iteration, it must be from some tulip callback. >> Why can't that callback be responsible for scheduling any further >> dispatching that may be needed? > > Well your original question was why not call_repeatedly() instead of > call_every_iteration(). I tried to answer that for my use case. > > Indeed, call_soon() could be used to schedule a dispatch every time > when an IO is received. However, I preferred to have a fixed callback > that I do not need to allocate and register every time, for > efficiency. > >>> If Tornado doesn't have infrastructure for call_every_iteration() you >>> could emulate it with a function that re-reschedules itself using >>> call_soon() just before calling the callback. (See my first point >>> about when call_soon() callbacks are scheduled.) >> >> >> No, because call_soon (and call_later(0)) cause the event loop to use a >> timeout of zero on its next poll call, so a function that reschedules itself >> with call_soon will be a busy loop. There is no good way to emulate >> call_every_iteration from the other methods; you'll either busy loop with >> call_soon or use a fixed timeout. If you need it it's an easy thing to >> offer, but since neither tornado nor twisted have such a method I'm >> questioning the need. > > Yes, you're right. I was confusing things with libuv and libev. I may > have actually implemented call_soon() the wrong way there :) > > Maybe I am abusing call_every_iteration() when I use it for > dispatching. If you look at the libuv and libev documentation, then > they say that their call_every_iteration() equivalents (Prepare and > Check) are for integrating with external event loops. So maybe that is > the use case. However, I've not looked into this in any detail. > > If Tornado and Twisted cannot implement call_every_iteration(), then I > think that is a good reason to remove it. Ok, I'll kill call_every_iteration(). I'll wait for more discussion on run_once() and run()'s until-idle behavior. -- --Guido van Rossum (python.org/~guido) From p.f.moore at gmail.com Tue Jan 22 21:36:47 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Tue, 22 Jan 2013 20:36:47 +0000 Subject: [Python-ideas] Tulip / PEP 3156 - subprocess events In-Reply-To: References: <50F87DC8.1060000@canterbury.ac.nz> <50F8D695.3050002@canterbury.ac.nz> Message-ID: On 22 January 2013 16:33, Guido van Rossum wrote: > I am not actually very committed to a particular design for a subprocess > transport. I'll happily leave it up to others to come up with a design and > make it work on multiple platforms. OK. I've written a pipe transport (event_loop.connect_read_pipe and event_loop.connect_write_pipe) and modified the existing subprocess test to use it. I've also added a small read/write test. The code is in my bitbucket repository at https://bitbucket.org/pmoore/tulip. I'm not very happy with the call-back based style of the read/write test. I'm sure it would be much better written in an async style, but I don't know how to do so. If anyone who understands the async style better than I do can offer a translation, I'd be very grateful - I'd like to see if the resulting code looks sufficiently clear. Here's the relevant code. The biggest ugliness is the need for the two protocol classes, which basically do nothing but (1) collect data received and (2) ignore unwanted callbacks. class DummyProto(protocols.Protocol): def __init__(self): pass def connection_made(self): pass def data_received(self, data): pass def eof_received(self): pass def connection_lost(): pass class MyCollector(protocols.Protocol): def __init__(self): self.data = [] def connection_made(self): pass def data_received(self, data): self.data.append(data) def eof_received(self): pass def connection_lost(): pass def get_data(self): return b''.join(self.data) def testReadWrite(self): proc = Popen(['/bin/tr', 'a-z', 'A-Z'], stdin=PIPE, stdout=PIPE) rt, rp = yield from self.event_loop.connect_read_pipe(MyCollector, proc.stdout) wt, wp = yield from self.event_loop.connect_read_pipe(DummyProto, proc.stdin) def send_data(): wt.write("hello, world") wt.write_eof() self.event_loop.call_soon(send_data) self.event_loop.run() self.assertEqual(rp.get_data(), b'HELLO, WORLD') Paul From rosuav at gmail.com Tue Jan 22 22:32:20 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 23 Jan 2013 08:32:20 +1100 Subject: [Python-ideas] More details in MemoryError In-Reply-To: <20130122192710.6f94d16f@pitrou.net> References: <50FE891D.2080603@pearwood.info> <20130122192710.6f94d16f@pitrou.net> Message-ID: On Wed, Jan 23, 2013 at 5:27 AM, Antoine Pitrou wrote: > On Wed, 23 Jan 2013 00:04:15 +1100 > Chris Angelico wrote: >> On Tue, Jan 22, 2013 at 11:42 PM, Steven D'Aprano wrote: >> > Something like this could be used to decide whether or not to flush >> > unimportant in-memory caches, compact data structures, etc., or just >> > give up and exit. >> >> That's a nice idea, but unless the requested allocation was fairly >> large, there's a good chance you don't have room to allocate anything >> more. > > I wouldn't be surprised if most cases of MemoryErrors were on fairly > large allocation requests ;-) Depends on the workflow. Something that allocates an immediate block of memory, yes, but if you're progressively building a complex structure, individual allocations mightn't themselves be significant. ChrisA From jcd at sdf.lonestar.org Wed Jan 23 02:06:08 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Tue, 22 Jan 2013 20:06:08 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. Message-ID: <1358903168.4767.4.camel@webb> Idea folks, I'm working with some poorly-formed CSV files, and I noticed that DictReader always and only pulls headers off of the first row. But many of the files I see have blank lines before the row of headers, sometimes with commas to the appropriate field count, sometimes without. The current implementation's behavior in this case is likely never correct, and certainly always annoying. Given the following file: ---Start File 1--- ,, A,B,C 1,2,3 2,4,6 ---End File 1--- csv.DictReader yields the rows: {'': 'C'} {'': '3'} {'': '6'} And given a file starting with a zero-length line, like the following: ---Start File 2--- A,B,C 1,2,3 2,4,6 ---End File 2--- It yields the following: {None: ['A', 'B', 'C']} {None: ['1', '2', '3']} {None: ['2', '4', '6']} I think that in both cases, the proper response would be treat the A,B,C line as the header line. The change that makes this work is pretty simple. In the fieldnames getter property, the "if not self._fieldnames:" conditional becomes "while not self._fieldnames or not any(self._fieldnames):" As a subclass: import csv class DictReader(csv.DictReader): @property def fieldnames(self): while self._fieldnames is None or not any(self._fieldnames): try: self._fieldnames = next(self.reader) except StopIteration: break return self._fieldnames self.line_num = self.reader.line_num #Same as the original setter, just rewritten to associate with the new getter propery @fieldnames.setter def fieldnames(self, value): self._fieldnames = value There might be some issues with existing code that depends on the {None: ['1','2','3']} construction, but I can't imagine a time when programmers would want to see {'': '3'} with the 1 and 2 values getting lost. Thoughts? Do folks think this is worth adding to the csv library, or should I just keep using my subclass? Cheers, Cliff From wuwei23 at gmail.com Wed Jan 23 02:51:38 2013 From: wuwei23 at gmail.com (alex23) Date: Tue, 22 Jan 2013 17:51:38 -0800 (PST) Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358903168.4767.4.camel@webb> References: <1358903168.4767.4.camel@webb> Message-ID: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> On Jan 23, 11:06?am, "J. Cliff Dyer" wrote: > I'm working with some poorly-formed CSV files, and I noticed that > DictReader always and only pulls headers off of the first row. ?But many > of the files I see have blank lines before the row of headers, sometimes > with commas to the appropriate field count, sometimes without. ?The > current implementation's behavior in this case is likely never correct, > and certainly always annoying. I don't think we should start adding support for every malformed type of csv file that exists. It's easy enough to remove the unnecessary lines yourself before passing them to DictReader: from csv import DictReader with open('malformed.csv','rb') as csvfile: csvlines = list(l for l in csvfile if l.strip()) csvreader = DictReader(csvlines) Personally, if I was dealing with this as often as you are, I'd probably make a custom context manager instead. The problem lies in the files themselves, not in csv's response to them. From cf.natali at gmail.com Wed Jan 23 12:16:14 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 23 Jan 2013 12:16:14 +0100 Subject: [Python-ideas] reducing multiprocessing.Queue contention Message-ID: Hello, Currently, multiprocessing.Queue put() and get() methods hold locks for the entire duration of the writing/reading to the backing Connection (which can be a pipe, unix domain socket, or whatever it's called on Windows). For example, here's what the feeder thread does: """ else: wacquire() try: send(obj) # Delete references to object. See issue16284 del obj finally: wrelease() """ Connection.send() and Connection.recv() have to serialize the data using pickle before writing them to the underlying file descriptor. While the locking is necessary to guarantee atomic read/write (well, it's not necessary if you're writing to a pipe less than PIPE_BUF, and writes seem atomic on Windows), the locks don't have to be held while the data is serialized. Although I didn't make any measurement, my gut feeling is that this serializing can take a non negligible part of the overall sending/receiving time, for large data items. If that's the case, then simply holding the lock for the duration of the read()/write() syscall (and not during serialization) could reduce contention in case of large data sending/receiving. One way to do that would be to refactor the code a bit to provide maybe a (private) AtomicConnection, which would encapsulate the necessary locking: another advantage is that this would hide the platform-dependent code inside Connection (right now, Queue only uses a lock for ending on Unix platforms, since write is apparently atomic on Windows). Thoughts? From shibturn at gmail.com Wed Jan 23 13:09:42 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 23 Jan 2013 12:09:42 +0000 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: On 23/01/2013 11:16am, Charles-Fran?ois Natali wrote: > Connection.send() and Connection.recv() have to serialize the data > using pickle before writing them to the underlying file descriptor. > While the locking is necessary to guarantee atomic read/write (well, > it's not necessary if you're writing to a pipe less than PIPE_BUF, and > writes seem atomic on Windows), the locks don't have to be held while > the data is serialized. But you can only rely on the atomicity of writing less than PIPE_BUF bytes if you know that no other process is currently trying to send a message longer than PIPE_BUF. Otherwise the short message could be embedded in the long message (even if the process sending the long message is holding the lock). -- Richard From cf.natali at gmail.com Wed Jan 23 13:27:39 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 23 Jan 2013 13:27:39 +0100 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: > But you can only rely on the atomicity of writing less than PIPE_BUF bytes > if you know that no other process is currently trying to send a message > longer than PIPE_BUF. Otherwise the short message could be embedded in the > long message (even if the process sending the long message is holding the > lock). Maybe I wasn't clear. I'm not suggesting to not hold the lock when sending less than PIPE_BUF, since it wouldn't work in the case you describe above. I'm suggesting to serialize the data prior to acquiring the writer lock, to reduce contention (and unserialize after releasing the reading lock). (I only mentioned PIPE_BUF because I was sad to see that Windows supported atomic messages, and this comforted me a bit :-) From solipsis at pitrou.net Wed Jan 23 13:37:17 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jan 2013 13:37:17 +0100 Subject: [Python-ideas] reducing multiprocessing.Queue contention References: Message-ID: <20130123133717.4c3a7357@pitrou.net> Le Wed, 23 Jan 2013 12:16:14 +0100, Charles-Fran?ois Natali a ?crit : > > One way to do that would be to refactor the code a bit to provide > maybe a (private) AtomicConnection, which would encapsulate the > necessary locking: another advantage is that this would hide the > platform-dependent code inside Connection Or perhaps simply some _send_with_lock and _recv_with_lock methods? (it may also skip the lock for the Windows PipeConnection implementation) Regards Antoine. From shibturn at gmail.com Wed Jan 23 14:13:30 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Wed, 23 Jan 2013 13:13:30 +0000 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: On 23/01/2013 12:27pm, Charles-Fran?ois Natali wrote: > Maybe I wasn't clear. > I'm not suggesting to not hold the lock when sending less than > PIPE_BUF, since it wouldn't work in the case you describe above. > I'm suggesting to serialize the data prior to acquiring the writer > lock, to reduce contention (and unserialize after releasing the > reading lock). That is reasonable. In fact if we should probably serialize when put() is called to catch any pickling error early. -- Richard From eliben at gmail.com Wed Jan 23 16:00:14 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 23 Jan 2013 07:00:14 -0800 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: On Wed, Jan 23, 2013 at 3:16 AM, Charles-Fran?ois Natali < cf.natali at gmail.com> wrote: > Hello, > > Currently, multiprocessing.Queue put() and get() methods hold locks > for the entire duration of the writing/reading to the backing > Connection (which can be a pipe, unix domain socket, or whatever it's > called on Windows). > > For example, here's what the feeder thread does: > """ > else: > wacquire() > try: > send(obj) > # Delete references to object. See issue16284 > del obj > finally: > wrelease() > """ > > Connection.send() and Connection.recv() have to serialize the data > using pickle before writing them to the underlying file descriptor. > While the locking is necessary to guarantee atomic read/write (well, > it's not necessary if you're writing to a pipe less than PIPE_BUF, and > writes seem atomic on Windows), the locks don't have to be held while > the data is serialized. > > Although I didn't make any measurement, my gut feeling is that this > serializing can take a non negligible part of the overall > sending/receiving time, for large data items. If that's the case, then > simply holding the lock for the duration of the read()/write() syscall > (and not during serialization) could reduce contention in case of > large data sending/receiving. > > One way to do that would be to refactor the code a bit to provide > maybe a (private) AtomicConnection, which would encapsulate the > necessary locking: another advantage is that this would hide the > platform-dependent code inside Connection (right now, Queue only uses > a lock for ending on Unix platforms, since write is apparently atomic > on Windows). > > In general, this sounds good. There's indeed no reason to perform the serialization under a lock. It would be great to have some measurements to see just how much it takes, though. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcd at sdf.lonestar.org Wed Jan 23 17:51:05 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Wed, 23 Jan 2013 11:51:05 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> Message-ID: <1358959865.5194.8.camel@gdoba.domain.local> On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote: > I don't think we should start adding support for every malformed type > of csv file that exists. It's easy enough to remove the unnecessary > lines yourself before passing them to DictReader: > > from csv import DictReader > > with open('malformed.csv','rb') as csvfile: > csvlines = list(l for l in csvfile if l.strip()) > csvreader = DictReader(csvlines) > > Personally, if I was dealing with this as often as you are, I'd > probably make a custom context manager instead. The problem lies in > the files themselves, not in csv's response to them. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > With all due respect, while you make a good point that we don't want to start special casing every malformed type of CSV, there is absolutely something wrong with DictReader's response to files that have duplicate headers. It throws away data silently. If you (and others on this list) aren't in favor of trying to find the right header row (which I can understand: "In the face of ambiguity, refuse the temptation to guess."), maybe a better solution would be to raise a (suppressible) exception if the headers aren't uniquely named. ("Errors should never pass silently. Unless explicitly silenced.") Cheers, Cliff From amauryfa at gmail.com Wed Jan 23 18:08:32 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Wed, 23 Jan 2013 18:08:32 +0100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358959865.5194.8.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> Message-ID: Hi, 2013/1/23 J. Cliff Dyer > On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote: > > I don't think we should start adding support for every malformed type > > of csv file that exists. It's easy enough to remove the unnecessary > > lines yourself before passing them to DictReader: > > > > from csv import DictReader > > > > with open('malformed.csv','rb') as csvfile: > > csvlines = list(l for l in csvfile if l.strip()) > > csvreader = DictReader(csvlines) > > > > Personally, if I was dealing with this as often as you are, I'd > > probably make a custom context manager instead. The problem lies in > > the files themselves, not in csv's response to them. > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > With all due respect, while you make a good point that we don't want to > start special casing every malformed type of CSV, there is absolutely > something wrong with DictReader's response to files that have duplicate > headers. It throws away data silently. > That's how Python dictionaries work, by design: d = {'a': 1, 'a': 2} "silently" discards the first value. If you (and others on this list) aren't in favor of trying to find the > right header row (which I can understand: "In the face of ambiguity, > refuse the temptation to guess."), maybe a better solution would be to > raise a (suppressible) exception if the headers aren't uniquely named. > ("Errors should never pass silently. Unless explicitly silenced.") > What about a subclass then: class CarefulDictReader(csv.DictReader): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) fieldnames = self.fieldnames if len(fieldnames) != len(set(fieldnames)): raise ValueError("Duplicate field names", fieldnames) -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 23 18:15:51 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 23 Jan 2013 18:15:51 +0100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> Message-ID: <20130123181551.44a6e0cb@pitrou.net> Le Wed, 23 Jan 2013 18:08:32 +0100, "Amaury Forgeot d'Arc" a ?crit : > Hi, > > 2013/1/23 J. Cliff Dyer > > > On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote: > > > I don't think we should start adding support for every malformed > > > type of csv file that exists. It's easy enough to remove the > > > unnecessary lines yourself before passing them to DictReader: > > > > > > from csv import DictReader > > > > > > with open('malformed.csv','rb') as csvfile: > > > csvlines = list(l for l in csvfile if l.strip()) > > > csvreader = DictReader(csvlines) > > > > > > Personally, if I was dealing with this as often as you are, I'd > > > probably make a custom context manager instead. The problem lies > > > in the files themselves, not in csv's response to them. > > > _______________________________________________ > > > Python-ideas mailing list > > > Python-ideas at python.org > > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > > > With all due respect, while you make a good point that we don't > > want to start special casing every malformed type of CSV, there is > > absolutely something wrong with DictReader's response to files that > > have duplicate headers. It throws away data silently. > > > > That's how Python dictionaries work, by design: > d = {'a': 1, 'a': 2} > "silently" discards the first value. It's still rather surprising (and, in many cases, undesired). I would suggest adding a parameter to DictReader to raise an exception when there are duplicate column headers. Regards Antoine. From ubershmekel at gmail.com Wed Jan 23 18:26:07 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 23 Jan 2013 19:26:07 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <20130123181551.44a6e0cb@pitrou.net> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> <20130123181551.44a6e0cb@pitrou.net> Message-ID: On Wed, Jan 23, 2013 at 7:15 PM, Antoine Pitrou wrote: > It's still rather surprising (and, in many cases, undesired). I would > suggest adding a parameter to DictReader to raise an exception when > there are duplicate column headers. > > Regards > > Antoine. > > Completely agree, it's a big surprise and a quiet bug. This is one of those changes we should remember for python 4.0. Until 4.0, give an option to raise an exception upon duplicates. After 4.0 throw an exception on duplicate headers by default with an option to ignore them. Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From jcd at sdf.lonestar.org Wed Jan 23 18:37:01 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Wed, 23 Jan 2013 12:37:01 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> Message-ID: <1358962621.5194.18.camel@gdoba.domain.local> On Wed, 2013-01-23 at 18:08 +0100, Amaury Forgeot d'Arc wrote: > Hi, > > 2013/1/23 J. Cliff Dyer > On Tue, 2013-01-22 at 17:51 -0800, alex23 wrote: > > I don't think we should start adding support for every > malformed type > > of csv file that exists. It's easy enough to remove the > unnecessary > > lines yourself before passing them to DictReader: > > > > from csv import DictReader > > > > with open('malformed.csv','rb') as csvfile: > > csvlines = list(l for l in csvfile if l.strip()) > > csvreader = DictReader(csvlines) > > > > Personally, if I was dealing with this as often as you are, > I'd > > probably make a custom context manager instead. The problem > lies in > > the files themselves, not in csv's response to them. > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > http://mail.python.org/mailman/listinfo/python-ideas > > > > > With all due respect, while you make a good point that we > don't want to > start special casing every malformed type of CSV, there is > absolutely > something wrong with DictReader's response to files that have > duplicate > headers. It throws away data silently. > > > That's how Python dictionaries work, by design: > d = {'a': 1, 'a': 2} > "silently" discards the first value. > > > If you (and others on this list) aren't in favor of trying to > find the > right header row (which I can understand: "In the face of > ambiguity, > refuse the temptation to guess."), maybe a better solution > would be to > raise a (suppressible) exception if the headers aren't > uniquely named. > ("Errors should never pass silently. Unless explicitly > silenced.") > > > What about a subclass then: > > > class CarefulDictReader(csv.DictReader): > def __init__(self, *args, **kwargs): > super().__init__(*args, **kwargs) > fieldnames = self.fieldnames > if len(fieldnames) != len(set(fieldnames)): > raise ValueError("Duplicate field names", fieldnames) > > > > > -- > Amaury Forgeot d'Arc Whether it's a subclass or a change to the existing class is worth having a discussion about. Obviously, the change could be made in a subclass. Currently, that's what I do. The question at issue is whether it should be made in the original. My position is that something should change in the standard library, whether that is modifying the code in some way to handle edge cases more robustly, or updating the documentation to advise programmers on how to handle files that aren't perfectly formed. This might include documenting that self.reader is an available attribute (where the programmer could iterate to find the header row they're looking for, if needed, and then assign it to self.fieldnames). I do like the idea of assigning the fieldnames variable and then raising the ValueError, so if the user silences the exception, they still have access to the field names found. However, I think the behavior should be overridden on the fieldnames property, so as not to change the semantics of the DictReader. From bruce at leapyear.org Wed Jan 23 19:20:21 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 23 Jan 2013 10:20:21 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <20130123181551.44a6e0cb@pitrou.net> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> <20130123181551.44a6e0cb@pitrou.net> Message-ID: On Wed, Jan 23, 2013 at 9:15 AM, Antoine Pitrou wrote: > > That's how Python dictionaries work, by design: > > d = {'a': 1, 'a': 2} > > "silently" discards the first value. > > It's still rather surprising (and, in many cases, undesired). I would > suggest adding a parameter to DictReader to raise an exception when > there are duplicate column headers. > If there are duplicate column headers, they are probably there for a reason. I can't imagine a case where the desired result is to discard one of the columns. If DictReader is going to recognize this case, perhaps: A,B,A 1,2,3 4,5,6 would be better as {'A': [1,3], 'B': 2} {'A': [4,6], 'B': 5} I realize that sometimes getting a single value and sometimes an array is potentially messy, but bear in mind that in most cases the reader of the csv file has some idea of what they are reading. There could be an optional parameter multivalue="A" that lists the columns that are allowed to have multiple values and if not present it raises an exception. To allow any column to be multivalued, you could use multivalue=True. As to skipping over a leading blank line, this happened to me just yesterday. I was saving some data in csv files and all the files ended up with an extra blank line at the top. I'd be +1 for skipping over a blank line at the top, +0 for skipping over more than one blank line. --- Bruce Only 5 hours left! http://www.kickstarter.com/projects/royleban/unique-puzzles-for-a-yankee-echo-alfa-romeo -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.hackett at metoffice.gov.uk Wed Jan 23 19:24:07 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Wed, 23 Jan 2013 18:24:07 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358962621.5194.18.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <1358962621.5194.18.camel@gdoba.domain.local> Message-ID: <201301231824.07965.mark.hackett@metoffice.gov.uk> On Wednesday 23 Jan 2013, J. Cliff Dyer wrote: > > Whether it's a subclass or a change to the existing class is worth > having a discussion about. Obviously, the change could be made in a > subclass. Currently, that's what I do. The question at issue is > whether it should be made in the original. My position is that > something should change in the standard library, whether that is > modifying the code in some way to handle edge cases more robustly, or > updating the documentation to advise programmers on how to handle files > that aren't perfectly formed. > It looks entirely like a format checking on something that doesn't necessarily have a format. It therefore belongs in something else. I.e. you define your "csv schema", pass it on to something that creates a "lint check" on the entire bytestream and/or checks each input as read, and passed in like any decoration on a base function in python. CSV format checking isn't, IMO any different than the socket service decorators that embed policy on the base function. From mark.hackett at metoffice.gov.uk Wed Jan 23 19:32:20 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Wed, 23 Jan 2013 18:32:20 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net> Message-ID: <201301231832.20087.mark.hackett@metoffice.gov.uk> On Wednesday 23 Jan 2013, Bruce Leban wrote: > If there are duplicate column headers, they are probably there for a > reason. I can't imagine a case where the desired result is to discard one > of the columns. If DictReader is going to recognize this case, perhaps: > I can't see why there would be duplicate column headers for valid reason. Someone may have written their CSV export incorrectly, but that's not actually valid. It would therefore be arguable for the program to give at least a WARNING that it's throwing data away. However, since python is mechanising this as a dictionary and since in python setting A to 1 then setting A to 3 would throw away the earlier value for A and the import function working AS EXPECTED in Python. Hence a decorator to insist on some formatting issues (e.g. turning A into a list of values 1,3 rather than throwing away the 1 or the 3). To do otherwise would have someone in the official library have to write their own format conversion and shove it in the middle and telling people what they should be doing. From malaclypse2 at gmail.com Wed Jan 23 20:59:42 2013 From: malaclypse2 at gmail.com (Jerry Hill) Date: Wed, 23 Jan 2013 14:59:42 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301231832.20087.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net> <201301231832.20087.mark.hackett@metoffice.gov.uk> Message-ID: On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett wrote: > I can't see why there would be duplicate column headers for valid reason. > > Someone may have written their CSV export incorrectly, but that's not actually > valid. Sure it is. Since there is no formal spec for .csv files, having a multiple columns with the same text in the header is a perfectly valid .csv file. For what it's worth, the informal spec for csv files seems to be "whatever Excel does" and Excel (and every other spreadsheet-oriented program) is happy to let you have duplicated headers too. > It would therefore be arguable for the program to give at least a WARNING that > it's throwing data away. I think the library should give the programmer some sort of indication that they are losing data. Personally, I'd prefer an exception which can either be caught or not, depending on whether the program is designed to handle the situation or not. > However, since python is mechanising this as a dictionary and since in python > setting A to 1 then setting A to 3 would throw away the earlier value for A > and the import function working AS EXPECTED in Python. I'm not sure this behavior merits the all-caps "AS EXPECTED" label. It's not terribly surprising once you sit down and think about it, but it's certainly at least a little unexpected to me that data is being thrown away with no notice. It's unusual for errors to pass silently in python. -- Jerry From cf.natali at gmail.com Wed Jan 23 21:03:46 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Wed, 23 Jan 2013 21:03:46 +0100 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: > In general, this sounds good. There's indeed no reason to perform the > serialization under a lock. > > It would be great to have some measurements to see just how much it takes, > though. I was curious, so I wrote a quick and dirty patch (it's doesn't support timed get()/put(), so I won't post it here). I used the attached script as benchmark: basically, it just spawns a bunch of processes that put()/get() to a queue some data repeatedly (10000 times a list of 1024 ints), and returns when everything has been sent and received. The following tests have been made on an 8-cores box, from 1 reader/1 writer up to 4 readers/4 writers (it would be interesting to see with only 1 writer and multiple readers, but readers would keep waiting for input so it requires another benchmark): Without patch: """ $ ./python /tmp/multi_queue.py took 0.7993290424346924 seconds with 1 workers took 1.8892168998718262 seconds with 2 workers took 3.075777053833008 seconds with 3 workers took 4.050479888916016 seconds with 4 workers """ With patch: """ $ ./python /tmp/multi_queue.py took 0.7730131149291992 seconds with 1 workers took 0.7471320629119873 seconds with 2 workers took 0.752316951751709 seconds with 3 workers took 0.8303961753845215 seconds with 4 workers """ -------------- next part -------------- A non-text attachment was scrubbed... Name: multi_queue.py Type: application/octet-stream Size: 1138 bytes Desc: not available URL: From jcd at sdf.lonestar.org Wed Jan 23 22:13:54 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Wed, 23 Jan 2013 16:13:54 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net> <201301231832.20087.mark.hackett@metoffice.gov.uk> Message-ID: <1358975634.4866.0.camel@gdoba.domain.local> On Wed, 2013-01-23 at 14:59 -0500, Jerry Hill wrote: > > However, since python is mechanising this as a dictionary and since > in python > > setting A to 1 then setting A to 3 would throw away the earlier > value for A > > and the import function working AS EXPECTED in Python. > > I'm not sure this behavior merits the all-caps "AS EXPECTED" label. > It's not terribly surprising once you sit down and think about it, but > it's certainly at least a little unexpected to me that data is being > thrown away with no notice. It's unusual for errors to pass silently > in python. Moreover, I think while it might be expected for a dict to do this, it does not follow that a DictReader should be expected to silently throw away the user's data. Just because it uses the dict format for storage does not mean that it's okay to throw away user's data silently. Dicts need to be blazingly fast for a host of reasons. DictReaders do not. They're usually dealing with file input, so any slowness in the DictReader itself is going to be dwarfed by the file access. As such we can afford to be more programmer-friendly here. Cheers, Cliff From ubershmekel at gmail.com Wed Jan 23 22:54:05 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Wed, 23 Jan 2013 23:54:05 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358975634.4866.0.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net> <201301231832.20087.mark.hackett@metoffice.gov.uk> <1358975634.4866.0.camel@gdoba.domain.local> Message-ID: On Wed, Jan 23, 2013 at 11:13 PM, J. Cliff Dyer wrote: > > Moreover, I think while it might be expected for a dict to do this, it > does not follow that a DictReader should be expected to silently throw > away the user's data. Just because it uses the dict format for storage > does not mean that it's okay to throw away user's data silently. Dicts > need to be blazingly fast for a host of reasons. DictReaders do not. > They're usually dealing with file input, so any slowness in the > DictReader itself is going to be dwarfed by the file access. As such we > can afford to be more programmer-friendly here. > If it were a NamedTupleReader, this wouldn't be an issue. >>> from collections import namedtuple >>> namedtuple('x', 'a b a c') Traceback (most recent call last): File "", line 1, in namedtuple('x', 'a b a c') File "C:\Python27\lib\collections.py", line 288, in namedtuple raise ValueError('Encountered duplicate field name: %r' % name) ValueError: Encountered duplicate field name: 'a' -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Jan 24 01:19:34 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 24 Jan 2013 11:19:34 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> <20130123181551.44a6e0cb@pitrou.net> Message-ID: <51007E16.30805@pearwood.info> On 24/01/13 05:20, Bruce Leban wrote: > I realize that sometimes getting a single value and sometimes an array is > potentially messy, but bear in mind that in most cases the reader of the > csv file has some idea of what they are reading. There could be an optional > parameter multivalue="A" that lists the columns that are allowed to have > multiple values and if not present it raises an exception. To allow any > column to be multivalued, you could use multivalue=True. -1 to adding optional parameters that change the behaviour of a class. To deal with cases where you expect multiple columns with the same name, add a new reader class that treats all columns to be multi-valued. The standard DictReader class should continue to behave like a dict. Don't over-engineer this MultiDictReader -- it should stay simple and treat all column names as potentially multivalued. If the caller has some requirements for which names can have how many columns -- "there should be exactly three columns named X, and only one Y, and at least four Z" -- they can check the result and decide for themselves if there is a problem. > As to skipping over a leading blank line, this happened to me just > yesterday. I was saving some data in csv files and all the files ended up > with an extra blank line at the top. I'd be +1 for skipping over a blank > line at the top, +0 for skipping over more than one blank line. I don't see any reason not to skip blank lines at the top of the file. -- Steven From steve at pearwood.info Thu Jan 24 01:26:52 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 24 Jan 2013 11:26:52 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <20130123181551.44a6e0cb@pitrou.net> <201301231832.20087.mark.hackett@metoffice.gov.uk> Message-ID: <51007FCC.5090400@pearwood.info> On 24/01/13 06:59, Jerry Hill wrote: > On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett > wrote: >> I can't see why there would be duplicate column headers for valid reason. >> >> Someone may have written their CSV export incorrectly, but that's not actually >> valid. > > Sure it is. Since there is no formal spec for .csv files, having a > multiple columns with the same text in the header is a perfectly valid > .csv file. For what it's worth, the informal spec for csv files seems > to be "whatever Excel does" and Excel (and every other > spreadsheet-oriented program) is happy to let you have duplicated > headers too. +1 I think keeping DictReader as it is now is fine for backward compatibility. Or better, simply have DictReader raise an exception rather than silently eat data. I don't expect that anyone is relying on that behaviour, nor is it behaviour promised by the class. But we should add a MultiDictReader that supports the multiple columns with the same name. >> It would therefore be arguable for the program to give at least a WARNING that >> it's throwing data away. > > I think the library should give the programmer some sort of indication > that they are losing data. Personally, I'd prefer an exception which > can either be caught or not, depending on whether the program is > designed to handle the situation or not. > >> However, since python is mechanising this as a dictionary and since in python >> setting A to 1 then setting A to 3 would throw away the earlier value for A >> and the import function working AS EXPECTED in Python. > > I'm not sure this behavior merits the all-caps "AS EXPECTED" label. > It's not terribly surprising once you sit down and think about it, but > it's certainly at least a little unexpected to me that data is being > thrown away with no notice. It's unusual for errors to pass silently > in python. Yes, we should not forget that a CSV file is not a dict. Just because DictReader is implemented with a dict as the storage, doesn't mean that it should behave exactly like a dict in all things. Multiple columns with the same name are legal in CSV, so there should be a reader for that situation. -- Steven From dustin at v.igoro.us Thu Jan 24 02:57:17 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Wed, 23 Jan 2013 20:57:17 -0500 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Tue, Jan 22, 2013 at 2:19 PM, Guido van Rossum wrote: > Ok, I'll kill call_every_iteration(). I'll wait for more discussion on > run_once() and run()'s until-idle behavior. One of the things that's been difficult for some time in Twisted is writing clients in such a way that they reliably finish. It's easy for a simple client, but when the client involves several levels of libraries doing mysterious, asynchronous things, it can be hard to know when everything's really done. Add error conditions in, and you end up spending a lot of time thinking about something that, in a synchronous program, is pretty simple. One option, recently introduced to Twisted, is "react" - http://twistedmatrix.com/documents/12.3.0/api/twisted.internet.task.html#react The idea is to encapsulate the lifetime of a client in a single asynchronous operation; the synchronous parallel is libc calling `exit` for you when `main` returns. If all of your library code cooperates and reliably indicates when it's done with any background operations, then this is a good choice. In cases where your libraries are less than perfect (perhaps they sync to the cloud "in the background"), the run-until-idle behavior is useful. The client calls a function that triggers a cascade of events. When that cascade has exhausted itself, the process exits. Synchronous, threaded programs do this with non-daemon threads. I think that this option should be supported, if only for the parallelism with synchronous code. As for run-until-idle - I've used this sort of behavior occasionally in tests, where I want to carefully control the sequence of operations. For example, I may want to reliably test handling of race conditions: op = start_operation() while not in_critical_section(): run_once() generate_conflict() while in_critical_section(): run_once() assert something() Such a case would rely heavily on the details of the event loop. Depending on how closely I want to tie my tests to that implementation, that may or may not be OK. If a particular event loop implementation doesn't even *have* this model (as, it appears, Tornado does not), then I think it would be fine to simply not implement this operation. So perhaps run_once() should be described as optional in the PEP? Dustin From greg.ewing at canterbury.ac.nz Thu Jan 24 03:14:31 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 24 Jan 2013 15:14:31 +1300 Subject: [Python-ideas] PEP 3156 - Coroutines are more better In-Reply-To: References: Message-ID: <51009907.8030404@canterbury.ac.nz> On 24/01/13 14:57, Dustin J. Mitchell wrote: > One of the things that's been difficult for some time in Twisted is > writing clients in such a way that they reliably finish. I think I'm going to wait and see what the coroutine-level features of tulip turn out to be like before saying much more. It seems to me that many of the problems we're arguing about here simply don't exist in coroutine-land. For example, if you can write something like yield from create_http(yield from create_tcp(host, port)) and creation of the transport fails and raises an exception, then create_http never gets called, so you won't waste any effort creating an unused protocol object. Likewise, if the main loop of your protocol consists of a Task that reads asynchronously from the transport, then (as long as you haven't done anything blatantly stupid) you know it will eventually return when the connection gets closed. If I were designing all this, I think I would have made coroutines the default way of dealing with everything above the event loop layer, and provide callback wrappers for those that like to do things that way. Building an entire callback-based protocol stack seems like going about it the hard way. -- Greg From guido at python.org Thu Jan 24 03:29:38 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 23 Jan 2013 18:29:38 -0800 Subject: [Python-ideas] PEP 3156 EventLoop: hide details of iterations and idleness? In-Reply-To: References: Message-ID: On Wed, Jan 23, 2013 at 5:57 PM, Dustin J. Mitchell wrote: > On Tue, Jan 22, 2013 at 2:19 PM, Guido van Rossum wrote: >> Ok, I'll kill call_every_iteration(). I'll wait for more discussion on >> run_once() and run()'s until-idle behavior. > > One of the things that's been difficult for some time in Twisted is > writing clients in such a way that they reliably finish. It's easy > for a simple client, but when the client involves several levels of > libraries doing mysterious, asynchronous things, it can be hard to > know when everything's really done. Add error conditions in, and you > end up spending a lot of time thinking about something that, in a > synchronous program, is pretty simple. > > One option, recently introduced to Twisted, is "react" - > http://twistedmatrix.com/documents/12.3.0/api/twisted.internet.task.html#react > The idea is to encapsulate the lifetime of a client in a single > asynchronous operation; the synchronous parallel is libc calling > `exit` for you when `main` returns. If all of your library code > cooperates and reliably indicates when it's done with any background > operations, then this is a good choice. > > In cases where your libraries are less than perfect (perhaps they sync > to the cloud "in the background"), the run-until-idle behavior is > useful. The client calls a function that triggers a cascade of > events. When that cascade has exhausted itself, the process exits. > Synchronous, threaded programs do this with non-daemon threads. > > I think that this option should be supported, if only for the > parallelism with synchronous code. > > > As for run-until-idle - I've used this sort of behavior occasionally > in tests, where I want to carefully control the sequence of > operations. For example, I may want to reliably test handling of race > conditions: > > op = start_operation() > while not in_critical_section(): > run_once() > generate_conflict() > while in_critical_section(): > run_once() > assert something() > > Such a case would rely heavily on the details of the event loop. > Depending on how closely I want to tie my tests to that > implementation, that may or may not be OK. If a particular event loop > implementation doesn't even *have* this model (as, it appears, Tornado > does not), then I think it would be fine to simply not implement this > operation. So perhaps run_once() should be described as optional in > the PEP? Despite some earlier moves in that direction I am not actually a fan of having optional parts in a spec. That way it's too easy for an app to claim compliance without actually running anywhere except on its "home" framework. I think that run_until_idle() can be safely replaced by run_until_complete(some_future). For run_once(), I expect that I will be able to concoct alternatives just fine as well. And, to Greg (who somehow replied in a separate thread), I amcertainly not planning to write the entire stack with only callbacks! Much of the code will have Futures on the outside and coroutines on the inside. -- --Guido van Rossum (python.org/~guido) From fafhrd91 at gmail.com Thu Jan 24 04:50:45 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Wed, 23 Jan 2013 19:50:45 -0800 Subject: [Python-ideas] PEP 3156 - gunicorn worker Message-ID: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com> Hello, To get feeling of tulip I wrote gunicorn worker and websocket server, it is possible to run wsgi app on top of it. maybe someone will be interested. gunicorn worker - https://github.com/fafhrd91/gtulip websocket server - https://github.com/fafhrd91/pyramid_sockjs2 From amauryfa at gmail.com Thu Jan 24 09:37:55 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Thu, 24 Jan 2013 09:37:55 +0100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51007E16.30805@pearwood.info> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <1358959865.5194.8.camel@gdoba.domain.local> <20130123181551.44a6e0cb@pitrou.net> <51007E16.30805@pearwood.info> Message-ID: 2013/1/24 Steven D'Aprano > -1 to adding optional parameters that change the behaviour of a class. Unfortunately there is a precedent with csv.DictWriter: extrasaction='raise' or 'ignore'. And the feature is close to the one proposed here: how to deal with "invalid" data. -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.hackett at metoffice.gov.uk Thu Jan 24 11:33:01 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 10:33:01 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51007E16.30805@pearwood.info> Message-ID: <201301241033.01555.mark.hackett@metoffice.gov.uk> On Thursday 24 Jan 2013, Amaury Forgeot d'Arc wrote: > 2013/1/24 Steven D'Aprano > > > -1 to adding optional parameters that change the behaviour of a class. > > Unfortunately there is a precedent with csv.DictWriter: > extrasaction='raise' or 'ignore'. > And the feature is close to the one proposed here: how to deal with > "invalid" data. > Just because you did wrong before doesn't mean you need to do it wrong again! From mark.hackett at metoffice.gov.uk Thu Jan 24 11:37:57 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 10:37:57 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <201301231832.20087.mark.hackett@metoffice.gov.uk> Message-ID: <201301241037.57101.mark.hackett@metoffice.gov.uk> On Wednesday 23 Jan 2013, Jerry Hill wrote: > On Wed, Jan 23, 2013 at 1:32 PM, Mark Hackett > > wrote: > > I can't see why there would be duplicate column headers for valid reason. > > > > Someone may have written their CSV export incorrectly, but that's not > > actually valid. > > Sure it is. Since there is no formal spec for .csv files, having a > multiple columns with the same text in the header is a perfectly valid > .csv file. For what it's worth, the informal spec for csv files seems Then you don't want it put in a dictionary, since a dictionary doesn't allow duplicate fields. > to be "whatever Excel does" and Excel (and every other > spreadsheet-oriented program) is happy to let you have duplicated > headers too. You don't, in Excel, use the name of the column in your calculation, you use the unique column ID (A, B, C..AA, AB, ...). > > > It would therefore be arguable for the program to give at least a WARNING > > that it's throwing data away. > > I think the library should give the programmer some sort of indication > that they are losing data. Personally, I'd prefer an exception which > can either be caught or not, depending on whether the program is > designed to handle the situation or not. > > > However, since python is mechanising this as a dictionary and since in > > python setting A to 1 then setting A to 3 would throw away the earlier > > value for A and the import function working AS EXPECTED in Python. > > I'm not sure this behavior merits the all-caps "AS EXPECTED" label. > It's not terribly surprising once you sit down and think about it, but > it's certainly at least a little unexpected to me that data is being > thrown away with no notice. It's unusual for errors to pass silently > in python. > Python doesn't warn about duplicate addition to keys, so as expected, it isn't warning about them now. Programming languages are hard enough to understand (why does everyone use a different way of stopping a loop???), so it's not a good idea to have little codas to the way things are done "oh, unless you're putting it into a dictionary via this call...". I can understand the library call doing so, mind, but I can also see the writer of the library going "You're putting it into a dictionary. Well, you know what happens when you put duplicate entries in them, right, else you wouldn't be using this routine that puts csv entries into a dictionary". From mark.hackett at metoffice.gov.uk Thu Jan 24 11:41:41 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 10:41:41 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358975634.4866.0.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <1358975634.4866.0.camel@gdoba.domain.local> Message-ID: <201301241041.41301.mark.hackett@metoffice.gov.uk> On Wednesday 23 Jan 2013, J. Cliff Dyer wrote: > > Moreover, I think while it might be expected for a dict to do this, it > does not follow that a DictReader should be expected to silently throw > away the user's data. > Cheers, > Cliff > > Cliff, the name of the routine is "DictReader". It is a very big hint. Like I said, the situation here is putting formatting expectations on the file being read in. It's pretty identical with sockets or threading libraries in python. If you want a specific action done that isn't "normal" for just "make one of them", you put policy on it as a decoration. But if you wanted some specific action and don't use the decorator to do so, you don't get an error, you get what you get without the decorator. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From mark.hackett at metoffice.gov.uk Thu Jan 24 11:47:17 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 10:47:17 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51007FCC.5090400@pearwood.info> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> Message-ID: <201301241047.17391.mark.hackett@metoffice.gov.uk> On Thursday 24 Jan 2013, Steven D'Aprano wrote: > > I'm not sure this behavior merits the all-caps "AS EXPECTED" label. > > It's not terribly surprising once you sit down and think about it, but > > it's certainly at least a little unexpected to me that data is being > > thrown away with no notice. It's unusual for errors to pass silently > > in python. > > Yes, we should not forget that a CSV file is not a dict. Just because > DictReader is implemented with a dict as the storage, doesn't mean that it > should behave exactly like a dict in all things. Multiple columns with the > same name are legal in CSV, so there should be a reader for that > situation. > But just because it's reading a csv file, we shouldn't change how a dictionary works if you add the same key again. Duplicate headings in a csv file are as legal as using the same name for something else in a programming language. e.g. endvalue=a+b+c/5 ...code using that result... endvalue = os.printerr(file_descriptor) ...print out an error string... this is "legal" but really REALLY smelly. Similarly a multivalued csv file. Excel uses the column ID not the name on the first row, to identify the columns in its macro language. Because otherwise which "endvalue" column did you mean? From shane at umbrellacode.com Thu Jan 24 12:55:05 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 24 Jan 2013 03:55:05 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301241047.17391.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> Message-ID: Not sure if I'm reading the discussion correctly, but it sounds like there's discussion about whether swallowing CSV values when confronted with multiple columns by the same name, which seems very incorrect if so. CSV doesn't even mandate column headers exist at all, as far as I know. If anything I would think mapping column positions to header values would make sense, such that header.items() -> [(0, header1), (1, header2), (2, header3), etc.], and header1 and header2 could be equal. To work with rows as dictionaries they can follow the FieldStorage model and have lists of values?either when there's a collision, or always?so all column values are contained. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 24, 2013, at 2:47 AM, Mark Hackett wrote: > On Thursday 24 Jan 2013, Steven D'Aprano wrote: > >>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label. >>> It's not terribly surprising once you sit down and think about it, but >>> it's certainly at least a little unexpected to me that data is being >>> thrown away with no notice. It's unusual for errors to pass silently >>> in python. >> >> Yes, we should not forget that a CSV file is not a dict. Just because >> DictReader is implemented with a dict as the storage, doesn't mean that it >> should behave exactly like a dict in all things. Multiple columns with the >> same name are legal in CSV, so there should be a reader for that >> situation. >> > > But just because it's reading a csv file, we shouldn't change how a dictionary > works if you add the same key again. > > Duplicate headings in a csv file are as legal as using the same name for > something else in a programming language. > > e.g. > > endvalue=a+b+c/5 > ...code using that result... > endvalue = os.printerr(file_descriptor) > ...print out an error string... > > this is "legal" but really REALLY smelly. > > Similarly a multivalued csv file. > > Excel uses the column ID not the name on the first row, to identify the columns > in its macro language. Because otherwise which "endvalue" column did you mean? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 24 13:33:07 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 24 Jan 2013 22:33:07 +1000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> Message-ID: On Thu, Jan 24, 2013 at 9:55 PM, Shane Green wrote: > Not sure if I'm reading the discussion correctly, but it sounds like there's > discussion about whether swallowing CSV values when confronted with multiple > columns by the same name, which seems very incorrect if so. CSV doesn't > even mandate column headers exist at all, as far as I know. If anything I > would think mapping column positions to header values would make sense, such > that header.items() -> [(0, header1), (1, header2), (2, header3), etc.], and > header1 and header2 could be equal. To work with rows as dictionaries they > can follow the FieldStorage model and have lists of values?either when > there's a collision, or always?so all column values are contained. That's not quite the discussion. The discussion is specifically about *DictReader*, and whether it should: 1. Do any data conditioning by ignoring empty lines and lines of just field delimiters before the header row (consensus seems to be "no") 2. Give an error when encountering a duplicate field name (which will lead to data loss when reading from the file) (consensus seems to be "yes") The problem with the latter suggestion is that it's a backwards incompatible change - code where "use the last column with that name" is the correct behaviour currently works, but would be broken if that situation was declared an error. Rather than messing with DictReader, it seems more fruitful to further investigate the idea of a namedtuple based reader (http://bugs.python.org/issue1818). The "multiple columns with the same name" use case seems specialised enough that the standard readers can continue to ignore it (although, as noted earlier in this thread, a namedtuple based reader will correctly reject duplicate column names) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From solipsis at pitrou.net Thu Jan 24 13:38:58 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Jan 2013 13:38:58 +0100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> Message-ID: <20130124133858.32622f6e@pitrou.net> Le Thu, 24 Jan 2013 22:33:07 +1000, Nick Coghlan a ?crit : > On Thu, Jan 24, 2013 at 9:55 PM, Shane Green > wrote: > > Not sure if I'm reading the discussion correctly, but it sounds > > like there's discussion about whether swallowing CSV values when > > confronted with multiple columns by the same name, which seems very > > incorrect if so. CSV doesn't even mandate column headers exist at > > all, as far as I know. If anything I would think mapping column > > positions to header values would make sense, such that > > header.items() -> [(0, header1), (1, header2), (2, header3), etc.], > > and header1 and header2 could be equal. To work with rows as > > dictionaries they can follow the FieldStorage model and have lists > > of values?either when there's a collision, or always?so all column > > values are contained. > > That's not quite the discussion. The discussion is specifically about > *DictReader*, and whether it should: > > 1. Do any data conditioning by ignoring empty lines and lines of just > field delimiters before the header row (consensus seems to be "no") > 2. Give an error when encountering a duplicate field name (which will > lead to data loss when reading from the file) (consensus seems to be > "yes") > > The problem with the latter suggestion is that it's a backwards > incompatible change - code where "use the last column with that name" > is the correct behaviour currently works, but would be broken if that > situation was declared an error. It's not really a problem if the new behaviour is conditioned by a constructor argument. Regards Antoine. From eliben at gmail.com Thu Jan 24 14:25:08 2013 From: eliben at gmail.com (Eli Bendersky) Date: Thu, 24 Jan 2013 05:25:08 -0800 Subject: [Python-ideas] reducing multiprocessing.Queue contention In-Reply-To: References: Message-ID: On Wed, Jan 23, 2013 at 12:03 PM, Charles-Fran?ois Natali < cf.natali at gmail.com> wrote: > > In general, this sounds good. There's indeed no reason to perform the > > serialization under a lock. > > > > It would be great to have some measurements to see just how much it > takes, > > though. > > I was curious, so I wrote a quick and dirty patch (it's doesn't > support timed get()/put(), so I won't post it here). > > I used the attached script as benchmark: basically, it just spawns a > bunch of processes that put()/get() to a queue some data repeatedly > (10000 times a list of 1024 ints), and returns when everything has > been sent and received. > > The following tests have been made on an 8-cores box, from 1 reader/1 > writer up to 4 readers/4 writers (it would be interesting to see with > only 1 writer and multiple readers, but readers would keep waiting for > input so it requires another benchmark): > > Without patch: > """ > $ ./python /tmp/multi_queue.py > took 0.7993290424346924 seconds with 1 workers > took 1.8892168998718262 seconds with 2 workers > took 3.075777053833008 seconds with 3 workers > took 4.050479888916016 seconds with 4 workers > """ > > With patch: > """ > $ ./python /tmp/multi_queue.py > took 0.7730131149291992 seconds with 1 workers > took 0.7471320629119873 seconds with 2 workers > took 0.752316951751709 seconds with 3 workers > took 0.8303961753845215 seconds with 4 workers > """ > Looks great, what are you waiting for :-) Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From benoitc at gunicorn.org Thu Jan 24 14:50:21 2013 From: benoitc at gunicorn.org (Benoit Chesneau) Date: Thu, 24 Jan 2013 14:50:21 +0100 Subject: [Python-ideas] PEP 3156 - gunicorn worker In-Reply-To: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com> References: <36B67E59-ED7C-46E8-84DD-08E13C8CB5E0@gmail.com> Message-ID: On Jan 24, 2013, at 4:50 AM, Nikolay Kim wrote: > Hello, > > To get feeling of tulip I wrote gunicorn worker and websocket server, it is possible to run > wsgi app on top of it. maybe someone will be interested. > > gunicorn worker - https://github.com/fafhrd91/gtulip > websocket server - https://github.com/fafhrd91/pyramid_sockjs2 > > Just tested also against a pure wsgi app: $ gunicorn -w 3 -k gtulip.worker.TulipWorker test:app 2013-01-24 14:45:24 [55771] [INFO] Starting gunicorn 0.17.2 2013-01-24 14:45:24 [55771] [INFO] Listening at: http://127.0.0.1:8000 (55771) 2013-01-24 14:45:24 [55771] [INFO] Using worker: gtulip.worker.TulipWorker 2013-01-24 14:45:24 [55774] [INFO] Booting worker with pid: 55774 2013-01-24 14:45:24 [55775] [INFO] Booting worker with pid: 55775 2013-01-24 14:45:24 [55776] [INFO] Booting worker with pid: 55776 and it works great. Will do more test asap :) - beno?t From jcd at sdf.lonestar.org Thu Jan 24 16:11:34 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 10:11:34 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <20130124133858.32622f6e@pitrou.net> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> Message-ID: <1359040294.4802.29.camel@gdoba.domain.local> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: > > 1. Do any data conditioning by ignoring empty lines and lines of > > just field delimiters before the header row (consensus seems to be > > "no") Well, I wouldn't necessarily say we have a consensus on this one. This idea received a +1 from Bruce Leban and an "I don't see any reason not to" from Steven D'Aprano. Objections are: 1. It's a backwards-incompatible change. (This could be mitigated in a couple ways, as with the duplicate header problem, below). I don't think anyone has argued that programmers would ever actually want to use the blank line as the headers, only that they may be doing it now as a workaround, and breaking the workarounds is undesirable. 2. You should pre-process the CSV instead of adapting the reader to malformations. (In which case, I think the DictReader.reader attribute should be better documented, so programmers have some guidance how to do the pre-processing, as the current DictReader can cause data loss which would make it difficult to recover the real headers without using the underlying reader). > > 2. Give an error when encountering a duplicate field name (which > > will lead to data loss when reading from the file) (consensus seems > > to be "yes") Mostly, but with a strong objection from Mark Hackett, and hesitation about altering current behavior from Amaury Forgeot d'Arc. Proposals to solve this problem: 1. Raise an exception (After setting the fieldnames, I think, so if you wanted to catch and continue or catch and edit the conflicting fieldnames, you could do so). 2. Combine multiple fields with the same header into a list under the same key. 2a. Make lists when there are multiple fields, but otherwise, key to strings as is currently done 2b. For consistency, make all values lists, regardless of the number of columns. Proposals for implementation: 1. Create a new Reader class. Suggestions include "CarefulDictReader" (for the version that raises an exception) and "MultiDictReader" (for the versions that make lists of values). 2. Add an option to DictReader. The idea to add an option for a MultiDictReader-like behavior was objected to, but there were multiple suggestions to add an option for raising an exception, in one case with the idea that in the future ("Python 4") the option would be standard behavior. Note: If we were to implement a CarefulDictReader, it could, without backward incompatibility, implement both skipping of blank header lines, and exception raising on duplicate headers. Cheers, Cliff From jcd at sdf.lonestar.org Thu Jan 24 16:23:24 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 10:23:24 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> Message-ID: <1359041004.4802.32.camel@gdoba.domain.local> On Thu, 2013-01-24 at 22:33 +1000, Nick Coghlan wrote: > The problem with the latter suggestion is that it's a backwards > incompatible change - code where "use the last column with that name" > is the correct behaviour currently works, but would be broken if that > situation was declared an error. One example where a programmer would legitimately want to ignore errors of this kind: A CSV file has a number of named columns, and a few unnamed ones, and the programmer doesn't care about data from the unnamed columns. The unnamed columns all have the same name (''), and would raise this exception. Hence the need to be able to suppress it somehow (e.g., by instantiation argument or by catching the exception) without losing the fieldnames. Cheers, Cliff From rosuav at gmail.com Thu Jan 24 16:24:23 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 25 Jan 2013 02:24:23 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: On Fri, Jan 25, 2013 at 2:11 AM, J. Cliff Dyer wrote: > On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: >> > 1. Do any data conditioning by ignoring empty lines and lines of >> > just field delimiters before the header row (consensus seems to be >> > "no") > > Well, I wouldn't necessarily say we have a consensus on this one. This > idea received a +1 from Bruce Leban and an "I don't see any reason not > to" from Steven D'Aprano. I've been lurking this thread, but fwiw, I'd put +1 on ignoring empty lines/just delimiter lines. For a row of column headers, a completely blank line makes no sense. It's a backward-incompatible change, yes, but I can't imagine any code actively relying on this. ISTM this would probably be safe for a minor release (Python 3.4), though of course not for Python 3.3.1. ChrisA From shane at umbrellacode.com Thu Jan 24 16:28:49 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 24 Jan 2013 07:28:49 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> Since every form of CSV file counts EOL as a line terminator, I think discarding empty lines preceding the headers is arguably acceptable, but do not think discarding lines of just delimiters would be. What about extending the DictReader API so it was easy to perform these actions explicitly, such as being able to discard() the field names to be re-evaluated on the next line? Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 24, 2013, at 7:11 AM, "J. Cliff Dyer" wrote: > On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: >>> 1. Do any data conditioning by ignoring empty lines and lines of >>> just field delimiters before the header row (consensus seems to be >>> "no") > > Well, I wouldn't necessarily say we have a consensus on this one. This > idea received a +1 from Bruce Leban and an "I don't see any reason not > to" from Steven D'Aprano. > > Objections are: > > 1. It's a backwards-incompatible change. (This could be mitigated in a > couple ways, as with the duplicate header problem, below). I don't think > anyone has argued that programmers would ever actually want to use the > blank line as the headers, only that they may be doing it now as a > workaround, and breaking the workarounds is undesirable. > > 2. You should pre-process the CSV instead of adapting the reader to > malformations. (In which case, I think the DictReader.reader attribute > should be better documented, so programmers have some guidance how to do > the pre-processing, as the current DictReader can cause data loss which > would make it difficult to recover the real headers without using the > underlying reader). > > >>> 2. Give an error when encountering a duplicate field name (which >>> will lead to data loss when reading from the file) (consensus seems >>> to be "yes") > > Mostly, but with a strong objection from Mark Hackett, and hesitation > about altering current behavior from Amaury Forgeot d'Arc. > > Proposals to solve this problem: > > 1. Raise an exception (After setting the fieldnames, I think, so if you > wanted to catch and continue or catch and edit the conflicting > fieldnames, you could do so). > > 2. Combine multiple fields with the same header into a list under the > same key. > > 2a. Make lists when there are multiple fields, but otherwise, key to > strings as is currently done > > 2b. For consistency, make all values lists, regardless of the number of > columns. > > Proposals for implementation: > > 1. Create a new Reader class. Suggestions include > "CarefulDictReader" (for the version that raises an exception) and > "MultiDictReader" (for the versions that make lists of values). > > 2. Add an option to DictReader. The idea to add an option for a > MultiDictReader-like behavior was objected to, but there were multiple > suggestions to add an option for raising an exception, in one case with > the idea that in the future ("Python 4") the option would be standard > behavior. > > > Note: If we were to implement a CarefulDictReader, it could, without > backward incompatibility, implement both skipping of blank header lines, > and exception raising on duplicate headers. > > Cheers, > Cliff > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.hackett at metoffice.gov.uk Thu Jan 24 16:29:19 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 15:29:19 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: <201301241529.19304.mark.hackett@metoffice.gov.uk> On Thursday 24 Jan 2013, J. Cliff Dyer wrote: > > > 2. Give an error when encountering a duplicate field name (which > > > will lead to data loss when reading from the file) (consensus seems > > > to be "yes") > > Mostly, but with a strong objection from Mark Hackett, and hesitation > about altering current behavior from Amaury Forgeot d'Arc. > More along the lines of your earlier: > 1. It's a backwards-incompatible change. strong objection. :-) Programs that had been working will stop. Programs that won't work because it doesn't throw an exception yet are no worse off. When you change something, you'll hear almost entirely from those for whom the change will be useful. From those who will find it an obstacle, you don't hear from. Until it's implemented. Requiring catching an exception means that until the code is changed, your working program no longer works. And as you later point out Cliff, empty and uninteresting field names may legitimately exist and WANT to be ignored. So although I CAN see a reasoning for an exception, I do not see it as enough to put it in this version of the library. It's a learning process and for the next version which will need code changes to incorporate anyway, that knowledge can be used to make things better *next time*. From jcd at sdf.lonestar.org Thu Jan 24 16:55:17 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 10:55:17 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301241529.19304.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <201301241529.19304.mark.hackett@metoffice.gov.uk> Message-ID: <1359042917.4802.39.camel@gdoba.domain.local> On Thu, 2013-01-24 at 15:29 +0000, Mark Hackett wrote: > On Thursday 24 Jan 2013, J. Cliff Dyer wrote: > > > > 2. Give an error when encountering a duplicate field name (which > > > > will lead to data loss when reading from the file) (consensus seems > > > > to be "yes") > > > > Mostly, but with a strong objection from Mark Hackett, and hesitation > > about altering current behavior from Amaury Forgeot d'Arc. > > > > > More along the lines of your earlier: > > > 1. It's a backwards-incompatible change. > > strong objection. :-) > > Programs that had been working will stop. Programs that won't work because it > doesn't throw an exception yet are no worse off. > Noted. I will say that this doesn't seem any worse than any other backwards-incompatible change, which are sometimes allowed, so it should probably be considered by the same standard. That said, what are your feelings on adding a CarefulDictReader? From jcd at sdf.lonestar.org Thu Jan 24 17:08:16 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 11:08:16 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> Message-ID: <1359043696.4802.42.camel@gdoba.domain.local> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote: > Since every form of CSV file counts EOL as a line terminator, I think > discarding empty lines preceding the headers is arguably acceptable, > but do not think discarding lines of just delimiters would be. What > about extending the DictReader API so it was easy to perform these > actions explicitly, such as being able to discard() the field names to > be re-evaluated on the next line? I think I like this idea. There's something a little distasteful about making the user manually delve into the underlying reader, but this makes it more user-friendly and more obvious how to proceed. For clarity's sake, what is your objection to discarding lines of delimiters? The reason I suggest doing it is that it is a common output situation when exporting Excel files or LibreCalc files that have a blank row at the top. Cheers, Cliff From ubershmekel at gmail.com Thu Jan 24 17:08:34 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 24 Jan 2013 18:08:34 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: On Thu, Jan 24, 2013 at 5:11 PM, J. Cliff Dyer wrote: > On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: > > > 1. Do any data conditioning by ignoring empty lines and lines of > > > just field delimiters before the header row (consensus seems to be > > > "no") > > Well, I wouldn't necessarily say we have a consensus on this one. This > idea received a +1 from Bruce Leban and an "I don't see any reason not > to" from Steven D'Aprano. > > Count me in that list as well. If it were urllib handling a special case for a server you don't control then fine. But it's a valid CSV file you can process yourself if you need more control. We should keep DictReader simple. This is also a reason against "CarefulDictReader". If you need to be more specific then use csv.Reader. > > > > 2. Give an error when encountering a duplicate field name (which > > > will lead to data loss when reading from the file) (consensus seems > > > to be "yes") > > Mostly, but with a strong objection from Mark Hackett, and hesitation > about altering current behavior from Amaury Forgeot d'Arc. > In that one too. Maybe we should ask the people on this list http://hg.python.org/cpython/log/5b02d622d625/Lib/csv.py Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Thu Jan 24 17:09:28 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 24 Jan 2013 18:09:28 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: To clarify - I agree with the aforementioned "consensus". -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jan 24 17:12:09 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 24 Jan 2013 16:12:09 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: <51015D59.6030409@mrabarnett.plus.com> On 2013-01-24 15:24, Chris Angelico wrote: > On Fri, Jan 25, 2013 at 2:11 AM, J. Cliff Dyer wrote: >> On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: >>> > 1. Do any data conditioning by ignoring empty lines and lines of >>> > just field delimiters before the header row (consensus seems to be >>> > "no") >> >> Well, I wouldn't necessarily say we have a consensus on this one. This >> idea received a +1 from Bruce Leban and an "I don't see any reason not >> to" from Steven D'Aprano. > > I've been lurking this thread, but fwiw, I'd put +1 on ignoring empty > lines/just delimiter lines. For a row of column headers, a completely > blank line makes no sense. It's a backward-incompatible change, yes, > but I can't imagine any code actively relying on this. ISTM this would > probably be safe for a minor release (Python 3.4), though of course > not for Python 3.3.1. > Ignoring empty lines before a header seems OK to me, but ignoring just-delimiter lines doesn't. To me, a just-delimiter line where it's expecting a header would mean that all of the columns are unnamed, unless we insist that it's not a header unless at least one column is named, and I don't think that that should be the default behaviour. As for duplicated columns names, I think that it should probably raise an exception unless you've specified that duplicates should be put into a list. From mark.hackett at metoffice.gov.uk Thu Jan 24 17:23:50 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Thu, 24 Jan 2013 16:23:50 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> Message-ID: <201301241623.50279.mark.hackett@metoffice.gov.uk> On Thursday 24 Jan 2013, J. Cliff Dyer wrote: > For clarity's sake, what is your objection to discarding lines of > delimiters? The reason I suggest doing it is that it is a common output > situation when exporting Excel files or LibreCalc files that have a > blank row at the top. > > Cheers, > Cliff > I'm putting too many pennies in this pot, I feel, but... What was the purpose of those blank lines? Like duplicate column names at the first row, what you need to do with them depends on why they are there and what the program using the output wants to do. If someone took the repository of macros from the spreadsheet which used column numbers and this was used to recreate EXACTLY whatever calculations were done without having to keep two copies of the same algorithm to account for the dropping of rows in the script, then dropping the rows would break this. This really is policy (wrt the source of the CSV and the consumer of the dictionary). Make it a pre process of the CSV to be used and configured to fit what the meaning of the CSV file output was to the producing program and what bits of it make a difference to the consumer of the dictionary's contents. From jcd at sdf.lonestar.org Thu Jan 24 17:40:09 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 11:40:09 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301241617.58727.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301241529.19304.mark.hackett@metoffice.gov.uk> <1359042917.4802.39.camel@gdoba.domain.local> <201301241617.58727.mark.hackett@metoffice.gov.uk> Message-ID: <1359045609.4802.57.camel@gdoba.domain.local> On Thu, 2013-01-24 at 16:17 +0000, Mark Hackett wrote: > > > > That said, what are your feelings on adding a CarefulDictReader? > > > > It's as good a solution to me as any. > > However, I'm not that good a programmer, and therefore what *I'd* do > isn't > necessarily a good idea, it's just one of the better ones out of the > limited > toolbox I have available. > > I'd prefer (for aesthetic reasons) some sort of stream converter. Much > like > freeze/thaw serialisation of data, it'd be a step between the raw csv > and the > reader that reads it. > > I think my reason for wanting to have a CarefulDictReader (or a careful DictReader), and why I think a stream converter isn't the best solution, is that CSVs are very commonly used by people just starting to get their feet wet with programming. Consider the use case: I've got my excel file, and I'm just getting to the point where excel isn't cutting it anymore. I want to start manipulating my data with python, and everyone is telling me to use the csv library. DictReader sounds cool, because I don't want to have to remember column numbers, and this is going make my code much more readable. But I can't make it read my headers simply because I put some blank space at the top of my excel file, above my headers. A stream converter is another layer of complexity that keeps this potential new programmer from having a good experience with programming, for what gain? So that the csv library can "properly" (?) treat a line without data as a header? I think it would be fully reasonable (and add little to no complexity to the code) to have a DictReader that treats the first non-empty line as the header row. The csv module is one of the big gateways into python programming for a lot of people. That's also one of the reasons I think the sockets library is a poor analogue here. A new programmer is unlikely to reach the sockets library until they've been through a few of the urllibs, the httplibs, requests, some part of http or an external web framework, smtplib, or some other higher-level networking-related libraries. For the same reason, I think if the solution isn't something handled automatically by the library, it needs to be accompanied by improvements to the documentation. If we're going to provide a DictReader that is this easy to break, we need to answer the question: How do I fix it? Cheers, Cliff From jcd at sdf.lonestar.org Thu Jan 24 17:41:07 2013 From: jcd at sdf.lonestar.org (J. Cliff Dyer) Date: Thu, 24 Jan 2013 11:41:07 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301241623.50279.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> <201301241623.50279.mark.hackett@metoffice.gov.uk> Message-ID: <1359045667.4802.58.camel@gdoba.domain.local> On Thu, 2013-01-24 at 16:23 +0000, Mark Hackett wrote: > If someone took the repository of macros from the spreadsheet which used > column numbers and this was used to recreate EXACTLY whatever calculations > were done without having to keep two copies of the same algorithm to account > for the dropping of rows in the script, then dropping the rows would break > this. > If that's the case, then why are you using a DictReader instead of a raw csv.reader? You're already losing the first row. From shane at umbrellacode.com Thu Jan 24 17:41:40 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 24 Jan 2013 08:41:40 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> Message-ID: Well, my objection to doing it automatically was based in part on not being familiar with the common scenarios you've brought up, but the other reasons I had in mind were that it seemed like the kind of thing that might also be indicative of an error?something wrong with the data someone might want to know was happening rather than have masked; and also because discarding such rows leaves a question about the delimiter: it's now known, but knowing it based on rows we've discarded seems unclean. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 24, 2013, at 8:08 AM, "J. Cliff Dyer" wrote: > On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote: >> Since every form of CSV file counts EOL as a line terminator, I think >> discarding empty lines preceding the headers is arguably acceptable, >> but do not think discarding lines of just delimiters would be. What >> about extending the DictReader API so it was easy to perform these >> actions explicitly, such as being able to discard() the field names to >> be re-evaluated on the next line? > > I think I like this idea. There's something a little distasteful about > making the user manually delve into the underlying reader, but this > makes it more user-friendly and more obvious how to proceed. > > For clarity's sake, what is your objection to discarding lines of > delimiters? The reason I suggest doing it is that it is a common output > situation when exporting Excel files or LibreCalc files that have a > blank row at the top. > > Cheers, > Cliff > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 24 19:23:40 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 10:23:40 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport Message-ID: A pragmatic question popped up: sometimes the protocol would like to know the name of the socket or its peer, i.e. call getsockname() or getpeername() on the underlying socket. (I can imagine wanting to log this, or do some kind of IP address blocking.) What should the interface for this look like? I can think of several ways: A) An API to return the underlying socket, if there is one. (In the case of a stack of transports and protocols there may not be one, so it may return None.) Downside is that it requires the transport to use sockets -- if it were to use some native Windows API there might not be a socket object even though there might be an IP connection with easily-accessible address and peer. B) An API to get the address and peer address; e.g. transport.getsockname() and transport.getpeername(). These would call the corresponding call on the underlying socket, if there is one, or return None otherwise; IP transports that don't use sockets would be free to retrieve and return the requested information in a platform-specific way. Note that the address may take different forms; e.g. for AF_UNIX sockets it is a filename, so the protocol must be prepared for different formats. C) Similar to (A) or (B), but putting the API in an abstract subclass of Transport (e.g. SocketTransport) so that a transport that doesn't have this doesn't need to implement dummy methods returning None -- it is now the protocol's responsibility to check for isinstance(transport, SocketTransport) before calling the method. I'm not so keen on this, Twisted has shown (IMO) that a deep hierarchy of interfaces or ABCs does not necessarily provide clarity. Discussion? -- --Guido van Rossum (python.org/~guido) From fafhrd91 at gmail.com Thu Jan 24 19:41:51 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Thu, 24 Jan 2013 10:41:51 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: <40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com> On Jan 24, 2013, at 10:23 AM, Guido van Rossum wrote: > C) Similar to (A) or (B), but putting the API in an abstract subclass > of Transport (e.g. SocketTransport) so that a transport that doesn't > have this doesn't need to implement dummy methods returning None -- it > is now the protocol's responsibility to check for > isinstance(transport, SocketTransport) before calling the method. I'm > not so keen on this, Twisted has shown (IMO) that a deep hierarchy of > interfaces or ABCs does not necessarily provide clarity. > SocketTransport could be abstract just like Transport class, just for description purpose. Another question, should we expect ability to use protocols on top of different transports (i.e. HTTPProtocol and UnixSubprocessTransport) ? From guido at python.org Thu Jan 24 19:44:48 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 10:44:48 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: <40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com> References: <40C339D1-1B7B-4D4B-978C-96D4571E2DFF@gmail.com> Message-ID: On Thu, Jan 24, 2013 at 10:41 AM, Nikolay Kim wrote: > > On Jan 24, 2013, at 10:23 AM, Guido van Rossum wrote: > > >> C) Similar to (A) or (B), but putting the API in an abstract subclass >> of Transport (e.g. SocketTransport) so that a transport that doesn't >> have this doesn't need to implement dummy methods returning None -- it >> is now the protocol's responsibility to check for >> isinstance(transport, SocketTransport) before calling the method. I'm >> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of >> interfaces or ABCs does not necessarily provide clarity. > SocketTransport could be abstract just like Transport class, just for description purpose. Yes, but I'm arguing against this. :-) > Another question, should we expect ability to use protocols on top of different > transports (i.e. HTTPProtocol and UnixSubprocessTransport) ? Yes, it should be possible, for example the subprocess might implement some kind of custom tunnel. If in this case there's no way to get the socket or peer name, or if the names aren't very useful, that's okay. -- --Guido van Rossum (python.org/~guido) From ubershmekel at gmail.com Thu Jan 24 19:45:17 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 24 Jan 2013 20:45:17 +0200 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum wrote: > A pragmatic question popped up: sometimes the protocol would like to > know the name of the socket or its peer, i.e. call getsockname() or > getpeername() on the underlying socket. (I can imagine wanting to log > this, or do some kind of IP address blocking.) > > What should the interface for this look like? I can think of several ways: > > A) An API to return the underlying socket, if there is one. (In the > case of a stack of transports and protocols there may not be one, so > it may return None.) Downside is that it requires the transport to use > sockets -- if it were to use some native Windows API there might not > be a socket object even though there might be an IP connection with > easily-accessible address and peer. > I feel (A) is the best option as it's the most flexible - underlying transports can have many different special methods. No? Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 24 19:50:03 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 10:50:03 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield wrote: > On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum wrote: >> >> A pragmatic question popped up: sometimes the protocol would like to >> know the name of the socket or its peer, i.e. call getsockname() or >> getpeername() on the underlying socket. (I can imagine wanting to log >> this, or do some kind of IP address blocking.) >> >> What should the interface for this look like? I can think of several ways: >> >> A) An API to return the underlying socket, if there is one. (In the >> case of a stack of transports and protocols there may not be one, so >> it may return None.) Downside is that it requires the transport to use >> sockets -- if it were to use some native Windows API there might not >> be a socket object even though there might be an IP connection with >> easily-accessible address and peer. > > > I feel (A) is the best option as it's the most flexible - underlying > transports can have many different special methods. No? The whole idea of defining a transport API is that the protocol shouldn't care about what type of transport it is being used with. The example of using an http client protocol with a subprocess transport that invokes some kind of tunneling process might clarify this. So I would like the transport API to be both small and fixed, rather than having different transports have different extensions to the standard transport API. What other things might you want to do with the socket besides calling getpeername() or getsockname()? Would that be reasonable to expect from a protocol written to be independent of the specific transport type? -- --Guido van Rossum (python.org/~guido) From fafhrd91 at gmail.com Thu Jan 24 20:05:40 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Thu, 24 Jan 2013 11:05:40 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> On Jan 24, 2013, at 10:50 AM, Guido van Rossum wrote: > On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield > wrote: >> On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum wrote: >>> >>> A pragmatic question popped up: sometimes the protocol would like to >>> know the name of the socket or its peer, i.e. call getsockname() or >>> getpeername() on the underlying socket. (I can imagine wanting to log >>> this, or do some kind of IP address blocking.) >>> >>> What should the interface for this look like? I can think of several ways: >>> >>> A) An API to return the underlying socket, if there is one. (In the >>> case of a stack of transports and protocols there may not be one, so >>> it may return None.) Downside is that it requires the transport to use >>> sockets -- if it were to use some native Windows API there might not >>> be a socket object even though there might be an IP connection with >>> easily-accessible address and peer. >> >> >> I feel (A) is the best option as it's the most flexible - underlying >> transports can have many different special methods. No? > > The whole idea of defining a transport API is that the protocol > shouldn't care about what type of transport it is being used with. The > example of using an http client protocol with a subprocess transport > that invokes some kind of tunneling process might clarify this. So I > would like the transport API to be both small and fixed, rather than > having different transports have different extensions to the standard > transport API. > > What other things might you want to do with the socket besides calling > getpeername() or getsockname()? Would that be reasonable to expect > from a protocol written to be independent of the specific transport > type? transport could have dictionary attribute where it can store optional information like socket name, peer name or file path, etc. From guido at python.org Thu Jan 24 20:12:00 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 11:12:00 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> Message-ID: On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim wrote: > > On Jan 24, 2013, at 10:50 AM, Guido van Rossum wrote: > >> On Thu, Jan 24, 2013 at 10:45 AM, Yuval Greenfield >> wrote: >>> On Thu, Jan 24, 2013 at 8:23 PM, Guido van Rossum wrote: >>>> >>>> A pragmatic question popped up: sometimes the protocol would like to >>>> know the name of the socket or its peer, i.e. call getsockname() or >>>> getpeername() on the underlying socket. (I can imagine wanting to log >>>> this, or do some kind of IP address blocking.) >>>> >>>> What should the interface for this look like? I can think of several ways: >>>> >>>> A) An API to return the underlying socket, if there is one. (In the >>>> case of a stack of transports and protocols there may not be one, so >>>> it may return None.) Downside is that it requires the transport to use >>>> sockets -- if it were to use some native Windows API there might not >>>> be a socket object even though there might be an IP connection with >>>> easily-accessible address and peer. >>> >>> >>> I feel (A) is the best option as it's the most flexible - underlying >>> transports can have many different special methods. No? >> >> The whole idea of defining a transport API is that the protocol >> shouldn't care about what type of transport it is being used with. The >> example of using an http client protocol with a subprocess transport >> that invokes some kind of tunneling process might clarify this. So I >> would like the transport API to be both small and fixed, rather than >> having different transports have different extensions to the standard >> transport API. >> >> What other things might you want to do with the socket besides calling >> getpeername() or getsockname()? Would that be reasonable to expect >> from a protocol written to be independent of the specific transport >> type? > > transport could have dictionary attribute where it can store optional information > like socket name, peer name or file path, etc. Aha, that makes some sense. Though maybe it shouldn't be a dict -- it may be expensive to populate some values in some cases, so maybe there should just be a method transport.get_extra_info('key') which computes and returns (and possibly caches) certain values but returns None if the info is not supported. E.g. get_extra_info('name'), get_extra_info('peer'). This API makes it pretty clear that the caller should check the value for None before using it. -- --Guido van Rossum (python.org/~guido) From ben at bendarnell.com Thu Jan 24 20:14:38 2013 From: ben at bendarnell.com (Ben Darnell) Date: Thu, 24 Jan 2013 14:14:38 -0500 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: On Tornado we basically do A (the IOStream's socket attribute was never really documented for public consumption but has become the de facto standard way to get this kind of information). As food for thought, consider extending this to include not just peer address but also SSL certificates. Tornado's SSL support uses the stdlib's ssl.SSLSocket, so the certificate is available from the socket object, but Twisted (I believe) uses pycrypto and things work differently there. To expose SSL certificates (and NPN, and other information that may or may not be there depending on SSL implementation) across both tornado- and twisted-based transports you'd need something like B or C. -Ben On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum wrote: > A pragmatic question popped up: sometimes the protocol would like to > know the name of the socket or its peer, i.e. call getsockname() or > getpeername() on the underlying socket. (I can imagine wanting to log > this, or do some kind of IP address blocking.) > > What should the interface for this look like? I can think of several ways: > > A) An API to return the underlying socket, if there is one. (In the > case of a stack of transports and protocols there may not be one, so > it may return None.) Downside is that it requires the transport to use > sockets -- if it were to use some native Windows API there might not > be a socket object even though there might be an IP connection with > easily-accessible address and peer. > > B) An API to get the address and peer address; e.g. > transport.getsockname() and transport.getpeername(). These would call > the corresponding call on the underlying socket, if there is one, or > return None otherwise; IP transports that don't use sockets would be > free to retrieve and return the requested information in a > platform-specific way. Note that the address may take different forms; > e.g. for AF_UNIX sockets it is a filename, so the protocol must be > prepared for different formats. > > C) Similar to (A) or (B), but putting the API in an abstract subclass > of Transport (e.g. SocketTransport) so that a transport that doesn't > have this doesn't need to implement dummy methods returning None -- it > is now the protocol's responsibility to check for > isinstance(transport, SocketTransport) before calling the method. I'm > not so keen on this, Twisted has shown (IMO) that a deep hierarchy of > interfaces or ABCs does not necessarily provide clarity. > > Discussion? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Thu Jan 24 20:32:22 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 11:32:22 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: On Thu, Jan 24, 2013 at 11:14 AM, Ben Darnell wrote: > On Tornado we basically do A (the IOStream's socket attribute was never > really documented for public consumption but has become the de facto > standard way to get this kind of information). As food for thought, > consider extending this to include not just peer address but also SSL > certificates. Tornado's SSL support uses the stdlib's ssl.SSLSocket, so the > certificate is available from the socket object, but Twisted (I believe) > uses pycrypto and things work differently there. To expose SSL certificates > (and NPN, and other information that may or may not be there depending on > SSL implementation) across both tornado- and twisted-based transports you'd > need something like B or C. Excellent points all. I'll mull this over -- it's unfortunate that (A) is so easy to do and handles future needs as well, but may shut out alternate transport implementations... > -Ben > > On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum wrote: >> >> A pragmatic question popped up: sometimes the protocol would like to >> know the name of the socket or its peer, i.e. call getsockname() or >> getpeername() on the underlying socket. (I can imagine wanting to log >> this, or do some kind of IP address blocking.) >> >> What should the interface for this look like? I can think of several ways: >> >> A) An API to return the underlying socket, if there is one. (In the >> case of a stack of transports and protocols there may not be one, so >> it may return None.) Downside is that it requires the transport to use >> sockets -- if it were to use some native Windows API there might not >> be a socket object even though there might be an IP connection with >> easily-accessible address and peer. >> >> B) An API to get the address and peer address; e.g. >> transport.getsockname() and transport.getpeername(). These would call >> the corresponding call on the underlying socket, if there is one, or >> return None otherwise; IP transports that don't use sockets would be >> free to retrieve and return the requested information in a >> platform-specific way. Note that the address may take different forms; >> e.g. for AF_UNIX sockets it is a filename, so the protocol must be >> prepared for different formats. >> >> C) Similar to (A) or (B), but putting the API in an abstract subclass >> of Transport (e.g. SocketTransport) so that a transport that doesn't >> have this doesn't need to implement dummy methods returning None -- it >> is now the protocol's responsibility to check for >> isinstance(transport, SocketTransport) before calling the method. I'm >> not so keen on this, Twisted has shown (IMO) that a deep hierarchy of >> interfaces or ABCs does not necessarily provide clarity. >> >> Discussion? >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > > -- --Guido van Rossum (python.org/~guido) From solipsis at pitrou.net Thu Jan 24 20:34:06 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Thu, 24 Jan 2013 20:34:06 +0100 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport References: Message-ID: <20130124203406.0952fb00@pitrou.net> On Thu, 24 Jan 2013 10:23:40 -0800 Guido van Rossum wrote: > A pragmatic question popped up: sometimes the protocol would like to > know the name of the socket or its peer, i.e. call getsockname() or > getpeername() on the underlying socket. (I can imagine wanting to log > this, or do some kind of IP address blocking.) > > What should the interface for this look like? I can think of several ways: > > A) An API to return the underlying socket, if there is one. (In the > case of a stack of transports and protocols there may not be one, so > it may return None.) Downside is that it requires the transport to use > sockets -- if it were to use some native Windows API there might not > be a socket object even though there might be an IP connection with > easily-accessible address and peer. I don't understand why you say Windows doesn't use sockets for IP connections. AFAIK, sockets are the *only* way to do networking with the Windows API. See e.g. WSARecv, which supports synchronous and asynchronous operation: http://msdn.microsoft.com/en-us/library/windows/desktop/ms741688%28v=vs.85%29.aspx (I also suppose you meant "TCP connection", not "IP connection" ;-)) That said, the problem with returning a socket is that it's quite low-level, and might return sockets with different characteristics depending on the backend. So, while it can be there, I think the preferred APIs for most uses should be B or C. > C) Similar to (A) or (B), but putting the API in an abstract subclass > of Transport (e.g. SocketTransport) so that a transport that doesn't > have this doesn't need to implement dummy methods returning None -- it > is now the protocol's responsibility to check for > isinstance(transport, SocketTransport) before calling the method. I'm > not so keen on this, Twisted has shown (IMO) that a deep hierarchy of > interfaces or ABCs does not necessarily provide clarity. IMO, Twisted mostly shows that zope.interface doesn't combine very well with automated doc generators such as epydoc (you have to look up the interface every time you want the documentation of one of the concrete classes). And as Ben says, I don't think you want to enumerate all possible introspection APIs (such as the various pieces of SSL-related information) on the base Transport class. Regards Antoine. From shane at umbrellacode.com Thu Jan 24 20:37:25 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 24 Jan 2013 11:37:25 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: Starting to seem like the transport could almost be an entry in the dictionary rather than owning it, kind of like environ['input'] in wsgi spec. Not that I'm necessarily recommending this, but it seems like the details may outlive the transports, could potentially include information the transport itself considered input, and may be a useful place to store details such as SSL details that might be shared. A lot of these details could be initialized when the transport was created, and many would be based on the whatever spawned it. For example, a transport spawned by an HTTPS server that accepted an incoming connection would inherit the SSL configuration, etc. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 24, 2013, at 11:14 AM, Ben Darnell wrote: > On Tornado we basically do A (the IOStream's socket attribute was never really documented for public consumption but has become the de facto standard way to get this kind of information). As food for thought, consider extending this to include not just peer address but also SSL certificates. Tornado's SSL support uses the stdlib's ssl.SSLSocket, so the certificate is available from the socket object, but Twisted (I believe) uses pycrypto and things work differently there. To expose SSL certificates (and NPN, and other information that may or may not be there depending on SSL implementation) across both tornado- and twisted-based transports you'd need something like B or C. > > -Ben > > On Thu, Jan 24, 2013 at 1:23 PM, Guido van Rossum wrote: > A pragmatic question popped up: sometimes the protocol would like to > know the name of the socket or its peer, i.e. call getsockname() or > getpeername() on the underlying socket. (I can imagine wanting to log > this, or do some kind of IP address blocking.) > > What should the interface for this look like? I can think of several ways: > > A) An API to return the underlying socket, if there is one. (In the > case of a stack of transports and protocols there may not be one, so > it may return None.) Downside is that it requires the transport to use > sockets -- if it were to use some native Windows API there might not > be a socket object even though there might be an IP connection with > easily-accessible address and peer. > > B) An API to get the address and peer address; e.g. > transport.getsockname() and transport.getpeername(). These would call > the corresponding call on the underlying socket, if there is one, or > return None otherwise; IP transports that don't use sockets would be > free to retrieve and return the requested information in a > platform-specific way. Note that the address may take different forms; > e.g. for AF_UNIX sockets it is a filename, so the protocol must be > prepared for different formats. > > C) Similar to (A) or (B), but putting the API in an abstract subclass > of Transport (e.g. SocketTransport) so that a transport that doesn't > have this doesn't need to implement dummy methods returning None -- it > is now the protocol's responsibility to check for > isinstance(transport, SocketTransport) before calling the method. I'm > not so keen on this, Twisted has shown (IMO) that a deep hierarchy of > interfaces or ABCs does not necessarily provide clarity. > > Discussion? > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From Steve.Dower at microsoft.com Thu Jan 24 21:16:58 2013 From: Steve.Dower at microsoft.com (Steve Dower) Date: Thu, 24 Jan 2013 20:16:58 +0000 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: <20130124203406.0952fb00@pitrou.net> References: <20130124203406.0952fb00@pitrou.net> Message-ID: Antoine Pitrou wrote: > On Thu, 24 Jan 2013 10:23:40 -0800 > Guido van Rossum wrote: > > A) An API to return the underlying socket, if there is one. (In the > > case of a stack of transports and protocols there may not be one, so > > it may return None.) Downside is that it requires the transport to use > > sockets -- if it were to use some native Windows API there might not > > be a socket object even though there might be an IP connection with > > easily-accessible address and peer. > > I don't understand why you say Windows doesn't use sockets for IP > connections. AFAIK, sockets are the *only* way to do networking with the > Windows API. See e.g. WSARecv, which supports synchronous and > asynchronous operation: > http://msdn.microsoft.com/en- > us/library/windows/desktop/ms741688%28v=vs.85%29.aspx There's also a whole selection of "Internet" APIs that could be used http://msdn.microsoft.com/en-us/library/hh309468.aspx and plenty (probably too many) other high level APIs. There's no expectation that every application has to deal solely in sockets. Cheers, Steve From storchaka at gmail.com Thu Jan 24 21:35:14 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Thu, 24 Jan 2013 22:35:14 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> Message-ID: On 23.01.13 03:51, alex23 wrote: > with open('malformed.csv','rb') as csvfile: > csvlines = list(l for l in csvfile if l.strip()) > csvreader = DictReader(csvlines) csvreader = DictReader(l for l in csvfile if l.strip()) From ncoghlan at gmail.com Thu Jan 24 21:51:40 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jan 2013 06:51:40 +1000 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> Message-ID: On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum wrote: > On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim wrote: >> transport could have dictionary attribute where it can store optional information >> like socket name, peer name or file path, etc. > > Aha, that makes some sense. Though maybe it shouldn't be a dict -- it > may be expensive to populate some values in some cases, so maybe there > should just be a method transport.get_extra_info('key') which computes > and returns (and possibly caches) certain values but returns None if > the info is not supported. E.g. get_extra_info('name'), > get_extra_info('peer'). This API makes it pretty clear that the caller > should check the value for None before using it. A "get_extra_info" API like that is also amenable to providing an explicit default for the "key not present" case, and makes it clearer that the calculations involved may not be cheap. You could even go so far as to have it return a Future, allowing it to be used for info that requires network activity. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From guido at python.org Thu Jan 24 22:50:35 2013 From: guido at python.org (Guido van Rossum) Date: Thu, 24 Jan 2013 13:50:35 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> Message-ID: On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan wrote: > On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum wrote: >> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim wrote: >>> transport could have dictionary attribute where it can store optional information >>> like socket name, peer name or file path, etc. >> >> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it >> may be expensive to populate some values in some cases, so maybe there >> should just be a method transport.get_extra_info('key') which computes >> and returns (and possibly caches) certain values but returns None if >> the info is not supported. E.g. get_extra_info('name'), >> get_extra_info('peer'). This API makes it pretty clear that the caller >> should check the value for None before using it. > > A "get_extra_info" API like that is also amenable to providing an > explicit default for the "key not present" case, and makes it clearer > that the calculations involved may not be cheap. Yeah, the signature could be get_extra_info(key, default=None). > You could even go so > far as to have it return a Future, allowing it to be used for info > that requires network activity. I think that goes too far. It doesn't look like getpeername() goes out to the network -- what other use case did you have in mind? (I suppose it could use a Future for some keys only -- but then the caller would still need to be aware that it could return None instead of a Future, so it would be somewhat awkward to use -- you couldn't write remote_user = yield from self.transport.get_extra_info("remote_user") you'd have to write f = yield from self.transport.get_extra_info("remote_user") remote_user = (yield from f) if f else None -- --Guido van Rossum (python.org/~guido) From steve at pearwood.info Fri Jan 25 00:15:14 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 25 Jan 2013 10:15:14 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359043696.4802.42.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> Message-ID: <5101C082.9070702@pearwood.info> On 25/01/13 03:08, J. Cliff Dyer wrote: > On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote: >> Since every form of CSV file counts EOL as a line terminator, I think >> discarding empty lines preceding the headers is arguably acceptable, >> but do not think discarding lines of just delimiters would be. What >> about extending the DictReader API so it was easy to perform these >> actions explicitly, such as being able to discard() the field names to >> be re-evaluated on the next line? > > I think I like this idea. There's something a little distasteful about > making the user manually delve into the underlying reader, but this > makes it more user-friendly and more obvious how to proceed. I couldn't disagree more. I think: - it adds burden to the caller, since the caller is now expected to manually inspect the field names and decide whether some should be discarded; - it is less obvious: *how* does the caller decide that there are too many field names? - incomplete: if there is a discard(), where is the add()? - completely irrelevant for the topic being discussed ("DictReader should ignore leading blank lines... I know, let's give the caller the ability to *discard* field names" -- but auto-detecting *too many* field names is not the problem); - and being able to change the field names on the fly is so far beyond anything required for ordinary CSV that it doesn't belong in the CSV module. > For clarity's sake, what is your objection to discarding lines of > delimiters? The reason I suggest doing it is that it is a common output > situation when exporting Excel files or LibreCalc files that have a > blank row at the top. A row of delimiters should be treated by the reader object as a row with explicitly empty fields. If the caller wishes to discard them, they can. But the reader object shouldn't make that decision. An empty row, on the other hand, should be just ignored. DictReader *already* ignores empty rows, provided that they are not in the first row. -- Steven From steve at pearwood.info Fri Jan 25 00:53:51 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 25 Jan 2013 10:53:51 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1359040294.4802.29.camel@gdoba.domain.local> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> Message-ID: <5101C98F.1000704@pearwood.info> On 25/01/13 02:11, J. Cliff Dyer wrote: > On Thu, 2013-01-24 at 13:38 +0100, Antoine Pitrou wrote: >>> 1. Do any data conditioning by ignoring empty lines and lines of >>> just field delimiters before the header row (consensus seems to be >>> "no") > > Well, I wouldn't necessarily say we have a consensus on this one. This > idea received a +1 from Bruce Leban and an "I don't see any reason not > to" from Steven D'Aprano. > > Objections are: > > 1. It's a backwards-incompatible change. All bug fixes are backwards-incompatible changes. The question is, is there anyone relying on this behaviour? DictReader already ignores blank lines, *except for the very first line*. Using Python 3.3: py> from io import StringIO py> from csv import DictReader py> data = StringIO('spam,ham,eggs\n\n\n\n1,2,3\n\n\n\n\n4,5,6\n') py> x = csv.DictReader(data) py> next(x) {'eggs': '3', 'ham': '2', 'spam': '1'} py> next(x) {'eggs': '6', 'ham': '5', 'spam': '4'} I don't expect that there is anyone relying on a CSV file with a leading blank line to be treated as one having no columns at all: py> data = StringIO('\n\n\n\nspam,ham,eggs\n1,2,3\n4,5,6\n') py> x = DictReader(data) py> next(x) {None: ['spam', 'ham', 'eggs']} py> x.fieldnames [] I expect that there is probably code that works around this issue, by skipping blank lines somehow, e.g. DictReader(row for row in data if row.strip()) These work-arounds may (or not) be fragile or buggy, but they ought to continue working even if DictReader changes its header detection. -- Steven From shane at umbrellacode.com Fri Jan 25 01:05:43 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 24 Jan 2013 16:05:43 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5101C082.9070702@pearwood.info> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> <5101C082.9070702@pearwood.info> Message-ID: If this is part of the same response? > A row of delimiters should be treated by the reader object as a row with > explicitly empty fields. If the caller wishes to discard them, they can. > But the reader object shouldn't make that decision. > > An empty row, on the other hand, should be just ignored. DictReader *already* > ignores empty rows, provided that they are not in the first row. Then I think my description was unclear. I wasn't suggesting we add methods for manipulating individual headers, only for telling the DictParser to drop existing headers and reevaluate them on the next row. To make it easy to do something like while not any(records.fieldnames): records.discard_fieldnames() # or something to that effect? without changing any existing behaviour. Shane Green www.umbrellacode.com 805-452-9666 | shane at umbrellacode.com On Jan 24, 2013, at 3:15 PM, Steven D'Aprano wrote: > On 25/01/13 03:08, J. Cliff Dyer wrote: >> On Thu, 2013-01-24 at 07:28 -0800, Shane Green wrote: >>> Since every form of CSV file counts EOL as a line terminator, I think >>> discarding empty lines preceding the headers is arguably acceptable, >>> but do not think discarding lines of just delimiters would be. What >>> about extending the DictReader API so it was easy to perform these >>> actions explicitly, such as being able to discard() the field names to >>> be re-evaluated on the next line? >> >> I think I like this idea. There's something a little distasteful about >> making the user manually delve into the underlying reader, but this >> makes it more user-friendly and more obvious how to proceed. > > I couldn't disagree more. I think: > > - it adds burden to the caller, since the caller is now expected to manually > inspect the field names and decide whether some should be discarded; > > - it is less obvious: *how* does the caller decide that there are too many > field names? > > - incomplete: if there is a discard(), where is the add()? > > - completely irrelevant for the topic being discussed ("DictReader should > ignore leading blank lines... I know, let's give the caller the ability > to *discard* field names" -- but auto-detecting *too many* field names is > not the problem); > > - and being able to change the field names on the fly is so far beyond > anything required for ordinary CSV that it doesn't belong in the CSV > module. > > >> For clarity's sake, what is your objection to discarding lines of >> delimiters? The reason I suggest doing it is that it is a common output >> situation when exporting Excel files or LibreCalc files that have a >> blank row at the top. > > > A row of delimiters should be treated by the reader object as a row with > explicitly empty fields. If the caller wishes to discard them, they can. > But the reader object shouldn't make that decision. > > An empty row, on the other hand, should be just ignored. DictReader *already* > ignores empty rows, provided that they are not in the first row. > > > > -- > Steven > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wuwei23 at gmail.com Fri Jan 25 02:49:53 2013 From: wuwei23 at gmail.com (alex23) Date: Thu, 24 Jan 2013 17:49:53 -0800 (PST) Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> Message-ID: <25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com> On 25 Jan, 06:35, Serhiy Storchaka wrote: > On 23.01.13 03:51, alex23 wrote: > > > ? ? ?with open('malformed.csv','rb') as csvfile: > > ? ? ? ? ?csvlines = list(l for l in csvfile if l.strip()) > > ? ? ? ? ?csvreader = DictReader(csvlines) > > csvreader = DictReader(l for l in csvfile if l.strip()) Uh, thanks, although I'm not sure what you think you're showing me that I'm not already aware of. I spelled it out as two separate expressions for clarity, I didn't realise we were playing code golf in our examples. From stephen at xemacs.org Fri Jan 25 03:38:30 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Fri, 25 Jan 2013 11:38:30 +0900 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5101C082.9070702@pearwood.info> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <2E02A587-DC04-4385-8A36-C41160E33E98@umbrellacode.com> <1359043696.4802.42.camel@gdoba.domain.local> <5101C082.9070702@pearwood.info> Message-ID: <877gn23s2h.fsf@uwakimon.sk.tsukuba.ac.jp> Steven D'Aprano writes: > - it adds burden to the caller, since the caller is now expected to > manually inspect the field names and decide whether some should > be discarded; It's a dirty job but somebody has to do it. And that ultimately has to be the *writer* of the CSV file, not the reader. Both csv.DictReader and the caller are merely guessing unless there's a private agreement with the writer. cvs.DictReader, as a stdlib module, can't know about that agreement. The caller can (although one obvious use case for csv.DictReader is that the caller doesn't and is hoping csv.DictReader can guess better, oops). Unless somebody has figured out how to give stdlib code "channeling" capability? From ethan at stoneleaf.us Fri Jan 25 04:20:23 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 24 Jan 2013 19:20:23 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301241047.17391.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> Message-ID: <5101F9F7.1070301@stoneleaf.us> On 01/24/2013 02:47 AM, Mark Hackett wrote: > On Thursday 24 Jan 2013, Steven D'Aprano wrote: > >>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label. >>> It's not terribly surprising once you sit down and think about it, but >>> it's certainly at least a little unexpected to me that data is being >>> thrown away with no notice. It's unusual for errors to pass silently >>> in python. >> >> Yes, we should not forget that a CSV file is not a dict. Just because >> DictReader is implemented with a dict as the storage, doesn't mean that it >> should behave exactly like a dict in all things. Multiple columns with the >> same name are legal in CSV, so there should be a reader for that >> situation. >> > > But just because it's reading a csv file, we shouldn't change how a dictionary > works if you add the same key again. The proposal is not to change how a dict works, but what the proper response is for DictReader when a duplicate key is found. ~Ethan~ From ethan at stoneleaf.us Fri Jan 25 04:25:38 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 24 Jan 2013 19:25:38 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <1358903168.4767.4.camel@webb> References: <1358903168.4767.4.camel@webb> Message-ID: <5101FB32.7030306@stoneleaf.us> On 01/22/2013 05:06 PM, J. Cliff Dyer wrote: > Thoughts? Do folks think this is worth adding to the csv library, or > should I just keep using my subclass? +1 for ignoring blank lines (including delimiter-only lines) +1 for raising an exception on duplicate headers +1 for a flag to not raise on duplicate empty headers (but a completely empty header line is still ignored) ~Ethan~ From tjreedy at udel.edu Fri Jan 25 05:26:19 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 24 Jan 2013 23:26:19 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5101C98F.1000704@pearwood.info> References: <1358903168.4767.4.camel@webb> <51007FCC.5090400@pearwood.info> <201301241047.17391.mark.hackett@metoffice.gov.uk> <20130124133858.32622f6e@pitrou.net> <1359040294.4802.29.camel@gdoba.domain.local> <5101C98F.1000704@pearwood.info> Message-ID: On 1/24/2013 6:53 PM, Steven D'Aprano wrote: > DictReader already ignores blank lines, *except for the very first line*. Interesting. A proper csv file does not contain blank lines. The csv doc is silent on what is does they are present. (The work 'blank' does not appear.) Ignoring them seems reasonable, but then all should be ignored. And the doc should say so. > Using Python 3.3: > > py> from io import StringIO > py> from csv import DictReader > py> data = StringIO('spam,ham,eggs\n\n\n\n1,2,3\n\n\n\n\n4,5,6\n') > py> x = csv.DictReader(data) > py> next(x) > {'eggs': '3', 'ham': '2', 'spam': '1'} > py> next(x) > {'eggs': '6', 'ham': '5', 'spam': '4'} > > > I don't expect that there is anyone relying on a CSV file with a leading > blank line to be treated as one having no columns at all: > > py> data = StringIO('\n\n\n\nspam,ham,eggs\n1,2,3\n4,5,6\n') > py> x = DictReader(data) > py> next(x) > {None: ['spam', 'ham', 'eggs']} > py> x.fieldnames > [] > > > I expect that there is probably code that works around this issue, by > skipping blank lines somehow, e.g. > > DictReader(row for row in data if row.strip()) > > These work-arounds may (or not) be fragile or buggy, but they ought > to continue working even if DictReader changes its header detection. -- Terry Jan Reedy From storchaka at gmail.com Fri Jan 25 11:01:08 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Fri, 25 Jan 2013 12:01:08 +0200 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com> References: <1358903168.4767.4.camel@webb> <5d57225f-436a-49db-b882-68bbf80b9e71@t6g2000pba.googlegroups.com> <25963d3e-a6fa-4737-bbd1-04ba0454f793@ro7g2000pbb.googlegroups.com> Message-ID: On 25.01.13 03:49, alex23 wrote: > On 25 Jan, 06:35, Serhiy Storchaka wrote: >> csvreader = DictReader(l for l in csvfile if l.strip()) > > Uh, thanks, although I'm not sure what you think you're showing me > that I'm not already aware of. I spelled it out as two separate > expressions for clarity, I didn't realise we were playing code golf in > our examples. I point that you no need to read all file in a memory. You can use an iterator and process it line by line. From mark.hackett at metoffice.gov.uk Fri Jan 25 11:58:28 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Fri, 25 Jan 2013 10:58:28 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5101F9F7.1070301@stoneleaf.us> References: <1358903168.4767.4.camel@webb> <201301241047.17391.mark.hackett@metoffice.gov.uk> <5101F9F7.1070301@stoneleaf.us> Message-ID: <201301251058.28531.mark.hackett@metoffice.gov.uk> On Friday 25 Jan 2013, Ethan Furman wrote: > On 01/24/2013 02:47 AM, Mark Hackett wrote: > > On Thursday 24 Jan 2013, Steven D'Aprano wrote: > >>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label. > >>> It's not terribly surprising once you sit down and think about it, but > >>> it's certainly at least a little unexpected to me that data is being > >>> thrown away with no notice. It's unusual for errors to pass silently > >>> in python. > >> > >> Yes, we should not forget that a CSV file is not a dict. Just because > >> DictReader is implemented with a dict as the storage, doesn't mean > >> that it should behave exactly like a dict in all things. Multiple > >> columns with the same name are legal in CSV, so there should be a reader > >> for that situation. > > > > But just because it's reading a csv file, we shouldn't change how a > > dictionary works if you add the same key again. > > The proposal is not to change how a dict works, but what the proper > response is for DictReader when a duplicate key is found. > Ethan, the proposal is predicated on the "silent abandonment" (which isn't actually the case any more than doing: a=4 a=9 is abandoning silently the 4.) being unexpected. Except, just like the assignment in the aside above, this is entirely what IS expected if you're putting a CSV line into a dictionary with duplicate key names. If you don't want it to do what a dictionary does, then don't use DictReader, as Chris proposes. My only niggle with that idea is that you'd be writing a lot of "SumptyReader" for each case and is redundant. But that may, in practice, be no problem at all. If you didn't want it to do what a dict does, don't use a dict. From mark.hackett at metoffice.gov.uk Fri Jan 25 12:00:31 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Fri, 25 Jan 2013 11:00:31 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5101C082.9070702@pearwood.info> References: <1358903168.4767.4.camel@webb> <1359043696.4802.42.camel@gdoba.domain.local> <5101C082.9070702@pearwood.info> Message-ID: <201301251100.31153.mark.hackett@metoffice.gov.uk> On Thursday 24 Jan 2013, Steven D'Aprano wrote: > - it is less obvious: how does the caller decide that there are too many > field names? > Additionally, the user of the library now has to read much more about the library (either code or documentation, which has to track the code too), to decide what it is going to do. If you have to read the code, then it's not really OO, is it. It's light grey, not black box. From ncoghlan at gmail.com Fri Jan 25 12:09:43 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 25 Jan 2013 21:09:43 +1000 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> Message-ID: On Fri, Jan 25, 2013 at 7:50 AM, Guido van Rossum wrote: > I think that goes too far. It doesn't look like getpeername() goes out > to the network -- what other use case did you have in mind? I don't have one, so YAGNI sounds like a good answer to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ronaldoussoren at mac.com Fri Jan 25 12:24:58 2013 From: ronaldoussoren at mac.com (Ronald Oussoren) Date: Fri, 25 Jan 2013 12:24:58 +0100 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> Message-ID: <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> On 24 Jan, 2013, at 22:50, Guido van Rossum wrote: > On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan wrote: >> On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum wrote: >>> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim wrote: >>>> transport could have dictionary attribute where it can store optional information >>>> like socket name, peer name or file path, etc. >>> >>> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it >>> may be expensive to populate some values in some cases, so maybe there >>> should just be a method transport.get_extra_info('key') which computes >>> and returns (and possibly caches) certain values but returns None if >>> the info is not supported. E.g. get_extra_info('name'), >>> get_extra_info('peer'). This API makes it pretty clear that the caller >>> should check the value for None before using it. >> >> A "get_extra_info" API like that is also amenable to providing an >> explicit default for the "key not present" case, and makes it clearer >> that the calculations involved may not be cheap. > > Yeah, the signature could be get_extra_info(key, default=None). > >> You could even go so >> far as to have it return a Future, allowing it to be used for info >> that requires network activity. > > I think that goes too far. It doesn't look like getpeername() goes out > to the network -- what other use case did you have in mind? (I suppose > it could use a Future for some keys only -- but then the caller would > still need to be aware that it could return None instead of a Future, > so it would be somewhat awkward to use -- you couldn't write A transport that tunnels traffic over a SOCKS or SSH tunnel might require network access to get the sockname or peername of the proxied connection. I don't know enough about either protocol to know for sure, and the information could also be fetched during connection setup and then cached. Ronald From ethan at stoneleaf.us Fri Jan 25 17:30:25 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 25 Jan 2013 08:30:25 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301251100.31153.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <1359043696.4802.42.camel@gdoba.domain.local> <5101C082.9070702@pearwood.info> <201301251100.31153.mark.hackett@metoffice.gov.uk> Message-ID: <5102B321.8070706@stoneleaf.us> On 01/25/2013 03:00 AM, Mark Hackett wrote: > On Thursday 24 Jan 2013, Steven D'Aprano wrote: >> - it is less obvious: how does the caller decide that there are too many >> field names? >> > > Additionally, the user of the library now has to read much more about the > library (either code or documentation, which has to track the code too), to > decide what it is going to do. > > If you have to read the code, then it's not really OO, is it. It's light grey, > not black box. If you have to read the code, the documentation needs improvement. ~Ethan~ From mark.hackett at metoffice.gov.uk Fri Jan 25 17:53:46 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Fri, 25 Jan 2013 16:53:46 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5102B321.8070706@stoneleaf.us> References: <1358903168.4767.4.camel@webb> <201301251100.31153.mark.hackett@metoffice.gov.uk> <5102B321.8070706@stoneleaf.us> Message-ID: <201301251653.46558.mark.hackett@metoffice.gov.uk> On Friday 25 Jan 2013, Ethan Furman wrote: > On 01/25/2013 03:00 AM, Mark Hackett wrote: > > On Thursday 24 Jan 2013, Steven D'Aprano wrote: > >> - it is less obvious: how does the caller decide that there are too many > >> field names? > > > > Additionally, the user of the library now has to read much more about the > > library (either code or documentation, which has to track the code too), > > to decide what it is going to do. > > > > If you have to read the code, then it's not really OO, is it. It's light > > grey, not black box. > > If you have to read the code, the documentation needs improvement. > And if you put your feet too close to the fire, your feet will burn. Neither have anything to do with the subject at hand, however. Which is if a dictionary acts a certain way and calling a routine that creates a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that creates a dictionary? You see, the option here is to leave it operating as a dictionary operates. And in that case, you do not need to document anything. The documentation of how it works is already covered by the python basics: "How does a dictionary work in Python?". So don't change it, and you don't have to improve the documentation. From guido at python.org Fri Jan 25 18:47:35 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jan 2013 09:47:35 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Fri, Jan 25, 2013 at 3:24 AM, Ronald Oussoren wrote: > > On 24 Jan, 2013, at 22:50, Guido van Rossum wrote: > > > On Thu, Jan 24, 2013 at 12:51 PM, Nick Coghlan > wrote: > >> On Fri, Jan 25, 2013 at 5:12 AM, Guido van Rossum > wrote: > >>> On Thu, Jan 24, 2013 at 11:05 AM, Nikolay Kim > wrote: > >>>> transport could have dictionary attribute where it can store optional > information > >>>> like socket name, peer name or file path, etc. > >>> > >>> Aha, that makes some sense. Though maybe it shouldn't be a dict -- it > >>> may be expensive to populate some values in some cases, so maybe there > >>> should just be a method transport.get_extra_info('key') which computes > >>> and returns (and possibly caches) certain values but returns None if > >>> the info is not supported. E.g. get_extra_info('name'), > >>> get_extra_info('peer'). This API makes it pretty clear that the caller > >>> should check the value for None before using it. > >> > >> A "get_extra_info" API like that is also amenable to providing an > >> explicit default for the "key not present" case, and makes it clearer > >> that the calculations involved may not be cheap. > > > > Yeah, the signature could be get_extra_info(key, default=None). > > > >> You could even go so > >> far as to have it return a Future, allowing it to be used for info > >> that requires network activity. > > > > I think that goes too far. It doesn't look like getpeername() goes out > > to the network -- what other use case did you have in mind? (I suppose > > it could use a Future for some keys only -- but then the caller would > > still need to be aware that it could return None instead of a Future, > > so it would be somewhat awkward to use -- you couldn't write > > A transport that tunnels traffic over a SOCKS or SSH tunnel might require > network access to get the sockname or peername of the proxied connection. I > don't know enough about either protocol to know for sure, and the > information could also be fetched during connection setup and then cached. Sounds good (to fetch it proactively ahead of time, rather than inject a Future into the API). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Jan 25 17:48:43 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 25 Jan 2013 08:48:43 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301251058.28531.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301241047.17391.mark.hackett@metoffice.gov.uk> <5101F9F7.1070301@stoneleaf.us> <201301251058.28531.mark.hackett@metoffice.gov.uk> Message-ID: <5102B76B.2080106@stoneleaf.us> On 01/25/2013 02:58 AM, Mark Hackett wrote: > On Friday 25 Jan 2013, Ethan Furman wrote: >> On 01/24/2013 02:47 AM, Mark Hackett wrote: >>> On Thursday 24 Jan 2013, Steven D'Aprano wrote: >>>>> I'm not sure this behavior merits the all-caps "AS EXPECTED" label. >>>>> It's not terribly surprising once you sit down and think about it, but >>>>> it's certainly at least a little unexpected to me that data is being >>>>> thrown away with no notice. It's unusual for errors to pass silently >>>>> in python. >>>> >>>> Yes, we should not forget that a CSV file is not a dict. Just because >>>> DictReader is implemented with a dict as the storage, doesn't mean >>>> that it should behave exactly like a dict in all things. Multiple >>>> columns with the same name are legal in CSV, so there should be a reader >>>> for that situation. >>> >>> But just because it's reading a csv file, we shouldn't change how a >>> dictionary works if you add the same key again. >> >> The proposal is not to change how a dict works, but what the proper >> response is for DictReader when a duplicate key is found. > > Ethan, the proposal is predicated on the "silent abandonment" (which isn't > actually the case any more than doing: > > a=4 > a=9 > > is abandoning silently the 4.) being unexpected. We're going to have to agree to disagree on this point -- I think there is a huge difference between reassigning a variable which is completely under your control from losing entire columns of data from a file which you may have never seen before. > Except, just like the assignment in the aside above, this is entirely what IS > expected if you're putting a CSV line into a dictionary with duplicate key > names. Expected by whom? The library writer? Sure. The application writer? Maybe. The person creating the spreadsheet that's going to be dumped to csv to be imported into the program that thought, "This field also needs an item number... I'll call it 'item_no', just like that other column" -- Nope. > If you don't want it to do what a dictionary does, then don't use DictReader, > as Chris proposes. DictReader puts a name on a column -- that's its primary use; I don't think the designers had the goal of dropping data when they implemented it -- I suspect it was just missed as a possibility (not being the "normal" type of csv file) or putting a warning in the docs was missed. ~Ethan~ From rurpy at yahoo.com Fri Jan 25 19:03:03 2013 From: rurpy at yahoo.com (rurpy at yahoo.com) Date: Fri, 25 Jan 2013 10:03:03 -0800 (PST) Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301251653.46558.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301251100.31153.mark.hackett@metoffice.gov.uk> <5102B321.8070706@stoneleaf.us> <201301251653.46558.mark.hackett@metoffice.gov.uk> Message-ID: <17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com> On 01/25/2013 09:53 AM, Mark Hackett wrote:> On Friday 25 Jan 2013, Ethan Furman wrote: >> On 01/25/2013 03:00 AM, Mark Hackett wrote: >> > On Thursday 24 Jan 2013, Steven D'Aprano wrote: >> >> - it is less obvious: how does the caller decide that there are too many >> >> field names? >> > >> > Additionally, the user of the library now has to read much more about the >> > library (either code or documentation, which has to track the code too), >> > to decide what it is going to do. >> > >> > If you have to read the code, then it's not really OO, is it. It's light >> > grey, not black box. >> >> If you have to read the code, the documentation needs improvement. >> > > And if you put your feet too close to the fire, your feet will burn. > > Neither have anything to do with the subject at hand, however. > > Which is if a dictionary acts a certain way and calling a routine that creates > a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that > creates a dictionary? > > You see, the option here is to leave it operating as a dictionary operates. > And in that case, you do not need to document anything. The documentation of > how it works is already covered by the python basics: "How does a dictionary > work in Python?". The csv DictReader *uses* a dictionary for its output. That it does so imposes no requirements on how it should parse or otherwise handle the input that eventually goes into that dict. I can understand the appeal of keeping things simple and simply cramming whatever comes out of a simple parse of the header into the dict keys. Simplicity is good and that is a valid opinion. However it is not a-priori the obviously best one no matter how much hand-waving and foot stomping comes with it. I would prefer to see a suppressible exception when header keys are duplicated on the grounds that such a csv file is not in general an appropriate input for the DictReader. > So don't change it, and you don't have to improve the documentation. If it's not changed then documentation definitely should be fixed. The very fact that when the behaviour was pointed out here, the result was a long discussion rather than one or two responses that said, "of course it behaves that way" is the strongest evidence that the current description is inadequate. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fafhrd91 at gmail.com Fri Jan 25 19:03:49 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Fri, 25 Jan 2013 10:03:49 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: I think Transport needs 'sendfile' api, something like: @tasks.coroutine def sendfile(self, fd, offset, nbytes): ?. otherwise it is impossible to implement sendfile without breaking transport encapsulation. From guido at python.org Fri Jan 25 19:08:46 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jan 2013 10:08:46 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: > > I think Transport needs 'sendfile' api, something like: > > @tasks.coroutine > def sendfile(self, fd, offset, nbytes): > ?. > > otherwise it is impossible to implement sendfile without breaking > transport encapsulation Really? Can't the user write this themselves? What's wrong with this: while True: data = os.read(fd, 16*1024) if not data: break transport.write(data) (Perhaps augmented with a way to respond to pause() requests.) -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fafhrd91 at gmail.com Fri Jan 25 19:11:37 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Fri, 25 Jan 2013 10:11:37 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Jan 25, 2013, at 10:08 AM, Guido van Rossum wrote: > On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: > > I think Transport needs 'sendfile' api, something like: > > @tasks.coroutine > def sendfile(self, fd, offset, nbytes): > ?. > > otherwise it is impossible to implement sendfile without breaking transport encapsulation > > Really? Can't the user write this themselves? What's wrong with this: > > while True: > data = os.read(fd, 16*1024) > if not data: break > transport.write(data) > > (Perhaps augmented with a way to respond to pause() requests.) > i mean 'os.sendfile()', zero-copy sendfile. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 25 21:04:24 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jan 2013 12:04:24 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim wrote: > > On Jan 25, 2013, at 10:08 AM, Guido van Rossum wrote: > > On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: > >> >> I think Transport needs 'sendfile' api, something like: >> >> @tasks.coroutine >> def sendfile(self, fd, offset, nbytes): >> ?. >> >> otherwise it is impossible to implement sendfile without breaking >> transport encapsulation > > > Really? Can't the user write this themselves? What's wrong with this: > > while True: > data = os.read(fd, 16*1024) > if not data: break > transport.write(data) > > (Perhaps augmented with a way to respond to pause() requests.) > > > i mean 'os.sendfile()', zero-copy sendfile. > I see (http://docs.python.org/dev/library/os.html#os.sendfile). Hm, that function is so platform-specific that we might as well force users to do it this way: sock = transport.get_extra_info("socket") if sock is not None: os.sendfile(sock.fileno(), ......) else: -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fafhrd91 at gmail.com Fri Jan 25 21:25:50 2013 From: fafhrd91 at gmail.com (Nikolay Kim) Date: Fri, 25 Jan 2013 12:25:50 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Jan 25, 2013, at 12:04 PM, Guido van Rossum wrote: > On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim wrote: > > On Jan 25, 2013, at 10:08 AM, Guido van Rossum wrote: > >> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: >> >> I think Transport needs 'sendfile' api, something like: >> >> @tasks.coroutine >> def sendfile(self, fd, offset, nbytes): >> ?. >> >> otherwise it is impossible to implement sendfile without breaking transport encapsulation >> >> Really? Can't the user write this themselves? What's wrong with this: >> >> while True: >> data = os.read(fd, 16*1024) >> if not data: break >> transport.write(data) >> >> (Perhaps augmented with a way to respond to pause() requests.) > > i mean 'os.sendfile()', zero-copy sendfile. > > I see (http://docs.python.org/dev/library/os.html#os.sendfile). > > Hm, that function is so platform-specific that we might as well force users to do it this way: > > sock = transport.get_extra_info("socket") > if sock is not None: > os.sendfile(sock.fileno(), ......) > else: > there should some kind of way to flush write buffer or write callbacks. sock = transport.get_extra_info("socket") if sock is not None: os.sendfile(sock.fileno(), ......) else: yield from transport.write_buffer_flush() -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Jan 25 21:28:28 2013 From: guido at python.org (Guido van Rossum) Date: Fri, 25 Jan 2013 12:28:28 -0800 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: On Fri, Jan 25, 2013 at 12:25 PM, Nikolay Kim wrote: > > On Jan 25, 2013, at 12:04 PM, Guido van Rossum wrote: > > On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim wrote: > >> >> On Jan 25, 2013, at 10:08 AM, Guido van Rossum wrote: >> >> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: >> >>> >>> I think Transport needs 'sendfile' api, something like: >>> >>> @tasks.coroutine >>> def sendfile(self, fd, offset, nbytes): >>> ?. >>> >>> otherwise it is impossible to implement sendfile without breaking >>> transport encapsulation >> >> >> Really? Can't the user write this themselves? What's wrong with this: >> >> while True: >> data = os.read(fd, 16*1024) >> if not data: break >> transport.write(data) >> >> (Perhaps augmented with a way to respond to pause() requests.) >> >> >> i mean 'os.sendfile()', zero-copy sendfile. >> > > I see (http://docs.python.org/dev/library/os.html#os.sendfile). > > Hm, that function is so platform-specific that we might as well force > users to do it this way: > > sock = transport.get_extra_info("socket") > if sock is not None: > os.sendfile(sock.fileno(), ......) > else: > > > > there should some kind of way to flush write buffer or write callbacks. > > sock = transport.get_extra_info("socket") > if sock is not None: > os.sendfile(sock.fileno(), ......) > else: > yield from transport.write_buffer_flush() > > Oh, that's an interesting idea in its own right. But I'm not sure Twisted could implement this given that their flow control works differently. However, I think you've convinced me that offering sendfile() is actually better. But should it take a file descriptor or a stream (file) object? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Fri Jan 25 21:44:14 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Fri, 25 Jan 2013 21:44:14 +0100 Subject: [Python-ideas] PEP 3156: Transport.sendfile In-Reply-To: References: <2BC319B1-026C-4255-B9D9-991A163CED7A@gmail.com> <2E80A011-E5E2-4954-9FAC-B2B0E6CBA509@mac.com> Message-ID: In principle os.sendfile() is not too different than socket.send(): they share the same return value (no. of bytes sent) and errors, hence it's pretty straightforward to implement (the user could even just override Transport.write() him/herself). Nonetheless there are other subtle differences (e.g. it works with regular (mmap-like) files only) so that deciding whether to use send() or sendfile() behind the curtains is not a good idea. Transport class should probably provide a separate method (other than write()). Also, I think that *at this point* thinking about adding sendfile() into Tulip is probably premature. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ 2013/1/25 Nikolay Kim : > > On Jan 25, 2013, at 12:04 PM, Guido van Rossum wrote: > > On Fri, Jan 25, 2013 at 10:11 AM, Nikolay Kim wrote: >> >> >> On Jan 25, 2013, at 10:08 AM, Guido van Rossum wrote: >> >> On Fri, Jan 25, 2013 at 10:03 AM, Nikolay Kim wrote: >>> >>> >>> I think Transport needs 'sendfile' api, something like: >>> >>> @tasks.coroutine >>> def sendfile(self, fd, offset, nbytes): >>> ?. >>> >>> otherwise it is impossible to implement sendfile without breaking >>> transport encapsulation >> >> >> Really? Can't the user write this themselves? What's wrong with this: >> >> while True: >> data = os.read(fd, 16*1024) >> if not data: break >> transport.write(data) >> >> (Perhaps augmented with a way to respond to pause() requests.) >> >> >> i mean 'os.sendfile()', zero-copy sendfile. > > > I see (http://docs.python.org/dev/library/os.html#os.sendfile). > > Hm, that function is so platform-specific that we might as well force users > to do it this way: > > sock = transport.get_extra_info("socket") > if sock is not None: > os.sendfile(sock.fileno(), ......) > else: > > > > there should some kind of way to flush write buffer or write callbacks. > > sock = transport.get_extra_info("socket") > if sock is not None: > os.sendfile(sock.fileno(), ......) > else: > yield from transport.write_buffer_flush() > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From shane at umbrellacode.com Sat Jan 26 12:55:48 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 26 Jan 2013 03:55:48 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301251653.46558.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301251100.31153.mark.hackett@metoffice.gov.uk> <5102B321.8070706@stoneleaf.us> <201301251653.46558.mark.hackett@metoffice.gov.uk> Message-ID: Sorry if this is a dupe?it went to the google groups address the first time around, and I think that's different? > I've been trying to avoid the wrath, but can't any longer. Let me start but clarifying that I know what a dictionary is, how it works, and what Python is, so we can bypass calling that into question. I also know what CSV is, and I've dealt with a lot of real-life examples of CSV data: not just exports from excel, log data from the energy management space, sensor values, etc.; critical electrical fault data generated by very legacy, stupid equipment. And while it's true that a dictionary is a dictionary and it works the way it works, the real point that drives home is that it's an inappropriate mechanism for dealing ordered rows of sequential values. Regardless of what choices were made for the implementation, if the module's name is csv, it should be able to do the things it says it does with any legal CSV content without losing information. Just because its how a dictionary works doesn't mean column 3's value replacing column 1's value is something other than the loss of data. One CSV file I worked with had headers for five columns of information, then the header "VALUE" for every 5 minute period in an hour. Using this CSV parser would leave the client with one sample an hour: how dictionaries work isn't going to bring back 10 values, so information was lost. > > The final point is a simple one: while that CSV file format was stupid, it was perfectly legal. Something that deals with CSV content should not be losing any of its content. It also should [not] be barfing or throwing exceptions, by the way. > And what about fixing it by replacing implementing a class that does it correctly, maps values to column numbers, keeps values as lists modeled after FieldStorage. Make iterating it work just like it does now by replacing the values with the last value in each least before returning it, and provide iterator methods for getting at the new functionality, which includes iterating items with repeating header names in order, etc; and also iter records, or something like that, to iterate the head: [value, ?] maps? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 25, 2013, at 8:53 AM, Mark Hackett wrote: > On Friday 25 Jan 2013, Ethan Furman wrote: >> On 01/25/2013 03:00 AM, Mark Hackett wrote: >>> On Thursday 24 Jan 2013, Steven D'Aprano wrote: >>>> - it is less obvious: how does the caller decide that there are too many >>>> field names? >>> >>> Additionally, the user of the library now has to read much more about the >>> library (either code or documentation, which has to track the code too), >>> to decide what it is going to do. >>> >>> If you have to read the code, then it's not really OO, is it. It's light >>> grey, not black box. >> >> If you have to read the code, the documentation needs improvement. >> > > And if you put your feet too close to the fire, your feet will burn. > > Neither have anything to do with the subject at hand, however. > > Which is if a dictionary acts a certain way and calling a routine that creates > a dictionary AND WORKS DIFFERENTLY, then why did you use a routine that > creates a dictionary? > > You see, the option here is to leave it operating as a dictionary operates. > And in that case, you do not need to document anything. The documentation of > how it works is already covered by the python basics: "How does a dictionary > work in Python?". > > So don't change it, and you don't have to improve the documentation. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen at xemacs.org Sat Jan 26 14:53:53 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Sat, 26 Jan 2013 22:53:53 +0900 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <201301251100.31153.mark.hackett@metoffice.gov.uk> <5102B321.8070706@stoneleaf.us> <201301251653.46558.mark.hackett@metoffice.gov.uk> Message-ID: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > And while it's true that a dictionary is a dictionary and it works > the way it works, the real point that drives home is that it's an > inappropriate mechanism for dealing ordered rows of sequential > values. Right! So use csv.reader, or csv.DictReader with an explicit fieldnames argument. The point of csv.DictReader with default fieldnames is to take a "well-behaved" table and turn it into a sequence of "poor-man's" objects. > The final point is a simple one: while that CSV file format was > stupid, it was perfectly legal. Something that deals with CSV > content should not be losing any of its content. That's a reasonable requirement. > It also should [not] be barfing or throwing exceptions, by the way. That's not. As long as the module provides classes capable of handling any CSV format (it does), it may also provide convenience classes for special purposes with restricted formats. Those classes may throw exceptions on input that doesn't satisfy the restrictions. > And what about fixing it by replacing implementing a class that > does it correctly, [...]? Doesn't help users who want automatically detected access-by-name. They must have unique field names. (I don't have a use case. I assume the implementer of csv.DictReader did.) From vito.detullio at gmail.com Sat Jan 26 13:01:11 2013 From: vito.detullio at gmail.com (Vito De Tullio) Date: Sat, 26 Jan 2013 13:01:11 +0100 Subject: [Python-ideas] complex number and fractional exponent Message-ID: Hi. with python3000 there was a lot of fuzz about pep238 (integer division with float result). Today I was stumble upon a similar behaviour, but I did not found a clear reference on the net, regarding an "extension" of the pep to others mathematical operations / data types >>> (-1)**.5 Traceback (most recent call last): File "", line 1, in ValueError: negative number cannot be raised to a fractional power >>> (-1+0j)**.5 (6.123031769111886e-17+1j) (with is "really close" to 1j) There is some ideas about extending the pow() / ** operator to return complex number when necessary? ATM I don't need to work with complex numbers, nor I have strong opinion on the choice, it's more that I'm curious on why was introduced a so big language difference on division and not extended to power exponentiation. thanks note: at the moment I don't have a python3 executable, but I guess this is applicable to it. -- ZeD From shane at umbrellacode.com Sat Jan 26 15:39:11 2013 From: shane at umbrellacode.com (Shane Green) Date: Sat, 26 Jan 2013 06:39:11 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1358903168.4767.4.camel@webb> <201301251100.31153.mark.hackett@metoffice.gov.uk> <5102B321.8070706@stoneleaf.us> <201301251653.46558.mark.hackett@metoffice.gov.uk> <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Okay, I like your point about DictReader having a place with a subset of CSV tables, and agree that, given that definition, it should throw an exception when its fed something that doesn't conform to this definition. I like that. One thing, though, the new version would let you access column data by name as well: Instead of row["timestamp"] == 1359210019.299478 It would be row["timestamp"] == [1359210019.299478] And potentially row["timestamp"] == [1359210019.299478,1359210019.299478] It could also be accessed as: row.headers[0] == "timestamp" row.headers[1] == "timestamp" row.values[0] == 1359210019.299478 row.values[1] == 1359210019.299478 Could still provide: for name,value in records.iterfirstitems(): # get the first value for each column with a given name. - or - for name,value in records.iterlasttitems(): # get the last value for each column with a given name. And the exact functionality you have now: records.itervaluemaps() # or something? just a map(dict(records.iterlastitesm())) Overkill, but really simple things to add? The only thing this really adds to the "convenience" of the current DictReader for well-behaved tables, is the ability to access values sequentially or by name; other than that, the only difference would be iterating on a generator method's output instead of the instance itself. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 26, 2013, at 5:53 AM, "Stephen J. Turnbull" wrote: > Shane Green writes: > >> And while it's true that a dictionary is a dictionary and it works >> the way it works, the real point that drives home is that it's an >> inappropriate mechanism for dealing ordered rows of sequential >> values. > > Right! So use csv.reader, or csv.DictReader with an explicit > fieldnames argument. > > The point of csv.DictReader with default fieldnames is to take a > "well-behaved" table and turn it into a sequence of "poor-man's" > objects. > >> The final point is a simple one: while that CSV file format was >> stupid, it was perfectly legal. Something that deals with CSV >> content should not be losing any of its content. > > That's a reasonable requirement. > >> It also should [not] be barfing or throwing exceptions, by the way. > > That's not. As long as the module provides classes capable of > handling any CSV format (it does), it may also provide convenience > classes for special purposes with restricted formats. Those classes > may throw exceptions on input that doesn't satisfy the restrictions. > >> And what about fixing it by replacing implementing a class that >> does it correctly, [...]? > > Doesn't help users who want automatically detected access-by-name. > They must have unique field names. (I don't have a use case. I > assume the implementer of csv.DictReader did.) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Jan 26 16:01:31 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jan 2013 01:01:31 +1000 Subject: [Python-ideas] complex number and fractional exponent In-Reply-To: References: Message-ID: On Sat, Jan 26, 2013 at 10:01 PM, Vito De Tullio wrote: > There is some ideas about extending the pow() / ** operator to return > complex number when necessary? > > ATM I don't need to work with complex numbers, nor I have strong opinion on > the choice, it's more that I'm curious on why was introduced a so big > language difference on division and not extended to power exponentiation. Python 3.2.3 (default, Jun 8 2012, 05:36:09) [GCC 4.7.0 20120507 (Red Hat 4.7.0-5)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> (-1) ** 0.5 (6.123031769111886e-17+1j) >>> pow(-1, 0.5) (6.123031769111886e-17+1j) The math module is still deliberately restricted to float results, though, as that module is intended to be a reasonably thin wrapper around the platform floating point support. The cmath module and the builtin pow are available if support for complex results is needed. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Sat Jan 26 17:01:14 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 26 Jan 2013 18:01:14 +0200 Subject: [Python-ideas] complex number and fractional exponent In-Reply-To: References: Message-ID: On 26.01.13 14:01, Vito De Tullio wrote: > note: at the moment I don't have a python3 executable, but I guess this is > applicable to it. No, it isn't. >>> (-1)**0.5 (6.123031769111886e-17+1j) From dustin at v.igoro.us Sat Jan 26 19:37:22 2013 From: dustin at v.igoro.us (Dustin J. Mitchell) Date: Sat, 26 Jan 2013 13:37:22 -0500 Subject: [Python-ideas] PEP 3156 - Coroutines are more better In-Reply-To: <51009907.8030404@canterbury.ac.nz> References: <51009907.8030404@canterbury.ac.nz> Message-ID: On Wed, Jan 23, 2013 at 9:14 PM, Greg Ewing wrote: > I think I'm going to wait and see what the coroutine-level features > of tulip turn out to be like before saying much more. I think this is pretty smart, actually. Deferreds, futures, promises, etc. give the programmer a lot of rope. They don't require classical models of control flow, in particular. That's cool, but tends to lead to code with subtle bugs. Coroutines re-introduce just enough structure to put programmers back in comfortable territory for verifying correctness. This ends up looking a bit like threads, but with less concern for synchronization primitives, and virtually-free cloning. Dustin From tjreedy at udel.edu Sat Jan 26 20:09:44 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 26 Jan 2013 14:09:44 -0500 Subject: [Python-ideas] complex number and fractional exponent In-Reply-To: References: Message-ID: On 1/26/2013 7:01 AM, Vito De Tullio wrote: > There is some ideas about extending the pow() / ** operator to return > complex number when necessary? As other have noted, it was. But the analogy with division is not exact. The new // operator was added for when one wants floor(a/b). For sqrt, one also has a choice and always has since complexes were added. >>> import math, cmath >>> math.sqrt(-1) Traceback (most recent call last): File "", line 1, in math.sqrt(-1) ValueError: math domain error >>> cmath.sqrt(-1) 1j For many purposes, such as computing standard deviation in statistics, one wants the exception, as negative variance indicates a calculation error. -- Terry Jan Reedy From oscar.j.benjamin at gmail.com Sat Jan 26 22:22:22 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sat, 26 Jan 2013 21:22:22 +0000 Subject: [Python-ideas] complex number and fractional exponent In-Reply-To: References: Message-ID: On 26 January 2013 19:09, Terry Reedy wrote: > > For sqrt, one also has a choice and always has since complexes were added. >>>> import math, cmath >>>> math.sqrt(-1) > > Traceback (most recent call last): > File "", line 1, in > math.sqrt(-1) > ValueError: math domain error >>>> cmath.sqrt(-1) > 1j Why does cmath.sqrt give a different value from the __pow__ version? ~$ python3 Python 3.2.3 (default, Oct 19 2012, 19:53:16) [GCC 4.7.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import cmath >>> cmath.sqrt(-1) 1j >>> (-1) ** .5 (6.123031769111886e-17+1j) Oscar From nbvfour at gmail.com Sun Jan 27 00:27:39 2013 From: nbvfour at gmail.com (nbv4) Date: Sat, 26 Jan 2013 15:27:39 -0800 (PST) Subject: [Python-ideas] built-in argspec for function objects Message-ID: def my_function(a, b=c): pass >>> my_function.args ['a'] >>> my_function.kwargs {'b': c} >>> my_function.all_args ['a', 'b'] What do you all think? Argspec is kind of unwieldy, this I think looks a lot nicer. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Sun Jan 27 01:09:15 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Sun, 27 Jan 2013 00:09:15 +0000 Subject: [Python-ideas] built-in argspec for function objects In-Reply-To: References: Message-ID: On 26 January 2013 23:27, nbv4 wrote: > def my_function(a, b=c): > pass > >>>> my_function.args > ['a'] I would have expected 'b' to be in that list. >>>> my_function.kwargs > {'b': c} I would have expected this to be a boolean indicating whether or not the function accepts **kwargs. >>>> my_function.all_args > ['a', 'b'] Rather than args and all_args, I would perhaps have used something more specific like required_positional_args and positional_args > What do you all think? Argspec is kind of unwieldy, this I think looks a lot > nicer. Are you aware of PEP-362? http://www.python.org/dev/peps/pep-0362/ I think that inspecting the arguments of a function is a relatively uncommon thing to do, so I don't really have any problem with the fact that all the code to do it is located in a special module (inspect). Oscar From tjreedy at udel.edu Sun Jan 27 02:51:39 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Sat, 26 Jan 2013 20:51:39 -0500 Subject: [Python-ideas] built-in argspec for function objects In-Reply-To: References: Message-ID: On 1/26/2013 6:27 PM, nbv4 wrote: > def my_function(a, b=c): > pass > > >>> my_function.args > ['a'] > >>> my_function.kwargs > {'b': c} > >>> my_function.all_args > ['a', 'b'] > > What do you all think? Argspec is kind of unwieldy, this I think looks a > lot nicer. The argument specifications are attributes of code objects. The inspect functions, including the new signature function, are one way to pull them out. You can write you own if you wish. I do not think they should be duplicated on function objects. -- Terry Jan Reedy From python at mrabarnett.plus.com Sun Jan 27 04:03:49 2013 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 27 Jan 2013 03:03:49 +0000 Subject: [Python-ideas] Interrupting threads Message-ID: <51049915.3060808@mrabarnett.plus.com> I know that this topic has been discussed before, but I've added a new twist... It's possible to interrupt the main thread using KeyboardInterrupt, so why shouldn't it be possible to do something similar to a thread? What I'm suggesting is that the Thread class could support an 'interrupt' method, which would raise a ThreadInterrupt exception similar to KeyboardInterrupt. Actually, there's more to it than that because sometimes you don't want a section of code to be interrupted. So here's what I'd like to suggest: 1. There's a private thread-specific flag called 'interrupt_occurred'. 2. There's a private thread-specific flag called 'heeding_interrupt'. 3. There's a context manager called 'heed_interrupt'. About the context manager: 1. It accepts a bool argument. 2. On entry, it saves 'heeding_interrupt' and sets it to the argument. 3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True. 4. On exit, it restores 'heeding_interrupt'. 5. On restoring 'heeding_interrupt', if it's now True and 'interrupt_occurred' is also True, it raises (or re-raises) ThreadInterrupt. Here are some examples which I hope will make things bit clearer (although it's still somewhat involved!): Example 1: with heed_interrupt(False): # some code Behaviour: On entry, the context manager saves heeding_interrupt and sets it to False. If an interrupt occurs in "some code", interrupt_occurred will be set to True (because heeding_interrupt is currently False). On exit, the context manager restores heeding_interrupt. If heeding_interrupt is now True and interrupt_occurred is also True, it raises ThreadInterrupt. Example 2: with heed_interrupt(False): # some code 1 with heed_interrupt(True): # some code 2 # some code 3 Behaviour: On entry, the outer context manager saves heeding_interrupt and sets it to False. If an interrupt occurs in "some code 1", interrupt_occurred will be set to True (because heeding_interrupt is currently False). On entry, the inner context manager saves heeding_interrupt and sets it to True. If an interrupt occurs in "some code 2" , ThreadInterrupt will be raised (because heeding_interrupt is currently True). The exception will be propagated until it's caught by the inner context manager, which will then set interrupt_occurred to True. (It's also possible that interrupt_occurred will already be True on entry because an interrupt occurred in "some code 1"; in that case, it may short-circuit, skipping "some code 2" completely.) On exit, the inner contact manager restores heeding_interrupt and restores it to False. If an interrupt occurs in "some code 3", interrupt_occurred will be set to True (because heeding_interrupt is currently False). On exit, the outer context manager restores heeding_interrupt and restores it. If heeding_interrupt is now True, it will raise ThreadInterrupt. From andre.roberge at gmail.com Sun Jan 27 04:12:39 2013 From: andre.roberge at gmail.com (Andre Roberge) Date: Sat, 26 Jan 2013 23:12:39 -0400 Subject: [Python-ideas] Interrupting threads In-Reply-To: <51049915.3060808@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> Message-ID: On Sat, Jan 26, 2013 at 11:03 PM, MRAB wrote: > I know that this topic has been discussed before, but I've added a new > twist... > > It's possible to interrupt the main thread using KeyboardInterrupt, so > why shouldn't it be possible to do something similar to a thread? > How about simply using http://code.activestate.com/recipes/496960/ ? Andr? > > What I'm suggesting is that the Thread class could support an > 'interrupt' method, which would raise a ThreadInterrupt exception > similar to KeyboardInterrupt. > > Actually, there's more to it than that because sometimes you don't want > a section of code to be interrupted. > > So here's what I'd like to suggest: > > 1. There's a private thread-specific flag called 'interrupt_occurred'. > > 2. There's a private thread-specific flag called 'heeding_interrupt'. > > 3. There's a context manager called 'heed_interrupt'. > > > About the context manager: > > 1. It accepts a bool argument. > > 2. On entry, it saves 'heeding_interrupt' and sets it to the argument. > > 3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True. > > 4. On exit, it restores 'heeding_interrupt'. > > 5. On restoring 'heeding_interrupt', if it's now True and > 'interrupt_occurred' is also True, it raises (or re-raises) > ThreadInterrupt. > > > Here are some examples which I hope will make things bit clearer > (although it's still somewhat involved!): > > > Example 1: > > with heed_interrupt(False): > # some code > > Behaviour: > > On entry, the context manager saves heeding_interrupt and sets it to > False. > > If an interrupt occurs in "some code", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On exit, the context manager restores heeding_interrupt. If > heeding_interrupt is now True and interrupt_occurred is also True, it > raises ThreadInterrupt. > > > Example 2: > > with heed_interrupt(False): > # some code 1 > with heed_interrupt(True): > # some code 2 > # some code 3 > > Behaviour: > > On entry, the outer context manager saves heeding_interrupt and sets it > to False. > > If an interrupt occurs in "some code 1", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On entry, the inner context manager saves heeding_interrupt and sets it > to True. > > If an interrupt occurs in "some code 2" , ThreadInterrupt will be > raised (because heeding_interrupt is currently True). The exception > will be propagated until it's caught by the inner context manager, > which will then set interrupt_occurred to True. > > (It's also possible that interrupt_occurred will already be True on > entry because an interrupt occurred in "some code 1"; in that case, it > may short-circuit, skipping "some code 2" completely.) > > On exit, the inner contact manager restores heeding_interrupt and > restores it to False. > > If an interrupt occurs in "some code 3", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On exit, the outer context manager restores heeding_interrupt and > restores it. If heeding_interrupt is now True, it will raise > ThreadInterrupt. > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 27 04:33:14 2013 From: guido at python.org (Guido van Rossum) Date: Sat, 26 Jan 2013 19:33:14 -0800 Subject: [Python-ideas] Interrupting threads In-Reply-To: <51049915.3060808@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> Message-ID: This won't be very useful unless you also find a way to immediately stop threads that are blocked, (1) for I/O (which may never happen), or (2) for a lock. This most likely will mean mucking with signals. On Sat, Jan 26, 2013 at 7:03 PM, MRAB wrote: > I know that this topic has been discussed before, but I've added a new > twist... > > It's possible to interrupt the main thread using KeyboardInterrupt, so > why shouldn't it be possible to do something similar to a thread? > > What I'm suggesting is that the Thread class could support an > 'interrupt' method, which would raise a ThreadInterrupt exception > similar to KeyboardInterrupt. > > Actually, there's more to it than that because sometimes you don't want > a section of code to be interrupted. > > So here's what I'd like to suggest: > > 1. There's a private thread-specific flag called 'interrupt_occurred'. > > 2. There's a private thread-specific flag called 'heeding_interrupt'. > > 3. There's a context manager called 'heed_interrupt'. > > > About the context manager: > > 1. It accepts a bool argument. > > 2. On entry, it saves 'heeding_interrupt' and sets it to the argument. > > 3. It catches ThreadInterrupt and sets 'interrupt_occurred' to True. > > 4. On exit, it restores 'heeding_interrupt'. > > 5. On restoring 'heeding_interrupt', if it's now True and > 'interrupt_occurred' is also True, it raises (or re-raises) > ThreadInterrupt. > > > Here are some examples which I hope will make things bit clearer > (although it's still somewhat involved!): > > > Example 1: > > with heed_interrupt(False): > # some code > > Behaviour: > > On entry, the context manager saves heeding_interrupt and sets it to > False. > > If an interrupt occurs in "some code", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On exit, the context manager restores heeding_interrupt. If > heeding_interrupt is now True and interrupt_occurred is also True, it > raises ThreadInterrupt. > > > Example 2: > > with heed_interrupt(False): > # some code 1 > with heed_interrupt(True): > # some code 2 > # some code 3 > > Behaviour: > > On entry, the outer context manager saves heeding_interrupt and sets it > to False. > > If an interrupt occurs in "some code 1", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On entry, the inner context manager saves heeding_interrupt and sets it > to True. > > If an interrupt occurs in "some code 2" , ThreadInterrupt will be > raised (because heeding_interrupt is currently True). The exception > will be propagated until it's caught by the inner context manager, > which will then set interrupt_occurred to True. > > (It's also possible that interrupt_occurred will already be True on > entry because an interrupt occurred in "some code 1"; in that case, it > may short-circuit, skipping "some code 2" completely.) > > On exit, the inner contact manager restores heeding_interrupt and > restores it to False. > > If an interrupt occurs in "some code 3", interrupt_occurred will be set > to True (because heeding_interrupt is currently False). > > On exit, the outer context manager restores heeding_interrupt and > restores it. If heeding_interrupt is now True, it will raise > ThreadInterrupt. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From ncoghlan at gmail.com Sun Jan 27 06:26:30 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 27 Jan 2013 15:26:30 +1000 Subject: [Python-ideas] built-in argspec for function objects In-Reply-To: References: Message-ID: On Sun, Jan 27, 2013 at 3:24 PM, Nick Coghlan wrote: > On Sun, Jan 27, 2013 at 9:27 AM, nbv4 wrote: >> def my_function(a, b=c): >> pass >> >>>>> my_function.args >> ['a'] >>>>> my_function.kwargs >> {'b': c} >>>>> my_function.all_args >> ['a', 'b'] > > Those are parameters, not arguments (see > http://docs.python.org/3/faq/programming.html#faq-argument-vs-parameter) > >> What do you all think? Argspec is kind of unwieldy, this I think looks a lot >> nicer. > > Indeed, argspec is unwieldy, which is why Python 3.3 includes the new > inspect.signature function to calculate a richer signature > representation that is easier to process: > http://docs.python.org/3/library/inspect#introspecting-callables-with-the-signature-object > > Aaron Illes has backported this feature to earlier Python versions as > the "funcsigs" PyPI package: http://pypi.python.org/pypi/funcsigs > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ubershmekel at gmail.com Sun Jan 27 09:16:14 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 27 Jan 2013 10:16:14 +0200 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: Message-ID: Sorry for the delay, it took me a while to read http://code.google.com/p/tulip/source/browse/ and wrap my head around it. On Thu, Jan 24, 2013 at 8:50 PM, Guido van Rossum wrote: > What other things might you want to do with the socket besides calling > getpeername() or getsockname()? >From http://en.wikipedia.org/wiki/Berkeley_sockets#Options_for_sockets > Options for sockets > > After creating a socket, it is possible to set options on it. Some of the more common options are: > > TCP_NODELAY disables the Nagle algorithm. > SO_KEEPALIVE enables periodic 'liveness' pings, if supported by the OS. Though these may not be the concern of a protocol as defined by PEP 3156. Would that be reasonable to expect > from a protocol written to be independent of the specific transport > type? > > Most protocols should be written independent of transport. But it seems to me that a user might write an entire app as a "protocol". Yuval Greenfield -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Sun Jan 27 09:58:03 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Sun, 27 Jan 2013 09:58:03 +0100 Subject: [Python-ideas] Interrupting threads In-Reply-To: <51049915.3060808@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> Message-ID: > It's possible to interrupt the main thread using KeyboardInterrupt, so > why shouldn't it be possible to do something similar to a thread? Because it's unsafe. Allowing asynchronous interruptions at any point in the code is calling for trouble: in a multi-threaded program, if you interrupt a thread in the middle of a critical section, there's a high chance that the invariants protected in this critical section won't hold. So basically, the object/structure will be in an unusable state, which will lead to random failures at some point in the future. > Actually, there's more to it than that because sometimes you don't want > a section of code to be interrupted. Actually it's exactly the opposite: you only want to handle interruption at very specific points in the code, so that the rollback and interruption logic is tractable. Also, as noted by Guido, it's basically useless because neither sleep() nor lock acquisition can be interrupted - at least in the current implementation - and those are likely the calls you'd like to interrupt. FWIW, Java has a Thread.Stop() method that more or less does what you're suggesting. It was quickly depreciated because it's inherently unsafe: the right way to do it is through a cooperative form of interruption, with an interruption exception that can be thrown at specific points in the code (and a per-thread interrupt status flag that can be checked explicitly, and which is checked implicitly when entering an interruptible method). See the rationale here: http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html > So here's what I'd like to suggest: > > 1. There's a private thread-specific flag called 'interrupt_occurred'. > > 2. There's a private thread-specific flag called 'heeding_interrupt'. > > 3. There's a context manager called 'heed_interrupt'. I'm not a native speaker, and I had never heard about the 'heed' verb before, had to look it up in the dictionary :-) From dickinsm at gmail.com Sun Jan 27 12:04:17 2013 From: dickinsm at gmail.com (Mark Dickinson) Date: Sun, 27 Jan 2013 11:04:17 +0000 Subject: [Python-ideas] complex number and fractional exponent In-Reply-To: References: Message-ID: On Sat, Jan 26, 2013 at 9:22 PM, Oscar Benjamin wrote: > Why does cmath.sqrt give a different value from the __pow__ version? > > ~$ python3 > Python 3.2.3 (default, Oct 19 2012, 19:53:16) > [GCC 4.7.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import cmath >>>> cmath.sqrt(-1) > 1j >>>> (-1) ** .5 > (6.123031769111886e-17+1j) Because they use different algorithms. pow(x, y) essentially computes exp(y * log(x)). That involves a number of steps, any of which can introduce small errors. cmath.sqrt can use a more specific (and usually more accurate) algorithm. Moral: use cmath.sqrt and math.sqrt for computing square roots, rather than x ** 0.5. Mark From solipsis at pitrou.net Sun Jan 27 12:21:21 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jan 2013 12:21:21 +0100 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport References: Message-ID: <20130127122121.6b779ada@pitrou.net> On Sun, 27 Jan 2013 10:16:14 +0200 Yuval Greenfield wrote: > From http://en.wikipedia.org/wiki/Berkeley_sockets#Options_for_sockets > > > Options for sockets > > > > After creating a socket, it is possible to set options on it. Some of the > more common options are: > > > > TCP_NODELAY disables the Nagle algorithm. > > SO_KEEPALIVE enables periodic 'liveness' pings, if supported by the OS. > > Though these may not be the concern of a protocol as defined by PEP 3156. How about e.g. TCP_CORK? > > Would that be reasonable to expect > > from a protocol written to be independent of the specific transport > > type? > > > > > Most protocols should be written independent of transport. But it seems to > me that a user might write an entire app as a "protocol". Well, such an assumption can fall flat. For example, certificate checking in HTTPS expects that the transport is some version of TLS or SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1 Regards Antoine. From solipsis at pitrou.net Sun Jan 27 13:16:37 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 27 Jan 2013 13:16:37 +0100 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> Message-ID: <1359288997.3488.2.camel@localhost.localdomain> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit : > On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou > wrote: > > Most protocols should be written independent of transport. > But it seems to > > me that a user might write an entire app as a "protocol". > > > Well, such an assumption can fall flat. For example, > certificate > checking in HTTPS expects that the transport is some version > of TLS or > SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1 > > > I'm not sure I understood your reply. You'd be for an api that exposes > the underlying transport? I meant to say that "an entire app" entails > control over the subtleties of the underlying transport. What I meant is that the HTTP protocol needs to know that it is running over a secure transport, and it needs to fetch the server certificate from that transport (or, alternatively, it needs to have one of its callbacks called by the transport when the certificate is known). That's not entirely transport-agnostic. Regards Antoine. From shane at umbrellacode.com Sun Jan 27 15:10:49 2013 From: shane at umbrellacode.com (Shane Green) Date: Sun, 27 Jan 2013 06:10:49 -0800 Subject: [Python-ideas] Fwd: csv.DictReader could handle headers more intelligently. References: Message-ID: <6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com> Something as simple as this (straw man) demonstrates what I mean: > class Record(defaultdict): > def __init__(self, headers, fields): > super(Record, self).__init__(list) > self.headers = headers > self.fields = fields > map(self.enter, self.headers, self.fields) > def valuemap(self, first=False): > index = 0 if first else -1 > return dict([(key,values[index]) for key,values in self.items()]) > def enter(self, header, *values): > if isinstance(header, int): > header = self.headers[header] > self[header].extend(values) > def itemseq(self): > return zip(self.headers,self.fields) > def __getitem__(self, spec): > if isinstance(spec, int): > return self.fields[spec] > return super(Record, self).__getitem__(spec) > def __getslice__(self, *args): > return self.fields.__getslice__(*args) > This would let you access column values using header names, just like before. Each column's value(s) is now in a list, and would contain multiple values anytime for any column repeated more than once in the header. Values can also be accessed sequentially using integer indexes, and the valuemap() returns a standard dictionary that conforms to the previous standard exactly: there is a one to one mapping between column headers and values, which the last value associated with a given column name being the value. While I think the changes should be added without changing what exists for backward compatibility reasons, I've started to think the existing version should also be deprecated, rather than maintained as a special case. Even when the format is perfect for the existing code, I don't see any big advantages to using it over this approach. Keep in mind the example is just a quick straw man: performance is a big difference (and plenty of bugs), but that doesn't seem like the right thing to base the decision, as performance can easily be enhanced later. In summary, given headers: A, B, C, D, E, B, G record.headers == ["A", "B", "C", "D", "E", "B", "G"] record.fields = [0, 1, 2, 3, 4, 5, 6, 7] record["A"] == [0] record["B"] == [1, 5] # Note sequential access values are not in lists, and the second "B" column's value 5 is in it's original 5th position. record[0] == 0 record[1] ==1 record[2] == 2 record[3] == 3 record[4] == 4 record[5] == 5 record.items() == [("A", [0]), ("B", [1, 5)), ?] record.valuemap() == {"A": 0, "B": 5, ?} # This returns exactly what DictReader does today, a single value per named column, with the last value being the one used. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com Begin forwarded message: > From: Shane Green > Subject: Re: [Python-ideas] csv.DictReader could handle headers more intelligently. > Date: January 26, 2013 6:39:11 AM PST > To: "Stephen J. Turnbull" > Cc: python-ideas at python.org > > Okay, I like your point about DictReader having a place with a subset of CSV tables, and agree that, given that definition, it should throw an exception when its fed something that doesn't conform to this definition. I like that. > > One thing, though, the new version would let you access column data by name as well: > > Instead of > row["timestamp"] == 1359210019.299478 > > It would be > row["timestamp"] == [1359210019.299478] > > And potentially > row["timestamp"] == [1359210019.299478,1359210019.299478] > > It could also be accessed as: > row.headers[0] == "timestamp" > row.headers[1] == "timestamp" > row.values[0] == 1359210019.299478 > row.values[1] == 1359210019.299478 > > Could still provide: > for name,value in records.iterfirstitems(): # get the first value for each column with a given name. > - or - > for name,value in records.iterlasttitems(): # get the last value for each column with a given name. > > And the exact functionality you have now: > records.itervaluemaps() # or something? just a map(dict(records.iterlastitesm())) > > Overkill, but really simple things to add? > > The only thing this really adds to the "convenience" of the current DictReader for well-behaved tables, is the ability to access values sequentially or by name; other than that, the only difference would be iterating on a generator method's output instead of the instance itself. > > > > > Shane Green > www.umbrellacode.com > 408-692-4666 | shane at umbrellacode.com > > On Jan 26, 2013, at 5:53 AM, "Stephen J. Turnbull" wrote: > >> Shane Green writes: >> >>> And while it's true that a dictionary is a dictionary and it works >>> the way it works, the real point that drives home is that it's an >>> inappropriate mechanism for dealing ordered rows of sequential >>> values. >> >> Right! So use csv.reader, or csv.DictReader with an explicit >> fieldnames argument. >> >> The point of csv.DictReader with default fieldnames is to take a >> "well-behaved" table and turn it into a sequence of "poor-man's" >> objects. >> >>> The final point is a simple one: while that CSV file format was >>> stupid, it was perfectly legal. Something that deals with CSV >>> content should not be losing any of its content. >> >> That's a reasonable requirement. >> >>> It also should [not] be barfing or throwing exceptions, by the way. >> >> That's not. As long as the module provides classes capable of >> handling any CSV format (it does), it may also provide convenience >> classes for special purposes with restricted formats. Those classes >> may throw exceptions on input that doesn't satisfy the restrictions. >> >>> And what about fixing it by replacing implementing a class that >>> does it correctly, [...]? >> >> Doesn't help users who want automatically detected access-by-name. >> They must have unique field names. (I don't have a use case. I >> assume the implementer of csv.DictReader did.) >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Jan 27 17:28:50 2013 From: guido at python.org (Guido van Rossum) Date: Sun, 27 Jan 2013 08:28:50 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: <1359288997.3488.2.camel@localhost.localdomain> References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou wrote: > Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit : >> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou >> wrote: >> > Most protocols should be written independent of transport. >> But it seems to >> > me that a user might write an entire app as a "protocol". >> >> >> Well, such an assumption can fall flat. For example, >> certificate >> checking in HTTPS expects that the transport is some version >> of TLS or >> SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1 >> >> >> I'm not sure I understood your reply. You'd be for an api that exposes >> the underlying transport? I meant to say that "an entire app" entails >> control over the subtleties of the underlying transport. > > What I meant is that the HTTP protocol needs to know that it is running > over a secure transport, and it needs to fetch the server certificate > from that transport (or, alternatively, it needs to have one of its > callbacks called by the transport when the certificate is known). That's > not entirely transport-agnostic. Yeah, it sounds like in the end having access to the socket itself (if there is one) may be necessary. I suppose there are a number of different ways to handle that specific use case, but it seems clear that we can't anticipate all use cases. I'd rather have a simpler abstraction with an escape hatch than attempting to codify more use cases into the abstraction. We can always iterate on the design after Python 3.4, if there's a useful generalization we didn't anticipate. -- --Guido van Rossum (python.org/~guido) From shane at umbrellacode.com Sun Jan 27 18:11:05 2013 From: shane at umbrellacode.com (Umbrella Code) Date: Sun, 27 Jan 2013 09:11:05 -0800 Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name from the transport References: Message-ID: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way? Sent from my iPad Begin forwarded message: > From: Umbrella Code > Date: January 27, 2013, 9:06:48 AM PST > To: Guido van Rossum > Subject: Re: [Python-ideas] PEP 3156: getting the socket or peer name from the transport > > It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way? > > > Sent from my iPad > > On Jan 27, 2013, at 8:28 AM, Guido van Rossum wrote: > >> On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou wrote: >>> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit : >>>> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou >>>> wrote: >>>>> Most protocols should be written independent of transport. >>>> But it seems to >>>>> me that a user might write an entire app as a "protocol". >>>> >>>> >>>> Well, such an assumption can fall flat. For example, >>>> certificate >>>> checking in HTTPS expects that the transport is some version >>>> of TLS or >>>> SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1 >>>> >>>> >>>> I'm not sure I understood your reply. You'd be for an api that exposes >>>> the underlying transport? I meant to thsay that "an entire app" entails >>>> control over the subtleties of the underlying transport. >>> What I meant is that the HTTP protocol needs to know that it is running >>> over a secure transport, and it needs to fetch the server certificate >>> from that transport (or, alternatively, it needs to have one of its >>> callbacks called by the transport when the certificate is known). That's >>> not entirely transport-agnostic. >> >> Yeah, it sounds like in the end having access to the socket itself (if >> there is one) may be necessary. I suppose there are a number of >> different ways to handle that specific use case, but it seems clear >> that we can't anticipate all use cases. I'd rather have a simpler >> abstraction with an escape hatch than attempting to codify more use >> cases into the abstraction. We can always iterate on the design after >> Python 3.4, if there's a useful generalization we didn't anticipate. >> >> -- >> --Guido van Rossum (python.org/~guido) >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Sun Jan 27 18:41:28 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Sun, 27 Jan 2013 19:41:28 +0200 Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name from the transport In-Reply-To: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com> References: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com> Message-ID: On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code wrote: > It's been a few years so my memory must be rusty, but where is the https > protocol dependent on the transport/SSL setup in that way? > > Sent from my iPad > > Begin forwarded message: > > I can't speak for Antoine but I'm guessing he's talking about SNI: * a VPS server hosts 2 sites with 2 certificates for "mysite.com" and " yoursite.com" * the original TCP server has no idea which cert to use as both sites share the same IP address and port. * the solution is the client sends the hostname in the TLS handshake. So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer. This example agrees with Antoine but it's in the reverse direction, so maybe he has another one in mind. http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers http://en.wikipedia.org/wiki/HTTP_Secure#Limitations http://en.wikipedia.org/wiki/Server_Name_Indication Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sun Jan 27 18:51:32 2013 From: shane at umbrellacode.com (Umbrella Code) Date: Sun, 27 Jan 2013 09:51:32 -0800 Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com> Message-ID: Thanks Yuval, that's a good example and explanation. Sent from my iPad On Jan 27, 2013, at 9:41 AM, Yuval Greenfield wrote: > On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code wrote: >> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way? >> >> Sent from my iPad >> >> Begin forwarded message: > > I can't speak for Antoine but I'm guessing he's talking about SNI: > > * a VPS server hosts 2 sites with 2 certificates for "mysite.com" and "yoursite.com" > * the original TCP server has no idea which cert to use as both sites share the same IP address and port. > * the solution is the client sends the hostname in the TLS handshake. > > So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer. This example agrees with Antoine but it's in the reverse direction, so maybe he has another one in mind. > > http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers > http://en.wikipedia.org/wiki/HTTP_Secure#Limitations > http://en.wikipedia.org/wiki/Server_Name_Indication > > Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Sun Jan 27 19:15:31 2013 From: shane at umbrellacode.com (Umbrella Code) Date: Sun, 27 Jan 2013 10:15:31 -0800 Subject: [Python-ideas] Fwd: PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <39BC9611-EB71-4749-AB8C-C6B64F7928D5@umbrellacode.com> Message-ID: <36DF0DF7-1F91-47EB-8C93-2F7A7DFD8EE0@umbrellacode.com> Could it be handled as a context given to the protocol, and maybe accommodate the other information we'd been discussing? Ultimately the socket could also be part of the context information available as the escape hatch, but generally pre-populated to buffer from hardware. It could include address information, SSL data assigned by the server, etc. Populating it at the right places could also be more efficient. Sent from my iPad On Jan 27, 2013, at 9:41 AM, Yuval Greenfield wrote: > On Sun, Jan 27, 2013 at 7:11 PM, Umbrella Code wrote: >> It's been a few years so my memory must be rusty, but where is the https protocol dependent on the transport/SSL setup in that way? >> >> Sent from my iPad >> >> Begin forwarded message: > > I can't speak for Antoine but I'm guessing he's talking about SNI: > > * a VPS server hosts 2 sites with 2 certificates for "mysite.com" and "yoursite.com" > * the original TCP server has no idea which cert to use as both sites share the same IP address and port. > * the solution is the client sends the hostname in the TLS handshake. > > So the DNS or HTTP line "host: mysite.com" is also used in the TLS layer. This example agrees with Antoine but it's in the reverse direction, so maybe he has another one in mind. > > http://en.wikipedia.org/wiki/Transport_Layer_Security#Support_for_name-based_virtual_servers > http://en.wikipedia.org/wiki/HTTP_Secure#Limitations > http://en.wikipedia.org/wiki/Server_Name_Indication > > Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at zip.com.au Sun Jan 27 22:04:15 2013 From: cs at zip.com.au (Cameron Simpson) Date: Mon, 28 Jan 2013 08:04:15 +1100 Subject: [Python-ideas] Interrupting threads In-Reply-To: References: Message-ID: <20130127210415.GA14691@cskk.homeip.net> On 27Jan2013 09:58, Charles-Fran?ois Natali wrote: | > It's possible to interrupt the main thread using KeyboardInterrupt, so | > why shouldn't it be possible to do something similar to a thread? | | Because it's unsafe. But the same can easily be true of a KeyboardInterrupt in the main thread in any multithreaded program. | Allowing asynchronous interruptions at any point in the code is | calling for trouble: in a multi-threaded program, if you interrupt a | thread in the middle of a critical section, there's a high chance that | the invariants protected in this critical section won't hold. So | basically, the object/structure will be in an unusable state, which | will lead to random failures at some point in the future. This is true if any other exception is raised also. MRAB's suggestion turns a thread interrupt into an exception, with some control for ignoring-but-detecting the exception around some places. | > Actually, there's more to it than that because sometimes you don't want | > a section of code to be interrupted. | | Actually it's exactly the opposite: you only want to handle | interruption at very specific points in the code, so that the rollback | and interruption logic is tractable. That would amount to running the whole thread inside his context manager and polling the interrupt_occurred flag regularly. | Also, as noted by Guido, it's basically useless because neither | sleep() nor lock acquisition can be interrupted - at least in the | current implementation - and those are likely the calls you'd like to | interrupt. Sure. | FWIW, Java has a Thread.Stop() method that more or less does what | you're suggesting. It was quickly depreciated because it's inherently | unsafe: the right way to do it is through a cooperative form of | interruption, with an interruption exception that can be thrown at | specific points in the code (and a per-thread interrupt status flag | that can be checked explicitly, and which is checked implicitly when | entering an interruptible method). | See the rationale here: | http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html | | > So here's what I'd like to suggest: | > 1. There's a private thread-specific flag called 'interrupt_occurred'. | > 2. There's a private thread-specific flag called 'heeding_interrupt'. | > 3. There's a context manager called 'heed_interrupt'. On this basis, I'd be inclined to cast MRAB's suggestion as giving every Thread object a cancel() method. When heeding_interrupt is True, raise ThreadInterrupt. When heeding_interrupt is False, set interrupt_occurred. In fact, I'd change the word "Interrupt" to "Cancellation", and name the flags thread_cancelled, thread_heed_cancel, and name the exception "ThreadCancelled". This only slightly changes the semantics and makes more clear the notion that the cancellation may be deferred (eg when I/O blocked, etc). That lets threads poll the thread_cancelled flag for cooperative behaviour and still provides an exception based method for situations where it is suitable. | I'm not a native speaker, and I had never heard about the 'heed' verb | before, had to look it up in the dictionary :-) It's in common use, and not obscure. I am a native speaker, and see no problem with it. Long standing word with a well known and defined meaning. Cheers, -- Cameron Simpson Rule #1: Never sell a Ducati. Rule #2: Always obey Rule #1. - Godfrey DiGiorgi - ramarren at apple.com - DoD #0493 From scott+python-ideas at scottdial.com Sun Jan 27 23:17:14 2013 From: scott+python-ideas at scottdial.com (Scott Dial) Date: Sun, 27 Jan 2013 17:17:14 -0500 Subject: [Python-ideas] Interrupting threads In-Reply-To: <20130127210415.GA14691@cskk.homeip.net> References: <20130127210415.GA14691@cskk.homeip.net> Message-ID: <5105A76A.5020703@scottdial.com> On 1/27/2013 4:04 PM, Cameron Simpson wrote: > | I'm not a native speaker, and I had never heard about the 'heed' verb > | before, had to look it up in the dictionary :-) > > It's in common use, and not obscure. I am a native speaker, and see no > problem with it. Long standing word with a well known and defined > meaning. I disagree. I am a native speaker and am familiar with the word, but that word definitely falls into the category of words that I don't use except in idiomatic expressions (e.g., "pay heed ..." or "take heed ..."). Beyond that, why choose such an obscure word when simple words will do? 'interrupt_occurred' => 'interrupted' 'heeding_interrupt' => 'interruptible' with interruptible(False): ... with interruptible(True): ... -- Scott Dial scott at scottdial.com From cf.natali at gmail.com Mon Jan 28 00:59:12 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Mon, 28 Jan 2013 00:59:12 +0100 Subject: [Python-ideas] Interrupting threads In-Reply-To: <20130127210415.GA14691@cskk.homeip.net> References: <20130127210415.GA14691@cskk.homeip.net> Message-ID: > | Because it's unsafe. > > But the same can easily be true of a KeyboardInterrupt in the main > thread in any multithreaded program. Yes, that's why I don't catch KeyboardInterrupt, and only use it to interrupt the execution of the program and leave it exit, unless it's raised at specific places like reading from stdin... > This is true if any other exception is raised also. MRAB's suggestion > turns a thread interrupt into an exception, with some control for > ignoring-but-detecting the exception around some places. No, because properly written code is prepared to deal with exceptions that the code is susceptible to throw. This change would make it possible for an unrelated exception to be thrown *at any point in the code*. Try writing safe and readable code with this in mind: it's impossible (especially since the interruption might be raise in the middle of the exception handling routine). > That would amount to running the whole thread inside his context manager > and polling the interrupt_occurred flag regularly. This should be the default behavior: you only want to support interruption at specific points in the code. > On this basis, I'd be inclined to cast MRAB's suggestion as giving every > Thread object a cancel() method. When heeding_interrupt is True, raise > ThreadInterrupt. When heeding_interrupt is False, set > interrupt_occurred. That's still wrong: you want to test explicitly for interruption, or throw interrupt exception at specific blocking calls (like sleep() or acquire()). Please have a look at the Java rationale and way of dealing with interruption, you'll see why you want cooperative and specific interruption support. > It's in common use, and not obscure. I am a native speaker, and see no > problem with it. Long standing word with a well known and defined > meaning. Really? I know about sigprocmask(), pthread_sigmask(), SIG_IGN and SIG_BLOCK, interrupt masking... I couldn't find a single occurrence of "heed" in the POSIX specification. From shane at umbrellacode.com Mon Jan 28 09:57:48 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 28 Jan 2013 00:57:48 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: On Jan 27, 2013, at 8:28 AM, Guido van Rossum wrote: > On Sun, Jan 27, 2013 at 4:16 AM, Antoine Pitrou wrote: >> Le dimanche 27 janvier 2013 ? 14:12 +0200, Yuval Greenfield a ?crit : >>> On Sun, Jan 27, 2013 at 1:21 PM, Antoine Pitrou >>> wrote: >>>> Most protocols should be written independent of transport. >>> But it seems to >>>> me that a user might write an entire app as a "protocol". >>> >>> >>> Well, such an assumption can fall flat. For example, >>> certificate >>> checking in HTTPS expects that the transport is some version >>> of TLS or >>> SSL: http://tools.ietf.org/html/rfc2818.html#section-3.1 >>> >>> >>> I'm not sure I understood your reply. You'd be for an api that exposes >>> the underlying transport? I meant to say that "an entire app" entails >>> control over the subtleties of the underlying transport. >> >> What I meant is that the HTTP protocol needs to know that it is running >> over a secure transport, and it needs to fetch the server certificate >> from that transport (or, alternatively, it needs to have one of its >> callbacks called by the transport when the certificate is known). That's >> not entirely transport-agnostic. > > Yeah, it sounds like in the end having access to the socket itself (if > there is one) may be necessary. I suppose there are a number of > different ways to handle that specific use case, but it seems clear > that we can't anticipate all use cases. I'd rather have a simpler > abstraction with an escape hatch than attempting to codify more use > cases into the abstraction. We can always iterate on the design after > Python 3.4, if there's a useful generalization we didn't anticipate. > > -- > --Guido van Rossum (python.org/~guido) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas What about giving the protocol an environ info object that should have all information it needs already, which could (and probably should) include things like the SSL certificate information, and would probably also be where additional info that happened to be looked up, like host name details, was stored and accessed. Assuming the transports, etc., can define all the state information a protocol needs, can operate without hardware dependencies; in case that doesn't happen, though, the state dict will also have references to the socket, so the protocol could get to directly if needed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.hackett at metoffice.gov.uk Mon Jan 28 13:06:39 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 12:06:39 +0000 Subject: [Python-ideas] Fwd: csv.DictReader could handle headers more intelligently. In-Reply-To: <6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com> References: <6867B23C-4C94-4B64-B5C3-CC7AACF25A79@umbrellacode.com> Message-ID: <201301281206.40149.mark.hackett@metoffice.gov.uk> On Sunday 27 Jan 2013, Shane Green wrote: > While I think the changes should be added without changing what exists for > backward compatibility reasons, I've started to think the existing version > should also be deprecated, rather than maintained as a special case > That sounds effective. From mark.hackett at metoffice.gov.uk Mon Jan 28 13:13:45 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 12:13:45 +0000 Subject: [Python-ideas] =?iso-8859-1?q?csv=2EDictReader_could_handle_heade?= =?iso-8859-1?q?rs_more=09intelligently=2E?= In-Reply-To: <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1358903168.4767.4.camel@webb> <87pq0s2gpa.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <201301281213.45810.mark.hackett@metoffice.gov.uk> On Saturday 26 Jan 2013, Stephen J. Turnbull wrote: > Shane Green writes: > > And while it's true that a dictionary is a dictionary and it works > > the way it works, the real point that drives home is that it's an > > inappropriate mechanism for dealing ordered rows of sequential > > values. > > Right! So use csv.reader, or csv.DictReader with an explicit > fieldnames argument. > > The point of csv.DictReader with default fieldnames is to take a > "well-behaved" table and turn it into a sequence of "poor-man's" > objects. > Well though there's another example out there of what do do next, I was thinking of being able to define the csv file format so that you could write it out correctly too. And to that end, some form of description of the csv file is needed. I was thinking something like this: A,B,C,A,D,E {(A:2,A:1),B,C,D,E} which would put columns 4 and 1 in the first entry (under the name A) as a list, in that order, followed by B, C, D and E all expected to be single unique names. This also allows the same definition to be used to write it out. Blank headers are denoted with: A,,,,,,B,C And headers not used in the dictionary (discarded) are handled by not being put in the "where do we put this" line: A,B,C,D {A,D} When writing out, you cannot have empty headers (since these values get dropped and the output format spec is now no longer suitable), and you must assign each header a dictionary (else again the dictionary doesn't contain all the data that was in the input). To write out these two types of input file, you need to create a new csv format spec which CAN be written out. Therefore you will have to deliberately define an output that loses data. From mark.hackett at metoffice.gov.uk Mon Jan 28 13:21:19 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 12:21:19 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com> References: <1358903168.4767.4.camel@webb> <201301251653.46558.mark.hackett@metoffice.gov.uk> <17bba319-ff53-41a6-8ada-3cd3ad036076@googlegroups.com> Message-ID: <201301281221.19978.mark.hackett@metoffice.gov.uk> On Friday 25 Jan 2013, rurpy at yahoo.com wrote: > > The csv DictReader *uses* a dictionary for its output. That > it does so imposes no requirements on how it should parse or > otherwise handle the input that eventually goes into that > dict. And that doesn't mean that writing dict[A]=1 dict[A]=9 results in dict[A] being a list containing 1 and 9. A program using a dictionary entry has to know whether the input has duplicate headers because in the case where only the first line is done, writing out the value of dict[A] gives you "1". Writing out dict[A] if it's a list gives you "[1,9]" which must be parsed differently. From mark.hackett at metoffice.gov.uk Mon Jan 28 13:21:58 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 12:21:58 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5102B76B.2080106@stoneleaf.us> References: <1358903168.4767.4.camel@webb> <201301251058.28531.mark.hackett@metoffice.gov.uk> <5102B76B.2080106@stoneleaf.us> Message-ID: <201301281221.58842.mark.hackett@metoffice.gov.uk> On Friday 25 Jan 2013, Ethan Furman wrote: > We're going to have to agree to disagree on this point -- I think there > is a huge difference between reassigning a variable which is completely > under your control from losing entire columns of data from a file which > you may have never seen before. > But if you've never seen it before, how do you know that you're going to get a LIST in one column? From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 28 14:33:45 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Mon, 28 Jan 2013 14:33:45 +0100 Subject: [Python-ideas] while conditional in list comprehension ?? Message-ID: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Dear all, I guess this is so obvious that someone must have suggested it before: in list comprehensions you can currently exclude items based on the if conditional, e.g.: [n for n in range(1,1000) if n % 4 == 0] Why not extend this filtering by allowing a while statement in addition to if, as in: [n for n in range(1,1000) while n < 400] Trivial effect, I agree, in this example since you could achieve the same by using range(1,400), but I hope you get the point. This intuitively understandable extension would provide a big speed-up for sorted lists where processing all the input is unnecessary. Consider this: some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # a sorted list of names [n for n in some_names if n.startswith("A")] # certainly gives a list of all names starting with A, but . [n for n in some_names while n.startswith("A")] # would have saved two comparisons Best, Wolfgang From rosuav at gmail.com Mon Jan 28 14:56:39 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Jan 2013 00:56:39 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier wrote: > Why not extend this filtering by allowing a while statement in addition to > if, as in: > > [n for n in range(1,1000) while n < 400] The time machine strikes again! Check out itertools.takewhile - it can do pretty much that: import itertools [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] It's not quite list comp notation, but it works. >>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39] ChrisA From oscar.j.benjamin at gmail.com Mon Jan 28 14:59:40 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 28 Jan 2013 13:59:40 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On 28 January 2013 13:56, Chris Angelico wrote: > On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier > wrote: >> Why not extend this filtering by allowing a while statement in addition to >> if, as in: >> >> [n for n in range(1,1000) while n < 400] > > The time machine strikes again! Check out itertools.takewhile - it can > do pretty much that: > > import itertools > [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] > > It's not quite list comp notation, but it works. > >>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))] > [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, > 37, 38, 39] The while clause is a lot clearer/nicer than takewhile/lambda. Presumably it would be more efficient as well. Oscar From masklinn at masklinn.net Mon Jan 28 15:28:52 2013 From: masklinn at masklinn.net (Masklinn) Date: Mon, 28 Jan 2013 15:28:52 +0100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On 2013-01-28, at 14:59 , Oscar Benjamin wrote: > On 28 January 2013 13:56, Chris Angelico wrote: >> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier >> wrote: >>> Why not extend this filtering by allowing a while statement in addition to >>> if, as in: >>> >>> [n for n in range(1,1000) while n < 400] >> >> The time machine strikes again! Check out itertools.takewhile - it can >> do pretty much that: >> >> import itertools >> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] >> >> It's not quite list comp notation, but it works. >> >>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))] >> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, >> 37, 38, 39] > > The while clause is a lot clearer/nicer than takewhile/lambda. > Presumably it would be more efficient as well. Maybe, but it's a rather uncommon need and that way lies Common Lisp's `loop`. From shane at umbrellacode.com Mon Jan 28 15:32:21 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 28 Jan 2013 06:32:21 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> Isn't "while" kind just the "if" of a looping construct? Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) if n < 400]? I guess your kind of looking for an "else break" feature to exit the list comprehension before evaluating all the input values. Wouldn't that complete the "while()" functionality? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 28, 2013, at 5:59 AM, Oscar Benjamin wrote: > On 28 January 2013 13:56, Chris Angelico wrote: >> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier >> wrote: >>> Why not extend this filtering by allowing a while statement in addition to >>> if, as in: >>> >>> [n for n in range(1,1000) while n < 400] >> >> The time machine strikes again! Check out itertools.takewhile - it can >> do pretty much that: >> >> import itertools >> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] >> >> It's not quite list comp notation, but it works. >> >>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))] >> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, >> 37, 38, 39] > > The while clause is a lot clearer/nicer than takewhile/lambda. > Presumably it would be more efficient as well. > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Mon Jan 28 15:38:58 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 28 Jan 2013 09:38:58 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On Mon, Jan 28, 2013 at 8:59 AM, Oscar Benjamin wrote: > On 28 January 2013 13:56, Chris Angelico wrote: >> On Tue, Jan 29, 2013 at 12:33 AM, Wolfgang Maier >> wrote: >>> Why not extend this filtering by allowing a while statement in addition to >>> if, as in: >>> >>> [n for n in range(1,1000) while n < 400] >> >> The time machine strikes again! Check out itertools.takewhile - it can >> do pretty much that: >> >> import itertools >> [n for n in itertools.takewhile(lambda n: n<400, range(1,1000))] >> >> It's not quite list comp notation, but it works. >> >>>>> [n for n in itertools.takewhile(lambda n: n<40, range(1,100))] >> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, >> 37, 38, 39] > > The while clause is a lot clearer/nicer than takewhile/lambda. > Presumably it would be more efficient as well. The while syntax definitely reads better, and I would guess that dis could clarify how much more efficient using `if n < 400` would be compared to the lambda. Then again this is a rather uncommon situation and it could be handled with the if syntax. Also, if we recall the zen of python "There should be one-- and preferably only one --obvious way to do it." which is argument enough against the `while` syntax. From rosuav at gmail.com Mon Jan 28 15:43:39 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Jan 2013 01:43:39 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> Message-ID: On Tue, Jan 29, 2013 at 1:32 AM, Shane Green wrote: > Isn't "while" kind just the "if" of a looping construct? > > Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) > if n < 400]? > > I guess your kind of looking for an "else break" feature to exit the list > comprehension before evaluating all the input values. Wouldn't that > complete the "while()" functionality? In the specific case given, they'll produce the same result, but there are two key differences: 1) If the condition becomes true again later in the original iterable, the 'if' will pick up those entries, but the 'while' won't; and 2) The 'while' version will not consume more than the one result that failed to pass the condition. I daresay it would be faster and maybe cleaner to implement this with a language feature rather than itertools.takewhile, but list comprehensions can get unwieldy too; is there sufficient call for this to justify the syntax? ChrisA From shane at umbrellacode.com Mon Jan 28 15:51:23 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 28 Jan 2013 06:51:23 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> Message-ID: Yeah, I realized (1) after a minute and came up with "else break": if n < 400 else break. Could that be functionally equivalent, not based on a loop construct within an iterator? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 28, 2013, at 6:43 AM, Chris Angelico wrote: > On Tue, Jan 29, 2013 at 1:32 AM, Shane Green wrote: >> Isn't "while" kind just the "if" of a looping construct? >> >> Would [n for n in range(1,1000) while n < 400] == [n for n in range(1,1000) >> if n < 400]? >> >> I guess your kind of looking for an "else break" feature to exit the list >> comprehension before evaluating all the input values. Wouldn't that >> complete the "while()" functionality? > > In the specific case given, they'll produce the same result, but there > are two key differences: > > 1) If the condition becomes true again later in the original iterable, > the 'if' will pick up those entries, but the 'while' won't; and > 2) The 'while' version will not consume more than the one result that > failed to pass the condition. > > I daresay it would be faster and maybe cleaner to implement this with > a language feature rather than itertools.takewhile, but list > comprehensions can get unwieldy too; is there sufficient call for this > to justify the syntax? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From graffatcolmingov at gmail.com Mon Jan 28 16:17:49 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 28 Jan 2013 10:17:49 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> Message-ID: On Mon, Jan 28, 2013 at 9:51 AM, Shane Green wrote: > Yeah, I realized (1) after a minute and came up with "else break": if n < > 400 else break. Could that be functionally equivalent, not based on a loop > construct within an iterator? > You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is definitely more obvious (in my opinion) than using the while syntax, but what does `break` mean in the context of a list comprehension? I understand the point, but I dislike the execution. I guess coming from a background in pure mathematics, this just seems wrong for a list (or set) comprehension. From rosuav at gmail.com Mon Jan 28 16:24:56 2013 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 29 Jan 2013 02:24:56 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <1CF441FF-8774-4687-A27E-2E563FCB7CA5@umbrellacode.com> Message-ID: On Tue, Jan 29, 2013 at 2:17 AM, Ian Cordasco wrote: > You mean: `[n for n in range(0, 400) if n < 100 else break]`? That is > definitely more obvious (in my opinion) than using the while syntax, > but what does `break` mean in the context of a list comprehension? It's easy enough in the simple case. What would happen if you added an "else break" to this: [(x,y) for x in range(10) for y in range(2) if x<3] Of course, this would be better written with the if between the two fors, but the clarity isn't that big a problem when it's not going to change the result. Would it be obvious that the "else break" would only halt the "for y" loop? ChrisA From guido at python.org Mon Jan 28 16:45:55 2013 From: guido at python.org (Guido van Rossum) Date: Mon, 28 Jan 2013 07:45:55 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: On Mon, Jan 28, 2013 at 12:57 AM, Shane Green wrote: > What about giving the protocol an environ info object that should have all > information it needs already, which could (and probably should) include > things like the SSL certificate information, and would probably also be > where additional info that happened to be looked up, like host name details, > was stored and accessed. Assuming the transports, etc., can define all the > state information a protocol needs, can operate without hardware > dependencies; in case that doesn't happen, though, the state dict will also > have references to the socket, so the protocol could get to directly if > needed. Hm. I'm not keen on precomputing all of that, since most protocols won't need it, and the cost add up. This is not WSGI. The protocol has the transport object and can ask it specific questions -- if through a general API, like get_extra_info(key, [default]). -- --Guido van Rossum (python.org/~guido) From ethan at stoneleaf.us Mon Jan 28 16:50:09 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Jan 2013 07:50:09 -0800 Subject: [Python-ideas] Interrupting threads In-Reply-To: References: <20130127210415.GA14691@cskk.homeip.net> Message-ID: <51069E31.6010909@stoneleaf.us> On 01/27/2013 03:59 PM, Charles-Fran?ois Natali wrote: > On 01/27/2013 01:04 PM, Cameron Simpson wrote: >> It's in common use, and not obscure. I am a native speaker, and see no >> problem with it. Long standing word with a well known and defined >> meaning. > > Really? > I know about sigprocmask(), pthread_sigmask(), SIG_IGN and SIG_BLOCK, > interrupt masking... > I couldn't find a single occurrence of "heed" in the POSIX specification. Common use is not the same as common technical use. I don't recall ever seeing 'heed' or any of its derivatives in technical literature, but I am very familiar with the word and its meaning. It's a good choice. Having said that, I also agree with Cameron that 'canceled' would be a better word that 'interrupted'. ~Ethan~ From eliben at gmail.com Mon Jan 28 16:55:57 2013 From: eliben at gmail.com (Eli Bendersky) Date: Mon, 28 Jan 2013 07:55:57 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On Mon, Jan 28, 2013 at 5:33 AM, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > Dear all, > I guess this is so obvious that someone must have suggested it before: > in list comprehensions you can currently exclude items based on the if > conditional, e.g.: > > [n for n in range(1,1000) if n % 4 == 0] > > Why not extend this filtering by allowing a while statement in addition to > if, as in: > > [n for n in range(1,1000) while n < 400] > > Trivial effect, I agree, in this example since you could achieve the same > by > using range(1,400), but I hope you get the point. > This intuitively understandable extension would provide a big speed-up for > sorted lists where processing all the input is unnecessary. > > Consider this: > > some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # > a sorted list of names > [n for n in some_names if n.startswith("A")] > # certainly gives a list of all names starting with A, but . > [n for n in some_names while n.startswith("A")] > # would have saved two comparisons > -1 This isn't adding a feature that the language can't currently perform. It can, with itertools, with an explicit 'for' loop and probably other methods. List comprehensions are a useful shortcut that should be kept as simple as possible. The semantics of the proposed 'while' aren't immediately obvious, which makes it out of place in list comprehensions, IMO. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 28 17:19:23 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Mon, 28 Jan 2013 17:19:23 +0100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de> > -1 > This isn't adding a feature that the language can't currently perform. It can, with itertools, with an explicit 'for' loop and probably other methods. > List comprehensions are a useful shortcut that should be kept as simple as possible. The semantics of the proposed 'while' aren't immediately > obvious, which makes it out of place in list comprehensions, IMO. > > Eli I thought everything that can be done with a list comprehension can also be done with an explicit 'for' loop! So following your logic, one would have to remove comprehensions from the language altogether. In terms of semantics I do not really see what isn't immediately obvious about my proposal. Since the question of use cases was brought up: I am working as a scientist, and one of the uses I thought of when proposing this was that it could be used in combination with any kind of iterator that can yield an infinite number of elements, but you only want the first few elements up to a certain value (note: this is related to, but not the same as saying I want a certain number of elements from the iterator). Let?s take the often used example of the Fibonacci iterator and assume you have an instance 'fibo' of its iterable class implementation, then: [n for n in fibo while n <10000] would return a list with all Fibonacci numbers that are smaller than 10000 (without having to know in advance how many such numbers there are). Likewise, with prime numbers and a 'prime' iterator: [n for n in prime while n<10000] and many other scientifically useful numeric sequences. I would appreciate such a feature, and, even though everything can be solved with itertools, I think it?s too much typing and thinking for generating a list quickly. Best, Wolfgang From graffatcolmingov at gmail.com Mon Jan 28 17:33:43 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 28 Jan 2013 11:33:43 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de> Message-ID: On Mon, Jan 28, 2013 at 11:19 AM, Wolfgang Maier wrote: >> -1 >> This isn't adding a feature that the language can't currently perform. It > can, with itertools, with an explicit 'for' loop and probably other methods. >> List comprehensions are a useful shortcut that should be kept as simple as > possible. The semantics of the proposed 'while' aren't immediately >> obvious, which makes it out of place in list comprehensions, IMO. >> >> Eli > > I thought everything that can be done with a list comprehension can also be > done with an explicit 'for' loop! So following your logic, one would have to > remove comprehensions from the language altogether. In terms of semantics I > do not really see what isn't immediately obvious about my proposal. > Sarcasm will not help your argument. The difference (as I would expect you to know) between the performance of a list comprehension and an explict `for` loop is significant and the comprehension is already a feature of the language. Removing it would be nonsensical. > Since the question of use cases was brought up: I am working as a scientist, > and one of the uses I thought of when proposing this was that it could be > used in combination with any kind of iterator that can yield an infinite > number of elements, but you only want the first few elements up to a certain > value (note: this is related to, but not the same as saying I want a certain > number of elements from the iterator). > > Let?s take the often used example of the Fibonacci iterator and assume you > have an instance 'fibo' of its iterable class implementation, then: > > [n for n in fibo while n <10000] > > would return a list with all Fibonacci numbers that are smaller than 10000 > (without having to know in advance how many such numbers there are). > Likewise, with prime numbers and a 'prime' iterator: > > [n for n in prime while n<10000] > > and many other scientifically useful numeric sequences. > I would appreciate such a feature, and, even though everything can be solved > with itertools, I think it?s too much typing and thinking for generating a > list quickly. > This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 28 17:48:07 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Mon, 28 Jan 2013 17:48:07 +0100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <00d901cdfd73$3c67e2c0$b537a840$@biologie.uni-freiburg.de> Message-ID: <00db01cdfd77$3fc1e660$bf45b320$@biologie.uni-freiburg.de> > Sarcasm will not help your argument. The difference (as I would expect you to know) between the performance of a list comprehension and an > explict `for` loop is significant and the comprehension is already a feature of the language. Removing it would be nonsensical. Ok, I am sorry for the sarcasm. Essentially this is exactly what I wanted to say with it. Because comprehensions are faster than for loops, I am using them, and this is why I'd like the while feature in them. I fully agree with everybody here that itertools provides a solution for it, but imagine for a moment the if clause wouldn't exist and people would point you to a similar itertools solution for it, e.g.: [n for n in itertools.takeif(lambda n: n % 4 == 0, range(1,1000))] What would you prefer? I think it is true that this is mostly about how often people would make use of the feature. And, yes, it was a mistake to disturb the ongoing voting with sarcasm. Best, Wolfgang > Since the question of use cases was brought up: I am working as a > scientist, and one of the uses I thought of when proposing this was > that it could be used in combination with any kind of iterator that > can yield an infinite number of elements, but you only want the first > few elements up to a certain value (note: this is related to, but not > the same as saying I want a certain number of elements from the iterator). > > Let?s take the often used example of the Fibonacci iterator and assume > you have an instance 'fibo' of its iterable class implementation, then: > > [n for n in fibo while n <10000] > > would return a list with all Fibonacci numbers that are smaller than > 10000 (without having to know in advance how many such numbers there are). > Likewise, with prime numbers and a 'prime' iterator: > > [n for n in prime while n<10000] > > and many other scientifically useful numeric sequences. > I would appreciate such a feature, and, even though everything can be > solved with itertools, I think it?s too much typing and thinking for > generating a list quickly. > This is definitely a problematic use case for a simple list comprehension, but the takewhile solution works exactly as expected and even resembles your solution. It is in the standard library and it's performance seems to be fast enough (to me at least, on a 10 year old laptop). And the key phrase here is "simple list comprehension". Yours is in theory a simple list comprehension but is rather a slightly more complex case that can be handled in a barely more complex way. itertools is a part of the standard library that needs more affection, in my opinion and really does its best to accommodate these more complex cases in sensible ways. I am still -1 on this. Cheers, Ian From ethan at stoneleaf.us Mon Jan 28 16:53:44 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Jan 2013 07:53:44 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301281221.58842.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301251058.28531.mark.hackett@metoffice.gov.uk> <5102B76B.2080106@stoneleaf.us> <201301281221.58842.mark.hackett@metoffice.gov.uk> Message-ID: <51069F08.8070000@stoneleaf.us> On 01/28/2013 04:21 AM, Mark Hackett wrote: > On Friday 25 Jan 2013, Ethan Furman wrote: >> We're going to have to agree to disagree on this point -- I think there >> is a huge difference between reassigning a variable which is completely >> under your control from losing entire columns of data from a file which >> you may have never seen before. >> > > But if you've never seen it before, how do you know that you're going to get a > LIST in one column? I don't, which is why an exception should be raised. ~Ethan~ From mark.hackett at metoffice.gov.uk Mon Jan 28 18:13:52 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 17:13:52 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51069F08.8070000@stoneleaf.us> References: <1358903168.4767.4.camel@webb> <201301281221.58842.mark.hackett@metoffice.gov.uk> <51069F08.8070000@stoneleaf.us> Message-ID: <201301281713.52322.mark.hackett@metoffice.gov.uk> On Monday 28 Jan 2013, Ethan Furman wrote: > On 01/28/2013 04:21 AM, Mark Hackett wrote: > > On Friday 25 Jan 2013, Ethan Furman wrote: > >> We're going to have to agree to disagree on this point -- I think there > >> is a huge difference between reassigning a variable which is completely > >> under your control from losing entire columns of data from a file which > >> you may have never seen before. > > > > But if you've never seen it before, how do you know that you're going to > > get a LIST in one column? > > I don't, which is why an exception should be raised. > > ~Ethan~ And there's an argument for that that I've agreed to before. There's a counter that this will cause programs that used to work to fail. Whether the pro is higher than the con or the other way round is what I question. You, however, seem to believe this is a forgone conclusion. And that's where I disagree. From python at mrabarnett.plus.com Mon Jan 28 18:20:50 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 28 Jan 2013 17:20:50 +0000 Subject: [Python-ideas] Interrupting threads In-Reply-To: <51049915.3060808@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> Message-ID: <5106B372.5040803@mrabarnett.plus.com> The point has been made that you don't want an interruption in the middle of an exception handling routine. That's true. You also don't want an interruption in the middle of a 'finally' block. I think the problem here is that most of what I've been talking about regarding the context manager actually belongs to the 'try' statement; context managers are, after all, built on the 'try' statement. In the following the flags, the exception and the thread's method have been renamed. On entry to a 'try' statement, heed_thread_cancel is saved. When an exception is raised, heed_thread_cancel is set to True. This ensures that normal exceptions take priority. On entry to a 'finally' block, heed_thread_cancel is set to True. This ensures that the block will not be interrupted. Execution will leave the 'try' statement in one of two ways: 1. Normal exit (i.e. the next statement to be executed will be the one after the 'try' statement). heed_thread_cancel is restored. If heed_thread_cancel is True and thread_cancelled is also True, then heed_thread_cancel is set to False and ThreadCancelled is raised. 2. Exception propagation (i.e. the 'try' statement has not handled the exception). The saved heed_thread_cancel is discarded (heed_thread_cancel remains False) and the propagation continues. If the same logic applies to the keyboard interrupt (and the only real difference between ThreadCancelled and KeyboardInterrupt is that the former is triggered by the thread's 'cancel' method while the latter is triggered by the user pressing ^C or the equivalent), then the user pressing ^C will no longer interrupt the code in 'finally' blocks, breaking the clean-up code in context managers. From python at mrabarnett.plus.com Mon Jan 28 18:26:31 2013 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 28 Jan 2013 17:26:31 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51069F08.8070000@stoneleaf.us> References: <1358903168.4767.4.camel@webb> <201301251058.28531.mark.hackett@metoffice.gov.uk> <5102B76B.2080106@stoneleaf.us> <201301281221.58842.mark.hackett@metoffice.gov.uk> <51069F08.8070000@stoneleaf.us> Message-ID: <5106B4C7.3090803@mrabarnett.plus.com> On 2013-01-28 15:53, Ethan Furman wrote: > On 01/28/2013 04:21 AM, Mark Hackett wrote: >> On Friday 25 Jan 2013, Ethan Furman wrote: >>> We're going to have to agree to disagree on this point -- I think there >>> is a huge difference between reassigning a variable which is completely >>> under your control from losing entire columns of data from a file which >>> you may have never seen before. >>> >> >> But if you've never seen it before, how do you know that you're going to get a >> LIST in one column? > > I don't, which is why an exception should be raised. > +1 It shouldn't silently drop the columns, nor should it silently merge the columns into a list. It should complain, unless you state that it should merge if necessary because, presumably, you're prepared for such an eventuality. From mark.hackett at metoffice.gov.uk Mon Jan 28 18:45:16 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Mon, 28 Jan 2013 17:45:16 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5106B4C7.3090803@mrabarnett.plus.com> References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> Message-ID: <201301281745.16485.mark.hackett@metoffice.gov.uk> On Monday 28 Jan 2013, MRAB wrote: > It shouldn't silently drop the columns > Why not? It's adding to a dictionary and adding a duplicate key replaces the earlier one. If it dropped the columns and shouldn't have, then the results will be seen to be wrong anyway, so there's not a huge amount of need for this. If it WANTED to keep both columns with the duplicate names, it won't work and needs abandoning. So no different from now. If it WANTED duplicate keys (e.g. blanks which aren't imported and aren't wanted), then you've just broken it. They can't necessarily change the csv file to put headers in. So now you've made the call useless for this case. And why, really, are there duplicate column names in there anyway? You can come up with the assertion that this might be wanted, but they're not normally what you see in a csv file. I've never seen nor used a csv file that duplicated column names other than being blank. If it had been such a problem, the call would already have been abandoned. From ethan at stoneleaf.us Mon Jan 28 21:39:54 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Jan 2013 12:39:54 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <5106E21A.1000507@stoneleaf.us> On 01/28/2013 05:33 AM, Wolfgang Maier wrote: > Consider this: > > some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # > a sorted list of names > [n for n in some_names if n.startswith("A")] > # certainly gives a list of all names starting with A, but . > [n for n in some_names while n.startswith("A")] > # would have saved two comparisons What happens when you want the names that start with 'B'? The advantage of 'if' is it processes the entire list so grabs all items that match, and the list does not have to be ordered. The disadvantage (can be) that it processes the entire list. Given that 'while' would only work on sorted lists, and could only start from the beginning, I think it may be too specialized. But I wouldn't groan if someone wanted to code it up. :) +0 ~Ethan~ From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 28 22:22:09 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Mon, 28 Jan 2013 21:22:09 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <5106E21A.1000507@stoneleaf.us> Message-ID: >Ethan Furman writes: > > On 01/28/2013 05:33 AM, Wolfgang Maier wrote: > > Consider this: > > > > some_names=["Adam", "Andrew", "Arthur", "Bob", "Caroline","Lancelot"] # > > a sorted list of names > > [n for n in some_names if n.startswith("A")] > > # certainly gives a list of all names starting with A, but . > > [n for n in some_names while n.startswith("A")] > > # would have saved two comparisons > > What happens when you want the names that start with 'B'? The advantage > of 'if' is it processes the entire list so grabs all items that match, > and the list does not have to be ordered. The disadvantage (can be) > that it processes the entire list. > > Given that 'while' would only work on sorted lists, and could only start > from the beginning, I think it may be too specialized. > > But I wouldn't groan if someone wanted to code it up. :) > > +0 > > ~Ethan~ > I thought about this question, and I agree this is not what the while clause would be best for. However, currently you could solve tasks like this with itertools.takewhile in the following (almost perl-like) way (I illustrate things with numbers to keep it simpler): l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] # now retrieve all numbers from 10 to 19 (combining takewhile and slicing) [n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in itertools.takewhile(lambda x:x<10,l)]):])] Nice, isn't it? If I am not mistaken, then with my suggestion this would at least simplify to: [n for n in l[len([x for x in l while x<10]):] while n<20] Not great either, I admit, but at least it's fun to play this mindgame. Best, Wolfgang From ethan at stoneleaf.us Mon Jan 28 22:43:33 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Mon, 28 Jan 2013 13:43:33 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <5106E21A.1000507@stoneleaf.us> Message-ID: <5106F105.3060001@stoneleaf.us> On 01/28/2013 01:22 PM, Wolfgang Maier wrote: > However, currently you could solve tasks like this with itertools.takewhile in > the following (almost perl-like) way (I illustrate things with numbers to keep > it simpler): > > l=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23] > # now retrieve all numbers from 10 to 19 (combining takewhile and slicing) > [n for n in itertools.takewhile(lambda n:n<20,l[len([x for x in > itertools.takewhile(lambda x:x<10,l)]):])] > > Nice, isn't it? > > If I am not mistaken, then with my suggestion this would at least simplify to: > > [n for n in l[len([x for x in l while x<10]):] while n<20] > > Not great either, I admit, but at least it's fun to play this mindgame. Well, as long as we're dreaming, how about [n for n in l while 10 <= n < 20] and somebody (else!) can code to skip until the first condition is met, then keep until the second condition is met, and then stop. :) ~Ethan~ From wolfgang.maier at biologie.uni-freiburg.de Mon Jan 28 23:01:40 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Mon, 28 Jan 2013 22:01:40 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <5106E21A.1000507@stoneleaf.us> <5106F105.3060001@stoneleaf.us> Message-ID: >Ethan Furman writes: > > Well, as long as we're dreaming, how about > > [n for n in l while 10 <= n < 20] > > and somebody (else!) can code to skip until the first condition is met, > then keep until the second condition is met, and then stop. > > :) Sounds great ;) Here it's 11 p.m. so dreaming sounds like a reasonable thing to do, Wolfgang From saghul at gmail.com Mon Jan 28 23:48:54 2013 From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=) Date: Mon, 28 Jan 2013 23:48:54 +0100 Subject: [Python-ideas] libuv based eventloop for tulip experiment Message-ID: <51070056.8020006@gmail.com> Hi all! I haven't been able to keep up with all the tulip development on the mailing list (hopefully I will now!) so please excuse me if something I mention has already been discussed. For those who may not know it, libuv is the platform layer library for nodejs, which implements a uniform interface on top of epoll, kqueue, event ports and iocp. I wrote Python bindings [1] for it a while ago, and I was very excited to see Tulip, so I thought I'd give this a try. Here [2] is the source code, along with some notes I took during the implementation. I know that the idea is not to re-implement the PEP itself but for people to create different EventLoop implementations. On rose I bundled tulip just to make a single package I could play with easily, once tulip makes it to the stdlib only the EventLop will remain. Here are some thoughts (in no particular order): - add_connector / remove_connector seem to be related to Windows, but being exposed like that feels a bit like leaking an implementation detail. I guess there was no way around it. - libuv implements a type of handle (Poll) which provides level-triggered file descriptor polling which also works on Windows, while being highly performant. It uses something called AFD Polling apparently, which is only available on Windows >= Vista, and a select thread on XP. I'm no Windows expert, but thanks to this the API is consistent across all platforms, which is nice. mAybe it's worth investigating? [3] - The transport abstraction seems quite tight to socket objects. pyuv provides a TCP and UDP handles, which provide a completion-style API and use a better approach than Poll handles. They should give better performance since EINTR in handled internally and there are less roundtrips between Python-land and C-land. Was it ever considered to provide some sort of abstraction so that transports can be used on top of something other than regular sockets? For example I see no way to get the remote party from the transport, without checking the underlying socket. Thanks for reading this far and keep up the good work. Regards, [1]: https://github.com/saghul/pyuv [2]: https://github.com/saghul/rose [3]: https://github.com/joyent/libuv/blob/master/src/win/poll.c -- Sa?l Ibarra Corretg? http://saghul.net/blog | http://about.me/saghul From tjreedy at udel.edu Tue Jan 29 00:27:08 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 28 Jan 2013 18:27:08 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On 1/28/2013 8:33 AM, Wolfgang Maier wrote: > Dear all, > I guess this is so obvious that someone must have suggested it before: No one who understands comprehensions would suggest this. > in list comprehensions you can currently exclude items based on the if > conditional, e.g.: > > [n for n in range(1,1000) if n % 4 == 0] > > Why not extend this filtering by allowing a while statement in addition to > if, as in: Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You want to break, not filter; and you are depending on the order of the items from the iterator. Comprehensions are a math-logic idea invented for (unordered) sets and borrowed by computer science and extended to sequences. However, sequences do not replace sets. https://en.wikipedia.org/wiki/Set-builder_notation https://en.wikipedia.org/wiki/List_comprehension Python has also extended the idea to dicts and iterators and uses almost exactly the same syntax for all 4 variations. > [n for n in range(1,1000) while n < 400] This would translate as def _temp(): res = [] for n in range(1, 1000): while n < 400): res.append(n) return res _temp() which makes an infinite loop, not a truncated loop. What you actually want is res = [] for n in range(1, 1000): if >= 400): break res.append(n) which is not the form of a comprehension. -- Terry Jan Reedy From bruce at leapyear.org Tue Jan 29 01:01:56 2013 From: bruce at leapyear.org (Bruce Leban) Date: Mon, 28 Jan 2013 16:01:56 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> Message-ID: The reader could return a multidict. If you know it's a multidict you an access the 'discarded' values. Otherwise, it appears just like the dict that we have today. A middle ground between people that don't want the interface changed and those who want to get the multiple values. Personally, I prefer code that raises exceptions when it gets unreasonable input, and I think duplicate field names qualifies. But if that's the the general sentiment than a multidict is a potential compromise. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Tue Jan 29 02:02:31 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 29 Jan 2013 01:02:31 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On 28 January 2013 23:27, Terry Reedy wrote: > On 1/28/2013 8:33 AM, Wolfgang Maier wrote: >> >> Dear all, >> I guess this is so obvious that someone must have suggested it before: > > No one who understands comprehensions would suggest this. That's a little strong. > >> in list comprehensions you can currently exclude items based on the if >> conditional, e.g.: >> >> [n for n in range(1,1000) if n % 4 == 0] >> >> Why not extend this filtering by allowing a while statement in addition to >> if, as in: > > > Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You > want to break, not filter; and you are depending on the order of the items > from the iterator. Comprehensions are a math-logic idea invented for > (unordered) sets and borrowed by computer science and extended to sequences. > However, sequences do not replace sets. Python's comprehensions are based on iterators that are inherently ordered (although in some cases the order is arbitrary). In the most common cases the comprehensions produce lists or generators that preserve the order of the underlying iterable. I find that the cases where the order of an iterable is relevant are very common in my own usage of iterables and of comprehensions. > > https://en.wikipedia.org/wiki/Set-builder_notation > https://en.wikipedia.org/wiki/List_comprehension > > Python has also extended the idea to dicts and iterators and uses almost > exactly the same syntax for all 4 variations. Although dicts and sets should be considered unordered they may still be constructed from a naturally ordered iterable. There are still cases where it makes sense to define the construction of such an object in terms of an order-dependent rule on the underlying iterator. > >> [n for n in range(1,1000) while n < 400] > > This would translate as > > def _temp(): > res = [] > for n in range(1, 1000): > while n < 400): > res.append(n) > return res > _temp() I guess this is what you mean by "No one who understands comprehensions would suggest this." Of course those are not the suggested semantics but I guess from this that you would object to a while clause that had a different meaning. > which makes an infinite loop, not a truncated loop. > What you actually want is > > res = [] > for n in range(1, 1000): > if >= 400): break > res.append(n) > > which is not the form of a comprehension. The form of a comprehension is not unchangeable. Oscar From graffatcolmingov at gmail.com Tue Jan 29 02:12:08 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 28 Jan 2013 20:12:08 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin wrote: > On 28 January 2013 23:27, Terry Reedy wrote: >> On 1/28/2013 8:33 AM, Wolfgang Maier wrote: >>> >>> Dear all, >>> I guess this is so obvious that someone must have suggested it before: >> >> No one who understands comprehensions would suggest this. > > That's a little strong. > >> >>> in list comprehensions you can currently exclude items based on the if >>> conditional, e.g.: >>> >>> [n for n in range(1,1000) if n % 4 == 0] >>> >>> Why not extend this filtering by allowing a while statement in addition to >>> if, as in: >> >> >> Why not? Because it is flat-out wrong. Or if you prefer, nonsensical. You >> want to break, not filter; and you are depending on the order of the items >> from the iterator. Comprehensions are a math-logic idea invented for >> (unordered) sets and borrowed by computer science and extended to sequences. >> However, sequences do not replace sets. > > Python's comprehensions are based on iterators that are inherently > ordered (although in some cases the order is arbitrary). In the most > common cases the comprehensions produce lists or generators that > preserve the order of the underlying iterable. I find that the cases > where the order of an iterable is relevant are very common in my own > usage of iterables and of comprehensions. > Technically they are not inherently ordered. You give the perfect example below. >> >> https://en.wikipedia.org/wiki/Set-builder_notation >> https://en.wikipedia.org/wiki/List_comprehension >> >> Python has also extended the idea to dicts and iterators and uses almost >> exactly the same syntax for all 4 variations. > > Although dicts and sets should be considered unordered they may still > be constructed from a naturally ordered iterable. There are still > cases where it makes sense to define the construction of such an > object in terms of an order-dependent rule on the underlying iterator. > They may be, but they may also be constructed from an unordered iterable. How so? Let `d` be a non-empty dictionary, and `f` a function that defines some mutation of it's input such that there doesn't exist x such that x = f(x). e = {k: f(v) for k, v in d.items()} You're taking an unordered object (a dictionary) and making a new one from it. An order dependent rule here would not make sense. Likewise, if we were to do: e = [(k, f(v)) for k, v in d.items()] We're creating order from an object in which there is none. How could the while statement be useful there? An if statement works fine. A `while` statement as suggested wouldn't. >> >>> [n for n in range(1,1000) while n < 400] >> >> This would translate as >> >> def _temp(): >> res = [] >> for n in range(1, 1000): >> while n < 400): >> res.append(n) >> return res >> _temp() > > I guess this is what you mean by "No one who understands > comprehensions would suggest this." Of course those are not the > suggested semantics but I guess from this that you would object to a > while clause that had a different meaning. > They are not the suggested semantics. You are correct. But based upon how list comprehensions are currently explained, one would be reasonable to expect a list comprehension with `while` to operate like this. >> which makes an infinite loop, not a truncated loop. >> What you actually want is >> >> res = [] >> for n in range(1, 1000): >> if >= 400): break >> res.append(n) >> >> which is not the form of a comprehension. > > The form of a comprehension is not unchangeable. > Agreed it is definitely mutable. I am just of the opinion that this is one of those instances where it shouldn't be changed. From alexandre.zani at gmail.com Tue Jan 29 02:15:22 2013 From: alexandre.zani at gmail.com (Alexandre Zani) Date: Mon, 28 Jan 2013 17:15:22 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> Message-ID: I think raising an exception on duplicate headers is actually very likely to cause working code to break. Consider that all you need for that to happen is an extra couple of empty separators on the first line creating two "" headers. That seems like the sort of behavior that is easy to occur in spreadsheet programs. (Empty cells are usually not very well differentiated from non-existent cells in spreadsheet UIs IME) A StrictDictReader is better, but I think this is overkill. As for a MultiDictReader, I don't think this is superior to csv.reader. In both cases, you need to keep track of the column orders. And if you already know the column order, you might as well just manually specify the field names in DictReader. On Mon, Jan 28, 2013 at 4:01 PM, Bruce Leban wrote: > The reader could return a multidict. If you know it's a multidict you an > access the 'discarded' values. Otherwise, it appears just like the dict > that we have today. A middle ground between people that don't want the > interface changed and those who want to get the multiple values. > Personally, I prefer code that raises exceptions when it gets unreasonable > input, and I think duplicate field names qualifies. But if that's the the > general sentiment than a multidict is a potential compromise. > > --- Bruce > Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Jan 29 02:30:56 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Jan 2013 12:30:56 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <51072650.5090808@pearwood.info> On 29/01/13 10:27, Terry Reedy wrote: >> [n for n in range(1,1000) while n < 400] > > This would translate as > > def _temp(): > res = [] > for n in range(1, 1000): > while n < 400): > res.append(n) > return res > _temp() > > which makes an infinite loop, not a truncated loop. Why would it translate that way? That would be a silly decision to make. Python can decide on the semantics of a while clause in a comprehension in whatever way makes the most sense, not necessarily according to some mechanical, nonsensical translation. We could easily decide that although [n for n in range(1,1000) if n < 400] has the semantics of: res = [] for n in range(1, 1000): if n < 400): res.append(n) [n for n in range(1,1000) while n < 400] could instead have the semantics of: res = [] for n in range(1, 1000): if not (n < 400): break res.append(n) If it were decided that reusing the while keyword in this way was too confusing (which doesn't seem likely, since it is a request that keeps coming up over and over again), we could use a different keyword: [n for n in range(1,1000) until n >= 400] > What you actually want is > > res = [] > for n in range(1, 1000): > if >= 400): break > res.append(n) > > which is not the form of a comprehension. Why not? Was the idea of a comprehension handed down from above by a deity, never to be changed? Or is it a tool to be changed if the change makes it more useful? Mathematical set builder notation has no notion of "break" because it is an abstraction. It takes exactly as much effort (time, memory, resources, whatever) to generate these two mathematical sets: {1} {x for all x in Reals if x == 1} (using a hybrid maths/python notation which I hope is clear enough). To put it another way, mathematically the list comp [p+1 for p in primes()] is expected to run infinitely fast. But clearly Python code is not a mathematical abstraction. So the fact that mathematical set builder notation does not include any way to break out of the loop is neither here nor there. Comprehensions are code, and need to be judged as code, not abstract mathematical identities. -- Steven From oscar.j.benjamin at gmail.com Tue Jan 29 02:34:46 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 29 Jan 2013 01:34:46 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On 29 January 2013 01:12, Ian Cordasco wrote: > On Mon, Jan 28, 2013 at 8:02 PM, Oscar Benjamin > wrote: >> >> Although dicts and sets should be considered unordered they may still >> be constructed from a naturally ordered iterable. There are still >> cases where it makes sense to define the construction of such an >> object in terms of an order-dependent rule on the underlying iterator. > > They may be, but they may also be constructed from an unordered > iterable. How so? > Let `d` be a non-empty dictionary, and `f` a function that defines > some mutation of it's input such that there doesn't exist x such that > x = f(x). > > e = {k: f(v) for k, v in d.items()} > > You're taking an unordered object (a dictionary) and making a new one > from it. An order dependent rule here would not make sense. Likewise, > if we were to do: > > e = [(k, f(v)) for k, v in d.items()] > > We're creating order from an object in which there is none. How could > the while statement be useful there? An if statement works fine. A > `while` statement as suggested wouldn't. I was referring to the case of constructing an object that does not preserve order by iterating over an object that does. Clearly a while clause would be a lot less useful if you were iterating over an object whose order was arbitrary: so don't use it in that case. A (contrived) example - caching Fibonacci numbers: # Fibonacci number generator def fib(): a = b = 1 while True: yield a a, b = b, a+b # Cache the first N fibonacci numbers fib_cache = {n: x for n, x in zip(range(N), fib())} # Alternative fib_cache = {n: x for n, x in enumerate(fib()) while n < N} # Cache the Fibonacci numbers less than X fib_cache = {} for n, x in enumerate(fib()): if x > X: break fib_cache[n] = x # Alternative 1 fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))} # Alternative 2 fib_cache = {n: x for n, x in enumerate(fib()) while x < X} Oscar From graffatcolmingov at gmail.com Tue Jan 29 02:43:22 2013 From: graffatcolmingov at gmail.com (Ian Cordasco) Date: Mon, 28 Jan 2013 20:43:22 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: On Mon, Jan 28, 2013 at 8:34 PM, Oscar Benjamin wrote: > I was referring to the case of constructing an object that does not > preserve order by iterating over an object that does. Clearly a while > clause would be a lot less useful if you were iterating over an object > whose order was arbitrary: so don't use it in that case. > Yeah, I'm not sure how well telling someone to use a construct of the language will go over. > A (contrived) example - caching Fibonacci numbers: > > # Fibonacci number generator > def fib(): > a = b = 1 > while True: > yield a > a, b = b, a+b > > # Cache the first N fibonacci numbers > fib_cache = {n: x for n, x in zip(range(N), fib())} > # Alternative > fib_cache = {n: x for n, x in enumerate(fib()) while n < N} > > # Cache the Fibonacci numbers less than X > fib_cache = {} > for n, x in enumerate(fib()): > if x > X: > break > fib_cache[n] = x > # Alternative 1 > fib_cache = {n: x for n, x in enumerate(takewhile(lambda x: x < X, fib()))} > # Alternative 2 > fib_cache = {n: x for n, x in enumerate(fib()) while x < X} > As contrived as it may be, it is a good example. Still, I dislike the use of `while` and would rather Steven's suggestion of `until` were this to be included. This would make `until` a super special case, but then again, this construct seems special enough that only a few examples of its usefulness can be constructed. I guess I'm more -0 with `until` than -1. Thanks for the extra example Oscar. It was helpful. Cheers, Ian From jsbueno at python.org.br Tue Jan 29 02:50:45 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Mon, 28 Jan 2013 23:50:45 -0200 Subject: [Python-ideas] constant/enum type in stdlib Message-ID: This idea is not new - but it is stalled - Last I remember it came around in Python-devel in 2010, in this thread: http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967 There is an even older PEP (PEP 354) that was rejected just for not being enough interest at the time - And it was not dismissed at all - to the contrary the last e-mail in the thread is a message from the BDLF for it to **be** ! The discussion happened in a bad moment as Python was mostly freature froozen for 3.2 - and it did not show up again for Python 3.3; The reasoning for wanting enums/ constants has been debated already - but one of the main reasons that emerge from that thread are the ability to have named constants (just like we have "True" and "False". why do I think this is needed in the stdlib, and having itin a 3rd party module is not enough? because they are an interesting thing to have, not only on the stdlib, but on several widely used Python projects that don't have other dependencies. Having a feature like this into the stdlib allow these projects to make use of it, without needing other dependencies, and moreover, users which will benefit the most out of such constants will have a wll known "constant" type which won't come as a surprise in each package he is using interactively or debugging. Most of the discussion on the 2010 thread was summed up in a message by Michael Foord in this link http://mail.python.org/pipermail/python-dev/2010-November/106063.html with some follow up here: http://mail.python.org/pipermail/python-dev/2010-November/106065.html js -><- ---------- -- From shane at umbrellacode.com Tue Jan 29 06:24:06 2013 From: shane at umbrellacode.com (Shane Green) Date: Mon, 28 Jan 2013 21:24:06 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> Message-ID: > On Monday 28 Jan 2013, MRAB wrote: >> It shouldn't silently drop the columns >> > > Why not? > > It's adding to a dictionary and adding a duplicate key replaces the earlier > one. > > If it dropped the columns and shouldn't have, then the results will be seen to > be wrong anyway, so there's not a huge amount of need for this. > > If it WANTED to keep both columns with the duplicate names, it won't work and > needs abandoning. So no different from now. > > If it WANTED duplicate keys (e.g. blanks which aren't imported and aren't > wanted), then you've just broken it. They can't necessarily change the csv file > to put headers in. So now you've made the call useless for this case. > > And why, really, are there duplicate column names in there anyway? You can > come up with the assertion that this might be wanted, but they're not normally > what you see in a csv file. > > I've never seen nor used a csv file that duplicated column names other than > being blank. > > If it had been such a problem, the call would already have been abandoned. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas Actually I've seen a many real life examples of CSV files with repeated column names, working with log data in the energy management space. CSV has been around for a very long time, and is used for a lot more than spreadsheets; there are a lot of funky formats out there. Things like, every "VALUE" column is a 15 minute reading. It seems like we're getting too hung up on dicts: all the information about a record is precisely stored by two sequences of values: the headers, and the field values. Those entires and their order can both be useful to a consumer of CSV records, and should be made available. The record also maps headers to corresponding value sequences for mapped access. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cf.natali at gmail.com Tue Jan 29 08:23:33 2013 From: cf.natali at gmail.com (=?ISO-8859-1?Q?Charles=2DFran=E7ois_Natali?=) Date: Tue, 29 Jan 2013 08:23:33 +0100 Subject: [Python-ideas] Interrupting threads In-Reply-To: <5106B372.5040803@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> Message-ID: > The point has been made that you don't want an interruption in the > middle of an exception handling routine. That's true. You also don't > want an interruption in the middle of a 'finally' block. That's a good start :-) > I think the problem here is that most of what I've been talking about > regarding the context manager actually belongs to the 'try' statement; > context managers are, after all, built on the 'try' statement. > [...] Several points: - I prefer the original "interruption" word to "cancellation": interruption is the mechanism by which a thread is notified of an asynchronous interruption/cancellation/whatever request. Cancellation is one of the potential outcomes of a thread interruption: the thread could ignore it, handle it in some specific way and continue its merry life, *or* cancel its activity and bail out. Also, "interruption" is already familiar to anybody knowing about hardware interrupts, and has precedent in other languages (e.g. Java, C#. pthread has cancellation points but those are really cancellation). - I don't understand what would happen by default, i.e. outside of any try/context manager: could an interruption exception be raised at any point? - I still don't see what this brings over a simple, explicit static Thread.interrupted() method. Interrupting a thread (through thread.interrupt()) would just set this flag (which would probably be an event/atomic read/write variable to assure memory visibility), and then a thread could just call Thread.interrupted() to check for pending interruption, and react accordingly. You don't have to mess with the 'heed' flag when an exception is raised, you're sure an asynchronous exception won't pop up at an arbitrary point in the code, it's simpler, and well, "explicit is better than implicit". The only usage I can see of an interruption exception is, as in Java, to interrupt a blocking call (which is currently not supported). - Really, "heed"? I've never had to look up a word in a dictionary while reading a technical book/presentation/discussion before. I may not be particularly good in English, but I'm positive this term will puzzle many non native speakers... From stephen at xemacs.org Tue Jan 29 09:17:46 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Tue, 29 Jan 2013 17:17:46 +0900 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> Message-ID: <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp> Shane Green writes: > Actually I've seen a many real life examples of CSV files with > repeated column names, Sure, but this really isn't the issue. If it were, "cvs.reader is your friend" would be all the answer that the issue deserves IMHO. > It seems like we're getting too hung up on dicts: Not at all. (For reasons I don't understand) Somebody has a use case where it's useful to have the field names stored in each record, rather than stored once and have both field names and field values accessed by position as needed. The point is to return a name-value *mapping object* for *each* row, and that may as well be a dict. The people who suggest a multidict or a list-valued dict are missing that point, AFAICS. Eg, in your "BLABLA", "VALUE", ..., "VALUE" example, position really is what matters, so a dict of any kind is inappropriate IMO. Again, it's arbitrary whether the list-valued dict does d["VALUE"].append(x) or d["VALUE"].insert(0,x), and it's hard for me to guess which it would do in practice: .append is easier to write, but .insert seems closer to the behavior of csv.reader (which is what we really want in your example IMO). From amauryfa at gmail.com Tue Jan 29 09:52:45 2013 From: amauryfa at gmail.com (Amaury Forgeot d'Arc) Date: Tue, 29 Jan 2013 09:52:45 +0100 Subject: [Python-ideas] Interrupting threads In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> Message-ID: 2013/1/29 Charles-Fran?ois Natali > > The point has been made that you don't want an interruption in the > > middle of an exception handling routine. That's true. You also don't > > want an interruption in the middle of a 'finally' block. > > That's a good start :-) But is it feasible? Is it possible to handle the case where a finally block calls another Python function? -- Amaury Forgeot d'Arc -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jan 29 10:21:17 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 01:21:17 -0800 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: <11D4B601-0234-41B0-8EA4-7078EFD5E30D@umbrellacode.com> Right. I was thinking about it from too high of a level, I think, and focused too much on a single example, HTTPS. To clarify a couple things, though, I actually didn't mean for transports to populate the state with superfluous information or things they didn't already know. Again, based on the single example I was considering, i was thinking they could intelligently populate it with state they know will be needed, and already have. Like the HTTPS server spawning a new HTTPS transport channel knows the channel will need its SSL information, and the transport can add it's own socket connection to the state in case the protocol needs it. I had also thought the state might somehow end up participating in get_extra_info(), so the expensive information returned was stored there; more importantly, though, I didn't mean for any such calls to be made preemptively in an attempt to populate state just in case a protocol did need it. HTTPS is a single, atypical example that's too high-level?and something similar to WSGI seemed like a reasonable approach ;-) Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 28, 2013, at 7:45 AM, Guido van Rossum wrote: > On Mon, Jan 28, 2013 at 12:57 AM, Shane Green wrote: >> What about giving the protocol an environ info object that should have all >> information it needs already, which could (and probably should) include >> things like the SSL certificate information, and would probably also be >> where additional info that happened to be looked up, like host name details, >> was stored and accessed. Assuming the transports, etc., can define all the >> state information a protocol needs, can operate without hardware >> dependencies; in case that doesn't happen, though, the state dict will also >> have references to the socket, so the protocol could get to directly if >> needed. > > Hm. I'm not keen on precomputing all of that, since most protocols > won't need it, and the cost add up. This is not WSGI. The protocol has > the transport object and can ask it specific questions -- if through a > general API, like get_extra_info(key, [default]). > > -- > --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Jan 29 10:54:43 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jan 2013 10:54:43 +0100 Subject: [Python-ideas] Interrupting threads References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> Message-ID: <20130129105443.2804520b@pitrou.net> Le Tue, 29 Jan 2013 08:23:33 +0100, Charles-Fran?ois Natali a ?crit : > - Really, "heed"? I've never had to look up a word in a dictionary > while reading a technical book/presentation/discussion before. I may > not be particularly good in English, but I'm positive this term will > puzzle many non native speakers... Ditto here. Now it's not unusual to have to learn new vocabulary, but "heed" is obscure and makes an API difficult to understand for me. Of course, I sympathize with native English speakers who are annoyed by the prevalence of Globish over real English. That said, Python already mandates American English instead of British English. Regards Antoine. From shane at umbrellacode.com Tue Jan 29 11:18:21 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 02:18:21 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: So I wasn't really questioning the usefulness of the dictionary representation, but couldn't the returned object also let you access the header and value sequences, etc? I was also thinking the conversion to simple dict with single (non-list) values per column could be part of the API. Appending duplicate field values as they're read reflects the order the duplicate entries appear in the source (when I've encountered CSV that purposely used duplicate column headers, the sequence they appear was critical). The output from the current implementation should reflect the last duplicate value, as that always replaces previous ones in the dict, so my conversions returned the last value (-1), which should do the same?I think. It was a straw man ;-). I see your point about the point. I think it would be good to have an implementation that kept all the information but still put the most usable API on it possible, rather than saying you can't have dictionary access unless you want to lose duplicate values, for example. I mean, I've needed to consume CSV a lot, and that's what would have made the module useful to me, and the implementation that keeps all the information and lets it easily to trimmed as-not-needed seems better than one that just wipes it out to start. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 12:17 AM, "Stephen J. Turnbull" wrote: > Shane Green writes: > >> Actually I've seen a many real life examples of CSV files with >> repeated column names, > > Sure, but this really isn't the issue. If it were, "cvs.reader is > your friend" would be all the answer that the issue deserves IMHO. > >> It seems like we're getting too hung up on dicts: > > Not at all. (For reasons I don't understand) Somebody has a use case > where it's useful to have the field names stored in each record, > rather than stored once and have both field names and field values > accessed by position as needed. The point is to return a name-value > *mapping object* for *each* row, and that may as well be a dict. > > The people who suggest a multidict or a list-valued dict are missing > that point, AFAICS. Eg, in your "BLABLA", "VALUE", ..., "VALUE" > example, position really is what matters, so a dict of any kind is > inappropriate IMO. Again, it's arbitrary whether the list-valued dict > does d["VALUE"].append(x) or d["VALUE"].insert(0,x), and it's hard for > me to guess which it would do in practice: .append is easier to write, > but .insert seems closer to the behavior of csv.reader (which is what > we really want in your example IMO). > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 29 11:44:54 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Jan 2013 20:44:54 +1000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <51072650.5090808@pearwood.info> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano wrote: > Why would it translate that way? That would be a silly decision to make. > Python can decide on the semantics of a while clause in a comprehension in > whatever way makes the most sense, not necessarily according to some > mechanical, nonsensical translation. Terry is correct: comprehensions are deliberately designed to have the exact same looping semantics as the equivalent statements flattened out into a single line, with the innermost expression lifted out of the loop body and placed in front. This then works to arbitrarily deep nesting levels. The surrounding syntax (parentheses, brackets, braces, and whether or not there is a colon present in the main expression) then governs what kind of result you get (generator-iterator, list, set, dict). For example in: (x, y, z for x in a if x for y in b if y for z in c if z) [x, y, z for x in a if x for y in b if y for z in c if z] {x, y, z for x in a if x for y in b if y for z in c if z} {x: y, z for x in a if x for y in b if y for z in c if z} The looping semantics of these expressions are all completely defined by the equivalent statements: for x in a: if x: for y in b: if y: for z in c: if z: (modulo a few name lookup quirks if you're playing with class scopes) Any attempt to change that fundamental equivalence between comprehensions and the corresponding statements has basically zero chance of getting accepted through the PEP process. The only remotely plausible proposal I've seen in this thread is the "else break" on the filter conditions, because that *can* be mapped directly to the statement form in order to accurately describe the intended semantics. However, it would fail the "just use itertools.takewhile or a custom iterator, that use case isn't common enough to justify dedicated syntax". The conceptual basis of Python's comprehensions in mathematical set notation would likely also play a part in rejecting an addition that requires an inherently procedural interpretation. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shane at umbrellacode.com Tue Jan 29 11:59:14 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 02:59:14 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Unfortunately "else break" also kind of falls flat on its face when you consider it's being used in context of an expression. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 2:44 AM, Nick Coghlan wrote: > On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano wrote: >> Why would it translate that way? That would be a silly decision to make. >> Python can decide on the semantics of a while clause in a comprehension in >> whatever way makes the most sense, not necessarily according to some >> mechanical, nonsensical translation. > > Terry is correct: comprehensions are deliberately designed to have the > exact same looping semantics as the equivalent statements flattened > out into a single line, with the innermost expression lifted out of > the loop body and placed in front. This then works to arbitrarily deep > nesting levels. The surrounding syntax (parentheses, brackets, braces, > and whether or not there is a colon present in the main expression) > then governs what kind of result you get (generator-iterator, list, > set, dict). > > For example in: > > (x, y, z for x in a if x for y in b if y for z in c if z) > [x, y, z for x in a if x for y in b if y for z in c if z] > {x, y, z for x in a if x for y in b if y for z in c if z} > {x: y, z for x in a if x for y in b if y for z in c if z} > > The looping semantics of these expressions are all completely defined > by the equivalent statements: > > for x in a: > if x: > for y in b: > if y: > for z in c: > if z: > > (modulo a few name lookup quirks if you're playing with class scopes) > > Any attempt to change that fundamental equivalence between > comprehensions and the corresponding statements has basically zero > chance of getting accepted through the PEP process. > > The only remotely plausible proposal I've seen in this thread is the > "else break" on the filter conditions, because that *can* be mapped > directly to the statement form in order to accurately describe the > intended semantics. However, it would fail the "just use > itertools.takewhile or a custom iterator, that use case isn't common > enough to justify dedicated syntax". The conceptual basis of Python's > comprehensions in mathematical set notation would likely also play a > part in rejecting an addition that requires an inherently procedural > interpretation. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Tue Jan 29 12:16:02 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 29 Jan 2013 11:16:02 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: On 29 January 2013 10:18, Shane Green wrote: > So I wasn't really questioning the usefulness of the dictionary > representation, but couldn't the returned object also let you access the > header and value sequences, etc? I was also thinking the conversion to > simple dict with single (non-list) values per column could be part of the > API. > > Appending duplicate field values as they're read reflects the order the > duplicate entries appear in the source (when I've encountered CSV that > purposely used duplicate column headers, the sequence they appear was > critical). The output from the current implementation should reflect the > last duplicate value, as that always replaces previous ones in the dict, so > my conversions returned the last value (-1), which should do the same?I > think. It was a straw man ;-). > > I see your point about the point. I think it would be good to have an > implementation that kept all the information but still put the most usable > API on it possible, rather than saying you can't have dictionary access > unless you want to lose duplicate values, for example. I mean, I've needed > to consume CSV a lot, and that's what would have made the module useful to > me, and the implementation that keeps all the information and lets it easily > to trimmed as-not-needed seems better than one that just wipes it out to > start. This is exactly what the csv.reader objects do. While it is a problem that csv.DictReader silently discards data when that is very likely an error, there's no need to try and guess how people want to deal with duplicate column headers and invent a new class for it. It's easy enough to write your own wrapper that exactly performs whatever processing you happen to want: def multireader(csvreader): try: headers = next(csvreader) except StopIteration: raise ValueError('No header') for row in csvreader: d = defaultdict(list) for h, v in zip(headers, row): d[h].append(v) yield d Oscar From shane at umbrellacode.com Tue Jan 29 12:33:05 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 03:33:05 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> <87ip6g1jyt.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: Okay, sure, I guess the starting point of my argument is, DictReader is nice, why not make one that supports duplicate columns and easily implement the other behaviors, whether it's discarding values from duplicate columns so there's a one-to-one mapping, or just raising an exception when a duplicate column is encountered to start with, in terms of something that handles this superset of legal CSV formats that do in fact specify exactly what header names each of their values should be mapped to? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 3:16 AM, Oscar Benjamin wrote: > On 29 January 2013 10:18, Shane Green wrote: >> So I wasn't really questioning the usefulness of the dictionary >> representation, but couldn't the returned object also let you access the >> header and value sequences, etc? I was also thinking the conversion to >> simple dict with single (non-list) values per column could be part of the >> API. >> >> Appending duplicate field values as they're read reflects the order the >> duplicate entries appear in the source (when I've encountered CSV that >> purposely used duplicate column headers, the sequence they appear was >> critical). The output from the current implementation should reflect the >> last duplicate value, as that always replaces previous ones in the dict, so >> my conversions returned the last value (-1), which should do the same?I >> think. It was a straw man ;-). >> >> I see your point about the point. I think it would be good to have an >> implementation that kept all the information but still put the most usable >> API on it possible, rather than saying you can't have dictionary access >> unless you want to lose duplicate values, for example. I mean, I've needed >> to consume CSV a lot, and that's what would have made the module useful to >> me, and the implementation that keeps all the information and lets it easily >> to trimmed as-not-needed seems better than one that just wipes it out to >> start. > > This is exactly what the csv.reader objects do. > > While it is a problem that csv.DictReader silently discards data when > that is very likely an error, there's no need to try and guess how > people want to deal with duplicate column headers and invent a new > class for it. It's easy enough to write your own wrapper that exactly > performs whatever processing you happen to want: > > def multireader(csvreader): > try: > headers = next(csvreader) > except StopIteration: > raise ValueError('No header') > for row in csvreader: > d = defaultdict(list) > for h, v in zip(headers, row): > d[h].append(v) > yield d > > > Oscar -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.hackett at metoffice.gov.uk Tue Jan 29 12:39:28 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Tue, 29 Jan 2013 11:39:28 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> Message-ID: <201301291139.28128.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Alexandre Zani wrote: > > As for a MultiDictReader, I don't think this is superior to csv.reader. In > both cases, you need to keep track of the column orders. And if you already > know the column order, you might as well just manually specify the field > names in DictReader. > But it would allow you to access the index by name. value=csv_array[indecies{"Total Cost"}] A little more verbose than value=csv_dict{"Total Cost"} But it's easier to read what it's doing than value=csv_array[3] From ncoghlan at gmail.com Tue Jan 29 12:50:07 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Jan 2013 21:50:07 +1000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno wrote: > This idea is not new - but it is stalled - > Last I remember it came around in Python-devel in 2010, in this thread: > http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967 FWIW, since that last discussion, I've switched to using strings for my special constants, dumping them in a container if I need some kind of easy validity checking or iteration. That said, an enum type may still be useful for interoperability with other systems (databases, C APIs, etc). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From yoavglazner at gmail.com Tue Jan 29 12:51:17 2013 From: yoavglazner at gmail.com (yoav glazner) Date: Tue, 29 Jan 2013 13:51:17 +0200 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: Here is very similar version that works (tested on python27) >>> def stop(): next(iter([])) >>> list((i if i<50 else stop()) for i in range(100)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] On Tue, Jan 29, 2013 at 12:59 PM, Shane Green wrote: > Unfortunately "else break" also kind of falls flat on its face when you > consider it's being used in context of an expression. > > > > > > Shane Green > www.umbrellacode.com > 408-692-4666 | shane at umbrellacode.com > > On Jan 29, 2013, at 2:44 AM, Nick Coghlan wrote: > > On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano > wrote: > > Why would it translate that way? That would be a silly decision to make. > Python can decide on the semantics of a while clause in a comprehension in > whatever way makes the most sense, not necessarily according to some > mechanical, nonsensical translation. > > > Terry is correct: comprehensions are deliberately designed to have the > exact same looping semantics as the equivalent statements flattened > out into a single line, with the innermost expression lifted out of > the loop body and placed in front. This then works to arbitrarily deep > nesting levels. The surrounding syntax (parentheses, brackets, braces, > and whether or not there is a colon present in the main expression) > then governs what kind of result you get (generator-iterator, list, > set, dict). > > For example in: > > (x, y, z for x in a if x for y in b if y for z in c if z) > [x, y, z for x in a if x for y in b if y for z in c if z] > {x, y, z for x in a if x for y in b if y for z in c if z} > {x: y, z for x in a if x for y in b if y for z in c if z} > > The looping semantics of these expressions are all completely defined > by the equivalent statements: > > for x in a: > if x: > for y in b: > if y: > for z in c: > if z: > > (modulo a few name lookup quirks if you're playing with class scopes) > > Any attempt to change that fundamental equivalence between > comprehensions and the corresponding statements has basically zero > chance of getting accepted through the PEP process. > > The only remotely plausible proposal I've seen in this thread is the > "else break" on the filter conditions, because that *can* be mapped > directly to the statement form in order to accurately describe the > intended semantics. However, it would fail the "just use > itertools.takewhile or a custom iterator, that use case isn't common > enough to justify dedicated syntax". The conceptual basis of Python's > comprehensions in mathematical set notation would likely also play a > part in rejecting an addition that requires an inherently procedural > interpretation. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Jan 29 12:53:00 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Jan 2013 21:53:00 +1000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On Tue, Jan 29, 2013 at 8:59 PM, Shane Green wrote: > Unfortunately "else break" also kind of falls flat on its face when you > consider it's being used in context of an expression. Not really, since comprehensions are all about providing expression forms of the equivalent statements. I'm not saying "else break" would get approved (I actually don't think that's likely for other reasons), just that it isn't clearly dead in the water due to the inconsistency with the statement semantics (which is the core problem with the "while" suggestion). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shane at umbrellacode.com Tue Jan 29 12:54:09 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 03:54:09 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291139.28128.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301291139.28128.mark.hackett@metoffice.gov.uk> Message-ID: <4FE11280-A0C2-4485-82A5-C8057145B61B@umbrellacode.com> > And funky CSV formats don't make the current version not work for anyone. It > works for the people it's been working for all along. Why stop that? Agreed: I'm actually not for changing the existing stuff. I don't think something that used to return single values, should start returning lists, and if it's going to start raising exceptions, I think that should be an option you enable explicitly. I think maybe this should be deprecated, in favor something that implements what we're discussing. I'm also realizing that way of thinking means it's slightly off topic, and apologize for that ;-) Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 3:39 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Alexandre Zani wrote: >> >> As for a MultiDictReader, I don't think this is superior to csv.reader. In >> both cases, you need to keep track of the column orders. And if you already >> know the column order, you might as well just manually specify the field >> names in DictReader. >> > > But it would allow you to access the index by name. > > value=csv_array[indecies{"Total Cost"}] > > A little more verbose than > > value=csv_dict{"Total Cost"} > > But it's easier to read what it's doing than > > value=csv_array[3] > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Tue Jan 29 13:03:49 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 29 Jan 2013 12:03:49 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: > Nick Coghlan writes: > > > On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano wrote: > > Why would it translate that way? That would be a silly decision to make. > > Python can decide on the semantics of a while clause in a comprehension in > > whatever way makes the most sense, not necessarily according to some > > mechanical, nonsensical translation. > > Terry is correct: comprehensions are deliberately designed to have the > exact same looping semantics as the equivalent statements flattened > out into a single line, with the innermost expression lifted out of > the loop body and placed in front. This then works to arbitrarily deep > nesting levels. The surrounding syntax (parentheses, brackets, braces, > and whether or not there is a colon present in the main expression) > then governs what kind of result you get (generator-iterator, list, > set, dict). > > For example in: > > (x, y, z for x in a if x for y in b if y for z in c if z) > [x, y, z for x in a if x for y in b if y for z in c if z] > {x, y, z for x in a if x for y in b if y for z in c if z} > {x: y, z for x in a if x for y in b if y for z in c if z} > > The looping semantics of these expressions are all completely defined > by the equivalent statements: > > for x in a: > if x: > for y in b: > if y: > for z in c: > if z: > > (modulo a few name lookup quirks if you're playing with class scopes) > > Any attempt to change that fundamental equivalence between > comprehensions and the corresponding statements has basically zero > chance of getting accepted through the PEP process. > > The only remotely plausible proposal I've seen in this thread is the > "else break" on the filter conditions, because that *can* be mapped > directly to the statement form in order to accurately describe the > intended semantics. However, it would fail the "just use > itertools.takewhile or a custom iterator, that use case isn't common > enough to justify dedicated syntax". The conceptual basis of Python's > comprehensions in mathematical set notation would likely also play a > part in rejecting an addition that requires an inherently procedural > interpretation. > > Cheers, > Nick. > Thanks Nick, that is really helpful, as I can now see where the problem really lies for the developer team. I agree that under these circumstances my suggestion is inacceptable. You know, I am just a python user, and I don't know about your development paradigms. Knowing about them, let me make a wild suggestion (and I am sure it has no chance of getting accepted either, it's more of a test to see if I understood the problem): You could introduce a new 'breakif ' statement, which would be equivalent to 'if : break'. Its use as a standalone statement could be allowed (but since its equivalent is already very simple it would be a very minor change). In addition, however the 'breakif' could be integrated into comprehensions just like 'if', and could be translated directly into loops of any nesting level without ambiguities. another note: in light of your explanation, it looks like the earlier suggestion of 'else break' would also work without ambiguities since with the rigid logic applied, there would be no doubt which of several 'for' loops gets broken by the 'break'. Thanks for any comments on this (and please :), don't yell at me for asking for a new keyword to achieve something minor, I already understood that part). Best, Wolfgang From mark.hackett at metoffice.gov.uk Tue Jan 29 13:07:16 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Tue, 29 Jan 2013 12:07:16 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <201301291207.16626.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Wolfgang Maier wrote: > > another note: in light of your explanation, it looks like the earlier > suggestion of 'else break' would also work without ambiguities since with > the rigid logic applied, there would be no doubt which of several 'for' > loops gets broken by the 'break'. > Deeply nested loops you want to break out of more than just the current loop of are why goto is still warranted in a language. Rules are there to make you THINK before you break them! From steve at pearwood.info Tue Jan 29 13:09:47 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Jan 2013 23:09:47 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <5107BC0B.8090808@pearwood.info> On 29/01/13 00:33, Wolfgang Maier wrote: > Dear all, > I guess this is so obvious that someone must have suggested it before: > in list comprehensions you can currently exclude items based on the if > conditional, e.g.: > > [n for n in range(1,1000) if n % 4 == 0] > > Why not extend this filtering by allowing a while statement in addition to > if, as in: > > [n for n in range(1,1000) while n< 400] Comprehensions in Clojure have this feature. http://clojuredocs.org/clojure_core/clojure.core/for ;; :when continues through the collection even if some have the ;; condition evaluate to false, like filter user=> (for [x (range 3 33 2) :when (prime? x)] x) (3 5 7 11 13 17 19 23 29 31) ;; :while stops at the first collection element that evaluates to ;; false, like take-while user=> (for [x (range 3 33 2) :while (prime? x)] x) (3 5 7) So there is precedent in at least one other language for this obvious and useful feature. -- Steven From ncoghlan at gmail.com Tue Jan 29 13:15:30 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Jan 2013 22:15:30 +1000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: On Tue, Jan 29, 2013 at 10:03 PM, Wolfgang Maier wrote: > Thanks for any comments on this (and please :), don't yell at me for asking for > a new keyword to achieve something minor, I already understood that part). I try not to do that - the judgement calls we have to make in designing the language don't always have obvious solutions, and part of the reason python-ideas exists is as a place for people to share ideas that turn out to be questionable, for the sake of uncovering those ideas that turn out to be worthwhile. I've had several proposals make their way into Python over the years, but they're still outnumbered by the ones which didn't make it (many because I decided not to propose them in the first place, but quite a few others because people on python-ideas and python-dev pointed out flaws, drawbacks and inconsistencies that I had missed). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From steve at pearwood.info Tue Jan 29 13:26:13 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 29 Jan 2013 23:26:13 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301281745.16485.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <51069F08.8070000@stoneleaf.us> <5106B4C7.3090803@mrabarnett.plus.com> <201301281745.16485.mark.hackett@metoffice.gov.uk> Message-ID: <5107BFE5.6010800@pearwood.info> On 29/01/13 04:45, Mark Hackett wrote: > On Monday 28 Jan 2013, MRAB wrote: >> It shouldn't silently drop the columns >> > > Why not? > > It's adding to a dictionary and adding a duplicate key replaces the earlier > one. Then adding to a dictionary was a mistake. The choice of a dict is *implementation*, not *interface*. The interface needed is to return a mapping of column names to values. The nature of that mapping is an implementation detail, and dict is only the simplest solution, not necessarily the correct solution. There is nothing about CSV files that imply that the right behaviour is to drop columns. The nature of CSV files is to allow duplicate column names, and so CSV readers should too. That implies that using a dict, which silently drops duplicate keys, was the wrong choice. We might argue that using duplicate column names is stupid, but CSV supports it, and so should CSV readers. > If it dropped the columns and shouldn't have, then the results will be seen to > be wrong anyway, so there's not a huge amount of need for this. You cannot assume that the caller knows that there are duplicated column names. That's why dropping columns is problematic: it *silently* drops them, giving the caller no idea that it has happened. Given that DictReader already exists, and that there probably is someone out there who is relying on it silently eating columns, I think that the only reasonable way forward is to add a new reader that supports multiple columns with the same name. The caller can then use whichever reader suits their use-case: * I don't care about duplicate-name columns, just give me some arbitrary one; - use DictReader * I want all of the duplicate-name columns; - use MultiDictReader * I want some of the duplicate-name columns; - use MultiDictReader, and then filter the results as you get them (When I put it like that, DictReader sounds even less useful. But as I said, I daresay *somebody* is relying on it right now, so we can't change it.) > And why, really, are there duplicate column names in there anyway? You can > come up with the assertion that this might be wanted, but they're not normally > what you see in a csv file. > > I've never seen nor used a csv file that duplicated column names other than > being blank. Well there you go. That is exactly one such example of duplicate column names. -- Steven From mark.hackett at metoffice.gov.uk Tue Jan 29 13:30:49 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Tue, 29 Jan 2013 12:30:49 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5107BFE5.6010800@pearwood.info> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> Message-ID: <201301291230.49247.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Steven D'Aprano wrote: > On 29/01/13 04:45, Mark Hackett wrote: > > On Monday 28 Jan 2013, MRAB wrote: > >> It shouldn't silently drop the columns > > > > Why not? > > > > It's adding to a dictionary and adding a duplicate key replaces the > > earlier one. > > Then adding to a dictionary was a mistake. > I agree. So don't use DictReader in that case. We have Oscar with the method to do your own (and looked fairly simple and straightforward). Chris with carefuldictreader. Shane with his dual-retention object. From mark.hackett at metoffice.gov.uk Tue Jan 29 13:35:01 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Tue, 29 Jan 2013 12:35:01 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5107BFE5.6010800@pearwood.info> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> Message-ID: <201301291235.01513.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Steven D'Aprano wrote: > > If it dropped the columns and shouldn't have, then the results will be > > seen to be wrong anyway, so there's not a huge amount of need for this. > > You cannot assume that the caller knows that there are duplicated column > names > You cannot assume they wanted them as a list. You cannot assume that duplicate replacement is what they want. If someone is using a csv file with header names they have never read, how are they going to use the data? They won't even know the name to access the value in the dictionary! So I discard the claim that the caller may not know the column names are duplicated. They have to know what the headers are to use DictReader. From jsbueno at python.org.br Tue Jan 29 13:35:22 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 29 Jan 2013 10:35:22 -0200 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 29 January 2013 09:51, yoav glazner wrote: > Here is very similar version that works (tested on python27) >>>> def stop(): > next(iter([])) > >>>> list((i if i<50 else stop()) for i in range(100)) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] Great. I think this nails it. It is exactly the intended behavior, and very readable under current language capabilities. One does not have to stop and go read what "itertools.takewhile" does, and mentally unfold the lambda guard expression - that is what makes this (and the O.P. request) more readable than using takewhile. Note: stop can also just explictly raise StopIteration - or your next(iter([])) expression can be inlined within the generator. It works in Python 3 as well - though for those who did not test: it won't work for list, dicr or set comprehensions - just for generator expressions. js -><- From steve at pearwood.info Tue Jan 29 14:08:03 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Jan 2013 00:08:03 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: <5107C9B3.2010608@pearwood.info> On 29/01/13 21:44, Nick Coghlan wrote: > On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano wrote: >> Why would it translate that way? That would be a silly decision to make. >> Python can decide on the semantics of a while clause in a comprehension in >> whatever way makes the most sense, not necessarily according to some >> mechanical, nonsensical translation. > > Terry is correct: comprehensions are deliberately designed to have the > exact same looping semantics as the equivalent statements flattened > out into a single line, with the innermost expression lifted out of > the loop body and placed in front. You have inadvertently supported the point I am trying to make: what is *deliberately designed* by people one way can be deliberately designed another way instead. List comps have the form, and limitations, they have because of people's decisions. People could decide differently. A while clause in a comprehension can map to the same statement form as currently used. Just because the parser sees "while" inside a comprehension doesn't mean that the underlying implementation has to literally insert a while loop inside a for-loop. Terry is right about one thing: that would lead to an entirely pointless infinite loop. Where Terry gets it wrong is to suppose that the only *conceivable* way to handle syntax that looks like [x for x in seq while condition] is to insert a while loop inside a for loop. But "while" is just a convenient keyword that looks good, is readable, and has a natural interpretation as executable pseudo-code. We could invent a new keyword if we wished, say "jabberwock", and treat "jabberwock cond" inside a comprehension as equivalent to "if cond else break": (x, y for x in a jabberwock x for y in b jabberwock y) for x in a: if x: for y in b: if y: yield (x, y) else: break else: break If you, as a core developer, tell me that in practice this would be exceedingly hard for the CPython implementation to do, I can only trust your opinion since I am not qualified to argue. But since you've already allowed that permitting "if cond else break" in comprehensions would be possible, I find it rather difficult to believe that spelling it "jabberwock cond" is not. > The only remotely plausible proposal I've seen in this thread is the > "else break" on the filter conditions, Which just begs for confusion and misunderstanding. Just wait until people start asking why they can't write "else some_expression", and we have to explain that inside a comprehension, the only thing allowed to follow "else" is "break". -- Steven From shane at umbrellacode.com Tue Jan 29 14:08:25 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 05:08:25 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> Message-ID: Let's remove the assumptions about their information by retaining all of it, and make an assumption that everyone is capable of dealing with lists. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 4:35 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >>> If it dropped the columns and shouldn't have, then the results will be >>> seen to be wrong anyway, so there's not a huge amount of need for this. >> >> You cannot assume that the caller knows that there are duplicated column >> names >> > > You cannot assume they wanted them as a list. > > You cannot assume that duplicate replacement is what they want. > > If someone is using a csv file with header names they have never read, how are > they going to use the data? They won't even know the name to access the value > in the dictionary! So I discard the claim that the caller may not know the > column names are duplicated. They have to know what the headers are to use > DictReader. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shibturn at gmail.com Tue Jan 29 14:18:44 2013 From: shibturn at gmail.com (Richard Oudkerk) Date: Tue, 29 Jan 2013 13:18:44 +0000 Subject: [Python-ideas] Interrupting threads In-Reply-To: <20130129105443.2804520b@pitrou.net> References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On 29/01/2013 9:54am, Antoine Pitrou wrote: > Of course, I sympathize with native English speakers who are annoyed > by the prevalence of Globish over real English. That said, Python > already mandates American English instead of British English. Is Future.cancelled() an acceptable American spelling? -- Richard From solipsis at pitrou.net Tue Jan 29 14:25:05 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 29 Jan 2013 14:25:05 +0100 Subject: [Python-ideas] Interrupting threads References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: <20130129142505.285bdc23@pitrou.net> Le Tue, 29 Jan 2013 13:18:44 +0000, Richard Oudkerk a ?crit : > On 29/01/2013 9:54am, Antoine Pitrou wrote: > > Of course, I sympathize with native English speakers who are annoyed > > by the prevalence of Globish over real English. That said, Python > > already mandates American English instead of British English. > > Is Future.cancelled() an acceptable American spelling? You shouldn't ask me. The only thing I can tell you is that it's not acceptable French :-) Regards Antoine. From steve at pearwood.info Tue Jan 29 14:28:19 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Jan 2013 00:28:19 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> Message-ID: <5107CE73.8070209@pearwood.info> On 29/01/13 23:35, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >>> If it dropped the columns and shouldn't have, then the results will be >>> seen to be wrong anyway, so there's not a huge amount of need for this. >> >> You cannot assume that the caller knows that there are duplicated column >> names >> > > You cannot assume they wanted them as a list. I don't need to assume that. They can take the list and post-process it into any data type they want. A list is a natural fit for associating multiple values to a single key, because it doesn't lose data: it is variable-sized, so it can handle "no values" or "1000 values" equally easily; it is ordered, and it is iterable. If the caller wants something else, they can convert it. > You cannot assume that duplicate replacement is what they want. I don't think I ever suggested that it was. > If someone is using a csv file with header names they have never read, how are > they going to use the data? reader = csv.DictReader(whatever) for mapping in reader: for key, value in mapping.items(): process(key, value) Or perhaps you only care about one column, and don't care about the other, unknown, columns: for mapping in reader: value = mapping.get('spam', 'some default') process(value) > They won't even know the name to access the value in the dictionary! Dealing with arbitrary field names in data you read from a file is not hard. -- Steven From ncoghlan at gmail.com Tue Jan 29 14:35:56 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 29 Jan 2013 23:35:56 +1000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <5107C9B3.2010608@pearwood.info> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <5107C9B3.2010608@pearwood.info> Message-ID: On Tue, Jan 29, 2013 at 11:08 PM, Steven D'Aprano wrote: > On 29/01/13 21:44, Nick Coghlan wrote: >> >> On Tue, Jan 29, 2013 at 11:30 AM, Steven D'Aprano >> wrote: >>> >>> Why would it translate that way? That would be a silly decision to make. >>> Python can decide on the semantics of a while clause in a comprehension >>> in >>> whatever way makes the most sense, not necessarily according to some >>> mechanical, nonsensical translation. >> >> >> Terry is correct: comprehensions are deliberately designed to have the >> exact same looping semantics as the equivalent statements flattened >> out into a single line, with the innermost expression lifted out of >> the loop body and placed in front. > > > > You have inadvertently supported the point I am trying to make: what is > *deliberately designed* by people one way can be deliberately designed > another way instead. List comps have the form, and limitations, they > have because of people's decisions. People could decide differently. "People" could. I'm telling you *we* (as in python-dev) won't. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From mark.hackett at metoffice.gov.uk Tue Jan 29 14:44:35 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Tue, 29 Jan 2013 13:44:35 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5107CE73.8070209@pearwood.info> References: <1358903168.4767.4.camel@webb> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5107CE73.8070209@pearwood.info> Message-ID: <201301291344.35342.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Steven D'Aprano wrote: > On 29/01/13 23:35, Mark Hackett wrote: > > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: > >>> If it dropped the columns and shouldn't have, then the results will be > >>> seen to be wrong anyway, so there's not a huge amount of need for this. > >> > >> You cannot assume that the caller knows that there are duplicated column > >> names > > > > You cannot assume they wanted them as a list. > > I don't need to assume that. They can take the list and post-process it > into any data type they want. Yes you ARE assuming it. You want them to post process it. But if they don't know there are duplicates there and have found their script works for their needs and therefore never looked, they will now get the wrong answer. As Oscar says, they could process the csv file themselves by hand and code in EXACTLY what they want. They don't have to put it in a dictionary then. And you've already said > Then adding to a dictionary was a mistake. So they shouldn't be using DictReader. From shane at umbrellacode.com Tue Jan 29 14:45:25 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 05:45:25 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291310.48404.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301291235.01513.mark.hackett@metoffice.gov.uk> <201301291310.48404.mark.hackett@metoffice.gov.uk> Message-ID: <48511CC7-69B5-4FDE-98C9-07765FCEBAAE@umbrellacode.com> On Jan 29, 2013, at 5:10 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, you wrote: >> Let's remove the assumptions about their information by retaining all of >> it, and make an assumption that everyone is capable of dealing with lists. >> > > Then lets not use a dictionary. And leave the DictReader alone. > Yes, I think a more useful CSV construct would map header names to lists of values, provide access to original header and value sequences, and methods for iterating sequential (header,value) items (with possibly repeating header values, and which could be fed to dict() to produce exactly what DictReader produces), As such, it would not be a DictReader because it would produce something that just extended the dictionary API. I would think something like CSVRecord, or just Record, would be more accurate. From bborcic at gmail.com Tue Jan 29 14:53:29 2013 From: bborcic at gmail.com (Boris Borcic) Date: Tue, 29 Jan 2013 14:53:29 +0100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: |>>> def notyet(cond) : if cond : raise StopIteration return True |>>> list(x for x in range(100) if notyet(x>10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] From shane at umbrellacode.com Tue Jan 29 15:09:12 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 06:09:12 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291344.35342.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5107CE73.8070209@pearwood.info> <201301291344.35342.mark.hackett@metoffice.gov.uk> Message-ID: I'm not sure this is constructive. I think it's safe to assume changing something in an API that used to return single values, into something that now returns lists of those values, will be a problem for folks. I also think it's safe to assume folks can design their applications for an API that returns lists of values. In support of this assumption, I will point out that's precisely what CGI's FieldStorage does to represent all HTML form values because some form values (radio buttons, checkboxes, etc.), can have more than one value associated with their name on submission. Finally, I would assert that the more legally formatted content your content reader accurately reads and handles, the better. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 5:44 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >> On 29/01/13 23:35, Mark Hackett wrote: >>> On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >>>>> If it dropped the columns and shouldn't have, then the results will be >>>>> seen to be wrong anyway, so there's not a huge amount of need for this. >>>> >>>> You cannot assume that the caller knows that there are duplicated column >>>> names >>> >>> You cannot assume they wanted them as a list. >> >> I don't need to assume that. They can take the list and post-process it >> into any data type they want. > > Yes you ARE assuming it. You want them to post process it. But if they don't > know there are duplicates there and have found their script works for their > needs and therefore never looked, they will now get the wrong answer. > > As Oscar says, they could process the csv file themselves by hand and code in > EXACTLY what they want. They don't have to put it in a dictionary then. > > And you've already said > >> Then adding to a dictionary was a mistake. > > So they shouldn't be using DictReader. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jan 29 15:13:22 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 06:13:22 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: <6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com> How funny? I tried a variation of that because one of my original thoughts had been "[? if x else raise StopIteration()]" may have also made some sense. But I tried it based on the example from earlier, and hadn't even considered it was even closer?. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 5:53 AM, Boris Borcic wrote: > |>>> def notyet(cond) : > if cond : > raise StopIteration > return True > > |>>> list(x for x in range(100) if notyet(x>10)) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Tue Jan 29 15:16:01 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 06:16:01 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <6B3751EA-339A-4A7F-88D9-545B56AE675A@umbrellacode.com> Message-ID: <368FE5D3-4628-406F-9AA1-F93238AF1FF4@umbrellacode.com> And, stupidly, I didn't put it in a generator?doh! Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 6:13 AM, Shane Green wrote: > How funny? I tried a variation of that because one of my original thoughts had been "[? if x else raise StopIteration()]" may have also made some sense. But I tried it based on the example from earlier, and hadn't even considered it was even closer?. > > > > > > Shane Green > www.umbrellacode.com > 408-692-4666 | shane at umbrellacode.com > > On Jan 29, 2013, at 5:53 AM, Boris Borcic wrote: > >> |>>> def notyet(cond) : >> if cond : >> raise StopIteration >> return True >> >> |>>> list(x for x in range(100) if notyet(x>10)) >> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Tue Jan 29 15:32:39 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 29 Jan 2013 14:32:39 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: > Boris Borcic writes: > > > |>>> def notyet(cond) : > if cond : > raise StopIteration > return True > > |>>> list(x for x in range(100) if notyet(x>10)) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] > Are you trying to say you entered that code and it ran? I would be very surprised: if you could simply 'raise StopIteration' within the 'if' clause then there would be no point to the discussion. But as it is, your StopIteration should not be caught by the 'for', but will be raised directly. Did you try running it? From wolfgang.maier at biologie.uni-freiburg.de Tue Jan 29 15:36:40 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 29 Jan 2013 14:36:40 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: > Are you trying to say you entered that code and it ran? > I would be very surprised: if you could simply 'raise StopIteration' within the > 'if' clause then there would be no point to the discussion. > But as it is, your StopIteration should not be caught by the 'for', but will be > raised directly. Did you try running it? Sorry, I missed your enclosing list(), which explains things of course. Cheers, Wolfgang From shane at umbrellacode.com Tue Jan 29 15:45:14 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 06:45:14 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: Here's what I was doing, and worked when i switched to the generator: >>> def stop(): ? raise StopIteration() >>> list(((x if x < 5 else stop()) for x in range(10))) [0, 1, 2, 3, 4] Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 6:36 AM, Wolfgang Maier wrote: >> Are you trying to say you entered that code and it ran? >> I would be very surprised: if you could simply 'raise StopIteration' within the >> 'if' clause then there would be no point to the discussion. >> But as it is, your StopIteration should not be caught by the 'for', but will be >> raised directly. Did you try running it? > > Sorry, I missed your enclosing list(), which explains things of course. > Cheers, > Wolfgang > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Jan 29 15:55:09 2013 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 30 Jan 2013 01:55:09 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5107CE73.8070209@pearwood.info> <201301291344.35342.mark.hackett@metoffice.gov.uk> Message-ID: On Wed, Jan 30, 2013 at 1:09 AM, Shane Green wrote: > I think it's safe to assume changing something in an API that used to return > single values, into something that now returns lists of those values, will > be a problem for folks. > > I also think it's safe to assume folks can design their applications for an > API that returns lists of values. Agreed on both points. A new API that returns lists of everything would be a lot safer than fiddling with the current one. ChrisA From rob.cliffe at btinternet.com Tue Jan 29 16:02:40 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 29 Jan 2013 15:02:40 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> Message-ID: <5107E490.9070501@btinternet.com> On 29/01/2013 10:44, Nick Coghlan wrote: > Terry is correct: comprehensions are deliberately designed to have the > exact same looping semantics as the equivalent statements flattened > out into a single line, with the innermost expression lifted out of > the loop body and placed in front. This then works to arbitrarily deep > nesting levels. The surrounding syntax (parentheses, brackets, braces, > and whether or not there is a colon present in the main expression) > then governs what kind of result you get (generator-iterator, list, > set, dict). > > For example in: > > (x, y, z for x in a if x for y in b if y for z in c if z) > [x, y, z for x in a if x for y in b if y for z in c if z] > {x, y, z for x in a if x for y in b if y for z in c if z} > {x: y, z for x in a if x for y in b if y for z in c if z} > > The looping semantics of these expressions are all completely defined > by the equivalent statements: > > for x in a: > if x: > for y in b: > if y: > for z in c: > if z: > > (modulo a few name lookup quirks if you're playing with class scopes) > Thanks for spelling this out so clearly. It helps me remember which order to place nested "for"s inside a list comprehension! :-) From wolfgang.maier at biologie.uni-freiburg.de Tue Jan 29 16:24:55 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 29 Jan 2013 15:24:55 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> Message-ID: yoav glazner writes: > > Here is very similar version that works (tested on python27) > >>>> def stop(): > > next(iter([])) > > > >>>> list((i if i<50 else stop()) for i in range(100)) > > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] Joao S. O. Bueno writes: > Great. I think this nails it. It is exactly the intended behavior, > and very readable under current language capabilities. > > One does not have to stop and go read what "itertools.takewhile" does, > and mentally unfold the lambda guard expression - that is what makes > this (and the O.P. request) more readable than using takewhile. > > Note: stop can also just explictly raise StopIteration - > or your next(iter([])) expression can be inlined within the generator. > > It works in Python 3 as well - though for those who did not test: > it won't work for list, dicr or set comprehensions - just for > generator expressions. > Shane Green writes: > > Here's what I was doing, and worked when i switched to the generator: > > >>> def stop(): > ? raise StopIteration() > > > >>> list(((x if x < 5 else stop()) for x in range(10))) > [0, 1, 2, 3, 4] Wow, thanks to the three of you! I think it's still not as clear what the code does as it would be with my 'while' suggestion. Particularly, the fact that this is not a simple 'if'-or-not decision for individual elements of the list, but in fact terminates the list with the first non-matching element (the while-like property) can easily be overlooked. However, I find it much more appealing to use built-in python semantics than to resort to the also hard to read itertools.takewhile(). In addition, this is also the fastest solution that was brought up so far. In my hands, it runs about 2x as fast as the equivalent takewhile construct, which in turn is just marginally faster than Boris Borcic's suggestion: |>>> def notyet(cond) : if cond : raise StopIteration return True |>>> list(x for x in range(100) if notyet(x>10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] I guess, I'll use your solution in my code from now on. Best, Wolfgang From oscar.j.benjamin at gmail.com Tue Jan 29 16:25:40 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 29 Jan 2013 15:25:40 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 29 January 2013 11:51, yoav glazner wrote: > Here is very similar version that works (tested on python27) >>>> def stop(): > next(iter([])) > >>>> list((i if i<50 else stop()) for i in range(100)) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] That's a great idea. You could also do: >>> list(i for i in range(100) if i<50 or stop()) It's a shame it doesn't work for list/set/dict comprehensions, though. Oscar From zachary.ware+pyideas at gmail.com Tue Jan 29 16:34:13 2013 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Tue, 29 Jan 2013 09:34:13 -0600 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On Jan 29, 2013 9:26 AM, "Oscar Benjamin" wrote: > > On 29 January 2013 11:51, yoav glazner wrote: > > Here is very similar version that works (tested on python27) > >>>> def stop(): > > next(iter([])) > > > >>>> list((i if i<50 else stop()) for i in range(100)) > > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > > That's a great idea. You could also do: > >>> list(i for i in range(100) if i<50 or stop()) > > It's a shame it doesn't work for list/set/dict comprehensions, though. > I know I'm showing my ignorance here, but how are list/dict/set comprehensions and generator expressions implemented differently that one's for loop will catch a StopIteration and the others won't? Would it make sense to reimplement list/dict/set comprehensions as an equivalent generator expression passed to the appropriate constructor, and thereby allow the StopIteration trick to work for each of them as well? Regards, Zach Ware -------------- next part -------------- An HTML attachment was scrubbed... URL: From wolfgang.maier at biologie.uni-freiburg.de Tue Jan 29 16:44:01 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Tue, 29 Jan 2013 15:44:01 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: Oscar Benjamin writes: > > On 29 January 2013 11:51, yoav glazner wrote: > > Here is very similar version that works (tested on python27) > >>>> def stop(): > > next(iter([])) > > > >>>> list((i if i<50 else stop()) for i in range(100)) > > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, > > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, > > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > > That's a great idea. You could also do: > >>> list(i for i in range(100) if i<50 or stop()) > > It's a shame it doesn't work for list/set/dict comprehensions, though. > > Oscar > list(i for i in range(100) if i<50 or stop()) Really (!) nice (and 2x as fast as using itertools.takewhile())! With the somewhat simpler (suggested earlier by Shane) def stop(): raise StopIteration this should become part of the python cookbook!! Thanks a real lot for working this out, Wolfgang From eliben at gmail.com Tue Jan 29 17:00:07 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 29 Jan 2013 08:00:07 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 3:50 AM, Nick Coghlan wrote: > On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno > wrote: > > This idea is not new - but it is stalled - > > Last I remember it came around in Python-devel in 2010, in this thread: > > > http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967 > > FWIW, since that last discussion, I've switched to using strings for > my special constants, dumping them in a container if I need some kind > of easy validity checking or iteration. > > That said, an enum type may still be useful for interoperability with > other systems (databases, C APIs, etc). > I really wish there would be an enum type in Python that would make sense. ISTM this has been raised numerous times, but not one submitted a good-enough proposal. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Tue Jan 29 17:01:28 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 29 Jan 2013 14:01:28 -0200 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 29 January 2013 13:34, Zachary Ware wrote: > > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" > wrote: >> >> On 29 January 2013 11:51, yoav glazner wrote: >> > Here is very similar version that works (tested on python27) >> >>>> def stop(): >> > next(iter([])) >> > >> >>>> list((i if i<50 else stop()) for i in range(100)) >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> > 20, >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, >> > 39, >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] >> >> That's a great idea. You could also do: >> >>> list(i for i in range(100) if i<50 or stop()) >> >> It's a shame it doesn't work for list/set/dict comprehensions, though. >> > > I know I'm showing my ignorance here, but how are list/dict/set > comprehensions and generator expressions implemented differently that one's > for loop will catch a StopIteration and the others won't? Would it make > sense to reimplement list/dict/set comprehensions as an equivalent generator > expression passed to the appropriate constructor, and thereby allow the > StopIteration trick to work for each of them as well? > That is because whil list/set/dict constructors are sort of "self contained", a generator expression - they will expect the StopIteration to be raised by the iterator in the "for" part os the expression. The generator expression, on the other hand, is an iterator in itself, and it is expected to raise a StopIteration sometime. The code put aroundit to actually execute it will catch the StopIteration - and it won't care wether it was raised by the for iterator or by any other expression in the iterator. I mean - when you do list(bla for bla in blargh) the generator is exausted inside the "list" call - and this generator exaustion is signaled by the StopIteration exception in both cases. > Regards, > > Zach Ware > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From oscar.j.benjamin at gmail.com Tue Jan 29 17:02:35 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 29 Jan 2013 16:02:35 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 29 January 2013 15:34, Zachary Ware wrote: > > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" > wrote: >> >> On 29 January 2013 11:51, yoav glazner wrote: >> > Here is very similar version that works (tested on python27) >> >>>> def stop(): >> > next(iter([])) >> > >> >>>> list((i if i<50 else stop()) for i in range(100)) >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, >> > 20, >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, >> > 39, >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] >> >> That's a great idea. You could also do: >> >>> list(i for i in range(100) if i<50 or stop()) >> >> It's a shame it doesn't work for list/set/dict comprehensions, though. >> > > I know I'm showing my ignorance here, but how are list/dict/set > comprehensions and generator expressions implemented differently that one's > for loop will catch a StopIteration and the others won't? Would it make > sense to reimplement list/dict/set comprehensions as an equivalent generator > expression passed to the appropriate constructor, and thereby allow the > StopIteration trick to work for each of them as well? A for loop is like a while loop with a try/except handler for StopIteration. So the following are roughly equivalent: # For loop for x in iterable: func1(x) else: func2() # Equivalent loop it = iter(iterable) while True: try: x = next(it) except StopIteration: func2() break func1(x) A list comprehension is just like an implicit for loop with limited functionality so it looks like: # List comp results = [func1(x) for x in iterable if func2(x)] # Equivalent loop results = [] it = iter(iterable) while True: try: x = next(it) except StopIteration: break # This part is outside the try/except if func2(x): results.append(func1(x)) The problem in the above is that we only catch StopIteration around the call to next(). So if either of func1 or func2 raises StopIteration the exception will propagate rather than terminate the loop. (This may mean that it terminates a for loop higher in the call stack - which can lead to confusing bugs - so it's important to always catch StopIteration anywhere it might get raised.) The difference with the list(generator) version is that func1() and func2() are both called inside the call to next() from the perspective of the list() function. This means that if they raise StopIteration then the try/except handler in the enclosing list function will catch it and terminate its loop. # list(generator) results = list(func1(x) for x in iterable if func2(c)) # Equivalent loop: def list(iterable): it = iter(iterable) results = [] while True: try: # Now func1 and func2 are both called in next() here x = next(it) except StopIteration: break results.append(x) return results results_gen = (func1(x) for x in iterable if func2(x)) results = list(results_gen) Oscar From jsbueno at python.org.br Tue Jan 29 17:09:20 2013 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 29 Jan 2013 14:09:20 -0200 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: Message-ID: On 29 January 2013 14:00, Eli Bendersky wrote: > > On Tue, Jan 29, 2013 at 3:50 AM, Nick Coghlan wrote: >> >> On Tue, Jan 29, 2013 at 11:50 AM, Joao S. O. Bueno >> wrote: >> > This idea is not new - but it is stalled - >> > Last I remember it came around in Python-devel in 2010, in this thread: >> > >> > http://mail.python.org/pipermail/python-dev/2010-November/thread.html#105967 >> >> FWIW, since that last discussion, I've switched to using strings for >> my special constants, dumping them in a container if I need some kind >> of easy validity checking or iteration. >> >> That said, an enum type may still be useful for interoperability with >> other systems (databases, C APIs, etc). > > > I really wish there would be an enum type in Python that would make sense. > ISTM this has been raised numerous times, but not one submitted a > good-enough proposal. As I pointed above, this last discussion was coming to a good term. Bad timing and no one clearly saying, with all the words "Michael Foord, please make this into a PEP" made it fade away, I think. js -><- > > Eli > > From zachary.ware+pyideas at gmail.com Tue Jan 29 17:23:33 2013 From: zachary.ware+pyideas at gmail.com (Zachary Ware) Date: Tue, 29 Jan 2013 10:23:33 -0600 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On Jan 29, 2013 10:02 AM, "Oscar Benjamin" wrote: > > On 29 January 2013 15:34, Zachary Ware wrote: > > > > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" > > wrote: > >> > >> On 29 January 2013 11:51, yoav glazner wrote: > >> > Here is very similar version that works (tested on python27) > >> >>>> def stop(): > >> > next(iter([])) > >> > > >> >>>> list((i if i<50 else stop()) for i in range(100)) > >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > >> > 20, > >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, > >> > 39, > >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > >> > >> That's a great idea. You could also do: > >> >>> list(i for i in range(100) if i<50 or stop()) > >> > >> It's a shame it doesn't work for list/set/dict comprehensions, though. > >> > > > > I know I'm showing my ignorance here, but how are list/dict/set > > comprehensions and generator expressions implemented differently that one's > > for loop will catch a StopIteration and the others won't? Would it make > > sense to reimplement list/dict/set comprehensions as an equivalent generator > > expression passed to the appropriate constructor, and thereby allow the > > StopIteration trick to work for each of them as well? > > A for loop is like a while loop with a try/except handler for > StopIteration. So the following are roughly equivalent: > > # For loop > for x in iterable: > func1(x) > else: > func2() > > # Equivalent loop > it = iter(iterable) > while True: > try: > x = next(it) > except StopIteration: > func2() > break > func1(x) > > A list comprehension is just like an implicit for loop with limited > functionality so it looks like: > > # List comp > results = [func1(x) for x in iterable if func2(x)] > > # Equivalent loop > results = [] > it = iter(iterable) > while True: > try: > x = next(it) > except StopIteration: > break > # This part is outside the try/except > if func2(x): > results.append(func1(x)) > > The problem in the above is that we only catch StopIteration around > the call to next(). So if either of func1 or func2 raises > StopIteration the exception will propagate rather than terminate the > loop. (This may mean that it terminates a for loop higher in the call > stack - which can lead to confusing bugs - so it's important to always > catch StopIteration anywhere it might get raised.) > > The difference with the list(generator) version is that func1() and > func2() are both called inside the call to next() from the perspective > of the list() function. This means that if they raise StopIteration > then the try/except handler in the enclosing list function will catch > it and terminate its loop. > > # list(generator) > results = list(func1(x) for x in iterable if func2(c)) > > # Equivalent loop: > def list(iterable): > it = iter(iterable) > results = [] > while True: > try: > # Now func1 and func2 are both called in next() here > x = next(it) > except StopIteration: > break > results.append(x) > return results > > results_gen = (func1(x) for x in iterable if func2(x)) > results = list(results_gen) > That makes a lot of sense. Thank you, Oscar and Joao, for the explanations. I wasn't thinking in enough scopes :) Regards, Zach Ware -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Tue Jan 29 17:41:09 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Jan 2013 11:41:09 -0500 Subject: [Python-ideas] Interrupting threads In-Reply-To: <20130129105443.2804520b@pitrou.net> References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On 1/29/2013 4:54 AM, Antoine Pitrou wrote: > Le Tue, 29 Jan 2013 08:23:33 +0100, > Charles-Fran?ois Natali > a ?crit : >> - Really, "heed"? I've never had to look up a word in a dictionary >> while reading a technical book/presentation/discussion before. I may >> not be particularly good in English, but I'm positive this term will >> puzzle many non native speakers... > > Ditto here. Now it's not unusual to have to learn new vocabulary, but > "heed" is obscure and makes an API difficult to understand for me. As a native American English speaker, 'heed' is not obscure. Heeding (paying close attention to) things such as the warnings in our fine manual may be out of style among gung-ho programmers, but I hope it is not archaic;-). That said, I can imagine that 'heed' falls below the threshhold of usage frequency for words taught abroad. > Of course, I sympathize with native English speakers who are annoyed > by the prevalence of Globish over real English.That said, Python > already mandates American English instead of British English. Heeding warning may be 'old-fashioned', but that does not make it British. -- Terry Jan Reedy From tjreedy at udel.edu Tue Jan 29 18:28:25 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Jan 2013 12:28:25 -0500 Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting threads) In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On 1/29/2013 8:18 AM, Richard Oudkerk wrote: > On 29/01/2013 9:54am, Antoine Pitrou wrote: >> Of course, I sympathize with native English speakers who are annoyed >> by the prevalence of Globish over real English. That said, Python >> already mandates American English instead of British English. > > Is Future.cancelled() an acceptable American spelling? Slightly controversial, but 'Yes'. My 1960s Dictionary of the American language gives 'canceled' and 'cancelled'. Ditto for travel. I see the same at modern web sites: http://www.merriam-webster.com/dictionary/cancel http://www.thefreedictionary.com/cancel Both give the one el version first, and that might indicate a preference. But I was actually taught in school (some decades ago) to double the els of travel and cancel have have read the rule various places. I suspect that is not done now. More discussion: http://www.reference.com/motif/language/cancelled-vs-canceled http://grammarist.com/spelling/cancel/ The latter has a Google ngram that shows 'canceled' has become more common in the U.S., but only in the last 30 years. It has even crept into British usage. http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share= On the other hand, just about no one, even in the U.S., currently spells 'cancellation' as 'cancelation'. That was tried by a few writers 1910 to 1940, but never caught on. http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share= -- Terry Jan Reedy From breamoreboy at yahoo.co.uk Tue Jan 29 18:38:50 2013 From: breamoreboy at yahoo.co.uk (Mark Lawrence) Date: Tue, 29 Jan 2013 17:38:50 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291230.49247.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291230.49247.mark.hackett@metoffice.gov.uk> Message-ID: On 29/01/2013 12:30, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >> On 29/01/13 04:45, Mark Hackett wrote: >>> On Monday 28 Jan 2013, MRAB wrote: >>>> It shouldn't silently drop the columns >>> >>> Why not? >>> >>> It's adding to a dictionary and adding a duplicate key replaces the >>> earlier one. >> >> Then adding to a dictionary was a mistake. >> > > I agree. > > So don't use DictReader in that case. > > We have Oscar with the method to do your own (and looked fairly simple and > straightforward). > Chris with carefuldictreader. > Shane with his dual-retention object. > Please can we also have a RemoveTheNullByteThatsPutAtheEndOfTheFileByBrainDeadMicrosoftMoney? :) -- Cheers. Mark Lawrence From guido at python.org Tue Jan 29 18:40:07 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jan 2013 09:40:07 -0800 Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting threads) In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: This is all pretty pointless given that PEP 3148 uses cancelled() and concurrent.futures.Future has been released since Python 3.2. Introducing single-ell aliases is just going to confuse things more. On Tue, Jan 29, 2013 at 9:28 AM, Terry Reedy wrote: > On 1/29/2013 8:18 AM, Richard Oudkerk wrote: >> >> On 29/01/2013 9:54am, Antoine Pitrou wrote: >>> >>> Of course, I sympathize with native English speakers who are annoyed >>> by the prevalence of Globish over real English. That said, Python >>> already mandates American English instead of British English. >> >> >> Is Future.cancelled() an acceptable American spelling? > > > Slightly controversial, but 'Yes'. My 1960s Dictionary of the American > language gives 'canceled' and 'cancelled'. Ditto for travel. I see the same > at modern web sites: > http://www.merriam-webster.com/dictionary/cancel > http://www.thefreedictionary.com/cancel > > Both give the one el version first, and that might indicate a preference. > But I was actually taught in school (some decades ago) to double the els of > travel and cancel have have read the rule various places. I suspect that is > not done now. More discussion: > > http://www.reference.com/motif/language/cancelled-vs-canceled > http://grammarist.com/spelling/cancel/ > > The latter has a Google ngram that shows 'canceled' has become more common > in the U.S., but only in the last 30 years. It has even crept into British > usage. > > http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share= > > On the other hand, just about no one, even in the U.S., currently spells > 'cancellation' as 'cancelation'. That was tried by a few writers 1910 to > 1940, but never caught on. > > http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share= > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Tue Jan 29 19:07:48 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Jan 2013 13:07:48 -0500 Subject: [Python-ideas] Interrupting threads In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On 1/29/2013 11:41 AM, Terry Reedy wrote: > On 1/29/2013 4:54 AM, Antoine Pitrou wrote: >> Ditto here. Now it's not unusual to have to learn new vocabulary, but >> "heed" is obscure and makes an API difficult to understand for me. > > As a native American English speaker, 'heed' is not obscure. As I believe you have often said, we need some benchmark numbers. According to Google's Ngram, 'heed' is still about 5 times more common in American books than 'annoy' and 'sympathize', which you use in your next sentence. >> Of course, I sympathize with native English speakers who are annoyed >> by the prevalence of Globish over real English. http://books.google.com/ngrams/graph?content=heed%2Ccancel%2Cannoy%2Csympathize&year_start=1800&year_end=2000&corpus=17&smoothing=3&share= Talking about obscure words, I have not seen 'Globish' before and I had to search to discover that it was not your idiosyncratic coinage. I was really surprised to find that there is even a Wikipedia entry. https://en.wikipedia.org/wiki/Globish -- Terry Jan Reedy From python at mrabarnett.plus.com Tue Jan 29 19:14:25 2013 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 29 Jan 2013 18:14:25 +0000 Subject: [Python-ideas] Interrupting threads In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> Message-ID: <51081181.5010001@mrabarnett.plus.com> On 2013-01-29 08:52, Amaury Forgeot d'Arc wrote: > 2013/1/29 Charles-Fran?ois Natali > > > > The point has been made that you don't want an interruption in the > > middle of an exception handling routine. That's true. You also don't > > want an interruption in the middle of a 'finally' block. > > That's a good start :-) > > > But is it feasible? > Is it possible to handle the case where a finally block calls another > Python function? > On entry to a finally block, interruption/cancellation is disabled/suppressed, and it remains disabled until the try statement is exited normally, at which point the original state is restored. However, the code in the finally block or in a function called from the finally block could re-enable it for a section of code with the context manager, but the context manager would save the current (disabled) state on entry and restore it on exit. From guido at python.org Tue Jan 29 19:48:49 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jan 2013 10:48:49 -0800 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: <51070056.8020006@gmail.com> References: <51070056.8020006@gmail.com> Message-ID: On Mon, Jan 28, 2013 at 2:48 PM, Sa?l Ibarra Corretg? wrote: > Hi all! > > I haven't been able to keep up with all the tulip development on the mailing > list (hopefully I will now!) so please excuse me if something I mention has > already been discussed. Me neither! :-) Libuv has been brought up before, though I haven't looked at it in detail. I think you're bringing up good stuff. > For those who may not know it, libuv is the platform layer library for > nodejs, which implements a uniform interface on top of epoll, kqueue, event > ports and iocp. I wrote Python bindings [1] for it a while ago, and I was > very excited to see Tulip, so I thought I'd give this a try. Great to hear! > Here [2] is the source code, along with some notes I took during the > implementation. Hm... I see you just copied all of tulip and then hacked on it for a while. :-) I wonder if you could refactor things so that an app would be able to dynamically choose between tulip's and rose's event loop using tulip's EventLoopPolicy machinery? The app could just instantiate tulip.unix_eventloop._UnixEventLoop() (yes, this should really be renamed!) or rose.uv.EventLoop, but all its imports should come from tulip. Also, there's a refactoring of the event loop classes underway in tulip's iocp branch -- this adds IOCP support on Windows. > I know that the idea is not to re-implement the PEP itself but for people to > create different EventLoop implementations. On rose I bundled tulip just to > make a single package I could play with easily, once tulip makes it to the > stdlib only the EventLoop will remain. It will be a long time before tulip makes it into the stdlib -- but for easy experimentation it should be possible for apps to choose between tulip and rose without having to change all their tulip imports to rose imports. > Here are some thoughts (in no particular order): > > - add_connector / remove_connector seem to be related to Windows, but being > exposed like that feels a bit like leaking an implementation detail. I guess > there was no way around it. They would only be needed if we ever were to support WSAPoll() on Windows, but I'm pretty much decided against that (need to check with Richard Oudkerk once more). Then we can kill add_connector and remove_connector. > - libuv implements a type of handle (Poll) which provides level-triggered > file descriptor polling which also works on Windows, while being highly > performant. It uses something called AFD Polling apparently, which is only > available on Windows >= Vista, and a select thread on XP. I'm no Windows > expert, but thanks to this the API is consistent across all platforms, which > is nice. mAybe it's worth investigating? [3] Again that's probably for Richard to look into. I have no idea how it relates to IOCP. > - The transport abstraction seems quite tight to socket objects. I'm confused to hear you say this, since the APIs for transports and protocols are one of the few places of PEP 3156 where sockets are *not* explicitly mentioned. (Though they are used in the implementations, but I am envisioning alternate implementations that don't use sockets.) > pyuv > provides a TCP and UDP handles, which provide a completion-style API and use > a better approach than Poll handles. So it implements TCP and UDP without socket objects? I actually like this, because it validates my decision to keep socket objects out of the transport/protocol APIs. (Note that PEP 3156 and Tulip currently don't support UDP; it will require a somewhat different API between transports and protocols.) > They should give better performance > since EINTR in handled internally and there are less roundtrips between > Python-land and C-land. Why would EINTR handling be important? That should occur almost never. Or did you mean EAGAIN? > Was it ever considered to provide some sort of > abstraction so that transports can be used on top of something other than > regular sockets? For example I see no way to get the remote party from the > transport, without checking the underlying socket. This we are considering in another thread -- there are in fact two proposals on the table, one to add transport methods get_name() and get_peer(), which should return (host, port) pairs if possible, or None if the transport is not talking to an IP connection (or there are too many layers in between to dig out that information). The other proposal is a more generic API to get info out of the transport, e.g. get_extra_info("name") and get_extra_info("peer"), which can be more easily extended (without changing the PEP) to support other things, e.g. certificate info if the transport implements SSL. > Thanks for reading this far and keep up the good work. Thanks for looking at this and reimplementing PEP 3156 on top of libuv! This is exactly the kind of thing I am hoping for. > Regards, > > [1]: https://github.com/saghul/pyuv > [2]: https://github.com/saghul/rose > [3]: https://github.com/joyent/libuv/blob/master/src/win/poll.c > > -- > Sa?l Ibarra Corretg? > http://saghul.net/blog | http://about.me/saghul -- --Guido van Rossum (python.org/~guido) From tjreedy at udel.edu Tue Jan 29 19:53:59 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Jan 2013 13:53:59 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <5107E490.9070501@btinternet.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <5107E490.9070501@btinternet.com> Message-ID: On 1/29/2013 10:02 AM, Rob Cliffe wrote: > > On 29/01/2013 10:44, Nick Coghlan wrote: >> Terry is correct: comprehensions are deliberately designed to have the >> exact same looping semantics as the equivalent statements flattened >> out into a single line, with the innermost expression lifted out of >> the loop body and placed in front. This then works to arbitrarily deep >> nesting levels. The surrounding syntax (parentheses, brackets, braces, >> and whether or not there is a colon present in the main expression) >> then governs what kind of result you get (generator-iterator, list, >> set, dict). >> >> For example in: >> >> (x, y, z for x in a if x for y in b if y for z in c if z) >> [x, y, z for x in a if x for y in b if y for z in c if z] >> {x, y, z for x in a if x for y in b if y for z in c if z} >> {x: y, z for x in a if x for y in b if y for z in c if z} >> >> The looping semantics of these expressions are all completely defined >> by the equivalent statements: >> >> for x in a: >> if x: >> for y in b: >> if y: >> for z in c: >> if z: >> >> (modulo a few name lookup quirks if you're playing with class scopes) >> > Thanks for spelling this out so clearly. It helps me remember which > order to place nested "for"s inside a list comprehension! :-) The reference manual does spell it out: "In this case, the elements of the new container are those that would be produced by considering each of the for or if clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached." Perhaps a non-trivial concrete example (say 4 levels deep) would help people understand that better. -- Terry Jan Reedy From eric at trueblade.com Tue Jan 29 19:49:01 2013 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 29 Jan 2013 13:49:01 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301291235.01513.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> Message-ID: <5108199D.2000601@trueblade.com> On 01/29/2013 07:35 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Steven D'Aprano wrote: >>> If it dropped the columns and shouldn't have, then the results will be >>> seen to be wrong anyway, so there's not a huge amount of need for this. >> >> You cannot assume that the caller knows that there are duplicated column >> names >> > > You cannot assume they wanted them as a list. > > You cannot assume that duplicate replacement is what they want. > > If someone is using a csv file with header names they have never read, how are > they going to use the data? They won't even know the name to access the value > in the dictionary! So I discard the claim that the caller may not know the > column names are duplicated. They have to know what the headers are to use > DictReader. Not true: I process some csv files just to translate them into another format, say tab delimited. I don't care about the column names, but dropping columns would sure bother me. I don't think any of the files I've processed have duplicate columns, but I wouldn't swear to it. And if they did, that would be an error I'd like to know about. Eric. From tjreedy at udel.edu Tue Jan 29 20:11:22 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 29 Jan 2013 14:11:22 -0500 Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting threads) In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On 1/29/2013 12:40 PM, Guido van Rossum wrote: > This is all pretty pointless given that PEP 3148 uses cancelled() and > concurrent.futures.Future has been released since Python 3.2. I should have added that I considered 'cancelled' the right choice now for a global language. The case is quite different from 'color' versus 'colour'. > Introducing single-ell aliases is just going to confuse things more. So this need not be considered for perhaps 50 to 100 years ;-). -- Terry Jan Reedy From turnbull at sk.tsukuba.ac.jp Tue Jan 29 20:19:30 2013 From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull) Date: Wed, 30 Jan 2013 04:19:30 +0900 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5108199D.2000601@trueblade.com> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5108199D.2000601@trueblade.com> Message-ID: <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp> Eric V. Smith writes: > Not true: I process some csv files just to translate them into another > format, say tab delimited. I don't care about the column names, Then you'd be nuts to use csv.DictReader! csv.reader does exactly what you want. DictReader is about transforming a data format from a sequence of rows of values accessed by position, one of which might be a header, to a headerless sequence of objects with values accessed by name. If your use case doesn't involve access by name, it is irrelevant. From rob.cliffe at btinternet.com Tue Jan 29 20:16:57 2013 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Tue, 29 Jan 2013 19:16:57 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <5107E490.9070501@btinternet.com> Message-ID: <51082029.5000103@btinternet.com> On 29/01/2013 18:53, Terry Reedy wrote: > On 1/29/2013 10:02 AM, Rob Cliffe wrote: >> >> On 29/01/2013 10:44, Nick Coghlan wrote: >>> Terry is correct: comprehensions are deliberately designed to have the >>> exact same looping semantics as the equivalent statements flattened >>> out into a single line, with the innermost expression lifted out of >>> the loop body and placed in front. This then works to arbitrarily deep >>> nesting levels. The surrounding syntax (parentheses, brackets, braces, >>> and whether or not there is a colon present in the main expression) >>> then governs what kind of result you get (generator-iterator, list, >>> set, dict). >>> >>> For example in: >>> >>> (x, y, z for x in a if x for y in b if y for z in c if z) >>> [x, y, z for x in a if x for y in b if y for z in c if z] >>> {x, y, z for x in a if x for y in b if y for z in c if z} >>> {x: y, z for x in a if x for y in b if y for z in c if z} >>> >>> The looping semantics of these expressions are all completely defined >>> by the equivalent statements: >>> >>> for x in a: >>> if x: >>> for y in b: >>> if y: >>> for z in c: >>> if z: >>> >>> (modulo a few name lookup quirks if you're playing with class scopes) >>> >> Thanks for spelling this out so clearly. It helps me remember which >> order to place nested "for"s inside a list comprehension! :-) > > The reference manual does spell it out: "In this case, the elements of > the new container are those that would be produced by considering each > of the for or if clauses a block, nesting from left to right, and > evaluating the expression to produce an element each time the > innermost block is reached." Perhaps a non-trivial concrete example > (say 4 levels deep) would help people understand that better. > Definitely. +1. Though I think 3 levels is enough. From ethan at stoneleaf.us Tue Jan 29 20:14:48 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Tue, 29 Jan 2013 11:14:48 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <5107E490.9070501@btinternet.com> Message-ID: <51081FA8.4010002@stoneleaf.us> On 01/29/2013 10:53 AM, Terry Reedy wrote: > On 1/29/2013 10:02 AM, Rob Cliffe wrote: >> >> On 29/01/2013 10:44, Nick Coghlan wrote: >>> Terry is correct: comprehensions are deliberately designed to have the >>> exact same looping semantics as the equivalent statements flattened >>> out into a single line, with the innermost expression lifted out of >>> the loop body and placed in front. This then works to arbitrarily deep >>> nesting levels. The surrounding syntax (parentheses, brackets, braces, >>> and whether or not there is a colon present in the main expression) >>> then governs what kind of result you get (generator-iterator, list, >>> set, dict). >>> >>> For example in: >>> >>> (x, y, z for x in a if x for y in b if y for z in c if z) >>> [x, y, z for x in a if x for y in b if y for z in c if z] >>> {x, y, z for x in a if x for y in b if y for z in c if z} >>> {x: y, z for x in a if x for y in b if y for z in c if z} >>> >>> The looping semantics of these expressions are all completely defined >>> by the equivalent statements: >>> >>> for x in a: >>> if x: >>> for y in b: >>> if y: >>> for z in c: >>> if z: >>> >>> (modulo a few name lookup quirks if you're playing with class scopes) >>> >> Thanks for spelling this out so clearly. It helps me remember which >> order to place nested "for"s inside a list comprehension! :-) > > The reference manual does spell it out: "In this case, the elements of > the new container are those that would be produced by considering each > of the for or if clauses a block, nesting from left to right, and > evaluating the expression to produce an element each time the innermost > block is reached." Perhaps a non-trivial concrete example (say 4 levels > deep) would help people understand that better. +1 The picture is much more enlightening (to me, anyway) than the words! ~Ethan~ From eric at trueblade.com Tue Jan 29 20:21:58 2013 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 29 Jan 2013 14:21:58 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5108199D.2000601@trueblade.com> <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <51082156.40702@trueblade.com> On 01/29/2013 02:19 PM, Stephen J. Turnbull wrote: > Eric V. Smith writes: > > > Not true: I process some csv files just to translate them into another > > format, say tab delimited. I don't care about the column names, > > Then you'd be nuts to use csv.DictReader! csv.reader does exactly > what you want. > > DictReader is about transforming a data format from a sequence of rows > of values accessed by position, one of which might be a header, to a > headerless sequence of objects with values accessed by name. If your > use case doesn't involve access by name, it is irrelevant. True. But my point stands: it's possible to read the data (even with a DictReader), do something with the data, and not know the column names in advance. It's not an impossible use case. Eric. From saghul at gmail.com Tue Jan 29 21:08:33 2013 From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=) Date: Tue, 29 Jan 2013 21:08:33 +0100 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: References: <51070056.8020006@gmail.com> Message-ID: <51082C41.2030508@gmail.com> Hi! [snip] > >> Here [2] is the source code, along with some notes I took during the >> implementation. > > Hm... I see you just copied all of tulip and then hacked on it for a > while. :-) I wonder if you could refactor things so that an app would > be able to dynamically choose between tulip's and rose's event loop > using tulip's EventLoopPolicy machinery? The app could just > instantiate tulip.unix_eventloop._UnixEventLoop() (yes, this should > really be renamed!) or rose.uv.EventLoop, but all its imports should > come from tulip. > > Also, there's a refactoring of the event loop classes underway in > tulip's iocp branch -- this adds IOCP support on Windows. > Sure, that's the idea, I just put everything together so that it would still run even if some API changes :-) Anyway, since I plan to follow this more closely I'll definitely go for that and rose will just create a new EventLoopPolicy which uses the uv event loop. >> I know that the idea is not to re-implement the PEP itself but for people to >> create different EventLoop implementations. On rose I bundled tulip just to >> make a single package I could play with easily, once tulip makes it to the >> stdlib only the EventLoop will remain. > > It will be a long time before tulip makes it into the stdlib -- but > for easy experimentation it should be possible for apps to choose > between tulip and rose without having to change all their tulip > imports to rose imports. > Agreed. >> Here are some thoughts (in no particular order): >> >> - add_connector / remove_connector seem to be related to Windows, but being >> exposed like that feels a bit like leaking an implementation detail. I guess >> there was no way around it. > > They would only be needed if we ever were to support WSAPoll() on > Windows, but I'm pretty much decided against that (need to check with > Richard Oudkerk once more). Then we can kill add_connector and > remove_connector. > Ok, good to hear :-) >> - libuv implements a type of handle (Poll) which provides level-triggered >> file descriptor polling which also works on Windows, while being highly >> performant. It uses something called AFD Polling apparently, which is only >> available on Windows>= Vista, and a select thread on XP. I'm no Windows >> expert, but thanks to this the API is consistent across all platforms, which >> is nice. mAybe it's worth investigating? [3] > > Again that's probably for Richard to look into. I have no idea how it > relates to IOCP. I'm no windows expert either :-) AFAIS, IOCP provides a completion-based interface, but many people/libraries are used to level-triggered readiness notifications. It's apparently not easy to have unix style file descriptor polling in Windows, but that AFD Poll stuff (fairy dust to me, to be honest) does the trick. It only works for sockets, but I guess that's ok. > >> - The transport abstraction seems quite tight to socket objects. > > I'm confused to hear you say this, since the APIs for transports and > protocols are one of the few places of PEP 3156 where sockets are > *not* explicitly mentioned. (Though they are used in the > implementations, but I am envisioning alternate implementations that > don't use sockets.) > Indeed I meant the implementation. For example right now start_serving returns a Python socket object maybe some sort of ServerHandler class could hide that and provide some some convenience methods such as getsockname. If the eventloop implementation uses Python sockets it could just call the function in the underlying sockets, but some other implementations may have other means so gather that information. >> pyuv >> provides a TCP and UDP handles, which provide a completion-style API and use >> a better approach than Poll handles. > > So it implements TCP and UDP without socket objects? I actually like > this, because it validates my decision to keep socket objects out of > the transport/protocol APIs. (Note that PEP 3156 and Tulip currently > don't support UDP; it will require a somewhat different API between > transports and protocols.) > Yes, the TCP and UDP handles from pyuv are wrappers to their corresponding types in libuv. They exist because JS doesn't have sockets so the had to create them for nodejs. The API, however, is completion style, here is a simple example on how data is read from a TCP handle: def on_data_received(handle, data, error): if error == pyuv.error.UV_EOF: # Remove closed the connection handle.close() return print(data) tcp_handle.start_read(on_data_received) This model actually fits pretty well in tulip's transport/protocol mechanism. >> They should give better performance >> since EINTR in handled internally and there are less roundtrips between >> Python-land and C-land. > > Why would EINTR handling be important? That should occur almost never. > Or did you mean EAGAIN? > Actually, both. If the process receives signal epoll_wait would be interrupted, and libuv takes care of rearming the file descriptor, which happens in C without the GIL. Same goes for EAGAIN, basically libuv tries to read 64k chunks when start_read is called, and it automatically retires on EAGAIN. I don't have number to back this up (yet) but conceptually sounds pretty plausible. >> Was it ever considered to provide some sort of >> abstraction so that transports can be used on top of something other than >> regular sockets? For example I see no way to get the remote party from the >> transport, without checking the underlying socket. > > This we are considering in another thread -- there are in fact two > proposals on the table, one to add transport methods get_name() and > get_peer(), which should return (host, port) pairs if possible, or > None if the transport is not talking to an IP connection (or there are > too many layers in between to dig out that information). The other > proposal is a more generic API to get info out of the transport, e.g. > get_extra_info("name") and get_extra_info("peer"), which can be more > easily extended (without changing the PEP) to support other things, > e.g. certificate info if the transport implements SSL. > The second model seems more flexible indeed. I guess the SSL transport could be tricky, because while currently Tulip uses the ssl module I have no TLS handle on pyuv so I'd have to build one on top of a TCP handle with pyOpenSSL (I have a prototype here [1]), so object types / APIs wouldn't match, unless Tulip provides some wrappers for SSL related objects such as certificates... >> Thanks for reading this far and keep up the good work. > > Thanks for looking at this and reimplementing PEP 3156 on top of > libuv! This is exactly the kind of thing I am hoping for. > I'll follow up the discussion closer now :-) [1]: https://gist.github.com/4599801#file-uvtls-py Regards, -- Sa?l Ibarra Corretg? http://saghul.net/blog | http://about.me/saghul From guido at python.org Tue Jan 29 21:26:52 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jan 2013 12:26:52 -0800 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: <51082C41.2030508@gmail.com> References: <51070056.8020006@gmail.com> <51082C41.2030508@gmail.com> Message-ID: On Tue, Jan 29, 2013 at 12:08 PM, Sa?l Ibarra Corretg? wrote: > [snip] [snip*2] > I'm no windows expert either :-) AFAIS, IOCP provides a completion-based > interface, but many people/libraries are used to level-triggered readiness > notifications. It's apparently not easy to have unix style file descriptor > polling in Windows, but that AFD Poll stuff (fairy dust to me, to be honest) > does the trick. It only works for sockets, but I guess that's ok. Yeah, so do the other polling things on Windows. (Well, mostly sockets. There are some other things supported like named pipes.) I guess in order to support this we'd need some kind of abstraction away from socket objects and file descriptors, at least for event loop methods like sock_recv() and add_reader(). But those are mostly meant for transports to build upon, so I think that would be fine. >>> - The transport abstraction seems quite tight to socket objects. >> I'm confused to hear you say this, since the APIs for transports and >> protocols are one of the few places of PEP 3156 where sockets are >> *not* explicitly mentioned. (Though they are used in the >> implementations, but I am envisioning alternate implementations that >> don't use sockets.) > Indeed I meant the implementation. For example right now start_serving > returns a Python socket object maybe some sort of ServerHandler class could > hide that and provide some some convenience methods such as getsockname. If > the eventloop implementation uses Python sockets it could just call the > function in the underlying sockets, but some other implementations may have > other means so gather that information. Ah, yes, the start_serving() API. It is far from ready. :-( >>> pyuv >>> provides a TCP and UDP handles, which provide a completion-style API and >>> use >>> a better approach than Poll handles. >> So it implements TCP and UDP without socket objects? I actually like >> this, because it validates my decision to keep socket objects out of >> the transport/protocol APIs. (Note that PEP 3156 and Tulip currently >> don't support UDP; it will require a somewhat different API between >> transports and protocols.) > Yes, the TCP and UDP handles from pyuv are wrappers to their corresponding > types in libuv. They exist because JS doesn't have sockets so the had to > create them for nodejs. The API, however, is completion style, here is a > simple example on how data is read from a TCP handle: > > def on_data_received(handle, data, error): > if error == pyuv.error.UV_EOF: > # Remove closed the connection > handle.close() > return > print(data) > > tcp_handle.start_read(on_data_received) > > This model actually fits pretty well in tulip's transport/protocol > mechanism. Yeah, I see. If we squint and read "handle" instead of "socket" we could even make it so that loop.sock_recv() takes one of these -- it would return a Future and your callback would set the Future's result, or its exception if an error was set. >>> They should give better performance >>> since EINTR in handled internally and there are less roundtrips between >>> Python-land and C-land. >> Why would EINTR handling be important? That should occur almost never. >> Or did you mean EAGAIN? > Actually, both. If the process receives signal epoll_wait would be > interrupted, and libuv takes care of rearming the file descriptor, which > happens in C without the GIL. Same goes for EAGAIN, basically libuv tries to > read 64k chunks when start_read is called, and it automatically retires on > EAGAIN. I don't have number to back this up (yet) but conceptually sounds > pretty plausible. Hm. Anything that uses signals for its normal operation sounds highly suspect to me. But it probably doesn't matter either way. >>> Was it ever considered to provide some sort of >>> abstraction so that transports can be used on top of something other than >>> regular sockets? For example I see no way to get the remote party from >>> the transport, without checking the underlying socket. >> This we are considering in another thread -- there are in fact two >> proposals on the table, one to add transport methods get_name() and >> get_peer(), which should return (host, port) pairs if possible, or >> None if the transport is not talking to an IP connection (or there are >> too many layers in between to dig out that information). The other >> proposal is a more generic API to get info out of the transport, e.g. >> get_extra_info("name") and get_extra_info("peer"), which can be more >> easily extended (without changing the PEP) to support other things, >> e.g. certificate info if the transport implements SSL. > The second model seems more flexible indeed. I guess the SSL transport could > be tricky, because while currently Tulip uses the ssl module I have no TLS > handle on pyuv so I'd have to build one on top of a TCP handle with > pyOpenSSL (I have a prototype here [1]), so object types / APIs wouldn't > match, unless Tulip provides some wrappers for SSL related objects such as > certificates... Hm, I thought certificates were just blobs of data? We should probably come up with a standard way to represent these that isn't tied to the stdlib's ssl module. But I don't think this should be part of PEP 3156 -- it's too big already. > [1]: https://gist.github.com/4599801#file-uvtls-py > Sa?l Ibarra Corretg? > http://saghul.net/blog | http://about.me/saghul -- --Guido van Rossum (python.org/~guido) From stephen at xemacs.org Tue Jan 29 21:37:38 2013 From: stephen at xemacs.org (Stephen J. Turnbull) Date: Wed, 30 Jan 2013 05:37:38 +0900 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51082156.40702@trueblade.com> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5108199D.2000601@trueblade.com> <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp> <51082156.40702@trueblade.com> Message-ID: <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> Eric V. Smith writes: > True. But my point stands: it's possible to read the data (even with a > DictReader), do something with the data, and not know the column names > in advance. It's not an impossible use case. But it is. Dicts don't guarantee iteration order, so you will most likely get an output file that not only has a different delimiter, but a different order of fields. The right use case here is duck-typing. Something like "I have a bunch of tables of data about car models from different manufacturers which have different sets of columns, and I know that all of them have a column labeled 'MSRP', but which column might vary across tables." Of course, I don't actually believe you'd get that lucky. From eric at trueblade.com Tue Jan 29 21:59:42 2013 From: eric at trueblade.com (Eric V. Smith) Date: Tue, 29 Jan 2013 15:59:42 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> References: <1358903168.4767.4.camel@webb> <201301281745.16485.mark.hackett@metoffice.gov.uk> <5107BFE5.6010800@pearwood.info> <201301291235.01513.mark.hackett@metoffice.gov.uk> <5108199D.2000601@trueblade.com> <87boc723wd.fsf@uwakimon.sk.tsukuba.ac.jp> <51082156.40702@trueblade.com> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> Message-ID: <5108383E.3020501@trueblade.com> On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote: > Eric V. Smith writes: > > > True. But my point stands: it's possible to read the data (even with a > > DictReader), do something with the data, and not know the column names > > in advance. It's not an impossible use case. > > But it is. Dicts don't guarantee iteration order, so you will most > likely get an output file that not only has a different delimiter, but > a different order of fields. We're going to have to agree to disagree. Order is not always important. -- Eric. From yorik.sar at gmail.com Wed Jan 30 00:37:00 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 30 Jan 2013 03:37:00 +0400 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > list(i for i in range(100) if i<50 or stop()) > Really (!) nice (and 2x as fast as using itertools.takewhile())! > I couldn't believe it so I had to check it: from __future__ import print_function import functools, itertools, operator, timeit def var1(): def _gen(): for i in range(100): if i > 50: break yield i return list(_gen()) def var2(): def stop(): raise StopIteration return list(i for i in range(100) if i <= 50 or stop()) def var3(): return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] def var4(): return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] if __name__ == '__main__': for f in (var1, var2, var3, var4): print(f.__name__, end=' ') print(timeit.timeit(f)) Results on my machine: var1 20.4974410534 var2 23.6218020916 var3 32.1543409824 var4 4.90913701057 var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? var3 is the slowest. I guess, because it has lambda in it. But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 01:17:51 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 16:17:51 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: <36B85EEB-E336-4E68-BC82-763F4AA582F1@umbrellacode.com> Haven't read back far enough to know whether this is as interesting as it looks to me, but.. >>> def until(items): ... stop = None ... counter = 0 ... items = iter(items) ... while not stop: ... stop = yield next(items) ... if stop: ... yield ... counter += 1 ... print(counter) ... >>> gen = until(range(15)) >>> stop = lambda: gen.send(True) >>> [x for x in gen if x < 3 or stop()] 1 2 3 4 [0, 1, 2] >>> Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 8:23 AM, Zachary Ware wrote: > > On Jan 29, 2013 10:02 AM, "Oscar Benjamin" wrote: > > > > On 29 January 2013 15:34, Zachary Ware wrote: > > > > > > On Jan 29, 2013 9:26 AM, "Oscar Benjamin" > > > wrote: > > >> > > >> On 29 January 2013 11:51, yoav glazner wrote: > > >> > Here is very similar version that works (tested on python27) > > >> >>>> def stop(): > > >> > next(iter([])) > > >> > > > >> >>>> list((i if i<50 else stop()) for i in range(100)) > > >> > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, > > >> > 20, > > >> > 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, > > >> > 39, > > >> > 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > > >> > > >> That's a great idea. You could also do: > > >> >>> list(i for i in range(100) if i<50 or stop()) > > >> > > >> It's a shame it doesn't work for list/set/dict comprehensions, though. > > >> > > > > > > I know I'm showing my ignorance here, but how are list/dict/set > > > comprehensions and generator expressions implemented differently that one's > > > for loop will catch a StopIteration and the others won't? Would it make > > > sense to reimplement list/dict/set comprehensions as an equivalent generator > > > expression passed to the appropriate constructor, and thereby allow the > > > StopIteration trick to work for each of them as well? > > > > A for loop is like a while loop with a try/except handler for > > StopIteration. So the following are roughly equivalent: > > > > # For loop > > for x in iterable: > > func1(x) > > else: > > func2() > > > > # Equivalent loop > > it = iter(iterable) > > while True: > > try: > > x = next(it) > > except StopIteration: > > func2() > > break > > func1(x) > > > > A list comprehension is just like an implicit for loop with limited > > functionality so it looks like: > > > > # List comp > > results = [func1(x) for x in iterable if func2(x)] > > > > # Equivalent loop > > results = [] > > it = iter(iterable) > > while True: > > try: > > x = next(it) > > except StopIteration: > > break > > # This part is outside the try/except > > if func2(x): > > results.append(func1(x)) > > > > The problem in the above is that we only catch StopIteration around > > the call to next(). So if either of func1 or func2 raises > > StopIteration the exception will propagate rather than terminate the > > loop. (This may mean that it terminates a for loop higher in the call > > stack - which can lead to confusing bugs - so it's important to always > > catch StopIteration anywhere it might get raised.) > > > > The difference with the list(generator) version is that func1() and > > func2() are both called inside the call to next() from the perspective > > of the list() function. This means that if they raise StopIteration > > then the try/except handler in the enclosing list function will catch > > it and terminate its loop. > > > > # list(generator) > > results = list(func1(x) for x in iterable if func2(c)) > > > > # Equivalent loop: > > def list(iterable): > > it = iter(iterable) > > results = [] > > while True: > > try: > > # Now func1 and func2 are both called in next() here > > x = next(it) > > except StopIteration: > > break > > results.append(x) > > return results > > > > results_gen = (func1(x) for x in iterable if func2(x)) > > results = list(results_gen) > > > > That makes a lot of sense. Thank you, Oscar and Joao, for the explanations. I wasn't thinking in enough scopes :) > > Regards, > > Zach Ware > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Jan 30 01:34:30 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Jan 2013 11:34:30 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: <51086A96.9020300@pearwood.info> On 30/01/13 02:44, Wolfgang Maier wrote: > list(i for i in range(100) if i<50 or stop()) > Really (!) nice (and 2x as fast as using itertools.takewhile())! I think you are mistaken about the speed. The itertools iterators are highly optimized and do all their work in fast C code. If you are seeing takewhile as slow, you are probably doing something wrong: untrustworthy timing code, misinterpreting what you are seeing, or some other error. Here's a comparison done the naive or obvious way. Copy and paste it into an interactive Python session: from itertools import takewhile from timeit import Timer def stop(): raise StopIteration setup = 'from __main__ import stop, takewhile' t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup) t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup) min(t1.repeat(number=100000, repeat=5)) min(t2.repeat(number=100000, repeat=5)) On my computer, t1 is about 1.5 times faster than t2. But this is misleading, because it's not takewhile that is slow. I am feeding something slow into takewhile. If I really need to run as fast as possible, I can optimize the function call inside takewhile: from operator import lt from functools import partial small_enough = partial(lt, 50) setup2 = 'from __main__ import takewhile, small_enough' t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2) min(t3.repeat(number=100000, repeat=5)) On my computer, t3 is nearly 13 times faster than t1, and 19 times faster than t2. Here are the actual times I get, using Python 2.7: py> min(t1.repeat(number=100000, repeat=5)) # using the StopIteration hack 1.2609241008758545 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.85182785987854 py> min(t3.repeat(number=100000, repeat=5)) # optimized version 0.09847092628479004 -- Steven From larry at hastings.org Wed Jan 30 02:06:45 2013 From: larry at hastings.org (Larry Hastings) Date: Tue, 29 Jan 2013 17:06:45 -0800 Subject: [Python-ideas] Extend module objects to support properties Message-ID: <51087225.3040801@hastings.org> Properties are a wonderful facility. But they only work on conventional objects. Specifically, they *don't* work on module objects. It would be nice to extend module objects so properties worked there too. For example, Victor Stinner's currently proposed PEP 433 adds two new methods to the sys module: sys.getdefaultcloexc() and sys.setdefaultcloexc(). What are we, Java? Surely this would be much nicer as a property, sys.defaultcloexc. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jan 30 02:27:30 2013 From: barry at python.org (Barry Warsaw) Date: Tue, 29 Jan 2013 20:27:30 -0500 Subject: [Python-ideas] constant/enum type in stdlib References: Message-ID: <20130129202730.6ea6d0d5@anarchist.wooz.org> On Jan 28, 2013, at 11:50 PM, Joao S. O. Bueno wrote: >And it was not dismissed at all - to the contrary the last e-mail in the >thread is a message from the BDLF for it to **be** ! The discussion happened >in a bad moment as Python was mostly freature froozen for 3.2 - and it did >not show up again for Python 3.3; I still offer up my own enum implementation, which I've used and has been available for years on PyPI, and hasn't had a new release in months because it hasn't needed one. :) It should be compatible with Pythons from 2.6 to 3.3. http://pypi.python.org/pypi/flufl.enum The one hang up about it the last time this came up was that my enum items are not ints and Guido though they should be. I actually tried at one point to make that so, but had some troublesome test failures that I didn't have time or motivation to fix, mostly because I don't particularly like those semantics. I don't remember the details. However, if someone *else* wanted to submit a branch/patch to have enum items inherit from ints, and that was all it took to have these adopted into the stdlib, I would be happy to take a look. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From shane at umbrellacode.com Wed Jan 30 02:27:52 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 17:27:52 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: Wait, it was much simpler than that? >>> def until(items): ... stops = [] ... def stop(): ... stops.append(1) ... yield stop ... items = iter(items) ... counter = 0 ... while not stops: ... yield next(items) ... print(counter) ... counter += 1 ... >>> >>> gen = until(range(15)) >>> stop = next(gen) >>> [x for x in gen if x < 3 or stop()] 0 1 2 3 [0, 1, 2] >>> I must have just been up for too long that this looks like something new to me. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 3:37 PM, Yuriy Taraday wrote: > On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier wrote: > list(i for i in range(100) if i<50 or stop()) > Really (!) nice (and 2x as fast as using itertools.takewhile())! > > I couldn't believe it so I had to check it: > > from __future__ import print_function > import functools, itertools, operator, timeit > > def var1(): > def _gen(): > for i in range(100): > if i > 50: break > yield i > return list(_gen()) > > def var2(): > def stop(): > raise StopIteration > return list(i for i in range(100) if i <= 50 or stop()) > > def var3(): > return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] > > def var4(): > return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] > > if __name__ == '__main__': > for f in (var1, var2, var3, var4): > print(f.__name__, end=' ') > print(timeit.timeit(f)) > > Results on my machine: > > var1 20.4974410534 > var2 23.6218020916 > var3 32.1543409824 > var4 4.90913701057 > > var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? > var3 is the slowest. I guess, because it has lambda in it. > But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. > > -- > > Kind regards, Yuriy. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 02:56:36 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 17:56:36 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com> Ah, right, feeding it through an iterator gives you full control? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 5:27 PM, Shane Green wrote: > Wait, it was much simpler than that? > > >>> def until(items): > ... stops = [] > ... def stop(): > ... stops.append(1) > ... yield stop > ... items = iter(items) > ... counter = 0 > ... while not stops: > ... yield next(items) > ... print(counter) > ... counter += 1 > ... > >>> > >>> gen = until(range(15)) > >>> stop = next(gen) > >>> [x for x in gen if x < 3 or stop()] > 0 > 1 > 2 > 3 > [0, 1, 2] > >>> > > > I must have just been up for too long that this looks like something new to me. > > > > Shane Green > www.umbrellacode.com > 408-692-4666 | shane at umbrellacode.com > > On Jan 29, 2013, at 3:37 PM, Yuriy Taraday wrote: > >> On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier wrote: >> list(i for i in range(100) if i<50 or stop()) >> Really (!) nice (and 2x as fast as using itertools.takewhile())! >> >> I couldn't believe it so I had to check it: >> >> from __future__ import print_function >> import functools, itertools, operator, timeit >> >> def var1(): >> def _gen(): >> for i in range(100): >> if i > 50: break >> yield i >> return list(_gen()) >> >> def var2(): >> def stop(): >> raise StopIteration >> return list(i for i in range(100) if i <= 50 or stop()) >> >> def var3(): >> return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] >> >> def var4(): >> return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] >> >> if __name__ == '__main__': >> for f in (var1, var2, var3, var4): >> print(f.__name__, end=' ') >> print(timeit.timeit(f)) >> >> Results on my machine: >> >> var1 20.4974410534 >> var2 23.6218020916 >> var3 32.1543409824 >> var4 4.90913701057 >> >> var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? >> var3 is the slowest. I guess, because it has lambda in it. >> But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. >> >> -- >> >> Kind regards, Yuriy. >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 03:31:37 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 18:31:37 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com> Message-ID: <864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com> Although it's not always viable, given how easy it is to wrap an iterator, it seems like might come in handy for comprehensions. [x for x in items if x < 50 or items.close()] Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 03:34:24 2013 From: shane at umbrellacode.com (Shane Green) Date: Tue, 29 Jan 2013 18:34:24 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <3B15C735-3030-4E1B-900E-BF2C7B1A2A92@umbrellacode.com> <864D6A71-6663-478A-B342-83F5634DF15C@umbrellacode.com> Message-ID: <644AED9E-D6A3-45AC-B07B-57EF7A2B6442@umbrellacode.com> Sorry, that was phrased backwards: the ease of wrapping iterators increases the viability? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 29, 2013, at 6:31 PM, Shane Green wrote: > Although it's not always viable, given how easy it is to wrap an iterator, it seems like might come in handy for comprehensions. > > [x for x in items if x < 50 or items.close()] > > > > Shane Green > www.umbrellacode.com > 408-692-4666 | shane at umbrellacode.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Wed Jan 30 00:26:35 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Jan 2013 12:26:35 +1300 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: Message-ID: <51085AAB.6090303@canterbury.ac.nz> Eli Bendersky wrote: > I really wish there would be an enum type in Python that would make > sense. ISTM this has been raised numerous times, but not one submitted a > good-enough proposal. I think the reason the discussion petered out last time is that everyone has a slightly different idea on what an enum type should be like. A number of proposals were made, but none of them stood out as being the obviously right one to put in the std lib. Also, so far nobody has come up with a really elegant solution to the DRY problem that inevitably arises in connection with enums. Ideally you want to be able to specify the names of the enums as identifiers, and not have to write them again as strings or otherwise provide explicit values for them. That seems to be very difficult to achieve cleanly with Python syntax as it stands. -- Greg From eliben at gmail.com Wed Jan 30 03:45:07 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 29 Jan 2013 18:45:07 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <51085AAB.6090303@canterbury.ac.nz> References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: On Tue, Jan 29, 2013 at 3:26 PM, Greg Ewing wrote: > Eli Bendersky wrote: > >> I really wish there would be an enum type in Python that would make >> sense. ISTM this has been raised numerous times, but not one submitted a >> good-enough proposal. >> > > I think the reason the discussion petered out last time > is that everyone has a slightly different idea on what > an enum type should be like. A number of proposals were > made, but none of them stood out as being the obviously > right one to put in the std lib. > > Also, so far nobody has come up with a really elegant > solution to the DRY problem that inevitably arises in > connection with enums. Ideally you want to be able to > specify the names of the enums as identifiers, and not > have to write them again as strings or otherwise provide > explicit values for them. That seems to be very difficult > to achieve cleanly with Python syntax as it stands. Since we're discussing a new language feature, why do we have to be restricted by the existing Python syntax? We have plenty of time before 3.4 at this point. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Wed Jan 30 05:01:34 2013 From: guido at python.org (Guido van Rossum) Date: Tue, 29 Jan 2013 20:01:34 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: On Tue, Jan 29, 2013 at 6:45 PM, Eli Bendersky wrote: > On Tue, Jan 29, 2013 at 3:26 PM, Greg Ewing > wrote: >> >> Eli Bendersky wrote: >>> >>> I really wish there would be an enum type in Python that would make >>> sense. ISTM this has been raised numerous times, but not one submitted a >>> good-enough proposal. >> >> I think the reason the discussion petered out last time >> is that everyone has a slightly different idea on what >> an enum type should be like. A number of proposals were >> made, but none of them stood out as being the obviously >> right one to put in the std lib. >> >> Also, so far nobody has come up with a really elegant >> solution to the DRY problem that inevitably arises in >> connection with enums. Ideally you want to be able to >> specify the names of the enums as identifiers, and not >> have to write them again as strings or otherwise provide >> explicit values for them. That seems to be very difficult >> to achieve cleanly with Python syntax as it stands. Hm, if people really want to write something like color = enum(RED, WHITE, BLUE) that might still be true, but given that it's likely going to look a little more like a class definition, this doesn't look so bad, and certainly doesn't violate DRY (though it's somewhat verbose): class color(enum): RED = value() WHITE = value() BLUE = value() The Python 3 metaclass can observe the order in which the values are defined easily by setting the class dict to an OrderdDict. > Since we're discussing a new language feature, why do we have to be > restricted by the existing Python syntax? We have plenty of time before 3.4 > at this point. Introducing new syntax requires orders of magnitude more convincing than a new library module or even a new builtin. -- --Guido van Rossum (python.org/~guido) From greg.ewing at canterbury.ac.nz Wed Jan 30 05:34:57 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Jan 2013 17:34:57 +1300 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: <5108A2F1.5010006@canterbury.ac.nz> Guido van Rossum wrote: > this doesn't look so bad, and > certainly doesn't violate DRY (though it's somewhat verbose): > > class color(enum): > RED = value() > WHITE = value() > BLUE = value() The verbosity is what makes it fail the "truly elegant" test for me. And I would say that it does violate DRY in the sense that you have to write value() repeatedly for no good reason. Sure, it's not bad enough to make it unusable, but like all the other solutions, it leaves me feeling vaguely annoyed that there isn't a better way. And it *is* bad enough to make writing an enum definition into a dreary chore, rather than the pleasure it should be. -- Greg From greg.ewing at canterbury.ac.nz Wed Jan 30 05:58:37 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Wed, 30 Jan 2013 17:58:37 +1300 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: <5108A87D.9000207@canterbury.ac.nz> Guido van Rossum wrote: > class color(enum): > RED = value() > WHITE = value() > BLUE = value() We could do somewhat better than that: class Color(Enum): RED, WHITE, BLUE = range(3) However, it's still slightly annoying that you have to specify how many values there are in the range() call. It would be even nicer it we could just use an infinite iterator, such as class Color(Enum): RED, WHITE, BLUE = values() However, the problem here is that the unpacking bytecode anally insists on the iterator providing *exactly* the right number of items, and there is no way for values() to know when to stop producing items. So, suppose we use a slightly extended version of the iterator protocol for unpacking purposes. If the object being unpacked has an __endunpack__ method, we call it after unpacking the last value, and it is responsible for doing appopriate checking and raising an exception if necessary. Otherwise we do as we do now. The values() object can then have an __endunpack__ method that does nothing, allowing you to unpack any number of items from it. -- Greg From eliben at gmail.com Wed Jan 30 06:26:11 2013 From: eliben at gmail.com (Eli Bendersky) Date: Tue, 29 Jan 2013 21:26:11 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: > Hm, if people really want to write something like > > color = enum(RED, WHITE, BLUE) > > that might still be true, but given that it's likely going to look a > little more like a class definition, this doesn't look so bad, and > certainly doesn't violate DRY (though it's somewhat verbose): > > class color(enum): > RED = value() > WHITE = value() > BLUE = value() > > The Python 3 metaclass can observe the order in which the values are > defined easily by setting the class dict to an OrderdDict. > > Even though I agree that enums lend themselves nicely to "class"-y syntax, the example you provide shows exactly why sticking to existing syntax makes use bend over backwards. Because 'color' is really not a class. And I don't want to explicitly say it's both a class and it subclasses something called 'enum'. And I don't want to specify values when I don't need values. All I really want is: enum color: RED WHITE BLUE Or shorter: enum color: RED, WHITE, BLUE Would adding a new "enum" keyword in Python 3.4 *really* meet that much resistance? ISTM built-in, standard, enums have been on the wishlist of Python developers for a long time. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Wed Jan 30 08:26:39 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jan 2013 08:26:39 +0100 Subject: [Python-ideas] constant/enum type in stdlib References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> Message-ID: <20130130082639.0b28d7eb@pitrou.net> On Wed, 30 Jan 2013 17:58:37 +1300 Greg Ewing wrote: > Guido van Rossum wrote: > > > class color(enum): > > RED = value() > > WHITE = value() > > BLUE = value() > > We could do somewhat better than that: > > class Color(Enum): > RED, WHITE, BLUE = range(3) > > However, it's still slightly annoying that you have to > specify how many values there are in the range() call. > It would be even nicer it we could just use an infinite > iterator, such as > > class Color(Enum): > RED, WHITE, BLUE = values() Well, how about: class Color(Enum): values = ('RED', 'WHITE', 'BLUE') ? (replace values with __values__ if you prefer) Regards Antoine. From storchaka at gmail.com Wed Jan 30 08:49:43 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Jan 2013 09:49:43 +0200 Subject: [Python-ideas] Interrupting threads In-Reply-To: <51049915.3060808@mrabarnett.plus.com> References: <51049915.3060808@mrabarnett.plus.com> Message-ID: On 27.01.13 05:03, MRAB wrote: > I know that this topic has been discussed before, but I've added a new > twist... For previous discussion see topics "Thread stopping" [1] and "Protecting finally clauses of interruptions" [2]. See also PEP 419 [3] created on the results of the last discussion. [1] http://comments.gmane.org/gmane.comp.python.ideas/14647 [2] http://comments.gmane.org/gmane.comp.python.ideas/14689 [3] http://www.python.org/dev/peps/pep-0419/ From mal at egenix.com Wed Jan 30 09:37:02 2013 From: mal at egenix.com (M.-A. Lemburg) Date: Wed, 30 Jan 2013 09:37:02 +0100 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <51087225.3040801@hastings.org> References: <51087225.3040801@hastings.org> Message-ID: <5108DBAE.8030601@egenix.com> On 30.01.2013 02:06, Larry Hastings wrote: > > > Properties are a wonderful facility. But they only work on conventional objects. Specifically, > they *don't* work on module objects. It would be nice to extend module objects so properties worked > there too. > > For example, Victor Stinner's currently proposed PEP 433 adds two new methods to the sys module: > sys.getdefaultcloexc() and sys.setdefaultcloexc(). What are we, Java? Surely this would be much > nicer as a property, sys.defaultcloexc. Would be nice, but I'm not sure how you'd implement this, since module contents are accessed directly via the module dictionary, so the attribute lookup hook to add the property magic is missing. Overall, it would be great to have modules behave more like classes. This idea has been floating around for years, but hasn't gone far due to the above direct content dict access approach taken by the Python code. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jan 30 2013) >>> Python Projects, Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From wolfgang.maier at biologie.uni-freiburg.de Wed Jan 30 10:46:31 2013 From: wolfgang.maier at biologie.uni-freiburg.de (Wolfgang Maier) Date: Wed, 30 Jan 2013 09:46:31 +0000 (UTC) Subject: [Python-ideas] while conditional in list comprehension ?? References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: Yuriy Taraday writes: > > > On Tue, Jan 29, 2013 at 7:44 PM, Wolfgang Maier wrote: > list(i for i in range(100) if i<50 or stop()) > Really (!) nice (and 2x as fast as using itertools.takewhile())! > > > > I couldn't believe it so I had to check it: > > > from __future__ import print_function > > import functools, itertools, operator, timeit > > def var1(): > def _gen(): > for i in range(100): > if i > 50: break > yield i > > return list(_gen()) > > def var2(): > def stop(): > raise StopIteration > return list(i for i in range(100) if i <= 50 or stop()) > > > def var3(): > return [i for i in itertools.takewhile(lambda n: n <= 50, range(100))] > > def var4(): > return [i for i in itertools.takewhile(functools.partial(operator.lt, 50), range(100))] > > > if __name__ == '__main__': > for f in (var1, var2, var3, var4): > print(f.__name__, end=' ') > print(timeit.timeit(f)) > > > > Results on my machine: > > > var1 20.4974410534 > var2 23.6218020916 > var3 32.1543409824 > var4 4.90913701057 > > var1 might have became the fastest of the first 3 because it's a special and very simple case. Why should explicit loops be slower that generator expressions? > > var3 is the slowest. I guess, because it has lambda in it. > But switching to Python and back can not be faster than the last option - sitting in the C code as much as we can. > > > -- Kind regards, Yuriy. Steven D'Aprano writes: > > On 30/01/13 02:44, Wolfgang Maier wrote: > > > list(i for i in range(100) if i<50 or stop()) > > Really (!) nice (and 2x as fast as using itertools.takewhile())! > > I think you are mistaken about the speed. The itertools iterators are highly > optimized and do all their work in fast C code. If you are seeing takewhile > as slow, you are probably doing something wrong: untrustworthy timing code, > misinterpreting what you are seeing, or some other error. > > Here's a comparison done the naive or obvious way. Copy and paste it into an > interactive Python session: > > from itertools import takewhile > from timeit import Timer > > def stop(): raise StopIteration > > setup = 'from __main__ import stop, takewhile' > > t1 = Timer('list(i for i in xrange(1000) if i < 50 or stop())', setup) > t2 = Timer('[i for i in takewhile(lambda x: x < 50, xrange(1000))]', setup) > > min(t1.repeat(number=100000, repeat=5)) > min(t2.repeat(number=100000, repeat=5)) > > On my computer, t1 is about 1.5 times faster than t2. But this is misleading, > because it's not takewhile that is slow. I am feeding something slow into > takewhile. If I really need to run as fast as possible, I can optimize the > function call inside takewhile: > > from operator import lt > from functools import partial > > small_enough = partial(lt, 50) > setup2 = 'from __main__ import takewhile, small_enough' > > t3 = Timer('[i for i in takewhile(small_enough, xrange(1000))]', setup2) > > min(t3.repeat(number=100000, repeat=5)) > > On my computer, t3 is nearly 13 times faster than t1, and 19 times faster > than t2. Here are the actual times I get, using Python 2.7: > > py> min(t1.repeat(number=100000, repeat=5)) # using the StopIteration hack > 1.2609241008758545 > py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda > 1.85182785987854 > py> min(t3.repeat(number=100000, repeat=5)) # optimized version > 0.09847092628479004 > Hi Yuriy and Steven, a) I had compared the originally proposed 'takewhile with lambda' version to the 'if cond or stop()' solution using 'timeit' just like you did. In principle, you find the same as I did, although I am a bit surprised that our differences are different. To be exact 'if cond or stop()' was 1.84 x faster in my hands than 'takewhile with lambda'. b) I have to say I was very impressed by the speed gains you report through the use of 'partial', which I had not thought of at all, I have to admit. However, I tested your suggestions and I think they both suffer from the same mistake: your condition is 'partial(lt,50)', but this is not met to begin with and results in an empty list at least for me. Have you two actually checked the output of the code or have you just timed it? I found that in order to make it work the comparison has to be made via 'partial(gt,50)'. With this modification the resulting list in your example would be [0,..,49] as it should be. And now the big surprise in terms of runtimes: partial(lt,50) variant: 1.17 (but incorrect results) partial(gt,50) variant: 13.95 if cond or stop() variant: 9.86 I guess python is just smart enough to recognize that it compares against a constant value all the time, and optimizes the code accordingly (after all the if clause is a pretty standard thing to use in a comprehension). So the reason for your reported speed-gain is that you actually broke out of the comprehension at the very first element instead of going through the first 50! Please comment, if you get different results. Best, Wolfgang From ncoghlan at gmail.com Wed Jan 30 10:54:18 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 30 Jan 2013 19:54:18 +1000 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <51087225.3040801@hastings.org> References: <51087225.3040801@hastings.org> Message-ID: On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings wrote: > > > Properties are a wonderful facility. But they only work on conventional > objects. Specifically, they *don't* work on module objects. It would be > nice to extend module objects so properties worked there too. As MAL notes, the issues with such an approach are: - code executed at module scope - code in inner scopes that uses "global" - code that uses globals() - code that directly modifies a module's __dict__ There is too much code that expects to be able to modify a module's namespace directly without going through the attribute access machinery. However, a slightly more practical suggestion might be: 1. Officially bless the practice of placing class instances in sys.modules (currently this is tolerated, since it's the only way to manage things like lazy module loading, but not officially recommended as the way to achieve "module properties") 2. Change sys from a module object to an ordinary class instance Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From chris.jerdonek at gmail.com Wed Jan 30 10:54:24 2013 From: chris.jerdonek at gmail.com (Chris Jerdonek) Date: Wed, 30 Jan 2013 01:54:24 -0800 Subject: [Python-ideas] Canceled versus cancelled (was Re: Interrupting threads) In-Reply-To: References: <51049915.3060808@mrabarnett.plus.com> <5106B372.5040803@mrabarnett.plus.com> <20130129105443.2804520b@pitrou.net> Message-ID: On Tue, Jan 29, 2013 at 9:28 AM, Terry Reedy wrote: > On 1/29/2013 8:18 AM, Richard Oudkerk wrote: >> >> On 29/01/2013 9:54am, Antoine Pitrou wrote: >>> >>> Of course, I sympathize with native English speakers who are annoyed >>> by the prevalence of Globish over real English. That said, Python >>> already mandates American English instead of British English. >> >> >> Is Future.cancelled() an acceptable American spelling? > > > Slightly controversial, but 'Yes'. My 1960s Dictionary of the American > language gives 'canceled' and 'cancelled'. Ditto for travel. I see the same > at modern web sites: > http://www.merriam-webster.com/dictionary/cancel > http://www.thefreedictionary.com/cancel > > Both give the one el version first, and that might indicate a preference. > But I was actually taught in school (some decades ago) to double the els of > travel and cancel have have read the rule various places. I suspect that is > not done now. More discussion: FWIW, my high school grammar teacher (who himself wrote a grammar book) taught us a rule about this. I can't remember the rule in its entirety, but part of it involved the location of the accent. If the accent is on the last syllable, then the final consonant is doubled -- modulo the rest of the rule. :) For example, "referring" and "fathering." Of course, there are exceptions. --Chris > > http://www.reference.com/motif/language/cancelled-vs-canceled > http://grammarist.com/spelling/cancel/ > > The latter has a Google ngram that shows 'canceled' has become more common > in the U.S., but only in the last 30 years. It has even crept into British > usage. > > http://books.google.com/ngrams/graph?content=canceled%2Ccancelled&year_start=1800&year_end=2000&corpus=6&smoothing=3&share= > > On the other hand, just about no one, even in the U.S., currently spells > 'cancellation' as 'cancelation'. That was tried by a few writers 1910 to > 1940, but never caught on. > > http://books.google.com/ngrams/graph?content=cancelation%2Ccancellation&year_start=1800&year_end=2000&corpus=17&smoothing=3&share= > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas From saghul at gmail.com Wed Jan 30 10:55:51 2013 From: saghul at gmail.com (=?ISO-8859-1?Q?Sa=FAl_Ibarra_Corretg=E9?=) Date: Wed, 30 Jan 2013 10:55:51 +0100 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: References: <51070056.8020006@gmail.com> <51082C41.2030508@gmail.com> Message-ID: <5108EE27.1000102@gmail.com> > Yeah, so do the other polling things on Windows. (Well, mostly > sockets. There are some other things supported like named pipes.) > In pyuv there is a pecial handle for those (Pipe) which works on both unix and windows with the same interface. > I guess in order to support this we'd need some kind of abstraction > away from socket objects and file descriptors, at least for event loop > methods like sock_recv() and add_reader(). But those are mostly meant > for transports to build upon, so I think that would be fine. > I see, great! [snip] > > Yeah, I see. If we squint and read "handle" instead of "socket" we > could even make it so that loop.sock_recv() takes one of these -- it > would return a Future and your callback would set the Future's result, > or its exception if an error was set. > YEah, sounds like it could work :-) Anyway, I wouldn't be opposed to leaving to APIs just for Python sockets (which I can interact with using a Poll handle) if transports can be built on top other entities such as TCP handles. [snip] > >> The second model seems more flexible indeed. I guess the SSL transport could >> be tricky, because while currently Tulip uses the ssl module I have no TLS >> handle on pyuv so I'd have to build one on top of a TCP handle with >> pyOpenSSL (I have a prototype here [1]), so object types / APIs wouldn't >> match, unless Tulip provides some wrappers for SSL related objects such as >> certificates... > > Hm, I thought certificates were just blobs of data? We should probably > come up with a standard way to represent these that isn't tied to the > stdlib's ssl module. But I don't think this should be part of PEP 3156 > -- it's too big already. > Yes, they are blobs, I meant the objects that wrap those blobs and provide verification functions and such. But that can indeed be left out and have implementation deal with it, having tulip just hand over the blobs. Regards, -- Sa?l Ibarra Corretg? http://saghul.net/blog | http://about.me/saghul From mark.hackett at metoffice.gov.uk Wed Jan 30 11:32:54 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Wed, 30 Jan 2013 10:32:54 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <5108383E.3020501@trueblade.com> References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> Message-ID: <201301301032.54211.mark.hackett@metoffice.gov.uk> On Tuesday 29 Jan 2013, Eric V. Smith wrote: > On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote: > > Eric V. Smith writes: > > > True. But my point stands: it's possible to read the data (even with a > > > DictReader), do something with the data, and not know the column names > > > in advance. It's not an impossible use case. > > > > But it is. Dicts don't guarantee iteration order, so you will most > > likely get an output file that not only has a different delimiter, but > > a different order of fields. > > We're going to have to agree to disagree. Order is not always important. > It's not impossible that we're living in a simulated world. If you don't know what's in the csv file at all, then how do you know what you're supposed to do with it. Reading into a list will ensure order, so that is usable if order is important. If the names aren't important at all, then you should drop the first line and read it into a list again. If the names are important, you'd better know what names the headers are using. From stefan_ml at behnel.de Wed Jan 30 11:34:31 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 30 Jan 2013 11:34:31 +0100 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: References: <51087225.3040801@hastings.org> Message-ID: Nick Coghlan, 30.01.2013 10:54: > On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings wrote: >> Properties are a wonderful facility. But they only work on conventional >> objects. Specifically, they *don't* work on module objects. It would be >> nice to extend module objects so properties worked there too. > > As MAL notes, the issues with such an approach are: > > - code executed at module scope > - code in inner scopes that uses "global" > - code that uses globals() > - code that directly modifies a module's __dict__ > > There is too much code that expects to be able to modify a module's > namespace directly without going through the attribute access > machinery. The Cython project has been wanting this feature for years. We even considered writing our own Module (sub-)type for this, but didn't get ourselves convinced that all of the involved hassle was really worth it. The main drive behind it is full control over setters to allow for safe and fast internal C level access to module globals (which usually don't change from the outside but may...). Currently, users help themselves by explicitly declaring globals as static internal names that are invisible to external Python code. Allowing regular objects in sys.modules would be one way to do it, but these things are a lot more involved at the C level than at the Python level due to the C level module setup procedure. I wouldn't mind letting such a feature appear at the C level first, even though the Python syntax would be pretty obvious anyway. It's not like people would commonly mess around with sys.__dict__. (Although, many C modules have a Python module wrapper these days, not sure if and how this should get passed through.) Stefan From steve at pearwood.info Wed Jan 30 13:09:20 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 30 Jan 2013 23:09:20 +1100 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301301032.54211.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> <201301301032.54211.mark.hackett@metoffice.gov.uk> Message-ID: <51090D70.2050102@pearwood.info> On 30/01/13 21:32, Mark Hackett wrote: > If you don't know what's in the csv file at all, then how do you know what > you're supposed to do with it. Maybe you're processing the file without caring what the column names are, but you still need to map column name to column contents. This is no more unusual than processing a dict where you don't know the keys: you just iterate over them. Or maybe you're scanning the file for one specific column name, and you don't care what the other names are. Or, most likely, you know what you are *expecting* in the CSV file, but because data files don't always contain what you expect, you want to be notified if there is something unexpected rather than just have it silently do the wrong thing. -- Steven From mark.hackett at metoffice.gov.uk Wed Jan 30 13:14:09 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Wed, 30 Jan 2013 12:14:09 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <51090D70.2050102@pearwood.info> References: <1358903168.4767.4.camel@webb> <201301301032.54211.mark.hackett@metoffice.gov.uk> <51090D70.2050102@pearwood.info> Message-ID: <201301301214.09203.mark.hackett@metoffice.gov.uk> On Wednesday 30 Jan 2013, Steven D'Aprano wrote: > On 30/01/13 21:32, Mark Hackett wrote: > > If you don't know what's in the csv file at all, then how do you know > > what you're supposed to do with it. > > Maybe you're processing the file without caring what the column names are, If you don't care, then you shouldn't be using a dictionary because you have to know to say what one you want. > but you still need to map column name to column contents. Why? You said this hypothetical reckless person doesn't care. > This is no more > unusual than processing a dict where you don't know the keys: you just > iterate over them. > Which is only used for printing the info out. There's a much easier way to do that: "cat file.csv" > Or maybe you're scanning the file for one specific column name, and you > don't care what the other names are. > Then you'll know if it's duplicated or not. > Or, most likely, you know what you are *expecting* in the CSV file, but > because data files don't always contain what you expect, you want to be > notified if there is something unexpected rather than just have it > silently do the wrong thing. > There's a way to do that: "head -n1 file.csv". You know, have a look. From shane at umbrellacode.com Wed Jan 30 13:24:53 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 30 Jan 2013 04:24:53 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301301032.54211.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> <201301301032.54211.mark.hackett@metoffice.gov.uk> Message-ID: <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com> So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content. There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case. Namely: - There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name. - When there are duplicate multiple values assigned to a name, order can be very important. - They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values. - Each of these have been deprecated an their FieldStorage now always maps field names to lists of values. I've implemented a Record class I'm going to pitch for feedback. Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects. This Record class is a dictionary type that maps header names to the values from columns labeled by that same header. Most lists have a single field because usually headers aren't duplicated. When multiple values are in a field, they are listed in the order they were read from the CSV file. The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired. The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well. In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV. While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only. This was for several reasons. One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content. About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-) I included a generator that seemed to work on the some test files. It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further. Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test. Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future. > class Record(dict): > def __init__(self, headers, fields): > if len(headers) != len(fields): > # I don't make decicions about how gaps should be filled. > raise ValueError("header/field size mismatch") > self._headers = headers > self._fields = tuple(fields) > [self.setdefault(h,[]).append(v) for h,v in self.fielditems()] > super(Record, self).__init__() > def fielditems(self): > """ > Get header,value sequence that reflects CSV source. > """ > return zip(self.headers(),self.fields()) > def headers(self): > """ > Get ordered sequence of headers reflecting CSV source. > """ > return self._headers > def fields(self): > """ > Get ordered sequence of values reflecting CSV row source. > """ > return self._fields > def getfirst(self, name, default=None): > """ > Get value of last field associated with header named > 'name'; return 'default' if no such value exists. > """ > return self[name][0] if name in self else default > def getlast(self, name, default=None): > """ > Get value of last field associated with header named > 'name'; return 'default' if no such value exists. > """ > return self[name][-1] if name in self else default > def getlist(self, name): > """ > Get values of all fields associated with header named 'name'. > """ > return self.get(name, []) > def pretty(self, header=True): > lines = [] > if header: > lines.append( > ["%s".ljust(10).rjust(20) % h for h in self.headers()]) > lines.append( > ["%s".ljust(10).rjust(20) % v for v in self.fields()]) > return "\n\n".join(["|".join(line).strip() for line in lines]) > def __getslice__(self, start=0, stop=None): > return self.fields()[start: stop] > > > import itertools > > Undefined = object() > def iterrecords(f, headers=None, bucketheader=Undefined, > missingfieldsok=False, dialect="excel", *args, **kw): > rows = reader(f, dialect, *args, **kw) > for row in itertools.ifilter(None, rows): > if not headers: > headers = row > headcount = len(headers) > print headers > continue > rowcount = len(row) > rowheaders = headers > if rowcount < headcount: > if not missingfieldsok: > raise KeyError("row has more values than headers") > elif rowcount > headcount: > if bucketheader is Undefined: > raise KeyError("row has more values than headers") > rowheaders += [bucketheader] * (rowcount - headcount) > record = Record(rowheaders, row) > yield record # That's run within the context of the "csv" module to work? maybe. Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 30, 2013, at 2:32 AM, Mark Hackett wrote: > On Tuesday 29 Jan 2013, Eric V. Smith wrote: >> On 1/29/2013 3:37 PM, Stephen J. Turnbull wrote: >>> Eric V. Smith writes: >>>> True. But my point stands: it's possible to read the data (even with a >>>> DictReader), do something with the data, and not know the column names >>>> in advance. It's not an impossible use case. >>> >>> But it is. Dicts don't guarantee iteration order, so you will most >>> likely get an output file that not only has a different delimiter, but >>> a different order of fields. >> >> We're going to have to agree to disagree. Order is not always important. >> > > It's not impossible that we're living in a simulated world. > > If you don't know what's in the csv file at all, then how do you know what > you're supposed to do with it. > > Reading into a list will ensure order, so that is usable if order is > important. If the names aren't important at all, then you should drop the first > line and read it into a list again. If the names are important, you'd better > know what names the headers are using. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 13:59:17 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 30 Jan 2013 04:59:17 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com> References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> <201301301032.54211.mark.hackett@metoffice.gov.uk> <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com> Message-ID: > So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content. There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case. > > Namely: > - There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name. > - When there are duplicate multiple values assigned to a name, order can be very important. > - They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values. > - Each of these have been deprecated an their FieldStorage now always maps field names to lists of values. > > I've implemented a Record class I'm going to pitch for feedback. Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects. This Record class is a dictionary type that maps header names to the values from columns labeled by that same header. Most lists have a single field because usually headers aren't duplicated. When multiple values are in a field, they are listed in the order they were read from the CSV file. The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired. The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well. In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV. > > While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only. This was for several reasons. One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content. > > About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-) I included a generator that seemed to work on the some test files. It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further. Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test. Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future. > > > > >> class Record(dict): >> def __init__(self, headers, fields): >> if len(headers) != len(fields): >> # I don't make decicions about how gaps should be filled. >> raise ValueError("header/field size mismatch") >> self._headers = headers >> self._fields = tuple(fields) >> [self.setdefault(h,[]).append(v) for h,v in self.fielditems()] >> super(Record, self).__init__() >> def fielditems(self): >> """ >> Get header,value sequence that reflects CSV source. >> """ >> return zip(self.headers(),self.fields()) >> def headers(self): >> """ >> Get ordered sequence of headers reflecting CSV source. >> """ >> return self._headers >> def fields(self): >> """ >> Get ordered sequence of values reflecting CSV row source. >> """ >> return self._fields >> def getfirst(self, name, default=None): >> """ >> Get value of last field associated with header named >> 'name'; return 'default' if no such value exists. >> """ >> return self[name][0] if name in self else default >> def getlast(self, name, default=None): >> """ >> Get value of last field associated with header named >> 'name'; return 'default' if no such value exists. >> """ >> return self[name][-1] if name in self else default >> def getlist(self, name): >> """ >> Get values of all fields associated with header named 'name'. >> """ >> return self.get(name, []) >> def pretty(self, header=True): >> lines = [] >> if header: >> lines.append( >> ["%s".ljust(10).rjust(20) % h for h in self.headers()]) >> lines.append( >> ["%s".ljust(10).rjust(20) % v for v in self.fields()]) >> return "\n\n".join(["|".join(line).strip() for line in lines]) >> def __getslice__(self, start=0, stop=None): >> return self.fields()[start: stop] >> >> >> import itertools >> >> Undefined = object() >> def iterrecords(f, headers=None, bucketheader=Undefined, >> missingfieldsok=False, dialect="excel", *args, **kw): >> rows = reader(f, dialect, *args, **kw) >> for row in itertools.ifilter(None, rows): >> if not headers: >> headers = row >> headcount = len(headers) >> print headers >> continue >> rowcount = len(row) >> rowheaders = headers >> if rowcount < headcount: >> if not missingfieldsok: >> raise KeyError("row has more values than headers") >> elif rowcount > headcount: >> if bucketheader is Undefined: >> raise KeyError("row has more values than headers") >> rowheaders += [bucketheader] * (rowcount - headcount) >> record = Record(rowheaders, row) >> yield record > I should probably also have noted the dictionary API behaviour since it's not explicitly: keys() -> list of unique() header names. values() -> list of field values lists. items() -> [(header, field-list),] pairs. And then of course dictionary lookup. One thing that comes to mind is that there's really no value to the unordered sequence of value lists; there could be some value in extending an OrderedDict, making all the iteration methods consistent and therefore something that could be used to do something like write values, etc?. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeff at jeffreyjenkins.ca Wed Jan 30 15:04:47 2013 From: jeff at jeffreyjenkins.ca (Jeff Jenkins) Date: Wed, 30 Jan 2013 09:04:47 -0500 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> <201301301032.54211.mark.hackett@metoffice.gov.uk> <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com> Message-ID: I think this may have been lost somewhere in the last 90 messages, but adding a warning to DictReader in the docs seems like it solves almost the entire problem. New csv.DictReader users are informed, no one's old code breaks, and a separate discussion can be had about whether it's worth adding a csv.MultiDictReader which uses lists. On Wed, Jan 30, 2013 at 7:59 AM, Shane Green wrote: > So I've done some thinking on it, a bit of research, etc., and have worked > with a lot of different CSV content. There are a lot of parallels between > the name/value pairs of an HTML form submission, and our use case. > > Namely: > - There's typically only one value per name, but it's perfectly legal to > have multiple values assigned to a name. > - When there are duplicate multiple values assigned to a name, order can > be very important. > - They made the mistake of mapping names to values; they made the mistake > of mapping name field names to singular values when there was only one > value, and multiple values where there were multiple values. > - Each of these have been deprecated an their FieldStorage now always maps > field names to lists of values. > > I've implemented a Record class I'm going to pitch for feedback. Although > I followed the FieldStorage API for a couple of methods, it didn't > translate very well because their values are complex objects. This Record > class is a dictionary type that maps header names to the values from > columns labeled by that same header. Most lists have a single field > because usually headers aren't duplicated. When multiple values are in a > field, they are listed in the order they were read from the CSV file. The > API provides convenience methods for getting the first or last value listed > for a given column name, making it very easy to turn work with singular > values when desired. The dictionary API will likely bent primary mechanism > for interacting with it, however, knows the header and row sequences it was > built from, and provides sequential access to them as well. In addition to > working with non-standard CSV, performing transformations, etc.this > information makes it possible to reproduce correctly ordered CSV. > > While I don't really know yet whether it would make sense to support any > kind of manipulation of values on the record instances themselves, versus > using more copy()/update() approach to defining modifying records or > something, but I did decide to wrap the row values in a tuple, making it > read only. This was for several reasons. One was to address a potential > inconsistency that might arise should we decide to support editing, and the > other is because the record is the representation of that row read from the > source file, and so it should always accurately reflect that content. > > About the code: I wrote it tonight, tested it for an hour, so it's not > meant to be perfect or final, but it should stir up a very concrete > discussion about the API, if nothing else ;-) I included a generator that > seemed to work on the some test files. It most definitely is not meant to > be critiqued or a distraction, but I've included it in case anyone ends up > wanting to investigate the things further. Although the iterator function > provides a slightly different signature that DictReader, that's not because > I'm trying toe change anything; please keep in mind the generator was just > a test. Also, I'd like to mention one last time that I don't think we > should change what exists to reflect any of these changes: I was thinking > it would be a new set of classes and functions that, that would become the > preferred implementation in the future. > > > > > class Record(dict): > def __init__(self, headers, fields): > if len(headers) != len(fields): > # I don't make decicions about how gaps should be filled. > raise ValueError("header/field size mismatch") > self._headers = headers > self._fields = tuple(fields) > [self.setdefault(h,[]).append(v) for h,v in self.fielditems()] > super(Record, self).__init__() > def fielditems(self): > """ > Get header,value sequence that reflects CSV source. > """ > return zip(self.headers(),self.fields()) > def headers(self): > """ > Get ordered sequence of headers reflecting CSV source. > """ > return self._headers > def fields(self): > """ > Get ordered sequence of values reflecting CSV row source. > """ > return self._fields > def getfirst(self, name, default=None): > """ > Get value of last field associated with header named > 'name'; return 'default' if no such value exists. > """ > return self[name][0] if name in self else default > def getlast(self, name, default=None): > """ > Get value of last field associated with header named > 'name'; return 'default' if no such value exists. > """ > return self[name][-1] if name in self else default > def getlist(self, name): > """ > Get values of all fields associated with header named 'name'. > """ > return self.get(name, []) > def pretty(self, header=True): > lines = [] > if header: > lines.append( > ["%s".ljust(10).rjust(20) % h for h in self.headers()]) > lines.append( > ["%s".ljust(10).rjust(20) % v for v in self.fields()]) > return "\n\n".join(["|".join(line).strip() for line in lines]) > def __getslice__(self, start=0, stop=None): > return self.fields()[start: stop] > > > import itertools > > Undefined = object() > def iterrecords(f, headers=None, bucketheader=Undefined, > missingfieldsok=False, dialect="excel", *args, **kw): > rows = reader(f, dialect, *args, **kw) > for row in itertools.ifilter(None, rows): > if not headers: > headers = row > headcount = len(headers) > print headers > continue > rowcount = len(row) > rowheaders = headers > if rowcount < headcount: > if not missingfieldsok: > raise KeyError("row has more values than headers") > elif rowcount > headcount: > if bucketheader is Undefined: > raise KeyError("row has more values than headers") > rowheaders += [bucketheader] * (rowcount - headcount) > record = Record(rowheaders, row) > yield record > > > > > I should probably also have noted the dictionary API behaviour since it's > not explicitly: > keys() -> list of unique() header names. > values() -> list of field values lists. > items() -> [(header, field-list),] pairs. > > And then of course dictionary lookup. One thing that comes to mind is > that there's really no value to the unordered sequence of value lists; > there could be some value in extending an OrderedDict, making all the > iteration methods consistent and therefore something that could be used to > do something like write values, etc?. > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dreis.pt at hotmail.com Wed Jan 30 15:22:37 2013 From: dreis.pt at hotmail.com (Daniel Reis) Date: Wed, 30 Jan 2013 14:22:37 +0000 Subject: [Python-ideas] Standard library high level support for email messages In-Reply-To: References: Message-ID: Hello all, Python, as a "batteries included" language, strives to provide out of the box solution for most common programming tasks. Composing and sending email messages is a common task, supported by `email` and `smtplib` modules. However, a programmer not familiar with MIME won't be able to create non-trivial email messages. Actually, this proposal idea comes from the frustration of fast learning about MIME to get the job done, and later learn that some people?s email clients couldn't properly display the messages because I tripped in some details of multipart messages with Text+HTML and attachments. You can call me a bad programmer, but couldn't / shouldn't this be easier? Should a programmer be required to know about MIME in order to send a decently composed email with images or attachments? The hardest part is already built in. Why not go that one step further and add to the email standard library an wrapper to handle common email composition without exposing the MIME details. Something similar to http://code.activestate.com/recipes/576858-send-html-or-text-email-with-or-without-attachment, or perhaps including as lib such as pyzlib. Regards Daniel Reis -------------- next part -------------- An HTML attachment was scrubbed... URL: From shane at umbrellacode.com Wed Jan 30 15:44:26 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 30 Jan 2013 06:44:26 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> <87a9rr20a5.fsf@uwakimon.sk.tsukuba.ac.jp> <5108383E.3020501@trueblade.com> <201301301032.54211.mark.hackett@metoffice.gov.uk> <0DE28815-D265-44D5-AC17-0A7524C6DF5D@umbrellacode.com> Message-ID: <8A15CA39-99E1-4E57-8541-FE39B53323DD@umbrellacode.com> """Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future.""" This is kind of that new discussion. I agree? Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 30, 2013, at 6:04 AM, Jeff Jenkins wrote: > I think this may have been lost somewhere in the last 90 messages, but adding a warning to DictReader in the docs seems like it solves almost the entire problem. New csv.DictReader users are informed, no one's old code breaks, and a separate discussion can be had about whether it's worth adding a csv.MultiDictReader which uses lists. > > > On Wed, Jan 30, 2013 at 7:59 AM, Shane Green wrote: >> So I've done some thinking on it, a bit of research, etc., and have worked with a lot of different CSV content. There are a lot of parallels between the name/value pairs of an HTML form submission, and our use case. >> >> Namely: >> - There's typically only one value per name, but it's perfectly legal to have multiple values assigned to a name. >> - When there are duplicate multiple values assigned to a name, order can be very important. >> - They made the mistake of mapping names to values; they made the mistake of mapping name field names to singular values when there was only one value, and multiple values where there were multiple values. >> - Each of these have been deprecated an their FieldStorage now always maps field names to lists of values. >> >> I've implemented a Record class I'm going to pitch for feedback. Although I followed the FieldStorage API for a couple of methods, it didn't translate very well because their values are complex objects. This Record class is a dictionary type that maps header names to the values from columns labeled by that same header. Most lists have a single field because usually headers aren't duplicated. When multiple values are in a field, they are listed in the order they were read from the CSV file. The API provides convenience methods for getting the first or last value listed for a given column name, making it very easy to turn work with singular values when desired. The dictionary API will likely bent primary mechanism for interacting with it, however, knows the header and row sequences it was built from, and provides sequential access to them as well. In addition to working with non-standard CSV, performing transformations, etc.this information makes it possible to reproduce correctly ordered CSV. >> >> While I don't really know yet whether it would make sense to support any kind of manipulation of values on the record instances themselves, versus using more copy()/update() approach to defining modifying records or something, but I did decide to wrap the row values in a tuple, making it read only. This was for several reasons. One was to address a potential inconsistency that might arise should we decide to support editing, and the other is because the record is the representation of that row read from the source file, and so it should always accurately reflect that content. >> >> About the code: I wrote it tonight, tested it for an hour, so it's not meant to be perfect or final, but it should stir up a very concrete discussion about the API, if nothing else ;-) I included a generator that seemed to work on the some test files. It most definitely is not meant to be critiqued or a distraction, but I've included it in case anyone ends up wanting to investigate the things further. Although the iterator function provides a slightly different signature that DictReader, that's not because I'm trying toe change anything; please keep in mind the generator was just a test. Also, I'd like to mention one last time that I don't think we should change what exists to reflect any of these changes: I was thinking it would be a new set of classes and functions that, that would become the preferred implementation in the future. >> >> >> >> >>> class Record(dict): >>> def __init__(self, headers, fields): >>> if len(headers) != len(fields): >>> # I don't make decicions about how gaps should be filled. >>> raise ValueError("header/field size mismatch") >>> self._headers = headers >>> self._fields = tuple(fields) >>> [self.setdefault(h,[]).append(v) for h,v in self.fielditems()] >>> super(Record, self).__init__() >>> def fielditems(self): >>> """ >>> Get header,value sequence that reflects CSV source. >>> """ >>> return zip(self.headers(),self.fields()) >>> def headers(self): >>> """ >>> Get ordered sequence of headers reflecting CSV source. >>> """ >>> return self._headers >>> def fields(self): >>> """ >>> Get ordered sequence of values reflecting CSV row source. >>> """ >>> return self._fields >>> def getfirst(self, name, default=None): >>> """ >>> Get value of last field associated with header named >>> 'name'; return 'default' if no such value exists. >>> """ >>> return self[name][0] if name in self else default >>> def getlast(self, name, default=None): >>> """ >>> Get value of last field associated with header named >>> 'name'; return 'default' if no such value exists. >>> """ >>> return self[name][-1] if name in self else default >>> def getlist(self, name): >>> """ >>> Get values of all fields associated with header named 'name'. >>> """ >>> return self.get(name, []) >>> def pretty(self, header=True): >>> lines = [] >>> if header: >>> lines.append( >>> ["%s".ljust(10).rjust(20) % h for h in self.headers()]) >>> lines.append( >>> ["%s".ljust(10).rjust(20) % v for v in self.fields()]) >>> return "\n\n".join(["|".join(line).strip() for line in lines]) >>> def __getslice__(self, start=0, stop=None): >>> return self.fields()[start: stop] >>> >>> >>> import itertools >>> >>> Undefined = object() >>> def iterrecords(f, headers=None, bucketheader=Undefined, >>> missingfieldsok=False, dialect="excel", *args, **kw): >>> rows = reader(f, dialect, *args, **kw) >>> for row in itertools.ifilter(None, rows): >>> if not headers: >>> headers = row >>> headcount = len(headers) >>> print headers >>> continue >>> rowcount = len(row) >>> rowheaders = headers >>> if rowcount < headcount: >>> if not missingfieldsok: >>> raise KeyError("row has more values than headers") >>> elif rowcount > headcount: >>> if bucketheader is Undefined: >>> raise KeyError("row has more values than headers") >>> rowheaders += [bucketheader] * (rowcount - headcount) >>> record = Record(rowheaders, row) >>> yield record >> > > > I should probably also have noted the dictionary API behaviour since it's not explicitly: > keys() -> list of unique() header names. > values() -> list of field values lists. > items() -> [(header, field-list),] pairs. > > And then of course dictionary lookup. One thing that comes to mind is that there's really no value to the unordered sequence of value lists; there could be some value in extending an OrderedDict, making all the iteration methods consistent and therefore something that could be used to do something like write values, etc?. > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From phd at phdru.name Wed Jan 30 15:54:10 2013 From: phd at phdru.name (Oleg Broytman) Date: Wed, 30 Jan 2013 18:54:10 +0400 Subject: [Python-ideas] Standard library high level support for email messages In-Reply-To: References: Message-ID: <20130130145410.GA30635@iskra.aviel.ru> Hi! On Wed, Jan 30, 2013 at 02:22:37PM +0000, Daniel Reis wrote: > Python, as a "batteries included" language, strives to provide out of the box solution for most common programming tasks. > Composing and sending email messages is a common task, supported by `email` and `smtplib` modules. > > However, a programmer not familiar with MIME won't be able to create non-trivial email messages. > Actually, this proposal idea comes from the frustration of fast learning about MIME to get the job done, and later learn that some people?s email clients couldn't properly display the messages because I tripped in some details of multipart messages with Text+HTML and attachments. > > You can call me a bad programmer, but couldn't / shouldn't this be easier? > Should a programmer be required to know about MIME in order to send a decently composed email with images or attachments? > > The hardest part is already built in. Why not go that one step further and add to the email standard library an wrapper to handle common email composition without exposing the MIME details. > > Something similar to http://code.activestate.com/recipes/576858-send-html-or-text-email-with-or-without-attachment, or perhaps including as lib such as pyzlib. The Law of Leaked Abstractions. If you are going to use a protocol or a data format you have to learn all the basic details and deep internals. Yes, it's inevitable. Because if something went wrong (sooner or later) how do you debug your code without deep understanding of what's going on? One of the most painful experience with email in Russia is when some server (forum software, e.g.) running on Linux and using koi8-r charset sends mail messages with unencoded headers to Windows users who use cp1251 encoding. This is because server software are often written by people who never use anything besides pure ascii so they write code like: print "Subject: " + subject How do you debug bug reports without understanding why and how you have to encode mail headers? It was just an example, but I think it shows an important point. On the other hand actually writing software shouldn't be hard, I agree. The way to extend the standard library is: write a module, publish it on PyPI, make it popular, apply for inclusion the module. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN. From mark.hackett at metoffice.gov.uk Wed Jan 30 16:16:37 2013 From: mark.hackett at metoffice.gov.uk (Mark Hackett) Date: Wed, 30 Jan 2013 15:16:37 +0000 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: References: <1358903168.4767.4.camel@webb> Message-ID: <201301301516.37499.mark.hackett@metoffice.gov.uk> On Wednesday 30 Jan 2013, Jeff Jenkins wrote: > I think this may have been lost somewhere in the last 90 messages, but > adding a warning to DictReader in the docs seems like it solves almost the > entire problem. Jeff, it breaks code that works now because duplicates aren't cared about. Shane is putting code up for a NEW call that you can use if you're worried about how the current one works and consideration for this issue is being included in the derivation of a new library for the next (and therefore allowed to be incompatible) python library version. From fuzzyman at gmail.com Wed Jan 30 16:16:49 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 30 Jan 2013 15:16:49 +0000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130129202730.6ea6d0d5@anarchist.wooz.org> References: <20130129202730.6ea6d0d5@anarchist.wooz.org> Message-ID: On 30 January 2013 01:27, Barry Warsaw wrote: > On Jan 28, 2013, at 11:50 PM, Joao S. O. Bueno wrote: > > >And it was not dismissed at all - to the contrary the last e-mail in the > >thread is a message from the BDLF for it to **be** ! The discussion > happened > >in a bad moment as Python was mostly freature froozen for 3.2 - and it did > >not show up again for Python 3.3; > > I still offer up my own enum implementation, which I've used and has been > available for years on PyPI, and hasn't had a new release in months > because it > hasn't needed one. :) It should be compatible with Pythons from 2.6 to > 3.3. > > http://pypi.python.org/pypi/flufl.enum > > The one hang up about it the last time this came up was that my enum items > are > not ints and Guido though they should be. I actually tried at one point to > make that so, but had some troublesome test failures that I didn't have > time > or motivation to fix, mostly because I don't particularly like those > semantics. I don't remember the details. > > However, if someone *else* wanted to submit a branch/patch to have enum > items > inherit from ints, and that was all it took to have these adopted into the > stdlib, I would be happy to take a look. > > Being an int subclass (and possibly optionally a strs subclass) is a requirement if any adopted Enum is to be used *within* the standard library in places where integers are currently used as "poor man's enums". I also don't *think* flufl.enum supports flag enums (ones that can be OR'd together), right? Michael > Cheers, > -Barry > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Wed Jan 30 16:25:36 2013 From: stefan_ml at behnel.de (Stefan Behnel) Date: Wed, 30 Jan 2013 16:25:36 +0100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> Message-ID: Eli Bendersky, 30.01.2013 06:26: > enum color: > RED, WHITE, BLUE > > Would adding a new "enum" keyword in Python 3.4 *really* meet that much > resistance? ISTM built-in, standard, enums have been on the wishlist of > Python developers for a long time. Special cases aren't special enough to break the rules (or even existing code!). Stefan From fuzzyman at gmail.com Wed Jan 30 16:22:06 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 30 Jan 2013 15:22:06 +0000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130082639.0b28d7eb@pitrou.net> References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 30 January 2013 07:26, Antoine Pitrou wrote: > On Wed, 30 Jan 2013 17:58:37 +1300 > Greg Ewing wrote: > > Guido van Rossum wrote: > > > > > class color(enum): > > > RED = value() > > > WHITE = value() > > > BLUE = value() > > > > We could do somewhat better than that: > > > > class Color(Enum): > > RED, WHITE, BLUE = range(3) > With a Python 3 metaclass that provides default values for *looked up* entries you could have this: class Color(Enum): RED, WHITE, BLUE The lookup would create the member - with the appropriate value. Michael > > > > However, it's still slightly annoying that you have to > > specify how many values there are in the range() call. > > It would be even nicer it we could just use an infinite > > iterator, such as > > > > class Color(Enum): > > RED, WHITE, BLUE = values() > > Well, how about: > > class Color(Enum): > values = ('RED', 'WHITE', 'BLUE') > > ? > > (replace values with __values__ if you prefer) > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at gmail.com Wed Jan 30 16:30:48 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 30 Jan 2013 15:30:48 +0000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 30 January 2013 15:22, Michael Foord wrote: > > > On 30 January 2013 07:26, Antoine Pitrou wrote: > >> On Wed, 30 Jan 2013 17:58:37 +1300 >> Greg Ewing wrote: >> > Guido van Rossum wrote: >> > >> > > class color(enum): >> > > RED = value() >> > > WHITE = value() >> > > BLUE = value() >> > >> > We could do somewhat better than that: >> > >> > class Color(Enum): >> > RED, WHITE, BLUE = range(3) >> > > > > With a Python 3 metaclass that provides default values for *looked up* > entries you could have this: > > class Color(Enum): > RED, WHITE, BLUE > > The lookup would create the member - with the appropriate value. > > class values(dict): def __init__(self): self.value = 0 def __getitem__(self, key): try: return dict.__getitem__(self, key) except KeyError: value = self[key] = self.value self.value += 1 return value class EnumMeta(type): @classmethod def __prepare__(metacls, name, bases): return values() def __new__(cls, name, bases, classdict): result = type.__new__(cls, name, bases, dict(classdict)) return result class Enum(metaclass=EnumMeta): pass class Color(Enum): RED, WHITE, BLUE > Michael > > > > > >> > >> > However, it's still slightly annoying that you have to >> > specify how many values there are in the range() call. >> > It would be even nicer it we could just use an infinite >> > iterator, such as >> > >> > class Color(Enum): >> > RED, WHITE, BLUE = values() >> >> Well, how about: >> >> class Color(Enum): >> values = ('RED', 'WHITE', 'BLUE') >> >> ? >> >> (replace values with __values__ if you prefer) >> >> Regards >> >> Antoine. >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> http://mail.python.org/mailman/listinfo/python-ideas >> > > > > -- > > http://www.voidspace.org.uk/ > > May you do good and not evil > May you find forgiveness for yourself and forgive others > > May you share freely, never taking more than you give. > -- the sqlite blessing http://www.sqlite.org/different.html > > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Wed Jan 30 16:35:48 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jan 2013 10:35:48 -0500 Subject: [Python-ideas] constant/enum type in stdlib References: <20130129202730.6ea6d0d5@anarchist.wooz.org> Message-ID: <20130130103548.12bce67d@anarchist.wooz.org> On Jan 30, 2013, at 03:16 PM, Michael Foord wrote: >Being an int subclass (and possibly optionally a strs subclass) is a >requirement if any adopted Enum is to be used *within* the standard library >in places where integers are currently used as "poor man's enums". I also >don't *think* flufl.enum supports flag enums (ones that can be OR'd >together), right? Sure, it does because you have to be explicit about the enum int value to assign the item. This doesn't bother me because the syntax is clear, I almost always want an explicit int value anyway, inheritance is supported, and as you comment, flag values are (mostly) easy to support. class Colors(Enum): red = 1 green = 2 blue = 3 class MoreColors(Colors): cyan = 4 magenta = 5 # chartreuse = 2 would be an error class Flags(Enum): beautiful = 1 fast = 2 elegant = 4 wonderful = 8 Now, it's true that because Flags.fast is not an int, it must be explicitly converted to an int, e.g. `int(Flags.fast)`. That doesn't bother me. What does bother me is that Enum doesn't support automatic conversion to int for OR and AND, so you have to do this: >>> int(Flags.fast) | int(Flags.elegant) 6 That should be easy enough to fix by adding the appropriate operators so that you could do: >>> Flags.fast | Flags.elegant 6 Returning an int from such operations is the only sensible interpretation. https://bugs.launchpad.net/flufl.enum/+bug/1110501 As far as autonumbering goes, I think we could support that in Python 3.3+, though I don't have any brilliant ideas on syntax. A couple of suggestions are in this bug: https://bugs.launchpad.net/flufl.enum/+bug/1110507 e.g class Colors(Enum): red = None green = None blue = None or from flufl.enum import Enum, auto class Colors(Enum): red = auto green = auto blue = auto I'm definitely open to suggestions here! Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From guido at python.org Wed Jan 30 16:45:12 2013 From: guido at python.org (Guido van Rossum) Date: Wed, 30 Jan 2013 07:45:12 -0800 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: <5108EE27.1000102@gmail.com> References: <51070056.8020006@gmail.com> <51082C41.2030508@gmail.com> <5108EE27.1000102@gmail.com> Message-ID: On Wed, Jan 30, 2013 at 1:55 AM, Sa?l Ibarra Corretg? wrote: > >> Yeah, so do the other polling things on Windows. (Well, mostly >> sockets. There are some other things supported like named pipes.) >> > > In pyuv there is a pecial handle for those (Pipe) which works on both unix > and windows with the same interface. PEP 3156 should add a new API for adding a pipe (either the read or write end). Someone worked on that for a bit, search last week's python-ideas archives. >> I guess in order to support this we'd need some kind of abstraction >> away from socket objects and file descriptors, at least for event loop >> methods like sock_recv() and add_reader(). But those are mostly meant >> for transports to build upon, so I think that would be fine. >> > > I see, great! The iocp branch now has all these refactorings. >> Hm, I thought certificates were just blobs of data? We should probably >> come up with a standard way to represent these that isn't tied to the >> stdlib's ssl module. But I don't think this should be part of PEP 3156 >> -- it's too big already. >> > > Yes, they are blobs, I meant the objects that wrap those blobs and provide > verification functions and such. But that can indeed be left out and have > implementation deal with it, having tulip just hand over the blobs. Do you know how to write code like that? It would be illustrative to take the curl.py and crawl.py examples and adjust them so that if the protocol is https, the server's authenticity is checked and reported. I've never dealt with this myself so I would probably do it wrong... :-( -- --Guido van Rossum (python.org/~guido) From barry at python.org Wed Jan 30 16:46:23 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jan 2013 10:46:23 -0500 Subject: [Python-ideas] Standard library high level support for email messages References: Message-ID: <20130130104623.4fb79da2@anarchist.wooz.org> On Jan 30, 2013, at 02:22 PM, Daniel Reis wrote: >The hardest part is already built in. Why not go that one step further and >add to the email standard library an wrapper to handle common email >composition without exposing the MIME details. Please discuss this on the email-sig mailing list. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From eliben at gmail.com Wed Jan 30 17:17:10 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 30 Jan 2013 08:17:10 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130103548.12bce67d@anarchist.wooz.org> References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> Message-ID: On Wed, Jan 30, 2013 at 7:35 AM, Barry Warsaw wrote: > On Jan 30, 2013, at 03:16 PM, Michael Foord wrote: > > >Being an int subclass (and possibly optionally a strs subclass) is a > >requirement if any adopted Enum is to be used *within* the standard > library > >in places where integers are currently used as "poor man's enums". I also > >don't *think* flufl.enum supports flag enums (ones that can be OR'd > >together), right? > > Sure, it does because you have to be explicit about the enum int value to > assign the item. This doesn't bother me because the syntax is clear, I > almost > always want an explicit int value anyway, inheritance is supported, and as > you > comment, flag values are (mostly) easy to support. > > class Colors(Enum): > red = 1 > green = 2 > blue = 3 > > class MoreColors(Colors): > cyan = 4 > magenta = 5 > # chartreuse = 2 would be an error > > class Flags(Enum): > beautiful = 1 > fast = 2 > elegant = 4 > wonderful = 8 > > > Now, it's true that because Flags.fast is not an int, it must be explicitly > converted to an int, e.g. `int(Flags.fast)`. That doesn't bother me. > > What does bother me is that Enum doesn't support automatic conversion to > int > for OR and AND, so you have to do this: > > >>> int(Flags.fast) | int(Flags.elegant) > 6 > > That should be easy enough to fix by adding the appropriate operators so > that > you could do: > > >>> Flags.fast | Flags.elegant > 6 > > Returning an int from such operations is the only sensible interpretation. > > https://bugs.launchpad.net/flufl.enum/+bug/1110501 > > As far as autonumbering goes, I think we could support that in Python 3.3+, > though I don't have any brilliant ideas on syntax. A couple of suggestions > are in this bug: > > https://bugs.launchpad.net/flufl.enum/+bug/1110507 > > e.g > > class Colors(Enum): > red = None > green = None > blue = None > > or > > from flufl.enum import Enum, auto > class Colors(Enum): > red = auto > green = auto > blue = auto > > I'm definitely open to suggestions here! > Barry, since you've obviously given this issue a lot of thought, maybe you could summarize it in a PEP so we have a clear way of moving forward for 3.4 ? Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Jan 30 17:21:41 2013 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 30 Jan 2013 16:21:41 +0000 Subject: [Python-ideas] libuv based eventloop for tulip experiment In-Reply-To: References: <51070056.8020006@gmail.com> <51082C41.2030508@gmail.com> <5108EE27.1000102@gmail.com> Message-ID: On 30 January 2013 15:45, Guido van Rossum wrote: >> In pyuv there is a pecial handle for those (Pipe) which works on both unix >> and windows with the same interface. > > PEP 3156 should add a new API for adding a pipe (either the read or > write end). Someone worked on that for a bit, search last week's > python-ideas archives. That was me. There's a patched version of tulip with pipe connector methods and a subprocess transport using them in my bitbucket repository: https://bitbucket.org/pmoore/tulip Paul From solipsis at pitrou.net Wed Jan 30 17:23:10 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jan 2013 17:23:10 +0100 Subject: [Python-ideas] constant/enum type in stdlib References: <20130129202730.6ea6d0d5@anarchist.wooz.org> Message-ID: <20130130172310.60b49ef4@pitrou.net> Le Wed, 30 Jan 2013 15:16:49 +0000, Michael Foord a ?crit : > > Being an int subclass (and possibly optionally a strs subclass) is a > requirement if any adopted Enum is to be used *within* the standard > library in places where integers are currently used as "poor man's > enums". I also don't *think* flufl.enum supports flag enums (ones > that can be OR'd together), right? If a flexible solution is desired (with either int or str subclassing, various numbering schemes), may I suggest another kind of syntax: class ErrorFlag(Enum): type = 'symbolic' names = ('strict', 'ignore', 'replace') class SeekFlag(Enum): type = 'sequential' names = ('SET', 'CUR', 'END') class TypeFlag(Enum): type = 'bitmask' names = ('HEAPTYPE', 'HAS_GC', 'INT_SUBCLASS') >>> ErrorFlag.ignore ErrorFlag.ignore >>> ErrorFlag.ignore == 'ignore' True >>> ErrorFlag('ignore') ErrorFlag.ignore >>> isinstance(ErrorFlag.ignore, str) True >>> isinstance(ErrorFlag.ignore, int) False >>> ErrorFlag(0) [...] ValueError: invalid value for : 0 >>> SeekFlag('SET') SeekFlag.SET >>> SeekFlag('SET') + 0 0 >>> SeekFlag(0) SeekFlag.SET >>> isinstance(SeekFlag.CUR, int) True >>> isinstance(SeekFlag.CUR, str) False >>> TypeFlag(1) TypeFlag.HEAPTYPE >>> TypeFlag(2) TypeFlag.HAS_GC >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS 6 Regards Antoine. From barry at python.org Wed Jan 30 17:27:07 2013 From: barry at python.org (Barry Warsaw) Date: Wed, 30 Jan 2013 11:27:07 -0500 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> Message-ID: <20130130112707.5cf60dfc@anarchist.wooz.org> On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote: >Barry, since you've obviously given this issue a lot of thought, maybe you >could summarize it in a PEP so we have a clear way of moving forward for >3.4 ? I'm happy to do so if there's a realistic chance of it being accepted. We already have one rejected enum PEP (354) and we probably don't need two. ;) Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From solipsis at pitrou.net Wed Jan 30 17:26:27 2013 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 30 Jan 2013 17:26:27 +0100 Subject: [Python-ideas] constant/enum type in stdlib References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: <20130130172627.32f64e71@pitrou.net> Le Wed, 30 Jan 2013 15:22:06 +0000, Michael Foord a ?crit : > On 30 January 2013 07:26, Antoine Pitrou > wrote: > > > On Wed, 30 Jan 2013 17:58:37 +1300 > > Greg Ewing > > wrote: > > > Guido van Rossum wrote: > > > > > > > class color(enum): > > > > RED = value() > > > > WHITE = value() > > > > BLUE = value() > > > > > > We could do somewhat better than that: > > > > > > class Color(Enum): > > > RED, WHITE, BLUE = range(3) > > > > > > With a Python 3 metaclass that provides default values for *looked up* > entries you could have this: > > class Color(Enum): > RED, WHITE, BLUE This relies on tuple evaluation order, and would also evaluate any other symbol looked up from inside the class body (which means I cannot add anything else than enum symbols to the class). In other words, I'm afraid it would be somewhat fragile ;) Regards Antoine. From eliben at gmail.com Wed Jan 30 17:33:35 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 30 Jan 2013 08:33:35 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130112707.5cf60dfc@anarchist.wooz.org> References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: On Wed, Jan 30, 2013 at 8:27 AM, Barry Warsaw wrote: > On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote: > > >Barry, since you've obviously given this issue a lot of thought, maybe you > >could summarize it in a PEP so we have a clear way of moving forward for > >3.4 ? > > I'm happy to do so if there's a realistic chance of it being accepted. We > already have one rejected enum PEP (354) and we probably don't need two. ;) > > Reading this thread it seems that many core devs are interested in the feature and the discussion is mainly deciding on the exact semantics and implementation. Even Guido didn't really speak against it (only somewhat against adding new syntax). Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From fuzzyman at gmail.com Wed Jan 30 17:35:25 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 30 Jan 2013 16:35:25 +0000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130172627.32f64e71@pitrou.net> References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> <20130130172627.32f64e71@pitrou.net> Message-ID: On 30 January 2013 16:26, Antoine Pitrou wrote: > Le Wed, 30 Jan 2013 15:22:06 +0000, > Michael Foord a > ?crit : > > On 30 January 2013 07:26, Antoine Pitrou > > wrote: > > > > > On Wed, 30 Jan 2013 17:58:37 +1300 > > > Greg Ewing > > > wrote: > > > > Guido van Rossum wrote: > > > > > > > > > class color(enum): > > > > > RED = value() > > > > > WHITE = value() > > > > > BLUE = value() > > > > > > > > We could do somewhat better than that: > > > > > > > > class Color(Enum): > > > > RED, WHITE, BLUE = range(3) > > > > > > > > > > > With a Python 3 metaclass that provides default values for *looked up* > > entries you could have this: > > > > class Color(Enum): > > RED, WHITE, BLUE > > This relies on tuple evaluation order, It does if you do them as a tuple. > and would also evaluate any > other symbol looked up from inside the class body Only if they aren't actually defined. > (which means I > cannot add anything else than enum symbols to the class). > > So not true - it is only *undefined* symbols that are added as enum values. > In other words, I'm afraid it would be somewhat fragile ;) > Well, within specific parameters... Michael > > Regards > > Antoine. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Wed Jan 30 17:42:53 2013 From: larry at hastings.org (Larry Hastings) Date: Wed, 30 Jan 2013 08:42:53 -0800 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: References: <51087225.3040801@hastings.org> Message-ID: <51094D8D.606@hastings.org> On 01/30/2013 01:54 AM, Nick Coghlan wrote: > On Wed, Jan 30, 2013 at 11:06 AM, Larry Hastings wrote: >> Properties are a wonderful facility. But they only work on conventional >> objects. Specifically, they *don't* work on module objects. It would be >> nice to extend module objects so properties worked there too. > As MAL notes, the issues with such an approach are: > > - code executed at module scope > - code in inner scopes that uses "global" > - code that uses globals() > - code that directly modifies a module's __dict__ > > There is too much code that expects to be able to modify a module's > namespace directly without going through the attribute access > machinery. Of those four issues, the latter two are wontfix. Code that futzes with an object's __dict__ bypasses the property machinery but this is already viewed as acceptable. Obviously the point of the proposal is to change the behavior of the first two. Whether this is manageable additional complexity, or fast enough, remains to be seen--which is why this is in ideas not dev. Also, I'm not sure there are any existing globals that we'd want to convert into properties. Assuming this is only used for new globals, this change hopefully wouldn't break existing code. (Fingers crossed.) //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Jan 30 17:56:56 2013 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 30 Jan 2013 18:56:56 +0200 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130172310.60b49ef4@pitrou.net> References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130172310.60b49ef4@pitrou.net> Message-ID: On 30.01.13 18:23, Antoine Pitrou wrote: >>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS > 6 I prefer something like >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS TypeFlag.HAS_GC|INT_SUBCLASS From fuzzyman at gmail.com Wed Jan 30 18:08:36 2013 From: fuzzyman at gmail.com (Michael Foord) Date: Wed, 30 Jan 2013 17:08:36 +0000 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130172310.60b49ef4@pitrou.net> Message-ID: On 30 January 2013 16:56, Serhiy Storchaka wrote: > On 30.01.13 18:23, Antoine Pitrou wrote: > >> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS >>>>> >>>> 6 >> > > I prefer something like > > >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS > TypeFlag.HAS_GC|INT_SUBCLASS > > Indeed - the whole benefit (pretty much) of using an Enum class is that you're no longer dealing with raw ints. Michael > > > ______________________________**_________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/**mailman/listinfo/python-ideas > -- http://www.voidspace.org.uk/ May you do good and not evil May you find forgiveness for yourself and forgive others May you share freely, never taking more than you give. -- the sqlite blessing http://www.sqlite.org/different.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Wed Jan 30 18:19:36 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 30 Jan 2013 09:19:36 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130172627.32f64e71@pitrou.net> References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> <20130130172627.32f64e71@pitrou.net> Message-ID: <51095628.1080406@stoneleaf.us> On 01/30/2013 08:26 AM, Antoine Pitrou wrote: > Le Wed, Michael Foord a ?crit : >> With a Python 3 metaclass that provides default values for *looked up* >> entries you could have this: >> >> class Color(Enum): >> RED, WHITE, BLUE > > This relies on tuple evaluation order, and would also evaluate any > other symbol looked up from inside the class body (which means I > cannot add anything else than enum symbols to the class). Probably a dumb question, but why would you want to add non-enum to an enum class? ~Ethan~ From ethan at stoneleaf.us Wed Jan 30 18:28:35 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 30 Jan 2013 09:28:35 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130172310.60b49ef4@pitrou.net> References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130172310.60b49ef4@pitrou.net> Message-ID: <51095843.3070503@stoneleaf.us> On 01/30/2013 08:23 AM, Antoine Pitrou wrote: > If a flexible solution is desired (with either int or str subclassing, > various numbering schemes), may I suggest another kind of syntax: > > class ErrorFlag(Enum): > type = 'symbolic' > names = ('strict', 'ignore', 'replace') > > class SeekFlag(Enum): > type = 'sequential' > names = ('SET', 'CUR', 'END') > > class TypeFlag(Enum): > type = 'bitmask' > names = ('HEAPTYPE', 'HAS_GC', 'INT_SUBCLASS') This I like. >>>> ErrorFlag.ignore > ErrorFlag.ignore >>>> ErrorFlag.ignore == 'ignore' > True >>>> ErrorFlag('ignore') > ErrorFlag.ignore >>>> isinstance(ErrorFlag.ignore, str) > True >>>> isinstance(ErrorFlag.ignore, int) > False >>>> ErrorFlag(0) > [...] > ValueError: invalid value for : 0 > >>>> SeekFlag('SET') > SeekFlag.SET >>>> SeekFlag('SET') + 0 > 0 >>>> SeekFlag(0) > SeekFlag.SET >>>> isinstance(SeekFlag.CUR, int) > True >>>> isinstance(SeekFlag.CUR, str) > False > >>>> TypeFlag(1) > TypeFlag.HEAPTYPE >>>> TypeFlag(2) > TypeFlag.HAS_GC >>>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS > 6 This should be `TypeFlag.HEAPTYPE|HAS_GC` +1 ~Ethan~ From bruce at leapyear.org Wed Jan 30 18:38:10 2013 From: bruce at leapyear.org (Bruce Leban) Date: Wed, 30 Jan 2013 09:38:10 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <51095628.1080406@stoneleaf.us> References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> <20130130172627.32f64e71@pitrou.net> <51095628.1080406@stoneleaf.us> Message-ID: On Wed, Jan 30, 2013 at 9:19 AM, Ethan Furman wrote: > Probably a dumb question, but why would you want to add non-enum to an > enum class? > class Color(Enum): RED, WHITE, BLUE def translate(language): """Get the name of an enum in the specified language.""" pass --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From yorik.sar at gmail.com Wed Jan 30 18:56:51 2013 From: yorik.sar at gmail.com (Yuriy Taraday) Date: Wed, 30 Jan 2013 21:56:51 +0400 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier < wolfgang.maier at biologie.uni-freiburg.de> wrote: > your condition is 'partial(lt,50)', but this is not met to begin with and > results in an empty list at least for me. Have you two actually checked the > output of the code or have you just timed it? > Yeah. Shame on me. You're right. My belief in partial and operator module has been shaken. -- Kind regards, Yuriy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From oscar.j.benjamin at gmail.com Wed Jan 30 19:05:44 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 30 Jan 2013 18:05:44 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: On 30 January 2013 17:56, Yuriy Taraday wrote: > > On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier > wrote: >> >> your condition is 'partial(lt,50)', but this is not met to begin with and >> results in an empty list at least for me. Have you two actually checked >> the >> output of the code or have you just timed it? > > Yeah. Shame on me. You're right. My belief in partial and operator module > has been shaken. > This is why I prefer this stop() idea to any of the takewhile() versions: regardless of performance it leads to clearer code, that can be understood more easily. Oscar From shane at umbrellacode.com Wed Jan 30 20:02:51 2013 From: shane at umbrellacode.com (Shane Green) Date: Wed, 30 Jan 2013 11:02:51 -0800 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: <0D930FE7-D150-4DA5-90AB-F3EDAFB00E63@umbrellacode.com> Although it's a bit of a cheat, if you create a wrapper of the thing you're iterating, or don't mind closing it (it's probably best to wrap it unless you know what it is), both generators and list comprehensions can be "while iterated" using this approach: [item for item in items if condition or items.close()] When I tested it earlier with a 1000 entries 5 times and had forgotten the parens on close(), it made it really obvious there would be times when the wrapping overhead wasn't a problem: On Jan 30, 2013, at 9:02 AM, Shane Green wrote: > Nice catch. New times, > > >>> timeit.timeit(var1) > 8.533167123794556 > >>> timeit.timeit(var2) > 9.067211151123047 > >>> timeit.timeit(var3) > 12.966150999069214 > >>> timeit.timeit(var4) > > And I accidentally ran this (without parens), so it was a regular comprehension: > def var5(count=1000): > seq = (i for i in xrange(count)) > return [i for i in seq if i < 50 or seq.close] > > >>> timeit.timeit(var5) > 212.26763486862183 > > Then fixed it: > >>> timeit.timeit(var5) > 10.280441045761108 > >>> > > > > > > > Shane Green > 805-452-9666 | shane.green at me.com > > Begin forwarded message: > >> From: Wolfgang Maier >> Subject: RE: [Python-ideas] while conditional in list comprehension ?? >> Date: January 30, 2013 8:40:51 AM PST >> To: 'Shane Green' >> >> Careful! You?re using range() in the slow ones, but xrange() in the fast ones. >> With the input seq being much longer than the output, differences in the time it takes to produce the range object may be important. >> >> From: Shane Green [mailto:shane.green at me.com] >> Sent: Wednesday, January 30, 2013 5:37 PM >> To: Wolfgang Maier >> Subject: Re: [Python-ideas] while conditional in list comprehension ?? >> >> >>> def var1(count=1000): >> ... def _gen(): >> ... for i in range(count): >> ... if i > 50: break >> ... yield i >> ... return list(_gen()) >> ... >> >>> def var2(count=1000): >> ... def stop(): >> ... raise StopIteration >> ... return list(i for i in range(count) if i <= 50 or stop()) >> ... >> >>> def var3(count=1000): >> ... return [i for i in itertools.takewhile(lambda n: n <= 50, range(count))] >> ... >> >>> def var4(count=1000): >> ... return [i for i in itertools.takewhile(functools.partial(operator.lt, 50) >> ... >> >>> def var5(count=1000): >> ... seq = (i for i in xrange(count)) >> ... return [i for i in seq if i < 50 or seq.close()] >> >> >>> timeit.timeit(var1) >> 19.118155002593994 >> >>> timeit.timeit(var2) >> >> 19.217869997024536 >> >> >>> timeit.timeit(var5) >> 10.251838207244873 >> >> >>> >> >> >> >> >> Shane Green >> 805-452-9666 | shane.green at me.com >> >> On Jan 30, 2013, at 8:17 AM, Wolfgang Maier wrote: >> >> >> list(i for i in a if i < 5000 or a.close()) >> > Shane Green www.umbrellacode.com 408-692-4666 | shane at umbrellacode.com On Jan 30, 2013, at 10:05 AM, Oscar Benjamin wrote: > On 30 January 2013 17:56, Yuriy Taraday wrote: >> >> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier >> wrote: >>> >>> your condition is 'partial(lt,50)', but this is not met to begin with and >>> results in an empty list at least for me. Have you two actually checked >>> the >>> output of the code or have you just timed it? >> >> Yeah. Shame on me. You're right. My belief in partial and operator module >> has been shaken. >> > > This is why I prefer this stop() idea to any of the takewhile() > versions: regardless of performance it leads to clearer code, that can > be understood more easily. > > > Oscar > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.rodola at gmail.com Wed Jan 30 21:13:58 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Wed, 30 Jan 2013 21:13:58 +0100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: 2013/1/30 Eli Bendersky : > > > > On Wed, Jan 30, 2013 at 8:27 AM, Barry Warsaw wrote: >> >> On Jan 30, 2013, at 08:17 AM, Eli Bendersky wrote: >> >> >Barry, since you've obviously given this issue a lot of thought, maybe >> > you >> >could summarize it in a PEP so we have a clear way of moving forward for >> >3.4 ? >> >> I'm happy to do so if there's a realistic chance of it being accepted. We >> already have one rejected enum PEP (354) and we probably don't need two. >> ;) >> > > Reading this thread it seems that many core devs are interested in the > feature and the discussion is mainly deciding on the exact semantics and > implementation. Even Guido didn't really speak against it (only somewhat > against adding new syntax). > > Eli Personally I'm -1 for a variety of reasons. 1) a const/enum type looks like something which is subject to personal taste to me. I personally don't like, for example, how flufl requires to define constants by using a class. It's just a matter of taste but to me module.FOO looks more "right" than module.Bar.FOO. Also "Colors.red < Colors.blue" raising an exception is something subject to personal taste. 2) introducing something like that (class-based) wouldn't help migrating the existent module-level constants we have in the stdlib. Only new projects or new stdlib modules would benefit from it. 3) other than being subject to personal taste, a const/enum type is also pretty easy to implement. For example, I came up with this: http://code.google.com/p/psutil/source/browse/trunk/psutil/_common.py?spec=svn1562&r=1524#33 ...which is sufficient for my needs. Users having different needs can do a similar thing pretty easily. 4) I'm getting the impression that the language is growing too big. To me, this looks like yet another thing that infrequent users have to learn before being able to read and understand Python code. Also consider that people lived without const/enum for 2 decades now. --- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From steve at pearwood.info Wed Jan 30 21:52:15 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 31 Jan 2013 07:52:15 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: <510987FF.9010808@pearwood.info> On 30/01/13 20:46, Wolfgang Maier wrote: > b) I have to say I was very impressed by the speed gains you report through the > use of 'partial', which I had not thought of at all, I have to admit. > However, I tested your suggestions and I think they both suffer from the same > mistake: > your condition is 'partial(lt,50)', but this is not met to begin with and > results in an empty list at least for me. Have you two actually checked the > output of the code or have you just timed it? I found that in order to make it > work the comparison has to be made via 'partial(gt,50)'. Yes, you are absolutely correct. I screwed that up badly. I can only take comfort that apparently so did Yuriy. I don't often paste code in public without testing it, but when I do, it invariably turns out to be wrong. > With this modification > the resulting list in your example would be [0,..,49] as it should be. > > And now the big surprise in terms of runtimes: > partial(lt,50) variant: 1.17 (but incorrect results) > partial(gt,50) variant: 13.95 > if cond or stop() variant: 9.86 I do not get such large differences. I get these: py> min(t1.repeat(number=100000, repeat=5)) # cond or stop() 1.2582030296325684 py> min(t2.repeat(number=100000, repeat=5)) # takewhile and lambda 1.9907748699188232 py> min(t3.repeat(number=100000, repeat=5)) # takewhile and partial 1.8741891384124756 with the timers t1, t2, t3 as per my previous email. > I guess python is just smart enough to recognize that it compares against a > constant value all the time, and optimizes the code accordingly (after all the > if clause is a pretty standard thing to use in a comprehension). No, it is much simpler than that. partial(lt, 50) is equivalent to: lambda x: lt(50, x) which is equivalent to 50 < x, *not* x < 50 like I expected. So the function tests 50 < 0 on the first iteration, which is False, and takewhile immediately returns, giving you an empty list. I was surprised that partial was *so much faster* than a regular function. But it showed me what I expected/wanted to see, and so I didn't question it. A lesson for us all. > So the reason for your reported speed-gain is that you actually broke out of the > comprehension at the very first element instead of going through the first 50! Correct. -- Steven From eliben at gmail.com Wed Jan 30 21:59:27 2013 From: eliben at gmail.com (Eli Bendersky) Date: Wed, 30 Jan 2013 12:59:27 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: > Reading this thread it seems that many core devs are interested in the > > feature and the discussion is mainly deciding on the exact semantics and > > implementation. Even Guido didn't really speak against it (only somewhat > > against adding new syntax). > > > > Eli > > > Personally I'm -1 for a variety of reasons. > > 1) a const/enum type looks like something which is subject to personal > taste to me. I personally don't like, for example, how flufl requires > to define constants by using a class. > It's just a matter of taste but to me module.FOO looks more "right" > than module.Bar.FOO. > Also "Colors.red < Colors.blue" raising an exception is something > subject to personal taste. > > 2) introducing something like that (class-based) wouldn't help > migrating the existent module-level constants we have in the stdlib. > Only new projects or new stdlib modules would benefit from it. > These are more in the domain of implementation details, though, not criticizing the concep? > > 3) other than being subject to personal taste, a const/enum type is > also pretty easy to implement. > For example, I came up with this: > > http://code.google.com/p/psutil/source/browse/trunk/psutil/_common.py?spec=svn1562&r=1524#33 > ...which is sufficient for my needs. > Users having different needs can do a similar thing pretty easily. > It is precisely *because* every library defines its own way to create enums that IMHO we should have them in the language (or in the standard library, at the least). > 4) I'm getting the impression that the language is growing too big. To > me, this looks like yet another thing that infrequent users have to > learn before being able to read and understand Python code. > Also consider that people lived without const/enum for 2 decades now. > I respectfully disagree. Most folks seem to favor a library solution (i.e. no new syntax, just a new metaclass+class to use). The stdlib has tools for very obscure things. In comparison, enum is something almost every non-trivial program needs to use at some stage or another. Eli -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Wed Jan 30 22:09:31 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 30 Jan 2013 16:09:31 -0500 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130082639.0b28d7eb@pitrou.net> References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 1/30/2013 2:26 AM, Antoine Pitrou wrote: > On Wed, 30 Jan 2013 17:58:37 +1300 > Greg Ewing wrote: >> Guido van Rossum wrote: >> >>> class color(enum): >>> RED = value() >>> WHITE = value() >>> BLUE = value() >> >> We could do somewhat better than that: >> >> class Color(Enum): >> RED, WHITE, BLUE = range(3) >> >> However, it's still slightly annoying that you have to >> specify how many values there are in the range() call. For small enumerations, not much of a problem. Or, if one does not want to take the time to count, allow RED, WHITE, BLUE, _extras = range(12) # any number >= n and have a metaclass delete _extras. > Well, how about: > > class Color(Enum): > values = ('RED', 'WHITE', 'BLUE') > ? > (replace values with __values__ if you prefer) I had the same idea, and having never written a metaclass that I can remember, decided to try it. class EnumMeta(type): def __new__(cls, name, bases, dic): for i, name in enumerate(dic['_values']): dic[name] = i del dic['_values'] return type.__new__(cls, name, bases, dic) class Enum(metaclass=EnumMeta): _values = () class Color(Enum): _values = 'RED', 'GREEN', 'BLUE' print(Color.RED, Color.GREEN, Color.BLUE) >>> 0 1 2 So this syntax is at least feasible -- today. -- Terry Jan Reedy From tjreedy at udel.edu Wed Jan 30 22:32:12 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Wed, 30 Jan 2013 16:32:12 -0500 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 1/30/2013 10:30 AM, Michael Foord wrote: > On 30 January 2013 15:22, Michael Foord > With a Python 3 metaclass that provides default values for *looked > up* entries you could have this: > > class Color(Enum): > RED, WHITE, BLUE > > The lookup would create the member - with the appropriate value. > > class values(dict): > def __init__(self): > self.value = 0 > def __getitem__(self, key): Adding 'print(self.value, key)' here prints 0 __name__ 0 __name__ 1 RED 2 WHITE 3 BLUE (I do not understand why it is the second and not first lookup of __name__ that increments the counter, but...) > try: > return dict.__getitem__(self, key) > except KeyError: > value = self[key] = self.value > self.value += 1 > return value > > class EnumMeta(type): > > @classmethod > def __prepare__(metacls, name, bases): > return values() > > def __new__(cls, name, bases, classdict): > result = type.__new__(cls, name, bases, dict(classdict)) > return result > > > class Enum(metaclass=EnumMeta): > pass > class Color(Enum): > RED, WHITE, BLUE So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might expect. That aside (which can be fixed), this is very nice. -- Terry Jan Reedy From g.rodola at gmail.com Wed Jan 30 22:52:53 2013 From: g.rodola at gmail.com (=?ISO-8859-1?Q?Giampaolo_Rodol=E0?=) Date: Wed, 30 Jan 2013 22:52:53 +0100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: 2013/1/30 Eli Bendersky : >> Reading this thread it seems that many core devs are interested in the >> >> > feature and the discussion is mainly deciding on the exact semantics and >> > implementation. Even Guido didn't really speak against it (only somewhat >> > against adding new syntax). >> > >> > Eli >> >> >> Personally I'm -1 for a variety of reasons. >> >> 1) a const/enum type looks like something which is subject to personal >> taste to me. I personally don't like, for example, how flufl requires >> to define constants by using a class. >> It's just a matter of taste but to me module.FOO looks more "right" >> than module.Bar.FOO. >> Also "Colors.red < Colors.blue" raising an exception is something >> subject to personal taste. >> >> 2) introducing something like that (class-based) wouldn't help >> migrating the existent module-level constants we have in the stdlib. >> Only new projects or new stdlib modules would benefit from it. > > > These are more in the domain of implementation details, though, not > criticizing the concep? Personally I'd be +0 for a constant type and -1 for an enum type, which I consider just useless. If a 'constant' type has to be added though, I'd prefer it to be as simple as possible and close to what we've been used thus far, meaning accessing it as "foo.BAR". In everybody's mind it is clear that "foo.BAR" is a constant, and that should be preserved. Something along these lines: >>> from collections import constant >>> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state') >>> STATUS_IDLE 0 >>> str(STATUS_IDLE) 'idle' ---- Giampaolo http://code.google.com/p/pyftpdlib/ http://code.google.com/p/psutil/ http://code.google.com/p/pysendfile/ From cs at zip.com.au Wed Jan 30 23:19:26 2013 From: cs at zip.com.au (Cameron Simpson) Date: Thu, 31 Jan 2013 09:19:26 +1100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <5108A2F1.5010006@canterbury.ac.nz> References: <5108A2F1.5010006@canterbury.ac.nz> Message-ID: <20130130221926.GA20372@cskk.homeip.net> On 30Jan2013 17:34, Greg Ewing wrote: | Guido van Rossum wrote: | > this doesn't look so bad, and | > certainly doesn't violate DRY (though it's somewhat verbose): | > | > class color(enum): | > RED = value() | > WHITE = value() | > BLUE = value() | | The verbosity is what makes it fail the "truly elegant" | test for me. And I would say that it does violate DRY | in the sense that you have to write value() repeatedly | for no good reason. | | Sure, it's not bad enough to make it unusable, but like | all the other solutions, it leaves me feeling vaguely | annoyed that there isn't a better way. How about this: Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9) where None means "pick the next natural choice. The __init__ method goes something like this: def __init__(self, style=None, **kw): self._names = {} self._taken = set() for name, value in kw.items: if name in self._names: raise ValueError("name already taken: " + name) if value is None: while seq in self._taken: seq += 1 value = seq elif value in self._taken: raise ValueError("\"%s\": value already taken: %s" % (name, value)) self._names[name] = value self._taken.add(value) Obviously this needs a little work: - you'd allocate the explicit values first and go after the Nones later so that you don't accidentally take an explicit value - you'd support (pluggable?) styles, starting with sequential, allocating 0, 1, 2, ... and bitmask allocating 1, 2, 4, ... but it lets you enumerate the names without quoting and specify explicit values and let the class pick default values. Cheers, -- Cameron Simpson ERROR 155 - You can't do that. - Data General S200 Fortran error code list From ethan at stoneleaf.us Wed Jan 30 23:26:40 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 30 Jan 2013 14:26:40 -0800 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: <51099E20.8060200@stoneleaf.us> On 01/30/2013 01:52 PM, ? wrote: > 2013/1/30 Eli Bendersky : >> These are more in the domain of implementation details, though, not >> criticizing the concep? > > Personally I'd be +0 for a constant type and -1 for an enum type, > which I consider just useless. > If a 'constant' type has to be added though, I'd prefer it to be as > simple as possible and close to what we've been used thus far, meaning > accessing it as "foo.BAR". > In everybody's mind it is clear that "foo.BAR" is a constant, and that > should be preserved. > Something along these lines: > >>>> from collections import constant >>>> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state') >>>> STATUS_IDLE > 0 >>>> str(STATUS_IDLE) > 'idle' So you'd have something like: --> from collections import constant --> STATUS_IDLE = constant(0, 'idle', doc='refers to the idle state') --> STATUS_PAUSE = constant(1, 'pause', doc='refers to the pause state') --> STATUS_RUN = constant(2, 'run', doc='refers to the run state') ? Absolutely -1 on this. (Although you can certainly implement it now.) ~Ethan~ From steve at pearwood.info Wed Jan 30 23:56:14 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 31 Jan 2013 09:56:14 +1100 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> Message-ID: <5109A50E.8070308@pearwood.info> On 31/01/13 05:05, Oscar Benjamin wrote: > On 30 January 2013 17:56, Yuriy Taraday wrote: >> >> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier >> wrote: >>> >>> your condition is 'partial(lt,50)', but this is not met to begin with and >>> results in an empty list at least for me. Have you two actually checked >>> the >>> output of the code or have you just timed it? >> >> Yeah. Shame on me. You're right. My belief in partial and operator module >> has been shaken. >> > > This is why I prefer this stop() idea to any of the takewhile() > versions: regardless of performance it leads to clearer code, that can > be understood more easily. Funny you say that, clarity of code and ease of understanding is exactly why I dislike this stop() idea. 1) It does not work with list, dict or set comprehensions, only with generator expressions. So if you need a list, dict or set, you have to avoid the obvious list/dict/set comprehension. 2) It is fragile: it is easy enough to come up with examples of the above that *appear* to work: [i for i in range(20) if i < 50 or stop()] # appears to work fine [i for i in range(20) if i < 10 or stop()] # breaks 3) It reads wrong for a Python boolean expression. Given an if clause: if cond1() or cond2() you should expect that an element is generated if either cond1 or cond2 are true. When I see "if cond1() or stop()" I don't read it as "stop if not cond1()" but as a Python bool expression, "generate an element if cond1() gives a truthy value or if stop() gives a truthy value". This "if cond or stop()" is a neat hack, but it's still a hack, and less readable and understandable than I expect from Python code. -- Steven From oscar.j.benjamin at gmail.com Thu Jan 31 01:37:23 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 31 Jan 2013 00:37:23 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: <5109A50E.8070308@pearwood.info> References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> <51086A96.9020300@pearwood.info> <5109A50E.8070308@pearwood.info> Message-ID: On 30 January 2013 22:56, Steven D'Aprano wrote: > On 31/01/13 05:05, Oscar Benjamin wrote: >> >> On 30 January 2013 17:56, Yuriy Taraday wrote: >>> >>> >>> On Wed, Jan 30, 2013 at 1:46 PM, Wolfgang Maier >>> wrote: >>>> >>>> >>>> your condition is 'partial(lt,50)', but this is not met to begin with >>>> and >>>> results in an empty list at least for me. Have you two actually checked >>>> the >>>> output of the code or have you just timed it? >>> >>> >>> Yeah. Shame on me. You're right. My belief in partial and operator module >>> has been shaken. >>> >> >> This is why I prefer this stop() idea to any of the takewhile() >> versions: regardless of performance it leads to clearer code, that can >> be understood more easily. > > Funny you say that, clarity of code and ease of understanding is exactly why > I dislike this stop() idea. > > > 1) It does not work with list, dict or set comprehensions, only with > generator > expressions. So if you need a list, dict or set, you have to avoid the > obvious list/dict/set comprehension. That's true. I would prefer it if a similar effect were achievable in these cases. > > 2) It is fragile: it is easy enough to come up with examples of the above > that *appear* to work: > > [i for i in range(20) if i < 50 or stop()] # appears to work fine > [i for i in range(20) if i < 10 or stop()] # breaks As I said I would prefer a solution that would work for list comprehensions but there isn't one so the stop() method has to come with the caveat that it can only be used in that way. That said, I have become used to using a generator inside a call to dict() or set() (since the comprehensions for those cases were only recently added) so it doesn't seem a big problem to rewrite the above with calls to list(). You are right, though, that a bug like this would be problematic. If the StopIteration leaks up the call stack into a generator that is being for-looped then it creates a confusing debug problem (at least it did the first time I encountered it). > > 3) It reads wrong for a Python boolean expression. Given an if clause: > > if cond1() or cond2() > > you should expect that an element is generated if either cond1 or cond2 > are true. When I see "if cond1() or stop()" I don't read it as "stop if > not cond1()" but as a Python bool expression, "generate an element if > cond1() gives a truthy value or if stop() gives a truthy value". Again I would have preferred 'else break' or something clearer but this seems the best available (I'm open to suggestions). > > This "if cond or stop()" is a neat hack, but it's still a hack, and less > readable and understandable than I expect from Python code. It is a hack (and I would prefer a supported method) but my point was that both you and Yuriy wrote the wrong code without noticing it. You both posted it to a mailing list where no one else noticed until someone actually tried running the code. In other words it wasn't obvious that the code was incorrect just from looking at it. This one looks strange but if you knew what stop() was then you would understand it: list(x for x in range(100) if x < 50 or stop()) This one is difficult to mentally parse even if you understand all of the constituent parts: [x for x in takewhile(partial(lt, 50), range(100))] Oscar From steve at pearwood.info Thu Jan 31 01:45:56 2013 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 31 Jan 2013 11:45:56 +1100 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <51094D8D.606@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> Message-ID: <5109BEC4.4050604@pearwood.info> On 31/01/13 03:42, Larry Hastings wrote: > Also, I'm not sure there are any existing globals that we'd want to convert into properties. How about this? math.pi = 3 which really should give an exception. (I'm sure there are many others.) -- Steven From timothy.c.delaney at gmail.com Thu Jan 31 02:27:24 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Thu, 31 Jan 2013 12:27:24 +1100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 31 January 2013 08:32, Terry Reedy wrote: > On 1/30/2013 10:30 AM, Michael Foord wrote: > >> On 30 January 2013 15:22, Michael Foord >> > > With a Python 3 metaclass that provides default values for *looked >> up* entries you could have this: >> >> class Color(Enum): >> RED, WHITE, BLUE >> >> The lookup would create the member - with the appropriate value. >> >> class values(dict): >> def __init__(self): >> self.value = 0 >> def __getitem__(self, key): >> > > > So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might > expect. That aside (which can be fixed), this is very nice. Here is a version that I think creates an enum with most of the features of traditional and modern enums. - Enum values are subclasses of int; - Only need to declare the enum key name; - Starts at zero by default; - Can change the start value; - Can have discontiguous values (e.g. 0, 1, 5, 6); - Can have other types of class attributes; - Ensures that there is a 1:1 mapping between key:value (throws an exception if either of these is violated; - Able to obtain the keys, values and items as per the mapping interface (sorted by value); - Lookup an enum by key or value; One thing to note is that *any* class attribute assigned a value which implements __index__ will be considered an enum value assignment. I've done some funky stuff to ensure that you can access all the above either via the enum class, or by an instance of the enum class. Most of the time you would just use the Enum subclass directly (i.e. it's a namespace) but there may be use cases for having instances of the Enum classes. import collections import operator class EnumValue(int): def __new__(cls, key, value): e = super().__new__(cls, value) super().__setattr__(e, 'key', key) return e def __setattr__(self, key, value): raise TypeError("Cannot set attribute of type %r" % (type(self),)) def __repr__(self): return "<%s '%s': %d>" % (self.__qualname__, self.key, self) class EnumValues(collections.OrderedDict): def __init__(self): super().__init__() self.value = 0 self.sealed = False def __getitem__(self, key): try: obj = super().__getitem__(key) if not self.sealed and isinstance(obj, EnumValue): raise TypeError("Duplicate enum key '%s' with values: %d and %d" % (obj.key, obj, self.value)) return obj except KeyError: if key[:2] == '__' and key[-2:] == '__': raise value = self.value super().__setitem__(key, EnumValue(key, value)) self.value += 1 return value def __setitem__(self, key, value): if key[:2] == '__' and key[-2:] == '__': return super().__setitem__(key, value) try: if isinstance(value, EnumValue): assert value.key == key else: value = operator.index(value) except TypeError: return super().__setitem__(key, value) try: o = super().__getitem__(key) if isinstance(o, EnumValue): raise TypeError("Duplicate enum key '%s' with values: %d and %d" % (o.key, o, value)) except KeyError: self.value = value + 1 if isinstance(value, EnumValue): value = value else: value = EnumValue(key, value) super().__setitem__(value.key, value) class EnumMeta(type): @classmethod def __prepare__(metacls, name, bases): return EnumValues() def __new__(cls, name, bases, classdict): classdict.sealed = True result = type.__new__(cls, name, bases, dict(classdict)) enum = [] for v in classdict.values(): if isinstance(v, EnumValue): enum.append(v) enum.sort() result._key_to_enum = collections.OrderedDict() result._value_to_enum = collections.OrderedDict() for e in enum: if e in result._value_to_enum: raise TypeError("Duplicate enum value %d for keys: '%s' and '%s'" % (e, result._value_to_enum[e].key), e.key) if e.key in result._key_to_enum: raise TypeError("Duplicate enum key '%s' with values: %d and %d" % (e.key, result._key_to_enum[e.key]), e) result._key_to_enum[e.key] = e result._value_to_enum[e] = e return result def __getitem__(self, key): try: key = operator.index(key) except TypeError: return self._key_to_enum[key] else: return self._value_to_enum[key] def _items(self): return self._key_to_enum.items() def _keys(self): return self._key_to_enum.keys() def _values(self): return self._key_to_enum.values() def items(self): return self._items() def keys(self): return self._keys() def values(self): return self._values() class Enum(metaclass=EnumMeta): def __getitem__(self, key): cls = type(self) return type(cls).__getitem__(cls, key) def items(cls): return cls._items() def keys(cls): return cls._keys() def values(cls): return cls._values() Enum.items = classmethod(Enum.items) Enum.keys = classmethod(Enum.keys) Enum.values = classmethod(Enum.values) class Color(Enum): RED, WHITE, BLUE GREEN = 4 YELLOW ORANGE = 'orange' BLACK def dump(self): print(self.RED, self.WHITE, self.BLUE, self.GREEN, self.YELLOW, self.BLACK, self.ORANGE, self.dump) print(Color.RED, Color.WHITE, Color.BLUE, Color.GREEN, Color.YELLOW, Color.BLACK, Color.ORANGE, Color.dump) Color().dump() print(repr(Color.RED)) print(repr(Color['RED'])) print(repr(Color().RED)) print(repr(Color()['RED'])) print(repr(Color[0])) print(repr(Color()[0])) print(*Color.items()) print(*Color().items()) print(*Color.keys()) print(*Color().keys()) print(*Color.values()) print(*Color().values()) Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry at hastings.org Thu Jan 31 02:53:25 2013 From: larry at hastings.org (Larry Hastings) Date: Wed, 30 Jan 2013 17:53:25 -0800 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109BEC4.4050604@pearwood.info> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> Message-ID: <5109CE95.7060104@hastings.org> On 01/30/2013 04:45 PM, Steven D'Aprano wrote: > On 31/01/13 03:42, Larry Hastings wrote: > >> Also, I'm not sure there are any existing globals that we'd want to >> convert into properties. > > How about this? > > math.pi = 3 > > which really should give an exception. > > (I'm sure there are many others.) Well, hmm. The thing is, properties--at least the existing implementation with classes--doesn't mesh well with direct access via the dict. So, right now, >>> math.__dict__['pi'] 3.141592653589793 If we change math.pi to be a property it wouldn't be in the dict anymore. So that has the possibility of breaking code. We could ameliorate it with >>> math.__dict__['pi'] = math.pi But if the user assigns a different value to math.__dict__['pi'], math.pi will diverge, which again could break code. (Who might try to assign a different value to pi? The 1897 House Of Representatives of Indiana for one!) More generally, it's often useful to monkeypatch "constants" at runtime, for testing purposes (and for less justifiable purposes). Why prevent that? I cite the Consenting Adults rule. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Jan 31 05:22:50 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Wed, 30 Jan 2013 20:22:50 -0800 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109CE95.7060104@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> <5109CE95.7060104@hastings.org> Message-ID: <5109F19A.3060902@stoneleaf.us> On 01/30/2013 05:53 PM, Larry Hastings wrote: > > On 01/30/2013 04:45 PM, Steven D'Aprano wrote: >> On 31/01/13 03:42, Larry Hastings wrote: >> >>> Also, I'm not sure there are any existing globals that we'd want to >>> convert into properties. >> >> How about this? >> >> math.pi = 3 >> >> which really should give an exception. >> >> (I'm sure there are many others.) > > Well, hmm. The thing is, properties--at least the existing > implementation with classes--doesn't mesh well with direct access via > the dict. So, right now, > > >>> math.__dict__['pi'] > 3.141592653589793 > > If we change math.pi to be a property it wouldn't be in the dict > anymore. So that has the possibility of breaking code. So make the property access the __dict__: --> class Test(object): ... @property ... def pi(self): ... return self.__dict__['pi'] ... @pi.setter ... def pi(self, new_value): ... self.__dict__['pi'] = new_value ... --> t = Test() --> t <__main__.Test object at 0x7f165d689850> --> t.pi = 3.141596 --> t.pi 3.141596 --> t.__dict__['pi'] = 3 --> t.pi 3 ~Ethan~ From larry at hastings.org Thu Jan 31 06:04:29 2013 From: larry at hastings.org (Larry Hastings) Date: Wed, 30 Jan 2013 21:04:29 -0800 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109F19A.3060902@stoneleaf.us> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> <5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us> Message-ID: <5109FB5D.2090109@hastings.org> On 01/30/2013 08:22 PM, Ethan Furman wrote: > On 01/30/2013 05:53 PM, Larry Hastings wrote: >> If we change math.pi to be a property it wouldn't be in the dict >> anymore. So that has the possibility of breaking code. > So make the property access the __dict__: In which case, it behaves exactly like it does today without a property. Okay... so why bother? If your answer is "so it can have code behind it", maybe you find a better example than math.pi, which will never need code behind it. In general, I was proposing we add property support to modules mostly so that new globals could be properties, saving us from adding more accessors to the language. Otherwise I'm gonna have to switch to Eclipse. //arry/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Thu Jan 31 08:28:15 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 31 Jan 2013 09:28:15 +0200 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109FB5D.2090109@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> <5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us> <5109FB5D.2090109@hastings.org> Message-ID: On Thu, Jan 31, 2013 at 7:04 AM, Larry Hastings wrote: > On 01/30/2013 08:22 PM, Ethan Furman wrote: > > On 01/30/2013 05:53 PM, Larry Hastings wrote: > > If we change math.pi to be a property it wouldn't be in the dict > anymore. So that has the possibility of breaking code. > > So make the property access the __dict__: > > > In which case, it behaves exactly like it does today without a property. > Okay... so why bother? If your answer is "so it can have code behind it", > maybe you find a better example than math.pi, which will never need code > behind it. > > In general, I was proposing we add property support to modules mostly so > that new globals could be properties, saving us from adding more accessors > to the language. Otherwise I'm gonna have to switch to Eclipse. > > > */arry* > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > > I'm just gonna write "Python 4" for searching later. -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Jan 31 09:17:35 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 31 Jan 2013 21:17:35 +1300 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130221926.GA20372@cskk.homeip.net> References: <5108A2F1.5010006@canterbury.ac.nz> <20130130221926.GA20372@cskk.homeip.net> Message-ID: <510A289F.4090904@canterbury.ac.nz> Cameron Simpson wrote: > How about this: > > Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9) You see, this is the problem -- there are quite a number of these solutions, all about as good as each other, with none of them standing out as obviously the right choice for stdlib inclusion. Michael Foord's solution has promise, though, as it manages to eliminate *all* of the extraneous cruft and look almost like it's built into the language. Plus it has the bonus of making you go "...??? How the blazes does *that* work?" the first time you see it. :-) -- Greg From ncoghlan at gmail.com Thu Jan 31 09:32:39 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 31 Jan 2013 18:32:39 +1000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno wrote: > On 29 January 2013 09:51, yoav glazner wrote: >> Here is very similar version that works (tested on python27) >>>>> def stop(): >> next(iter([])) >> >>>>> list((i if i<50 else stop()) for i in range(100)) >> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, >> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, >> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > > Great. I think this nails it. It is exactly the intended behavior, > and very readable under current language capabilities. > > One does not have to stop and go read what "itertools.takewhile" does, > and mentally unfold the lambda guard expression - that is what makes > this (and the O.P. request) more readable than using takewhile. > > Note: stop can also just explictly raise StopIteration - > or your next(iter([])) expression can be inlined within the generator. > > It works in Python 3 as well - though for those who did not test: > it won't work for list, dicr or set comprehensions - just for > generator expressions. This actually prompted an interesting thought for me. The statement-as-expression syntactic equivalent of the "else stop()" construct would actually be "else return", rather than "else break", since the goal is to say "we're done", regardless of the level of loop nesting. It just so happens that, inside a generator (or generator expression) raising StopIteration and returning from the generator are very close to being equivalent operations, which is why the "else stop()" trick works. In a 3.x container comprehension, the inner scope is an ordinary function, so the equivalence between returning from the function and raising StopIteration is lost. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ubershmekel at gmail.com Thu Jan 31 09:51:14 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 31 Jan 2013 10:51:14 +0200 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: On Mon, Jan 28, 2013 at 5:45 PM, Guido van Rossum wrote: > Hm. I'm not keen on precomputing all of that, since most protocols > won't need it, and the cost add up. This is not WSGI. The protocol has > the transport object and can ask it specific questions -- if through a > general API, like get_extra_info(key, [default]). > > I forgot to ask before, but why is get_extra_info better than normal attributes and methods? val = transport.get_extra_info(key, None) if val is None: pass # vs if hasattr(transport, key): val = transport.key else: pass Yuval -------------- next part -------------- An HTML attachment was scrubbed... URL: From ubershmekel at gmail.com Thu Jan 31 09:52:09 2013 From: ubershmekel at gmail.com (Yuval Greenfield) Date: Thu, 31 Jan 2013 10:52:09 +0200 Subject: [Python-ideas] PEP 3156: getting the socket or peer name from the transport In-Reply-To: References: <20130127122121.6b779ada@pitrou.net> <1359288997.3488.2.camel@localhost.localdomain> Message-ID: > > val = getattr(transport, key) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Thu Jan 31 09:56:16 2013 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 31 Jan 2013 18:56:16 +1000 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <51094D8D.606@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> Message-ID: On Thu, Jan 31, 2013 at 2:42 AM, Larry Hastings wrote: > Of those four issues, the latter two are wontfix. Code that futzes with an > object's __dict__ bypasses the property machinery but this is already viewed > as acceptable. > > Obviously the point of the proposal is to change the behavior of the first > two. Whether this is manageable additional complexity, or fast enough, > remains to be seen--which is why this is in ideas not dev. Looking at the problem from a different direction: Currently, modules are *instances* of a normal type (types.ModuleType). Thus, anything stored in their global namespace is like anything else stored in a normal instance dictionary: no descriptor behaviour. The request in this thread is basically for a way to: 1. Define a custom type 2. Put an instance of that type in sys.modules instead of the ordinary module object Now here's the thing: we already support this, because the import system is designed to cope with modules replacing "sys.modules[__name__]" while they're being loaded. The way this happens is that, after we finish loading a module, we usually don't trust what the loader gave us. Instead, we go look at what's in sys.modules under the name being loaded. So if, in your module code, you do this: import sys, types class MyPropertyUsingModule(types.ModuleType): def __init__(self, original): # Keep a reference to the original module to avoid the # destructive cleanup of the global namespace self._original = original @property def myglobal(self): return theglobal @myglobal.setter def myglobal(self, value): global theglobal theglobal = value sys.modules[__name__] = MyPropertyUsingModule(sys.modules[__name__]) Then what you end up with in sys.modules is a module with a global property, "myglobal". I'd prefer to upgrade this from "begrudged backwards compatibility hack" to "supported feature", rather than doing anything more complicated. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From shane at umbrellacode.com Thu Jan 31 11:05:53 2013 From: shane at umbrellacode.com (Shane Green) Date: Thu, 31 Jan 2013 02:05:53 -0800 Subject: [Python-ideas] csv.DictReader could handle headers more intelligently. In-Reply-To: <201301301516.37499.mark.hackett@metoffice.gov.uk> References: <1358903168.4767.4.camel@webb> <201301301516.37499.mark.hackett@metoffice.gov.uk> Message-ID: <510A4201.60504@umbrellacode.com> It's important to note, though, that I'm not proposing a change for DictReader. We defined the DictReader API a long time ago, and that API returns a single value for each column header; if a DictReader began returninig dicts with lists of values instead of single values, it would be a bug that violated the API we've defined. As fun as it would be to explain to people the reason there's now what they consider to be a bug in an application that's run "for like 10 years," is because, if we hadn't fixed it for them, and they began using a different file format, there was a chance the old version wouldn't have read the new content properly, the truth is I do not want to replace DictReader behaviour with what's described below. I would like thumbs +/-, and feedback on the idea of adding CsvRecordReader() (or something that mirrors DictReader but produces...) CSVRecord instances, for which I've suggested the API below as the starting point. It might be good to change the subject or something, but I'll leave that to someone else because I'm infamous for doing the wrong thing in mailing lists... > Mark Hackett > January 30, 2013 7:16 AM > > Jeff, it breaks code that works now because duplicates aren't cared about. > > Shane is putting code up for a NEW call that you can use if you're > worried > about how the current one works and consideration for this issue is being > included in the derivation of a new library for the next (and therefore > allowed to be incompatible) python library version. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > Jeff Jenkins > January 30, 2013 6:04 AM > I think this may have been lost somewhere in the last 90 messages, but > adding a warning to DictReader in the docs seems like it solves almost > the entire problem. New csv.DictReader users are informed, no one's > old code breaks, and a separate discussion can be had about whether > it's worth adding a csv.MultiDictReader which uses lists. > > > > Shane Green > January 30, 2013 4:59 AM > > > I should probably also have noted the dictionary API behaviour since > it's not explicitly: > keys() -> list of unique() header names. > values() -> list of field values lists. > items() -> [(header, field-list),] pairs. > > And then of course dictionary lookup. One thing that comes to mind is > that there's really no value to the unordered sequence of value lists; > there could be some value in extending an OrderedDict, making all the > iteration methods consistent and therefore something that could be > used to do something like write values, etc.... > > > > > J. Cliff Dyer > January 22, 2013 5:06 PM > Idea folks, > > I'm working with some poorly-formed CSV files, and I noticed that > DictReader always and only pulls headers off of the first row. But many > of the files I see have blank lines before the row of headers, sometimes > with commas to the appropriate field count, sometimes without. The > current implementation's behavior in this case is likely never correct, > and certainly always annoying. Given the following file: > > ---Start File 1--- > ,, > A,B,C > 1,2,3 > 2,4,6 > ---End File 1--- > > csv.DictReader yields the rows: > > {'': 'C'} > {'': '3'} > {'': '6'} > > > And given a file starting with a zero-length line, like the following: > > ---Start File 2--- > > A,B,C > 1,2,3 > 2,4,6 > ---End File 2--- > > It yields the following: > > {None: ['A', 'B', 'C']} > {None: ['1', '2', '3']} > {None: ['2', '4', '6']} > > I think that in both cases, the proper response would be treat the A,B,C > line as the header line. The change that makes this work is pretty > simple. In the fieldnames getter property, the "if not > self._fieldnames:" conditional becomes "while not self._fieldnames or > not any(self._fieldnames):" As a subclass: > > import csv > > > class DictReader(csv.DictReader): > @property > def fieldnames(self): > while self._fieldnames is None or not any(self._fieldnames): > try: > self._fieldnames = next(self.reader) > except StopIteration: > break > return self._fieldnames > self.line_num = self.reader.line_num > > #Same as the original setter, just rewritten to associate with the > new getter propery > @fieldnames.setter > def fieldnames(self, value): > self._fieldnames = value > > There might be some issues with existing code that depends on the {None: > ['1','2','3']} construction, but I can't imagine a time when programmers > would want to see {'': '3'} with the 1 and 2 values getting lost. > > Thoughts? Do folks think this is worth adding to the csv library, or > should I just keep using my subclass? > > Cheers, > Cliff > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: compose-unknown-contact.jpg Type: image/jpeg Size: 770 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postbox-contact.jpg Type: image/jpeg Size: 1041 bytes Desc: not available URL: From drekin at gmail.com Thu Jan 31 11:38:29 2013 From: drekin at gmail.com (drekin at gmail.com) Date: Thu, 31 Jan 2013 02:38:29 -0800 (PST) Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <20130130172310.60b49ef4@pitrou.net> Message-ID: <510a49a5.49d80e0a.1489.ffffebe2@mx.google.com> Hello. It should be also possible to specify the values of enum constants explicitly. For 'bitmask' type only powers of 2 should be allowed or maybe the values could be the exponents (as your TypeFlag example indicates). The same way 'symbolic' type acts as str and 'sequential' type acts as int, 'bitmask' type could act both as int and set (or frozenset) since its semantics is like of set. The enum value object could represent both the int value and corresponding singleton set. OR-ing would produce corresponding multivalue set. >>> isinstance(TypeFlag.HEAPTYPE, int) True >>> isinstance(TypeFlag.HEAPTYTE, set) True >>> TypeFlag.HAS_GC | TypeFlag.INT_SUBCLASS TypeFlag.HEAPTYPE|HAS_GS # or maybe >>> TypeFlag.HEAPTYPE in (TypeFlag.HEAPTYPE | TypeFlag.HAS_GC) True >>> TypeFlag.HEAPTYPE in TypeFlag.HEAPTYPE True >>> TypeFlag(1) TypeFlag.HEAPTYPE >>> TypeFlag(2) TypeFlag.HAS_GC >>> set(TypeFlag.HEAPTYPE) {1} >>> set(TypeFlag.HEAPTYPE | TypeFlag.HAS_GC) {1, 2} >>> int(TypeFlag.HEAPTYPE) 2 >>> int(TypeFlag.HEAPTYPE | TypeFlag.HAS_GC) 6 Note the difference between n and 2 ** n semantics. So there slould be something like >>> TypeFlag.decompose(2) TypeFlag.HEAPTYPE >>> TypeFlag.decompose(6) TypeFlag.HEAPTYPE|HAS_GS Regards, Drekin From oscar.j.benjamin at gmail.com Thu Jan 31 12:08:58 2013 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Thu, 31 Jan 2013 11:08:58 +0000 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 31 January 2013 08:32, Nick Coghlan wrote: > On Tue, Jan 29, 2013 at 10:35 PM, Joao S. O. Bueno > wrote: >> On 29 January 2013 09:51, yoav glazner wrote: >>> Here is very similar version that works (tested on python27) >>>>>> def stop(): >>> next(iter([])) >>> >>>>>> list((i if i<50 else stop()) for i in range(100)) >>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, >>> 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, >>> 40, 41, 42, 43, 44, 45, 46, 47, 48, 49] > > This actually prompted an interesting thought for me. The > statement-as-expression syntactic equivalent of the "else stop()" > construct would actually be "else return", rather than "else break", > since the goal is to say "we're done", regardless of the level of loop > nesting. I'm not sure if it is the goal to be able to break out of any level of nesting or at least that's not how I interpreted the original proposal. It is what happens for this stop() function but only because there's no other way. Personally I don't mind as I generally avoid multiple-for comprehensions; by the time I've written one out I usually decide that it would be more readable as ordinary for loops or with a separate function. > It just so happens that, inside a generator (or generator expression) > raising StopIteration and returning from the generator are very close > to being equivalent operations, which is why the "else stop()" trick > works. In a 3.x container comprehension, the inner scope is an > ordinary function, so the equivalence between returning from the > function and raising StopIteration is lost. I don't really understand what you mean here. What is the difference between comprehensions in 2.x and 3.x? Oscar From jimjjewett at gmail.com Thu Jan 31 17:10:33 2013 From: jimjjewett at gmail.com (Jim Jewett) Date: Thu, 31 Jan 2013 11:10:33 -0500 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: Message-ID: On Tue, Jan 29, 2013 at 6:50 AM, Nick Coghlan wrote: > FWIW, since that last discussion, I've switched to using strings for > my special constants, dumping them in a container if I need some kind > of easy validity checking or iteration. Unfortunately, some of the problems with that involve unicode normalization, and won't show up in English. Python has defined a normalization for identifiers; this normalization does not apply to quoted strings. Essentially, this is the same problem string exceptions caused, except that it (sometimes) applies to '==' as well as to 'is'. Essentially, we want the simplicity of: color=enum(red, green, blue) except that we *also* want to able to compare the symbols to (int or str) constants, and to decide when they will be equal. I don't see any good way to support: color=enum(red=15, green, blue) without requiring either that strings be used instead of symbols, or that later entries be explicitly initialized. -jJ From jasonkeene at gmail.com Thu Jan 31 17:35:28 2013 From: jasonkeene at gmail.com (Jason Keene) Date: Thu, 31 Jan 2013 11:35:28 -0500 Subject: [Python-ideas] Definition Symmetry Message-ID: Why do function definitions require parens? >>> class MyClass: ... pass ... >>> def my_func: File "", line 1 def my_func: ^ SyntaxError: invalid syntax This seems to me to break a symmetry with class definitions. I assume this is just a hold off from C, perhaps there is a non-historical reason tho. I believe in the past we've forced parens in list comprehensions to create a symmetry between comprehensions/generator expressions. Why not for this? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonkeene at gmail.com Thu Jan 31 17:43:19 2013 From: jasonkeene at gmail.com (Jason Keene) Date: Thu, 31 Jan 2013 11:43:19 -0500 Subject: [Python-ideas] Definition Symmetry In-Reply-To: References: Message-ID: Just to be clear, I wasn't suggesting forcing parens for class definitions. Rather make them optional for functions! On Thu, Jan 31, 2013 at 11:35 AM, Jason Keene wrote: > Why do function definitions require parens? > > >>> class MyClass: > ... pass > ... > >>> def my_func: > File "", line 1 > def my_func: > ^ > SyntaxError: invalid syntax > > This seems to me to break a symmetry with class definitions. I assume > this is just a hold off from C, perhaps there is a non-historical reason > tho. > > I believe in the past we've forced parens in list comprehensions to create > a symmetry between comprehensions/generator expressions. Why not for this? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Thu Jan 31 18:15:09 2013 From: python at mrabarnett.plus.com (MRAB) Date: Thu, 31 Jan 2013 17:15:09 +0000 Subject: [Python-ideas] Definition Symmetry In-Reply-To: References: Message-ID: <510AA69D.1060300@mrabarnett.plus.com> On 2013-01-31 16:35, Jason Keene wrote: > Why do function definitions require parens? > >>>> class MyClass: > ... pass > ... >>>> def my_func: > File "", line 1 > def my_func: > ^ > SyntaxError: invalid syntax > > This seems to me to break a symmetry with class definitions. I assume > this is just a hold off from C, perhaps there is a non-historical reason > tho. > > I believe in the past we've forced parens in list comprehensions to > create a symmetry between comprehensions/generator expressions. Why not > for this? > The parentheses are always required when calling the function, so it makes sense to always require them when defining the function. The case with class definitions is different; they are used in the definition only when you want to specify the superclass. They are always required when creating an instance of the class and in method definitions. From ethan at stoneleaf.us Thu Jan 31 18:26:08 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 31 Jan 2013 09:26:08 -0800 Subject: [Python-ideas] Definition Symmetry In-Reply-To: <510AA69D.1060300@mrabarnett.plus.com> References: <510AA69D.1060300@mrabarnett.plus.com> Message-ID: <510AA930.9020704@stoneleaf.us> On 01/31/2013 09:15 AM, MRAB wrote: > On 2013-01-31 16:35, Jason Keene wrote: >> Why do function definitions require parens? >> >>>>> class MyClass: >> ... pass >> ... >>>>> def my_func: >> File "", line 1 >> def my_func: >> ^ >> SyntaxError: invalid syntax >> >> This seems to me to break a symmetry with class definitions. I assume >> this is just a hold off from C, perhaps there is a non-historical reason >> tho. >> > The parentheses are always required when calling the function, so it > makes sense to always require them when defining the function. > > The case with class definitions is different; they are used in the > definition only when you want to specify the superclass. ... they are required in the definition when you want to specify the superclass, and optional otherwise. ~Ethan~ From ned at nedbatchelder.com Thu Jan 31 19:11:55 2013 From: ned at nedbatchelder.com (Ned Batchelder) Date: Thu, 31 Jan 2013 13:11:55 -0500 Subject: [Python-ideas] Definition Symmetry In-Reply-To: <510AA69D.1060300@mrabarnett.plus.com> References: <510AA69D.1060300@mrabarnett.plus.com> Message-ID: <510AB3EB.9020806@nedbatchelder.com> On 1/31/2013 12:15 PM, MRAB wrote: > On 2013-01-31 16:35, Jason Keene wrote: >> Why do function definitions require parens? >> >>>>> class MyClass: >> ... pass >> ... >>>>> def my_func: >> File "", line 1 >> def my_func: >> ^ >> SyntaxError: invalid syntax >> >> This seems to me to break a symmetry with class definitions. I assume >> this is just a hold off from C, perhaps there is a non-historical reason >> tho. >> >> I believe in the past we've forced parens in list comprehensions to >> create a symmetry between comprehensions/generator expressions. Why not >> for this? >> > The parentheses are always required when calling the function, so it > makes sense to always require them when defining the function. > > The case with class definitions is different; they are used in the > definition only when you want to specify the superclass. > I think parens for super class are an unfortunate syntax, since it looks just like arguments to the class and is confusing for some beginners: def function(arg): ... function(10) # Similar syntax: 10 corresponds to arg class Thing(Something): ... thing = Thing(10) # How does 10 relate to Something? It doesn't. A better syntax (which I AM NOT PROPOSING) would be: class Thing from Something: --Ned. > They are always required when creating an instance of the class and in > method definitions. > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas > From tjreedy at udel.edu Thu Jan 31 20:00:35 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 31 Jan 2013 14:00:35 -0500 Subject: [Python-ideas] while conditional in list comprehension ?? In-Reply-To: References: <00b701cdfd5c$18d01100$4a703300$@biologie.uni-freiburg.de> <51072650.5090808@pearwood.info> <97262C84-D345-44FC-9A10-BD3D07023D6F@umbrellacode.com> Message-ID: On 1/31/2013 6:08 AM, Oscar Benjamin wrote: > On 31 January 2013 08:32, Nick Coghlan wrote: just so happens that, inside a generator (or generator expression) >> raising StopIteration and returning from the generator are very close >> to being equivalent operations, which is why the "else stop()" trick >> works. In a 3.x container comprehension, the inner scope is an >> ordinary function, so the equivalence between returning from the >> function and raising StopIteration is lost. > > I don't really understand what you mean here. What is the difference > between comprehensions in 2.x and 3.x? In 2.x, (list) conprehensions are translated to the equivalent nested for and if statements and compiled and executed in place. In 3.x, the translation is wrapped in a temporary function that is called and then discarded. The main effect is to localize the loop names, the 'i' in '[i*2 for i in iterable]', for instance. -- Terry Jan Reedy From tjreedy at udel.edu Thu Jan 31 20:04:52 2013 From: tjreedy at udel.edu (Terry Reedy) Date: Thu, 31 Jan 2013 14:04:52 -0500 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: <510A289F.4090904@canterbury.ac.nz> References: <5108A2F1.5010006@canterbury.ac.nz> <20130130221926.GA20372@cskk.homeip.net> <510A289F.4090904@canterbury.ac.nz> Message-ID: On 1/31/2013 3:17 AM, Greg Ewing wrote: > Cameron Simpson wrote: >> How about this: >> >> Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9) > > You see, this is the problem -- there are quite a number > of these solutions, all about as good as each other, with > none of them standing out as obviously the right choice > for stdlib inclusion. > > Michael Foord's solution has promise, though, as it manages > to eliminate *all* of the extraneous cruft and look almost > like it's built into the language. > > Plus it has the bonus of making you go "...??? How the > blazes does *that* work?" the first time you see it. :-) Yeah, I was thinking that if it were added to stdlib, the current metaclass discussion in the reference should be augmented by referring to it as a non-toy example of metaclasses at work. -- Terry Jan Reedy From andrew at ei-grad.ru Thu Jan 31 20:33:05 2013 From: andrew at ei-grad.ru (Andrew Grigorev) Date: Thu, 31 Jan 2013 23:33:05 +0400 Subject: [Python-ideas] Definition Symmetry In-Reply-To: References: Message-ID: <510AC6F1.1060503@ei-grad.ru> Other strange thing is that the `raise` statement doesn't require to instantiate an Exception object, allowing to pass an Exception class to it. raise NotImplementedError raise NotImplementedError() Is there any difference between this two lines of code? And there is nothing about that fact in python docs. (or I just not found?..) -- Andrew 31.01.2013 20:35, Jason Keene ?????: > Why do function definitions require parens? > > >>> class MyClass: > ... pass > ... > >>> def my_func: > File "", line 1 > def my_func: > ^ > SyntaxError: invalid syntax > > This seems to me to break a symmetry with class definitions. I assume > this is just a hold off from C, perhaps there is a non-historical > reason tho. > > I believe in the past we've forced parens in list comprehensions to > create a symmetry between comprehensions/generator expressions. Why > not for this? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > http://mail.python.org/mailman/listinfo/python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ericsnowcurrently at gmail.com Thu Jan 31 20:56:04 2013 From: ericsnowcurrently at gmail.com (Eric Snow) Date: Thu, 31 Jan 2013 12:56:04 -0700 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> Message-ID: On Thu, Jan 31, 2013 at 1:56 AM, Nick Coghlan wrote: > Looking at the problem from a different direction: > > Currently, modules are *instances* of a normal type > (types.ModuleType). Thus, anything stored in their global namespace is > like anything else stored in a normal instance dictionary: no > descriptor behaviour. > > The request in this thread is basically for a way to: > > 1. Define a custom type > 2. Put an instance of that type in sys.modules instead of the ordinary > module object > > Now here's the thing: we already support this, because the import > system is designed to cope with modules replacing > "sys.modules[__name__]" while they're being loaded. The way this > happens is that, after we finish loading a module, we usually don't > trust what the loader gave us. Instead, we go look at what's in > sys.modules under the name being loaded. > > So if, in your module code, you do this: > > import sys, types > class MyPropertyUsingModule(types.ModuleType): > def __init__(self, original): > # Keep a reference to the original module to avoid the > # destructive cleanup of the global namespace > self._original = original > > @property > def myglobal(self): > return theglobal > > @myglobal.setter > def myglobal(self, value): > global theglobal > theglobal = value > > sys.modules[__name__] = MyPropertyUsingModule(sys.modules[__name__]) > > Then what you end up with in sys.modules is a module with a global > property, "myglobal". > > I'd prefer to upgrade this from "begrudged backwards compatibility > hack" to "supported feature", rather than doing anything more > complicated. +1 At this point I don't see this behavior of the import system changing, even for Python 4. Making it part of the spec is the best fit for this class of problem (not-terribly-sophisticated solution for a relatively uncommon case). Otherwise we'd need a way to allow a module definition (.py, etc.) to dictate which class to use, which seems unnecessary and even overly complicated given the scale of the target audience. That said, Larry's original proposal relates to sys, a built-in module written in C (in CPython of course). In that case the solution is not quite the same, since module initialization interacts with sys.modules differently. [1][2] Accommodating the original request would require more work, whether to muck with the import C-API or making sys an instance of another type, as someone suggested. -eric [1] See http://mail.python.org/pipermail/python-dev/2012-November/122599.html [2] http://bugs.python.org/msg174704 From timothy.c.delaney at gmail.com Thu Jan 31 21:19:55 2013 From: timothy.c.delaney at gmail.com (Tim Delaney) Date: Fri, 1 Feb 2013 07:19:55 +1100 Subject: [Python-ideas] constant/enum type in stdlib In-Reply-To: References: <51085AAB.6090303@canterbury.ac.nz> <5108A87D.9000207@canterbury.ac.nz> <20130130082639.0b28d7eb@pitrou.net> Message-ID: On 31 January 2013 12:27, Tim Delaney wrote: > On 31 January 2013 08:32, Terry Reedy wrote: > >> On 1/30/2013 10:30 AM, Michael Foord wrote: >> >>> On 30 January 2013 15:22, Michael Foord >>> >> >> With a Python 3 metaclass that provides default values for *looked >>> up* entries you could have this: >>> >>> class Color(Enum): >>> RED, WHITE, BLUE >>> >>> The lookup would create the member - with the appropriate value. >>> >>> class values(dict): >>> def __init__(self): >>> self.value = 0 >>> def __getitem__(self, key): >>> >> >> >> So RED, WHITE, BLUE are 1, 2, 3; not 0, 1, 2 as I and many readers might >> expect. That aside (which can be fixed), this is very nice. >> > > Here is a version that I think creates an enum with most of the features > of traditional and modern enums. > > - Enum values are subclasses of int; > > - Only need to declare the enum key name; > > - Starts at zero by default; > > - Can change the start value; > > - Can have discontiguous values (e.g. 0, 1, 5, 6); > > - Can have other types of class attributes; > > - Ensures that there is a 1:1 mapping between key:value (throws an > exception if either of these is violated; > > - Able to obtain the keys, values and items as per the mapping interface > (sorted by value); > > - Lookup an enum by key or value; > > One thing to note is that *any* class attribute assigned a value which > implements __index__ will be considered an enum value assignment. > Forgot about making it iterable - an easy-to-ad feature. Obviously it would iterate over the EnumValue instancess. Thought I'd better make it explicit as well that this was based on Michael Foords brilliant work. Tim Delaney -------------- next part -------------- An HTML attachment was scrubbed... URL: From barry at python.org Thu Jan 31 21:46:30 2013 From: barry at python.org (Barry Warsaw) Date: Thu, 31 Jan 2013 15:46:30 -0500 Subject: [Python-ideas] constant/enum type in stdlib References: <5108A2F1.5010006@canterbury.ac.nz> <20130130221926.GA20372@cskk.homeip.net> Message-ID: <20130131154630.23903b07@anarchist.wooz.org> On Jan 31, 2013, at 09:19 AM, Cameron Simpson wrote: > Color = enum(RED=None, WHITE=None, BLUE=None, yellow=9) Oh, I forgot to mention that flufl.enum has an alternative API that's fairly close to this, although it does not completely eliminate DRY[1]: >>> from flufl.enum import make >>> make('Animals', ('ant', 'bee', 'cat', 'dog')) You can also supply the elements as a 2-tuples if you want to specify the values. An example from the docs providing bit flags: >>> def enumiter(): ... start = 1 ... while True: ... yield start ... start <<= 1 >>> make('Flags', zip(list('abcdefg'), enumiter())) Cheers, -Barry [1] The first argument is currently necessary in order to give the right printed representation of the enum. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jan 31 22:00:32 2013 From: barry at python.org (Barry Warsaw) Date: Thu, 31 Jan 2013 16:00:32 -0500 Subject: [Python-ideas] constant/enum type in stdlib References: <20130129202730.6ea6d0d5@anarchist.wooz.org> <20130130103548.12bce67d@anarchist.wooz.org> <20130130112707.5cf60dfc@anarchist.wooz.org> Message-ID: <20130131160032.63baef0a@anarchist.wooz.org> I'll agree that enums are subject to personal taste, and I am opinionated about the syntax and semantics, as should be evident in my library :). On Jan 30, 2013, at 09:13 PM, Giampaolo Rodol? wrote: >1) a const/enum type looks like something which is subject to personal >taste to me. I personally don't like, for example, how flufl requires >to define constants by using a class. In practice, I find this quite nice. In my larger projects, I define the enum class an the interface module and often intersperse comments among the enum values so that more documentation is provided to the reader. >It's just a matter of taste but to me module.FOO looks more "right" >than module.Bar.FOO. I almost always 'from module import MyEnum' so typical use looks something like: if thing.color is Color.red: ... elif thing.color is Color.blue: ... Again, in practice, I find it quite readable and just the right level of verbosity. >Also "Colors.red < Colors.blue" raising an exception is something >subject to personal taste. I guess, if you like blue more than red, but what if you like red more than blue? :) Ordered enums just don't usually make sense, and if they really did, you can coerce to int to do the comparison (but again, I've never needed it, so YAGNI). >2) introducing something like that (class-based) wouldn't help >migrating the existent module-level constants we have in the stdlib. >Only new projects or new stdlib modules would benefit from it. Sure, but I don't think this is necessarily about converting the stdlib. We rarely do such mass conversions anyway. >3) other than being subject to personal taste, a const/enum type is >also pretty easy to implement. True, depending on the semantics, syntax, and feature you want. >4) I'm getting the impression that the language is growing too big. To >me, this looks like yet another thing that infrequent users have to >learn before being able to read and understand Python code. >Also consider that people lived without const/enum for 2 decades now. Well, I would agree that the *language* doesn't need them, but that's different than the stdlib. Maybe the stdlib still doesn't need them either. I don't personally care either way except to save me the trouble of writing up another PEP. :) As for the language growing too big, maybe Pycon 2013 is time for another one of Guido's infamous polls! Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From barry at python.org Thu Jan 31 22:06:57 2013 From: barry at python.org (Barry Warsaw) Date: Thu, 31 Jan 2013 16:06:57 -0500 Subject: [Python-ideas] Definition Symmetry References: <510AC6F1.1060503@ei-grad.ru> Message-ID: <20130131160657.4620f918@anarchist.wooz.org> On Jan 31, 2013, at 11:33 PM, Andrew Grigorev wrote: >Other strange thing is that the `raise` statement doesn't require to >instantiate an Exception object, allowing to pass an Exception class to it. > >raise NotImplementedError >raise NotImplementedError() > >Is there any difference between this two lines of code? The main difference (I *think* this is still true) is that in the first example, if the exception is caught in C it can avoid instantiation. Cheers, -Barry -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 836 bytes Desc: not available URL: From greg.ewing at canterbury.ac.nz Thu Jan 31 22:31:32 2013 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Fri, 01 Feb 2013 10:31:32 +1300 Subject: [Python-ideas] Definition Symmetry In-Reply-To: References: Message-ID: <510AE2B4.8070202@canterbury.ac.nz> Jason Keene wrote: > Just to be clear, I wasn't suggesting forcing parens for class > definitions. Rather make them optional for functions! That would introduce an asymmetry between function definitions and function calls -- parens would be required in the call but not the definition. And before you say that this asymmetry currently exists between class definitions and class instantiations, it's not the same situation. What goes between the parens in a class definition is the base classes, not the arguments to the constructor. -- Greg From rosuav at gmail.com Thu Jan 31 22:43:51 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 1 Feb 2013 08:43:51 +1100 Subject: [Python-ideas] Definition Symmetry In-Reply-To: <510AB3EB.9020806@nedbatchelder.com> References: <510AA69D.1060300@mrabarnett.plus.com> <510AB3EB.9020806@nedbatchelder.com> Message-ID: On Fri, Feb 1, 2013 at 5:11 AM, Ned Batchelder wrote: > I think parens for super class are an unfortunate syntax, since it looks > just like arguments to the class and is confusing for some beginners: > > def function(arg): > ... > function(10) # Similar syntax: 10 corresponds to arg > > class Thing(Something): > ... > thing = Thing(10) # How does 10 relate to Something? It doesn't. > > A better syntax (which I AM NOT PROPOSING) would be: > > class Thing from Something: What about class Thing = Something: pass I am not proposing this either, but it would emphasize the difference between superclasses and __init__ args. But really, parens are used in many different ways. There doesn't need to be a logical parallel between generator expressions and function calls, for instance. ChrisA From rosuav at gmail.com Thu Jan 31 23:00:14 2013 From: rosuav at gmail.com (Chris Angelico) Date: Fri, 1 Feb 2013 09:00:14 +1100 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109CE95.7060104@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> <5109CE95.7060104@hastings.org> Message-ID: On Thu, Jan 31, 2013 at 12:53 PM, Larry Hastings wrote: > But if the user assigns a different value to math.__dict__['pi'], math.pi > will diverge, which again could break code. (Who might try to assign a > different value to pi? The 1897 House Of Representatives of Indiana for > one!) > > > More generally, it's often useful to monkeypatch "constants" at runtime, for > testing purposes (and for less justifiable purposes). Why prevent that? I > cite the Consenting Adults rule. I've never actually been in the situation of doing it, but wouldn't it be reasonable to switch out math.pi to be (say) a decimal.Decimal rather than a float? ChrisA From ethan at stoneleaf.us Thu Jan 31 23:13:38 2013 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 31 Jan 2013 14:13:38 -0800 Subject: [Python-ideas] Extend module objects to support properties In-Reply-To: <5109FB5D.2090109@hastings.org> References: <51087225.3040801@hastings.org> <51094D8D.606@hastings.org> <5109BEC4.4050604@pearwood.info> <5109CE95.7060104@hastings.org> <5109F19A.3060902@stoneleaf.us> <5109FB5D.2090109@hastings.org> Message-ID: <510AEC92.1060809@stoneleaf.us> On 01/30/2013 09:04 PM, Larry Hastings wrote: > On 01/30/2013 08:22 PM, Ethan Furman wrote: >> On 01/30/2013 05:53 PM, Larry Hastings wrote: >>> If we change math.pi to be a property it wouldn't be in the dict >>> anymore. So that has the possibility of breaking code. >> So make the property access the __dict__: > > In which case, it behaves exactly like it does today without a > property. Okay... so why bother? If your answer is "so it can have > code behind it", maybe you find a better example than math.pi, which > will never need code behind it. math.pi wasn't my example, I was just showing how you could use the __dict__ as well. Why bother? Backwards compatibility. I think I missed your main point of __dict__ access, though -- if it is set directly then the property doesn't get the chance to update whatever is supposed to update at the right moment, leading to weird (and most likely buggy) behavior. ~Ethan~