From solipsis at pitrou.net Sun Oct 1 08:19:24 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 1 Oct 2017 14:19:24 +0200 Subject: [Python-ideas] Changes to the existing optimization levels References: <20170928210248.65a06e4b@fsol> Message-ID: <20171001141924.33263d61@fsol> On Fri, 29 Sep 2017 17:33:11 +1000 Nick Coghlan wrote: > > That said, we may also want to consider a couple of other options > related to changing the meaning of *existing* parameters to these > APIs: > > 1. We have the PyCompilerFlags struct that's currently only used to > pass around feature flags for the __future__ module. It could gain a > second bitfield for optimisation options Not sure about that. PyCompilerFlags describes options that should be common to all implementations (since __future__ is part of the language spec). > 2. We could reinterpret "optimize" as a bitfield instead of a regular > integer, special casing the already defined values: > > - all zero: no optimizations > - sign bit set: negative -> use global settings > - 0x0001: nodebug+noassert > - 0x0002: nodebug+noassert+nodocstrings > - 0x0004: nodebug > - 0x0008: noassert > - 0x0010: nodocstrings Well, this is not really a bitfield, but a bitfield plus some irregular hardcoded values. Therefore I don't think it brings much in the way of discoverability / understandability. That said, perhaps it makes implementation easier on the C side... Regards Antoine. From ncoghlan at gmail.com Mon Oct 2 03:51:56 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 2 Oct 2017 17:51:56 +1000 Subject: [Python-ideas] Changes to the existing optimization levels In-Reply-To: <20171001141924.33263d61@fsol> References: <20170928210248.65a06e4b@fsol> <20171001141924.33263d61@fsol> Message-ID: On 1 October 2017 at 22:19, Antoine Pitrou wrote: >> 2. We could reinterpret "optimize" as a bitfield instead of a regular >> integer, special casing the already defined values: >> >> - all zero: no optimizations >> - sign bit set: negative -> use global settings >> - 0x0001: nodebug+noassert >> - 0x0002: nodebug+noassert+nodocstrings >> - 0x0004: nodebug >> - 0x0008: noassert >> - 0x0010: nodocstrings > > Well, this is not really a bitfield, but a bitfield plus some irregular > hardcoded values. Therefore I don't think it brings much in the way of > discoverability / understandability. That's why the 2-field struct for compiler flags was my first idea. > That said, perhaps it makes implementation easier on the C side... Yep, the fact it would avoid requiring any ABI changes for the C API is the main reason I think redefining the semantics of the existing int parameter is worth considering. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From w00t at fb.com Mon Oct 2 12:08:56 2017 From: w00t at fb.com (Wren Turkal) Date: Mon, 2 Oct 2017 16:08:56 +0000 Subject: [Python-ideas] allow overriding files used for the input builtin In-Reply-To: <> Message-ID: <5EBC3BD3-18BA-4A68-A4EC-A7AA40CE3B68@contoso.com> Adam, I just realized that I had not responded to you. I?m sorry for not replying to the message as this was one of the messages that came in digest form. I am referring to your message here: https://mail.python.org/pipermail/python-ideas/2017-September/047238.html I looked over your comment about how input() should be like print(), but there is an issue that I see with it. It seems print() allows overriding the file used for the output. It seems like input() should allow that for symmetry. FWIW, I think your idea about making some file ?readline? method more like input() is intriguing. I think I like the modification to input() better as it makes more sense to me as I can already change the file used by input, but it?s done through side effects now (i.e. setting sys.stdin to a different value). I think that removing the need for the side effect would be a good outcome. I feel like it?d at lease be more obvious what someone is doing than when I look at current use cases that redirect the stdin. wt -------------- next part -------------- An HTML attachment was scrubbed... URL: From w00t at fb.com Mon Oct 2 12:12:48 2017 From: w00t at fb.com (Wren Turkal) Date: Mon, 2 Oct 2017 16:12:48 +0000 Subject: [Python-ideas] allow overriding files used for the input builtin In-Reply-To: <> Message-ID: <90C86F21-6940-44F1-9093-7C78BDC7637A@contoso.com> Paul, Sorry for not replying to your message yet, I lost it in a well-intentioned try at digest delivery. Sorry about that, I am replying to this message: https://mail.python.org/pipermail/python-ideas/2017-September/047236.html I would guess that the context manager to override stdout does it globally. One of the reasons I like the params for input is that is doesn?t have to override the value globally by setting the module variable, which would affect other concurrent uses of input(). Thoughts? Thanks, wt -------------- next part -------------- An HTML attachment was scrubbed... URL: From w00t at fb.com Mon Oct 2 12:21:51 2017 From: w00t at fb.com (Wren Turkal) Date: Mon, 2 Oct 2017 16:21:51 +0000 Subject: [Python-ideas] allow overriding files used for the input builtin In-Reply-To: <> Message-ID: <22B40DBF-0064-4406-8643-08547BCB1B66@contoso.com> Steven, Sorry for not replying to your message due to my mailing list incompetence (configured for digest at first). I am replying to your message here: https://mail.python.org/pipermail/python-ideas/2017-September/047239.html I have not figured out the issue with readline/arrow keys on the overridden files. Consider this current iteration an RFC on the concept. I?m sure there are bugs in there. ? wt -------------- next part -------------- An HTML attachment was scrubbed... URL: From desmoulinmichel at gmail.com Tue Oct 3 04:18:13 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 3 Oct 2017 10:18:13 +0200 Subject: [Python-ideas] Add pop_callback to deque signature Message-ID: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> My initial proposal would have been to add a data structure to allow something like deque, but allowing cheaper random accesses. However I realized it was a use case more suitable for an external lib. Thinking about how I would implement such a new data structure, I imagined several possibilities, one of them would be to couple a deque with a dict or a list. However, sharing data between a deque and another data structure is hard because, while you can easily hook on the element going in (since you put them in yourself), there is no efficient way to get back an element on its way out. On lists or dicts, if you remove an element, you can pop() it and you get back the removed element. On deque, if you set a maxlen of 5 and add a 6th element, if you want to get the element that has been removed, you need to check if the maxlen has been reached, and if yes, get a reference to the first element, then add the new one. It's inconvenient and of course slower than it needs to be given that deque are quite fast. So my more modest and realistic proposal would be to add a callback on deque signature: collections.deque(iterable, maxlen, pop_callback) pop_callback would accept any callable, and call it everytime an element is removed from the deque, allowing third party libraries to then do whatever they need with it. From steve at pearwood.info Tue Oct 3 07:04:46 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 3 Oct 2017 22:04:46 +1100 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> Message-ID: <20171003110446.GC13110@ando.pearwood.info> On Tue, Oct 03, 2017 at 10:18:13AM +0200, Michel Desmoulin wrote: > My initial proposal would have been to add a data structure to allow > something like deque, but allowing cheaper random accesses. That would be a list :-) Most data structures trade off performance in one area for performance in another. Lists have good random access, but slow insertion and deletion at the start. Deques have fast insertion and deletion at both ends, but slower random access. If it were possible to have a lightweight sequence data structure with fast O(1) random access AND insertion/deletion at the same time, that would be a candidate to replace both lists and deques in the stdlib. If you have such a data structure in mind, you should propose it. [...] > So my more modest and realistic proposal would be to add a callback on > deque signature: > > collections.deque(iterable, maxlen, pop_callback) > > pop_callback would accept any callable, and call it everytime an element > is removed from the deque, allowing third party libraries to then do > whatever they need with it. Adding callbacks to basic data structures like this seems like a hypergeneralisation. How often are they going to be used? Not often enough to make it worthwhile, I think. But if we did something like that, I think that following the design of dicts would be a closer match to Pythonic style than a callback. Dicts support a special __missing__ method in subclasses. Perhaps deque could do something similar? Subclass deque and give it a __dropped__ method, and whenever an item is dropped from the deque the method will be called with the value dropped (and perhaps a flag telling you which end it was dropped from). To me, that sounds like a more Pythonic interface than a callback. -- Steve From solipsis at pitrou.net Tue Oct 3 07:22:10 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Oct 2017 13:22:10 +0200 Subject: [Python-ideas] Add pop_callback to deque signature References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> Message-ID: <20171003132210.02eca679@fsol> On Tue, 3 Oct 2017 22:04:46 +1100 Steven D'Aprano wrote: > > If it were possible to have a lightweight sequence data structure with > fast O(1) random access AND insertion/deletion at the same time, that > would be a candidate to replace both lists and deques in the stdlib. Daniel Stutzbach's blist is well-known at this point: http://stutzbachenterprises.com/performance-blist See http://legacy.python.org/dev/peps/pep-3128/ (rejected) and https://mail.python.org/pipermail/python-ideas/2014-September/029434.html (inconclusive) Regards Antoine. From diana.joan.clarke at gmail.com Tue Oct 3 11:42:40 2017 From: diana.joan.clarke at gmail.com (Diana Clarke) Date: Tue, 3 Oct 2017 09:42:40 -0600 Subject: [Python-ideas] Changes to the existing optimization levels In-Reply-To: <20171001141924.33263d61@fsol> References: <20170928210248.65a06e4b@fsol> <20171001141924.33263d61@fsol> Message-ID: On Sun, Oct 1, 2017 at 6:19 AM, Antoine Pitrou wrote: > Well, this is not really a bitfield, but a bitfield plus some irregular > hardcoded values. Therefore I don't think it brings much in the way of > discoverability / understandability. > > That said, perhaps it makes implementation easier on the C side... I think I'm coming to the same conclusion: using bitwise operations for the optimization levels seems to just boil down to a more cryptic version of the simple "level 3" solution, with public-facing impacts to the pycache and existing interfaces etc that I don't think are worth it in this case. My only other thought at the moment, would be to use the existing -X option to achieve something similar to what I did with the new -N option, but then just quickly map that back to an integer under the hood. That is, "-X opt-nodebug -X opt-noassert" would just become "level 3" internally so that the various interfaces wouldn't have to change. But there are lots of downsides to that solution too: - having to hardcode the various possible combinations of string options to an integer value - inelegant lookups like: if flag is greater than 2 but not 10 or 15, etc - un-zen: yet even more ways to set that integer flag (PYTHONOPTIMIZE, -OOO, "-X opt-nodebug -X opt-noassert") - mixed-bag -X options are less discoverable than just adding a new command line option (like -N or -OOO) - other downsides, I'm sure Hmmm.... stalled again, I think. --diana From solipsis at pitrou.net Tue Oct 3 11:47:48 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Oct 2017 17:47:48 +0200 Subject: [Python-ideas] Changes to the existing optimization levels References: <20170928210248.65a06e4b@fsol> <20171001141924.33263d61@fsol> Message-ID: <20171003174748.2f33a4ad@fsol> On Tue, 3 Oct 2017 09:42:40 -0600 Diana Clarke wrote: > - mixed-bag -X options are less discoverable than just adding a > new command line option (like -N or -OOO) For such a feature, I think being less discoverable is not really a problem. I don't think many people use the -O flags currently, and among those that do I'm curious how many really benefit (as opposed to seeing that Python has an "optimization" flag and thinking "great, I'm gonna use that to make my code faster" without ever measuring the difference). Regards Antoine. From chris.barker at noaa.gov Tue Oct 3 12:13:53 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 3 Oct 2017 10:13:53 -0600 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: <20171003132210.02eca679@fsol> References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> Message-ID: On Tue, Oct 3, 2017 at 5:22 AM, Antoine Pitrou wrote: > Daniel Stutzbach's blist is well-known at this point: > http://stutzbachenterprises.com/performance-blist > > https://mail.python.org/pipermail/python-ideas/2014-September/029434.html > (inconclusive) > that was some years ago -- I wonder how much use it's seen? but I note: "I'm really reluctant to include `blist` as a dependency, given that it would basically mean my package wouldn't be pip-installable on Windows machines. " With binary wheels and all, this isn't the same issue it used to be -- so a third party package is more viable. Though on PyPi it hasn't been touched for 4 years -- so maybe didn't turn out to be that useful (Or jsut not that well known) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Oct 3 12:38:57 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 3 Oct 2017 18:38:57 +0200 Subject: [Python-ideas] Add pop_callback to deque signature References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> Message-ID: <20171003183857.1afa831e@fsol> On Tue, 3 Oct 2017 10:13:53 -0600 Chris Barker wrote: > On Tue, Oct 3, 2017 at 5:22 AM, Antoine Pitrou wrote: > > > Daniel Stutzbach's blist is well-known at this point: > > http://stutzbachenterprises.com/performance-blist > > > > > > > https://mail.python.org/pipermail/python-ideas/2014-September/029434.html > > (inconclusive) > > > > that was some years ago -- I wonder how much use it's seen? You can go a surprisingly long way with Python's built-in and stdlib containers, so I'm not surprised it's not very widely used. > but I note: > > "I'm really reluctant to include `blist` as a dependency, given that it > would basically mean my package wouldn't be pip-installable on Windows > machines. " > > With binary wheels and all, this isn't the same issue it used to be -- so a > third party package is more viable. I don't know what wheels are supposed to change here. You could already build binary packages for Windows before wheels existed. The problem as I understand it is that you need a Windows machine (or VM) together with the required set of compilers, and have to take the time to run the builds. With conda-forge though, one could simply submit a recipe and have all builds done automatically in the cloud. Your users then have to use conda (rather than pip and virtualenv). Regards Antoine. From chris.barker at noaa.gov Tue Oct 3 14:24:30 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 3 Oct 2017 12:24:30 -0600 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: <20171003183857.1afa831e@fsol> References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> <20171003183857.1afa831e@fsol> Message-ID: > > > > that was some years ago -- I wonder how much use it's seen? > > You can go a surprisingly long way with Python's built-in and stdlib > containers, so I'm not surprised it's not very widely used. Exactly ? what are the odds that list or deque performance is your bottleneck? However, the barrier to entry for a third party package is still quite a bit higher than the Stdlib. So if a third party package that provides nothing but a performance boost isn?t used much ? that doesn?t mean it wouldn?t be well-used if it were in the stdlib. Note: I am not advocating anything? I haven?t looked at blist at all. I don't know what wheels are supposed to change here. You could already > build binary packages for Windows before wheels existed. Yes, but with a different ecosystem an no virtual environment support. Being able to pip install binary wheels does make things easier for Windows users. The problem as > I understand it is that you need a Windows machine (or VM) together with > the required set of compilers, and have to take the time to run the > builds. Yup ? still the case. Also with Mac and OS-X. Distributing a package with a compiled component is still a lot more work. With conda-forge though, one could simply submit a recipe and have all > builds done automatically in the cloud. Your users then have to use > conda (rather than pip and I?m a big fan of conda and conda-forge. But a similar auto-build system could support binary wheels and pypi as well. And indeed, the scipy folks are doing just that. My point was that the infrastructure for delivering complied packaged is much better than it was even a few years ago. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 4 00:34:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Oct 2017 14:34:24 +1000 Subject: [Python-ideas] Changes to the existing optimization levels In-Reply-To: References: <20170928210248.65a06e4b@fsol> <20171001141924.33263d61@fsol> Message-ID: On 4 October 2017 at 01:42, Diana Clarke wrote: > On Sun, Oct 1, 2017 at 6:19 AM, Antoine Pitrou wrote: >> Well, this is not really a bitfield, but a bitfield plus some irregular >> hardcoded values. Therefore I don't think it brings much in the way of >> discoverability / understandability. >> >> That said, perhaps it makes implementation easier on the C side... > > I think I'm coming to the same conclusion: using bitwise operations > for the optimization levels seems to just boil down to a more cryptic > version of the simple "level 3" solution, with public-facing impacts > to the pycache and existing interfaces etc that I don't think are > worth it in this case. > > My only other thought at the moment, would be to use the existing -X > option to achieve something similar to what I did with the new -N > option, but then just quickly map that back to an integer under the > hood. That is, "-X opt-nodebug -X opt-noassert" would just become > "level 3" internally so that the various interfaces wouldn't have to > change. Sorry, I don't think I was entirely clear as to what my suggestion actually was: * Switch to your suggested "set-of-strings" API at the Python level, with the Python level integer interface retained only for backwards compatibility * Keep the current integer-based *C* optimization API, but redefine the way that value is interpreted, rather than passing Python sets around The Python APIs would then convert the Python level sets to the bitfield representation almost immediately for internal use, but you wouldn't need to mess about with the bitfield yourself when calling the Python APIs. The difference I see relates to the fact that in Python: * sets of strings are easier to work with than integer bitfields * adding a new keyword-only argument to existing APIs is straightforward While in C: * integer bitfields are easier to work with than Python sets of Python strings * supporting a new argument would mean defining a whole new parallel set of APIs Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From ncoghlan at gmail.com Wed Oct 4 00:58:57 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Oct 2017 14:58:57 +1000 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> <20171003183857.1afa831e@fsol> Message-ID: On 4 October 2017 at 04:24, Chris Barker wrote: > [Antoine] >> The problem as >> I understand it is that you need a Windows machine (or VM) together with >> the required set of compilers, and have to take the time to run the >> builds. > > Yup ? still the case. Also with Mac and OS-X. Distributing a package with a > compiled component is still a lot more work. The broad availability & popularity of AppVeyor's free tier is another relevant change compared to a few years ago, and https://github.com/joerick/cibuildwheel is designed to work with that to help projects automate their artifact builds without need to maintain their own cross-platform build infrastructure. So yeah, we're definitely to a point where adding new data structures to the standard library, or new features to existing data structures, is mainly going to be driven by standard library use cases that benefit from them, rather than "binary dependencies are still too hard to publish & manage". (For example, __missing__ made defaultdict easy to implement). For deque specifically, I like Steven D'Aprano's suggestion of a "__dropped__" or "__discard__" subclassing API that makes it straightforward to change the way that queue overruns are handled (especially if raising an exception from the new subclass method can prevent the collection modification entirely - that way you could readily change the deque semantics in a subclass such that if the queue fills up, submitters start getting errors instead of silently discarding older messages, allowing backpressure to be more easily propagated through a system of queues). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From storchaka at gmail.com Wed Oct 4 01:47:46 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 4 Oct 2017 08:47:46 +0300 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> <20171003183857.1afa831e@fsol> Message-ID: 04.10.17 07:58, Nick Coghlan ????: > For deque specifically, I like Steven D'Aprano's suggestion of a > "__dropped__" or "__discard__" subclassing API that makes it > straightforward to change the way that queue overruns are handled > (especially if raising an exception from the new subclass method can > prevent the collection modification entirely - that way you could > readily change the deque semantics in a subclass such that if the > queue fills up, submitters start getting errors instead of silently > discarding older messages, allowing backpressure to be more easily > propagated through a system of queues). Wouldn't this harm performance? Looking up the attribute of the type is more costly than pushing/popping the item in the deque. From solipsis at pitrou.net Wed Oct 4 08:38:31 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Wed, 4 Oct 2017 14:38:31 +0200 Subject: [Python-ideas] Add pop_callback to deque signature References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> <20171003183857.1afa831e@fsol> Message-ID: <20171004143831.55f700cd@fsol> On Wed, 4 Oct 2017 08:47:46 +0300 Serhiy Storchaka wrote: > 04.10.17 07:58, Nick Coghlan ????: > > For deque specifically, I like Steven D'Aprano's suggestion of a > > "__dropped__" or "__discard__" subclassing API that makes it > > straightforward to change the way that queue overruns are handled > > (especially if raising an exception from the new subclass method can > > prevent the collection modification entirely - that way you could > > readily change the deque semantics in a subclass such that if the > > queue fills up, submitters start getting errors instead of silently > > discarding older messages, allowing backpressure to be more easily > > propagated through a system of queues). > > Wouldn't this harm performance? Looking up the attribute of the type is > more costly than pushing/popping the item in the deque. You would only do that for subtypes, so when Py_TYPE(self) is different from the base type. This is a simple pointer comparison. Nick's idea sounds nice to me, as long as there's an actual use case :-) Regards Antoine. From ncoghlan at gmail.com Wed Oct 4 08:48:19 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 4 Oct 2017 22:48:19 +1000 Subject: [Python-ideas] Add pop_callback to deque signature In-Reply-To: References: <17bc5c01-c434-a1d9-cbb9-65daa817ca1f@gmail.com> <20171003110446.GC13110@ando.pearwood.info> <20171003132210.02eca679@fsol> <20171003183857.1afa831e@fsol> Message-ID: On 4 October 2017 at 15:47, Serhiy Storchaka wrote: > 04.10.17 07:58, Nick Coghlan ????: >> >> For deque specifically, I like Steven D'Aprano's suggestion of a >> "__dropped__" or "__discard__" subclassing API that makes it >> straightforward to change the way that queue overruns are handled >> (especially if raising an exception from the new subclass method can >> prevent the collection modification entirely - that way you could >> readily change the deque semantics in a subclass such that if the >> queue fills up, submitters start getting errors instead of silently >> discarding older messages, allowing backpressure to be more easily >> propagated through a system of queues). > > Wouldn't this harm performance? Looking up the attribute of the type is more > costly than pushing/popping the item in the deque. Aye, that would need to be considered, and may push the API towards callback registration on the instance over using a subclassing API. The performance considerations in the dict case are different, since the default behaviour is to raise KeyError, where the cost of checking for the method doesn't matter much either because the command is about to terminate, or else because it is still much faster than instantiating the caught exception. However, most of the performance impact could also be avoided through a PyCheck_Exact that only checks for the method for subclasses, and not for regular deque instances. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia From diana.joan.clarke at gmail.com Thu Oct 5 10:33:53 2017 From: diana.joan.clarke at gmail.com (Diana Clarke) Date: Thu, 5 Oct 2017 08:33:53 -0600 Subject: [Python-ideas] Changes to the existing optimization levels In-Reply-To: References: <20170928210248.65a06e4b@fsol> <20171001141924.33263d61@fsol> Message-ID: Thanks, Nick! I'll let this sink in today and give it a shot tomorrow. Have a great weekend, --diana > * Switch to your suggested "set-of-strings" API at the Python level, > with the Python level integer interface retained only for backwards > compatibility > * Keep the current integer-based *C* optimization API, but redefine > the way that value is interpreted, rather than passing Python sets > around > > The Python APIs would then convert the Python level sets to the > bitfield representation almost immediately for internal use, but you > wouldn't need to mess about with the bitfield yourself when calling > the Python APIs. > > The difference I see relates to the fact that in Python: > > * sets of strings are easier to work with than integer bitfields > * adding a new keyword-only argument to existing APIs is straightforward > > While in C: > > * integer bitfields are easier to work with than Python sets of Python strings > * supporting a new argument would mean defining a whole new parallel set of APIs From jhihn at gmx.com Thu Oct 5 11:40:32 2017 From: jhihn at gmx.com (Jason H) Date: Thu, 5 Oct 2017 17:40:32 +0200 Subject: [Python-ideas] Ternary operators in list comprehensions Message-ID: >>> a = [1,2,3] >>> [ x for x in a if x & 1] [1, 3] >>> [ x for x in a if x & 1 else 'even'] File "", line 1 [ x for x in a if x & 1 else 'even'] ^ SyntaxError: invalid syntax I expected [1, 'even', 3] I would expect that the if expression would be able to provide alternative values through else. The work around blows it out to: l = [] for x in a: if x&1: l.append(x) else: l.append('even') Unless there is a better way? From jelle.zijlstra at gmail.com Thu Oct 5 11:44:47 2017 From: jelle.zijlstra at gmail.com (Jelle Zijlstra) Date: Thu, 5 Oct 2017 08:44:47 -0700 Subject: [Python-ideas] Ternary operators in list comprehensions In-Reply-To: References: Message-ID: [x if x & 1 else 'even' for x in a] An `if` at the end of the comprehension means a condition on whether to include the value. Also, this question would have been better asked on python-list. 2017-10-05 8:40 GMT-07:00 Jason H : > >>> a = [1,2,3] > >>> [ x for x in a if x & 1] > [1, 3] > >>> [ x for x in a if x & 1 else 'even'] > File "", line 1 > [ x for x in a if x & 1 else 'even'] > ^ > SyntaxError: invalid syntax > > I expected [1, 'even', 3] > > I would expect that the if expression would be able to provide alternative > values through else. > > The work around blows it out to: > l = [] > for x in a: > if x&1: > l.append(x) > else: > l.append('even') > > > Unless there is a better way? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Oct 5 11:45:29 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 5 Oct 2017 16:45:29 +0100 Subject: [Python-ideas] Ternary operators in list comprehensions In-Reply-To: References: Message-ID: >>> a = [1,2,3] >>> [x if x & 1 else 'even' for x in a] [1, 'even', 3] You're mixing the if clause of the list comprehension up with a ternary expresssion. There's no "else" in the list comprehension if clause. Paul On 5 October 2017 at 16:40, Jason H wrote: >>>> a = [1,2,3] >>>> [ x for x in a if x & 1] > [1, 3] >>>> [ x for x in a if x & 1 else 'even'] > File "", line 1 > [ x for x in a if x & 1 else 'even'] > ^ > SyntaxError: invalid syntax > > I expected [1, 'even', 3] > > I would expect that the if expression would be able to provide alternative values through else. > > The work around blows it out to: > l = [] > for x in a: > if x&1: > l.append(x) > else: > l.append('even') > > > Unless there is a better way? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From rob.cliffe at btinternet.com Thu Oct 5 12:24:42 2017 From: rob.cliffe at btinternet.com (Rob Cliffe) Date: Thu, 5 Oct 2017 17:24:42 +0100 Subject: [Python-ideas] Ternary operators in list comprehensions In-Reply-To: References: Message-ID: <9313b8bd-773f-13d5-d18d-0c156888f5b1@btinternet.com> Putting it another way, your example doesn't make sense.? How would you parenthesise it to make it clearer? ??? [ (x for x? in a if x & 1) else 'even'] ??? You have an "else" without an "if". ??? [ x for x? in (a if x & 1 else 'even')] ??? Using x before it has been defined, at least in this line of code. ??? [ (x for x? in a) if x & 1 else 'even']???? Ditto Other variants may be possible.? Whereas Jelle's correct version can be written as ??? [(x if x & 1 else 'even') for x in a]??? ??? [1, 'even', 3] Rob Cliffe On 05/10/2017 16:44, Jelle Zijlstra wrote: > [x if x & 1 else 'even' for x in a] > > An `if` at the end of the comprehension means a condition on whether > to include the value. > > Also, this question would have been better asked on python-list. > > 2017-10-05 8:40 GMT-07:00 Jason H >: > > >>> a = [1,2,3] > >>> [ x for x? in a if x & 1] > [1, 3] > >>> [ x for x? in a if x & 1 else 'even'] > ? File "", line 1 > ? ? [ x for x? in a if x & 1 else 'even'] > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ^ > SyntaxError: invalid syntax > > I expected [1, 'even', 3] > > I would expect that the if expression would be able to provide > alternative values through else. > > The work around blows it out to: > l = [] > for x in a: > ? if x&1: > ? ? l.append(x) > ? else: > ? ? l.append('even') > > > Unless there is a better way? > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > Virus-free. www.avg.com > > > > <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2> > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Thu Oct 5 13:00:22 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Fri, 6 Oct 2017 04:00:22 +1100 Subject: [Python-ideas] Ternary operators in list comprehensions In-Reply-To: References: Message-ID: <20171005170022.GC9068@ando.pearwood.info> On Thu, Oct 05, 2017 at 05:40:32PM +0200, Jason H wrote: > >>> a = [1,2,3] > >>> [ x for x in a if x & 1] > [1, 3] > >>> [ x for x in a if x & 1 else 'even'] > File "", line 1 > [ x for x in a if x & 1 else 'even'] > ^ > SyntaxError: invalid syntax [(x if x & 1 else 'even') for x in a] The if clause in the list comprehension determines which items are included, and there is no "else" allowed. You don't want to skip any values: the list comp should have the same number of items as "a" (namely, 3) the comprehension if clause is inappropriate. Instead, you want the value in the comprehension to conditionally depend on x. By the way, this list is for suggesting improvements and new functionality to the language, not for asking for help. You should consider asking your questions on python-list or tutor mailing lists instead. -- Steve From ethan at ethanhs.me Fri Oct 6 16:00:40 2017 From: ethan at ethanhs.me (Ethan Smith) Date: Fri, 6 Oct 2017 13:00:40 -0700 Subject: [Python-ideas] PEP 561: Distributing Type Information V3 Message-ID: Hello, I have made some changes to my PEP on distributing type information. A summary of the changes: - Move to adding a new metadata specifier so that more packaging tools can participate - Clarify version matching between third party stubs and runtime packages. - various other fixes for clarity, readability, and removal of repetition As usual I have replicated a copy below. Cheers, Ethan PEP: 561 Title: Distributing and Packaging Type Information Author: Ethan Smith Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 09-Sep-2017 Python-Version: 3.7 Post-History: Abstract ======== PEP 484 introduced type hinting to Python, with goals of making typing gradual and easy to adopt. Currently, typing information must be distributed manually. This PEP provides a standardized means to package and distribute type information and an ordering for type checkers to resolve modules and collect this information for type checking using existing packaging architecture. Rationale ========= Currently, package authors wish to distribute code that has inline type information. However, there is no standard method to distribute packages with inline type annotations or syntax that can simultaneously be used at runtime and in type checking. Additionally, if one wished to ship typing information privately the only method would be via setting ``MYPYPATH`` or the equivalent to manually point to stubs. If the package can be released publicly, it can be added to typeshed [1]_. However, this does not scale and becomes a burden on the maintainers of typeshed. Additionally, it ties bugfixes to releases of the tool using typeshed. PEP 484 has a brief section on distributing typing information. In this section [2]_ the PEP recommends using ``shared/typehints/pythonX.Y/`` for shipping stub files. However, manually adding a path to stub files for each third party library does not scale. The simplest approach people have taken is to add ``site-packages`` to their ``MYPYPATH``, but this causes type checkers to fail on packages that are highly dynamic (e.g. sqlalchemy and Django). Specification ============= There are several motivations and methods of supporting typing in a package. This PEP recognizes three (3) types of packages that may be created: 1. The package maintainer would like to add type information inline. 2. The package maintainer would like to add type information via stubs. 3. A third party would like to share stub files for a package, but the maintainer does not want to include them in the source of the package. This PEP aims to support these scenarios and make them simple to add to packaging and deployment. The two major parts of this specification are the packaging specifications and the resolution order for resolving module type information. The packaging spec is based on and extends PEP 345 metadata. The type checking spec is meant to replace the ``shared/typehints/pythonX.Y/`` spec of PEP 484 [2]_. New third party stub libraries are encouraged to distribute stubs via the third party packaging proposed in this PEP in place of being added to typeshed. Typeshed will remain in use, but if maintainers are found, third party stubs in typeshed are encouraged to be split into their own package. Packaging Type Information -------------------------- In order to make packaging and distributing type information as simple and easy as possible, the distribution of type information, and typed Python code is done through existing packaging frameworks. This PEP adds a new item to the ``*.distinfo/METADATA`` file to contain metadata about a package's support for typing. The new item is optional, but must have a name of ``Typed`` and have a value of either ``inline`` or ``stubs``, if present. Metadata Examples:: Typed: inline Typed: stubs Stub Only Packages '''''''''''''''''' For package maintainers wishing to ship stub files containing all of their type information, it is prefered that the ``*.pyi`` stubs are alongside the corresponding ``*.py`` files. However, the stubs may be put in a sub-folder of the Python sources, with the same name the ``*.py`` files are in. For example, the ``flyingcircus`` package would have its stubs in the folder ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are not found in ``flyingcircus/`` the type checker may treat the subdirectory as a normal package. The normal resolution order of checking ``*.pyi`` before ``*.py`` will be maintained. Third Party Stub Packages ''''''''''''''''''''''''' Third parties seeking to distribute stub files are encouraged to contact the maintainer of the package about distribution alongside the package. If the maintainer does not wish to maintain or package stub files or type information inline, then a "third party stub package" should be created. The structure is similar, but slightly different from that of stub only packages. If the stubs are for the library ``flyingcircus`` then the package should be named ``flyingcircus-stubs`` and the stub files should be put in a sub-directory named ``flyingcircus``. This allows the stubs to be checked as if they were in a regular package. In addition, the third party stub package should indicate which version(s) of the runtime package are supported by indicating the runtime package's version(s) through the normal dependency data. For example, if there was a stub package ``flyingcircus-stubs``, it can indicate the versions of the runtime ``flyingcircus`` package supported through ``install_requires`` in distutils based tools, or the equivalent in other packaging tools. Type Checker Module Resolution Order ------------------------------------ The following is the order that type checkers supporting this PEP should resolve modules containing type information: 1. User code - the files the type checker is running on. 2. Stubs or Python source manually put in the beginning of the path. Type checkers should provide this to allow the user complete control of which stubs to use, and patch broken stubs/inline types from packages. 3. Third party stub packages - these packages can supersede the installed untyped packages. They can be found at ``pkg-stubs`` for package ``pkg``, however it is encouraged to check the package's metadata using packaging query APIs such as ``pkg_resources`` to assure that the package is meant for type checking, and is compatible with the installed version. 4. Inline packages - finally, if there is nothing overriding the installed package, and it opts into type checking. 5. Typeshed (if used) - Provides the stdlib types and several third party libraries Type checkers that check a different Python version than the version they run on must find the type information in the ``site-packages``/``dist-packages`` of that Python version. This can be queried e.g. ``pythonX.Y -c 'import site; print(site.getsitepackages())'``. It is also recommended that the type checker allow for the user to point to a particular Python binary, in case it is not in the path. To check if a package has opted into type checking, type checkers are recommended to use the ``pkg_resources`` module to query the package metadata. If the ``typed`` package metadata has ``None`` as its value, the package has not opted into type checking, and the type checker should skip that package. References ========== .. [1] Typeshed (https://github.com/python/typeshed) .. [2] PEP 484, Storing and Distributing Stub Files (https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sat Oct 7 15:20:02 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sat, 7 Oct 2017 22:20:02 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: Hi all, Thank you for the feedback so far. FYI, or as a reminder, this is now PEP 555, but the web version is still the same draft that I posted here. ?The discussion of this was paused as there was a lot going on at that moment, but I'm now getting ready to make a next version of the draft.? Below, I'll draft some changes I intend to make so they can already be discussed. First of all, I'm considering calling the concept "context arguments" instead of "context variables", because that describes the concept better. But see below for some more. On Tue, Sep 5, 2017 at 12:50 AM, Koos Zevenhoven wrote: > Hi all, > > as promised, here is a draft PEP for context variable semantics and > implementation. Apologies for the slight delay; I had a not-so-minor > autosave accident and had to retype the majority of this first draft. > > During the past years, there has been growing interest in something like > task-local storage or async-local storage. This PEP proposes an alternative > approach to solving the problems that are typically stated as motivation > for such concepts. > > This proposal is based on sketches of solutions since spring 2015, with > some minor influences from the recent discussion related to PEP 550. I can > also see some potential implementation synergy between this PEP and PEP > 550, even if the proposed semantics are quite different. > > So, here it is. This is the first draft and some things are still missing, > but the essential things should be there. > > -- Koos > > |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| > > PEP: 999 > Title: Context-local variables (contextvars) > Version: $Revision$ > Last-Modified: $Date$ > Author: Koos Zevenhoven > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: DD-Mmm-YYYY > Post-History: DD-Mmm-YYYY > > > Abstract > ======== > > Sometimes, in special cases, it is desired that code can pass information > down the function call chain to the callees without having to explicitly > pass the information as arguments to each function in the call chain. This > proposal describes a construct which allows code to explicitly switch in > and out of a context where a certain context variable has a given value > assigned to it. This is a modern alternative to some uses of things like > global variables in traditional single-threaded (or thread-unsafe) code and > of thread-local storage in traditional *concurrency-unsafe* code (single- > or multi-threaded). In particular, the proposed mechanism can also be used > with more modern concurrent execution mechanisms such as asynchronously > executed coroutines, without the concurrently executed call chains > interfering with each other's contexts. > > The "call chain" can consist of normal functions, awaited coroutines, or > generators. The semantics of context variable scope are equivalent in all > cases, allowing code to be refactored freely into *subroutines* (which here > refers to functions, sub-generators or sub-coroutines) without affecting > the semantics of context variables. Regarding implementation, this proposal > aims at simplicity and minimum changes to the CPython interpreter and to > other Python interpreters. > > Rationale > ========= > > Consider a modern Python *call chain* (or call tree), which in this > proposal refers to any chained (nested) execution of *subroutines*, using > any possible combinations of normal function calls, or expressions using > ``await`` or ``yield from``. In some cases, passing necessary *information* > down the call chain as arguments can substantially complicate the required > function signatures, or it can even be impossible to achieve in practice. > In these cases, one may search for another place to store this information. > Let us look at some historical examples. > > The most naive option is to assign the value to a global variable or > similar, where the code down the call chain can access it. However, this > immediately makes the code thread-unsafe, because with multiple threads, > all threads assign to the same global variable, and another thread can > interfere at any point in the call chain. > > A somewhat less naive option is to store the information as per-thread > information in thread-local storage, where each thread has its own "copy" > of the variable which other threads cannot interfere with. Although > non-ideal, this has been the best solution in many cases. However, thanks > to generators and coroutines, the execution of the call chain can be > suspended and resumed, allowing code in other contexts to run concurrently. > Therefore, using thread-local storage is *concurrency-unsafe*, because > other call chains in other contexts may interfere with the thread-local > variable. > > Note that in the above two historical approaches, the stored information > has the *widest* available scope without causing problems. For a third > solution along the same path, one would first define an equivalent of a > "thread" for asynchronous execution and concurrency. This could be seen as > the largest amount of code and nested calls that is guaranteed to be > executed sequentially without ambiguity in execution order. This might be > referred to as concurrency-local or task-local storage. In this meaning of > "task", there is no ambiguity in the order of execution of the code within > one task. (This concept of a task is close to equivalent to a ``Task`` in > ``asyncio``, but not exactly.) In such concurrency-locals, it is possible > to pass information down the call chain to callees without another code > path interfering with the value in the background. > > Common to the above approaches is that they indeed use variables with a > wide but just-narrow-enough scope. Thread-locals could also be called > thread-wide globals---in single-threaded code, they are indeed truly > global. And task-locals could be called task-wide globals, because tasks > can be very big. > > The issue here is that neither global variables, thread-locals nor > task-locals are really meant to be used for this purpose of passing > information of the execution context down the call chain. Instead of the > widest possible variable scope, the scope of the variables should be > controlled by the programmer, typically of a library, to have the desired > scope---not wider. In other words, task-local variables (and globals and > thread-locals) have nothing to do with the kind of context-bound > information passing that this proposal intends to enable, even if > task-locals can be used to emulate the desired semantics. Therefore, in the > following, this proposal describes the semantics and the outlines of an > implementation for *context-local variables* (or context variables, > contextvars). In fact, as a side effect of this PEP, an async framework can > use the proposed feature to implement task-local variables. > > Proposal > ======== > > Because the proposed semantics are not a direct extension to anything > already available in Python, this proposal is first described in terms of > semantics and API at a fairly high level. In particular, Python ``with`` > statements are heavily used in the description, as they are a good match > with the proposed semantics. However, the underlying ``__enter__`` and > ``__exit__`` methods correspond to functions in the lower-level > speed-optimized (C) API. For clarity of this document, the lower-level > functions are not explicitly named in the definition of the semantics. > After describing the semantics and high-level API, the implementation is > described, going to a lower level. > > Semantics and higher-level API > ------------------------------ > > Core concept > '''''''''''' > > A context-local variable is represented by a single instance of > ``contextvars.Var``, say ``cvar``. Any code that has access to the ``cvar`` > object can ask for its value with respect to the current context. In the > high-level API, this value is given by the ``cvar.value`` property:: > > cvar = contextvars.Var(default="the default value", > description="example context variable") > > ?Some points related to the arguments and naming:? Indeed, this might change to contextvars.Arg. After all, these are more like arguments than variables. But just like with function arguments, you can use a mutable value, which allows more variable-like semantics. That is, however, not the primarily intended use. It may also cause more problems at inter-process or inter-interpreter boundaries etc., where direct mutation of objects may not be possible.? ?I might have to remove the ``default`` argument, at least in this form. If there is a default, it should be more explicit what the scope of the default is. There could be thread-wide defaults or interpreter-wide defaults and so on.? It is not completely clear what a truly global default would mean. One way to deal with this would be to always pass the context on to other threads and processes etc when they are created. But there are some ambiguities here too, so the safest way might be to let the user implement the desired semantics regarding defaults and thread boundaries etc. > assert cvar.value == "the default value" # default still applies > > # In code examples, all ``assert`` statements should > # succeed according to the proposed semantics. > > > No assignments to ``cvar`` have been applied for this context, so > ``cvar.value`` gives the default value. Assigning new values to contextvars > is done in a highly scope-aware manner:: > > with cvar.assign(new_value): > assert cvar.value is new_value > # Any code here, or down the call chain from here, sees: > # cvar.value is new_value > # unless another value has been assigned in a > # nested context > assert cvar.value is new_value > # the assignment of ``cvar`` to ``new_value`` is no longer visible > assert cvar.value == "the default value" > > > Here, ``cvar.assign(value)`` returns another object, namely > ``contextvars.Assignment(cvar, new_value)``. The essential part here is > that applying a context variable assignment (``Assignment.__enter__``) is > paired with a de-assignment (``Assignment.__exit__``). These operations set > the bounds for the scope of the assigned value. > > Assignments to the same context variable can be nested to override the > outer assignment in a narrower context:: > > assert cvar.value == "the default value" > with cvar.assign("outer"): > assert cvar.value == "outer" > with cvar.assign("inner"): > assert cvar.value == "inner" > assert cvar.value == "outer" > assert cvar.value == "the default value" > > > Also multiple variables can be assigned to in a nested manner without > affecting each other:: > > cvar1 = contextvars.Var() > cvar2 = contextvars.Var() > > assert cvar1.value is None # default is None by default > assert cvar2.value is None > > with cvar1.assign(value1): > assert cvar1.value is value1 > assert cvar2.value is None > with cvar2.assign(value2): > assert cvar1.value is value1 > assert cvar2.value is value2 > assert cvar1.value is value1 > assert cvar2.value is None > assert cvar1.value is None > assert cvar2.value is None > > Or with more convenient Python syntax:: > > with cvar1.assign(value1), cvar2.assign(value2): > assert cvar1.value is value1 > assert cvar2.value is value2 > > In another *context*, in another thread or otherwise concurrently executed > task or code path, the context variables can have a completely different > state. The programmer thus only needs to worry about the context at hand. > > Refactoring into subroutines > '''''''''''''''''''''''''''' > > Code using contextvars can be refactored into subroutines without > affecting the semantics. For instance:: > > assi = cvar.assign(new_value) > def apply(): > assi.__enter__() > assert cvar.value == "the default value" > apply() > assert cvar.value is new_value > assi.__exit__() > assert cvar.value == "the default value" > > > Or similarly in an asynchronous context where ``await`` expressions are > used. The subroutine can now be a coroutine:: > > assi = cvar.assign(new_value) > async def apply(): > assi.__enter__() > assert cvar.value == "the default value" > await apply() > assert cvar.value is new_value > assi.__exit__() > assert cvar.value == "the default value" > > > Or when the subroutine is a generator:: > > def apply(): > yield > assi.__enter__() > > > which is called using ``yield from apply()`` or with calls to ``next`` or > ``.send``. This is discussed further in later sections. > > Semantics for generators and generator-based coroutines > ''''''''''''''''''''''''''''''''''''''''''''''''''''''' > > Generators, coroutines and async generators act as subroutines in much the > same way that normal functions do. However, they have the additional > possibility of being suspended by ``yield`` expressions. Assignment > contexts entered inside a generator are normally preserved across yields:: > > def genfunc(): > with cvar.assign(new_value): > assert cvar.value is new_value > yield > assert cvar.value is new_value > g = genfunc() > next(g) > assert cvar.value == "the default value" > with cvar.assign(another_value): > next(g) > > > However, the outer context visible to the generator may change state > across yields:: > > def genfunc(): > assert cvar.value is value2 > yield > assert cvar.value is value1 > yield > with cvar.assign(value3): > assert cvar.value is value3 > > with cvar.assign(value1): > g = genfunc() > with cvar.assign(value2): > next(g) > next(g) > next(g) > assert cvar.value is value1 > > > Similar semantics apply to async generators defined by ``async def ... > yield ...`` ). > > By default, values assigned inside a generator do not leak through yields > to the code that drives the generator. However, the assignment contexts > entered and left open inside the generator *do* become visible outside the > generator after the generator has finished with a ``StopIteration`` or > another exception:: > > assi = cvar.assign(new_value) > def genfunc(): > yield > assi.__enter__(): > yield > > g = genfunc() > assert cvar.value == "the default value" > next(g) > assert cvar.value == "the default value" > next(g) # assi.__enter__() is called here > assert cvar.value == "the default value" > next(g) > assert cvar.value is new_value > assi.__exit__() > > > > Special functionality for framework authors > ------------------------------------------- > > Frameworks, such as ``asyncio`` or third-party libraries, can use > additional functionality in ``contextvars`` to achieve the desired > semantics in cases which are not determined by the Python interpreter. Some > of the semantics described in this section are also afterwards used to > describe the internal implementation. > > Leaking yields > '''''''''''''' > > Using the ``contextvars.leaking_yields`` decorator, one can choose to leak > the context through ``yield`` expressions into the outer context that > drives the generator:: > > @contextvars.leaking_yields > def genfunc(): > assert cvar.value == "outer" > with cvar.assign("inner"): > yield > assert cvar.value == "inner" > assert cvar.value == "outer" > > g = genfunc(): > with cvar.assign("outer"): > assert cvar.value == "outer" > next(g) > assert cvar.value == "inner" > next(g) > assert cvar.value == "outer" > > > ?Unfortunately, we actually need a third kind of generator semantics, something like this: @?contextvars.caller_context def genfunc(): assert cvar.value is the_value yield assert cvar.value is the_value with cvar.assign(the_value): gen = genfunc() next(gen) with cvar.assign(1234567890): try: next(gen) except StopIteration: pass Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly missed the reasons for this in discussions related to PEP 550. Perhaps because we had mostly been looking at it from an async angle. [In addition to this, all context changes (Assignment __enter__ or __exit__) would be leaked out when the generator finishes iff there are no outer context changes. If there are outer context changes, an attempt to leak changes will fail. (I will probably need to explain this better).] ? > Capturing contextvar assignments > '''''''''''''''''''''''''''''''' > > Using ``contextvars.capture()``, one can capture the assignment contexts > that are entered by a block of code. The changes applied by the block of > code can then be reverted and subsequently reapplied, even in another > context:: > > assert cvar1.value is None # default > assert cvar2.value is None # default > assi1 = cvar1.assign(value1) > assi2 = cvar1.assign(value2) > with contextvars.capture() as delta: > assi1.__enter__() > with cvar2.assign("not captured"): > assert cvar2.value is "not captured" > assi2.__enter__() > assert cvar1.value is value2 > delta.revert() > assert cvar1.value is None > assert cvar2.value is None > ... > with cvar1.assign(1), cvar2.assign(2): > delta.reapply() > assert cvar1.value is value2 > assert cvar2.value == 2 > > > However, reapplying the "delta" if its net contents include deassignments > may not be possible (see also Implementation and Open Issues). > > > Getting a snapshot of context state > ''''''''''''''''''''''''''''''''''' > > The function ``contextvars.get_local_state()`` returns an object > representing the applied assignments to all context-local variables in the > context where the function is called. This can be seen as equivalent to > using ``contextvars.capture()`` to capture all context changes from the > beginning of execution. The returned object supports methods ``.revert()`` > and ``reapply()`` as above. > > ?We will probably need also a ``use()`` method (or another name) here. That would return a context manager that applies the full context on __enter__ and reapplies the previous one on __exit__. > > Running code in a clean state > ''''''''''''''''''''''''''''' > > Although it is possible to revert all applied context changes using the > above primitives, a more convenient way to run a block of code in a clean > context is provided:: > > with context_vars.clean_context(): > # here, all context vars start off with their default values > # here, the state is back to what it was before the with block. > > > ? > ?As an additional tool, there could be contextvars.callback: @contextvars.callback def some_callback(): # do stuff This would provide some of the functionality of this PEP if callbacks are used, so that the callback would be run with the same context as the code that creates the callback. The implementation of this would be essentially: def callback(func): context = contextvars.get_local_context(): def wrapped(*args, **kwargs): with context.use(): func(*args, **kwargs return wrapped With some trickery this might allow an async framework based on callbacks instead of coroutines to use context arguments. But using this m?ight be a bit awkward sometimes. A contextlib.ExitStack might help here. ? > Implementation > -------------- > > This section describes to a variable level of detail how the described > semantics can be implemented. At present, an implementation aimed at > simplicity but sufficient features is described. More details will be added > later. > > Alternatively, a somewhat more complicated implementation offers minor > additional features while adding some performance overhead and requiring > more code in the implementation. > > Data structures and implementation of the core concept > '''''''''''''''''''''''''''''''''''''''''''''''''''''' > > Each thread of the Python interpreter keeps its on stack of > ``contextvars.Assignment`` objects, each having a pointer to the previous > (outer) assignment like in a linked list. The local state (also returned by > ``contextvars.get_local_state()``) then consists of a reference to the > top of the stack and a pointer/weak reference to the bottom of the stack. > This allows efficient stack manipulations. An object produced by > ``contextvars.capture()`` is similar, but refers to only a part of the > stack with the bottom reference pointing to the top of the stack as it was > in the beginning of the capture block. > > Now, the stack evolves according to the assignment ``__enter__`` and > ``__exit__`` methods. For example:: > > cvar1 = contextvars.Var() > cvar2 = contextvars.Var() > # stack: [] > assert cvar1.value is None > assert cvar2.value is None > > with cvar1.assign("outer"): > # stack: [Assignment(cvar1, "outer")] > assert cvar1.value == "outer" > > with cvar1.assign("inner"): > # stack: [Assignment(cvar1, "outer"), > # Assignment(cvar1, "inner")] > assert cvar1.value == "inner" > > with cvar2.assign("hello"): > # stack: [Assignment(cvar1, "outer"), > # Assignment(cvar1, "inner"), > # Assignment(cvar2, "hello")] > assert cvar2.value == "hello" > > # stack: [Assignment(cvar1, "outer"), > # Assignment(cvar1, "inner")] > assert cvar1.value == "inner" > assert cvar2.value is None > > # stack: [Assignment(cvar1, "outer")] > assert cvar1.value == "outer" > > # stack: [] > assert cvar1.value is None > assert cvar2.value is None > > > Getting a value from the context using ``cvar1.value`` can be implemented > as finding the topmost occurrence of a ``cvar1`` assignment on the stack > and returning the value there, or the default value if no assignment is > found on the stack. However, this can be optimized to instead be an O(1) > operation in most cases. Still, even searching through the stack may be > reasonably fast since these stacks are not intended to grow very large. > ?I will still need to explain the O(1) algorithm, but one nice thing is that an implementation like micropython does not necessarily need to include that optimization.? > > The above description is already sufficient for implementing the core > concept. Suspendable frames require some additional attention, as explained > in the following. > > Implementation of generator and coroutine semantics > ''''''''''''''''''''''''''''''''''''''''''''''''''' > > Within generators, coroutines and async generators, assignments and > deassignments are handled in exactly the same way as anywhere else. > However, some changes are needed in the builtin generator methods ``send``, > ``__next__``, ``throw`` and ``close``. Here is the Python equivalent of the > changes needed in ``send`` for a generator (here ``_old_send`` refers to > the behavior in Python 3.6):: > > def send(self, value): > # if decorated with contextvars.leaking_yields > if self.gi_contextvars is LEAK: > # nothing needs to be done to leak context through yields :) > return self._old_send(value) > try: > with contextvars.capture() as delta: > if self.gi_contextvars: > # non-zero captured content from previous iteration > self.gi_contextvars.reapply() > ret = self._old_send(value) > except Exception: > raise > else: > # suspending, revert context changes but > delta.revert() > self.gi_contextvars = delta > return ret > > > The corresponding modifications to the other methods is essentially > identical. The same applies to coroutines and async generators. > > For code that does not use ``contextvars``, the additions are O(1) and > essentially reduce to a couple of pointer comparisons. For code that does > use ``contextvars``, the additions are still O(1) in most cases. > > More on implementation > '''''''''''''''''''''' > > The rest of the functionality, including ``contextvars.leaking_yields``, > contextvars.capture()``, ``contextvars.get_local_state()`` and > ``contextvars.clean_context()`` are in fact quite straightforward to > implement, but their implementation will be discussed further in later > versions of this proposal. Caching of assigned values is somewhat more > complicated, and will be discussed later, but it seems that most cases > should achieve O(1) complexity. > > Backwards compatibility > ======================= > > There are no *direct* backwards-compatibility concerns, since a completely > new feature is proposed. > > However, various traditional uses of thread-local storage may need a > smooth transition to ``contextvars`` so they can be concurrency-safe. There > are several approaches to this, including emulating task-local storage with > a little bit of help from async frameworks. A fully general implementation > cannot be provided, because the desired semantics may depend on the design > of the framework. > > ?I have a preliminary design for this, but probably doesn't need to be in this PEP.? > Another way to deal with the transition is for code to first look for a > context created using ``contextvars``. If that fails because a new-style > context has not been set or because the code runs on an older Python > version, a fallback to thread-local storage is used. > > ?If context variables are renamed context arguments, then there could be a settable variant called a context variable (could also be a third-party thing on top of context arguments, depending on what is done with decimal contexts).? > > Open Issues > =========== > > Out-of-order de-assignments > --------------------------- > > In this proposal, all variable deassignments are made in the opposite > order compared to the preceding assignments. This has two useful > properties: it encourages using ``with`` statements to define assignment > scope and has a tendency to catch errors early (forgetting a > ``.__exit__()`` call often results in a meaningful error. To have this as a > requirement requirement is beneficial also in terms of implementation > simplicity and performance. Nevertheless, allowing out-of-order context > exits is not completely out of the question, and reasonable implementation > strategies for that do exist. > > ? > Rejected Ideas > ============== > > Dynamic scoping linked to subroutine scopes > ------------------------------------------- > > The scope of value visibility should not be determined by the way the code > is refactored into subroutines. It is necessary to have per-variable > control of the assignment scope. > > ?In fact, in early sketches, my approach was closer to this. The context variables (or async variables) were stored in frame locals in a namespace called `__async__` and they were propagated through subroutine calls to callees. But this introduces problems when new scope layers are added, and ended up being more complicated (and slightly similar to PEP 550). Anyway, for starters, this was a glimpse of the changes I have planned, and open for discussion. -- Koos ?-- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Oct 7 17:16:48 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 7 Oct 2017 14:16:48 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: ?Unfortunately, we actually need a third kind of generator semantics, something like this: @?contextvars.caller_context def genfunc(): assert cvar.value is the_value yield assert cvar.value is the_value with cvar.assign(the_value): gen = genfunc() next(gen) with cvar.assign(1234567890): try: next(gen) except StopIteration: pass Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly missed the reasons for this in discussions related to PEP 550. Perhaps because we had mostly been looking at it from an async angle. That's certainly a semantics that one can write down (and it's what the very first version of PEP 550 did), but why do you say it's needed? What are these reasons that were missed? Do you have a use case? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sat Oct 7 18:40:22 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 8 Oct 2017 01:40:22 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Sun, Oct 8, 2017 at 12:16 AM, Nathaniel Smith wrote: > On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: > > > ?Unfortunately, we actually need a third kind of generator semantics, > something like this: > > @?contextvars.caller_context > def genfunc(): > assert cvar.value is the_value > yield > assert cvar.value is the_value > > with cvar.assign(the_value): > gen = genfunc() > > next(gen) > > with cvar.assign(1234567890): > try: > next(gen) > except StopIteration: > pass > > Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly > missed the reasons for this in discussions related to PEP 550. Perhaps > because we had mostly been looking at it from an async angle. > > > That's certainly a semantics that one can write down (and it's what the > very first version of PEP 550 did), > ??I do remember Yury mentioning that the first draft of PEP 550 captured something when the generator function was called. I think I started reading the discussions after that had already been removed, so I don't know exactly what it was. But I doubt that it was *exactly* the above, because PEP 550 uses set and get operations instead of "assignment contexts" like PEP 555 (this one) does. ?? > but why do you say it's needed? What are these reasons that were missed? > Do you have a use case? > > ?Yes, there's a type of use case. When you think of a generator function as a function that returns an iterable of values and you don't care about whether the values are computed lazily or not. In that case, you don't want next() or .send() to affect the context inside the generator. ?In terms of code, we might want this: def values(): # compute some values using cvar.value and return a_list_of_values with cvar.assign(something): data = values() datalist = list(data) ...to be equivalent to: def values(): # compute some values using cvar.value and # yield them one by one ?with cvar.assign(something): data = values() datalist = list(data) So we don't want the "lazy evaluation" of generators to affect the values "in" the iterable. But I think we had our minds too deep in event loops and chains of coroutines and async generators to realize this. Initially, this seems to do the wrong thing in many other cases, but in fact, with the right extension to this behavior, we get the right thing in almost all situtations. We still do need the other generator behaviors described in PEP 555, for async and other uses, but I would probably go as far as making this new one the default. But I kept the decorator syntax for now. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 9 02:46:13 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 9 Oct 2017 16:46:13 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 8 October 2017 at 08:40, Koos Zevenhoven wrote: > On Sun, Oct 8, 2017 at 12:16 AM, Nathaniel Smith wrote: > >> On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: >> >> >> ?Unfortunately, we actually need a third kind of generator semantics, >> something like this: >> >> @?contextvars.caller_context >> def genfunc(): >> assert cvar.value is the_value >> yield >> assert cvar.value is the_value >> >> with cvar.assign(the_value): >> gen = genfunc() >> >> next(gen) >> >> with cvar.assign(1234567890): >> try: >> next(gen) >> except StopIteration: >> pass >> >> Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly >> missed the reasons for this in discussions related to PEP 550. Perhaps >> because we had mostly been looking at it from an async angle. >> >> >> That's certainly a semantics that one can write down (and it's what the >> very first version of PEP 550 did), >> > > ??I do remember Yury mentioning that the first draft of PEP 550 captured > something when the generator function was called. I think I started reading > the discussions after that had already been removed, so I don't know > exactly what it was. But I doubt that it was *exactly* the above, because > PEP 550 uses set and get operations instead of "assignment contexts" like > PEP 555 (this one) does. ?? > We didn't forget it, we just don't think it's very useful. However, if you really want those semantics under PEP 550, you can do something like this: def use_creation_context(g): @functools.wraps(g) def make_generator_wrapper(*args, **kwds): gi = g(*args, **kwds) return _GeneratorWithCapturedEC(gi) return make_generator_wrapper class _GeneratorWithCapturedEC: def __init__(self, gi): self._gi = gi self._ec = contextvars.get_execution_context() def __next__(self): return self.send(None) def send(self, value): return contextvars.run_with_execution_context(self.ec, self._gi.send, value) def throw(self, *exc_details): return contextvars.run_with_execution_context(self.ec, self._gi.throw, *exc_details) def close(self): return self.throw(GeneratorExit) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexandre.galode at gmail.com Mon Oct 9 06:23:02 2017 From: alexandre.galode at gmail.com (alexandre.galode at gmail.com) Date: Mon, 9 Oct 2017 03:23:02 -0700 (PDT) Subject: [Python-ideas] Fwd: Fwd: A PEP to define basical metric which allows to guarantee minimal code quality In-Reply-To: References: <4c594053-c431-4b29-a8ac-d5c422e6a06f@googlegroups.com> <20170915101954.GF13110@ando.pearwood.info> <-1213469140971527789@unknownmsgid> Message-ID: Hi, After some reflexion on this full thread, with all your arguments and discussion with my team, i have finally a better understanding on PEP finality. I saw that PEP 8 & 20 i used as example are "specials" PEP. So i let my idea here, and eventually, as previously suggested, i'll contact PYCQA. Thank you very much everybody for your help and your attention :) Le mardi 26 septembre 2017 04:54:45 UTC+2, Nick Coghlan a ?crit : > > Forwarding my reply, since Google Groups still can't get the Reply-To > headers for the mailing list right, and we still don't know how to > categorically prohibit posting from there. > > ---------- Forwarded message ---------- > From: Nick Coghlan > > Date: 26 September 2017 at 12:51 > Subject: Re: [Python-ideas] Fwd: A PEP to define basical metric which > allows to guarantee minimal code quality > To: Alexandre GALODE > > Cc: python-ideas > > > > On 25 September 2017 at 21:49, > > wrote: > > Hi, > > > > Sorry from being late, i was in professional trip to Pycon FR. > > > > I see that the subject is divising advises. > > > > Reading responses, i have impression that my proposal has been saw as > > mandatory, that i don't want of course. As previously said, i see this > "PEP" > > as an informational PEP. So it's a guideline, not a mandatory. Each > > developer will have right to ignore it, as each developer can choose to > > ignore PEP8 or PEP20. > > > > Perfect solution does not exist, i know it, but i think this "PEP" > could, > > partially, be a good guideline. > > Your question is essentially "Are python-dev prepared to offer generic > code quality assessment advice to Python developers?" > > The answer is "No, we're not". It's not our role, and it's not a role > we're the least bit interested in taking on. Just because we're the > ones making the software equivalent of hammers and saws doesn't mean > we're also the ones that should be drafting or signing off on people's > building codes :) > > Python's use cases are too broad, and what's appropriate for my ad hoc > script to download desktop wallpaper backgrounds, isn't going to be > what's appropriate for writing an Ansible module, which in turn isn't > going to be the same as what's appropriate for writing a highly > scalable web service or a complex data analysis job. > > So the question of "What does 'good enough for my purposes' actually > mean?" is something for end users to tackle for themselves, either > individually or collaboratively, without seeking specific language > designer endorsement of their chosen criteria. > > However, as mentioned earlier in the thread, it would be *entirely* > appropriate for the folks participating in PyCQA to decide to either > take on this work themselves, or else endorse somebody else taking it > on. I'd see such an effort as being similar to the way that > packaging.python.org originally started as an independent PyPA project > hosted at python-packaging-user-guide.readthedocs.io, with a fair bit > of content already being added before we later requested and received > the python.org subdomain. > > Cheers, > Nick. > > -- > Nick Coghlan | ncog... at gmail.com | Brisbane, > Australia > > > -- > Nick Coghlan | ncog... at gmail.com | Brisbane, > Australia > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 9 11:24:50 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 9 Oct 2017 08:24:50 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: > On 8 October 2017 at 08:40, Koos Zevenhoven wrote: > >> On Sun, Oct 8, 2017 at 12:16 AM, Nathaniel Smith wrote: >> >>> On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: >>> >>> >>> ?Unfortunately, we actually need a third kind of generator semantics, >>> something like this: >>> >>> @?contextvars.caller_context >>> def genfunc(): >>> assert cvar.value is the_value >>> yield >>> assert cvar.value is the_value >>> >>> with cvar.assign(the_value): >>> gen = genfunc() >>> >>> next(gen) >>> >>> with cvar.assign(1234567890): >>> try: >>> next(gen) >>> except StopIteration: >>> pass >>> >>> Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly >>> missed the reasons for this in discussions related to PEP 550. Perhaps >>> because we had mostly been looking at it from an async angle. >>> >>> >>> That's certainly a semantics that one can write down (and it's what the >>> very first version of PEP 550 did), >>> >> >> ??I do remember Yury mentioning that the first draft of PEP 550 captured >> something when the generator function was called. I think I started reading >> the discussions after that had already been removed, so I don't know >> exactly what it was. But I doubt that it was *exactly* the above, because >> PEP 550 uses set and get operations instead of "assignment contexts" like >> PEP 555 (this one) does. ?? >> > > We didn't forget it, we just don't think it's very useful. > I'm not sure I agree on the usefulness. Certainly a lot of the complexity of PEP 550 exists just to cater to Nathaniel's desire to influence what a generator sees via the context of the send()/next() call. I'm still not sure that's worth it. In 550 v1 there's no need for chained lookups. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Mon Oct 9 11:37:25 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 9 Oct 2017 18:37:25 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Mon, Oct 9, 2017 at 9:46 AM, Nick Coghlan wrote: > On 8 October 2017 at 08:40, Koos Zevenhoven wrote: > >> On Sun, Oct 8, 2017 at 12:16 AM, Nathaniel Smith wrote: >> >>> On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: >>> >>> >>> ?Unfortunately, we actually need a third kind of generator semantics, >>> something like this: >>> >>> @?contextvars.caller_context >>> def genfunc(): >>> assert cvar.value is the_value >>> yield >>> assert cvar.value is the_value >>> >>> with cvar.assign(the_value): >>> gen = genfunc() >>> >>> next(gen) >>> >>> with cvar.assign(1234567890): >>> try: >>> next(gen) >>> except StopIteration: >>> pass >>> >>> Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just narrowly >>> missed the reasons for this in discussions related to PEP 550. Perhaps >>> because we had mostly been looking at it from an async angle. >>> >>> >>> That's certainly a semantics that one can write down (and it's what the >>> very first version of PEP 550 did), >>> >> >> ??I do remember Yury mentioning that the first draft of PEP 550 captured >> something when the generator function was called. I think I started reading >> the discussions after that had already been removed, so I don't know >> exactly what it was. But I doubt that it was *exactly* the above, because >> PEP 550 uses set and get operations instead of "assignment contexts" like >> PEP 555 (this one) does. ?? >> > > We didn't forget it, we just don't think it's very useful. > Yeah, ?I'm not surprised you remember that? :). But while none of us saw a good enough reason for it at that moment, I have come to think we absolutely need it. We need both the forest and the trees. Sure, if you think of next() as being a simple function call that does something that involves state, then you might want the other semantics (with PEP 555, that situation would look like): def do_stuff_with_side_effects(): with cvar.assign(value): return next(global_gen_containing_state) Now stuff happens within next(..), and whatever happens in next(..) is expected to see the cvar assignment. However, probably much more often, one just thinks of next(..) as "getting the next value", although some computations happen under the hood that one doesn't need to care about. As we all know, in the real world, the use case is usually just to generate the Fibonacci sequence ;). And when you call fibonacci(), the whole sequence should already be determined. You just evaluate the sequence lazily by calling next() each time you want a new number. It may not even be obvious when the computations are made: fib_cache = [0, 1] def fibonacci(): for i in itertools.counter(): if i < len(fib_cache): yield fib_cache[i] else: # not calculated before new = sum(fib_cache[-2:]) fib_cache.append(new) yield new # (function above is thread-unsafe, for clarity) (Important:) So in *any* situation, where you want the outer context to affect the stuff inside the generator through next(), like in the `do_stuff_with_side_effects` example, *the author of the generator function needs to know about it*. And then it is ok to require that the author uses a decorator on the generator function. But when you just generate a pre-determined set of numbers (like fibonacci), the implementation of the generator function should not matter, but if the outer context leaks in through next(..), the internal implementation does matters, and the abstraction is leaky. I don't want to give the leaky things by default. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Mon Oct 9 16:39:49 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 9 Oct 2017 23:39:49 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Mon, Oct 9, 2017 at 6:24 PM, Guido van Rossum wrote: > On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: > >> On 8 October 2017 at 08:40, Koos Zevenhoven wrote: >> >>> On Sun, Oct 8, 2017 at 12:16 AM, Nathaniel Smith wrote: >>> >>>> On Oct 7, 2017 12:20, "Koos Zevenhoven" wrote: >>>> >>>> >>>> ?Unfortunately, we actually need a third kind of generator semantics, >>>> something like this: >>>> >>>> @contextvars.caller_context >>>> def genfunc(): >>>> assert cvar.value is the_value >>>> yield >>>> assert cvar.value is the_value >>>> >>>> with cvar.assign(the_value): >>>> gen = genfunc() >>>> >>>> next(gen) >>>> >>>> with cvar.assign(1234567890): >>>> try: >>>> next(gen) >>>> except StopIteration: >>>> pass >>>> >>>> Nick, Yury and I (and Nathaniel, Guido, Jim, ...?) somehow just >>>> narrowly missed the reasons for this in discussions related to PEP >>>> 550. Perhaps because we had mostly been looking at it from an async angle. >>>> >>>> >>>> That's certainly a semantics that one can write down (and it's what the >>>> very first version of PEP 550 did), >>>> >>> >>> ??I do remember Yury mentioning that the first draft of PEP 550 captured >>> something when the generator function was called. I think I started reading >>> the discussions after that had already been removed, so I don't know >>> exactly what it was. But I doubt that it was *exactly* the above, because >>> PEP 550 uses set and get operations instead of "assignment contexts" like >>> PEP 555 (this one) does. ?? >>> >> >> We didn't forget it, we just don't think it's very useful. >> > > I'm not sure I agree on the usefulness. Certainly a lot of the complexity > of PEP 550 exists just to cater to Nathaniel's desire to influence what a > generator sees via the context of the send()/next() call. I'm still not > sure that's worth it. In 550 v1 there's no need for chained lookups. > ?We do need some sort of chained lookups, though, at least in terms of semantics. But it is possible to optimize that away in PEP 555. Some kind of chained-lookup-like thing is inevitable if you want the state not to leak though yields out of the generator: with cvar.assign(a_value): # don't leak `a_value` to outer context yield some_stuff() ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Oct 9 18:55:52 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 9 Oct 2017 18:55:52 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Mon, Oct 9, 2017 at 4:39 PM, Koos Zevenhoven wrote: > On Mon, Oct 9, 2017 at 6:24 PM, Guido van Rossum wrote: [..] >> I'm not sure I agree on the usefulness. Certainly a lot of the complexity >> of PEP 550 exists just to cater to Nathaniel's desire to influence what a >> generator sees via the context of the send()/next() call. I'm still not sure >> that's worth it. In 550 v1 there's no need for chained lookups. > > > We do need some sort of chained lookups, though, at least in terms of > semantics. But it is possible to optimize that away in PEP 555. You keep using the "optimize away" terminology. I assume that you mean that ContextVar.get() will have a cache (so it does in PEP 550 btw). What else do you plan to "optimize away"? Where's a detailed implementation spec? What you have in the PEP is still vague and leaves many important implementation details to the imagination of the reader. The fact is that the datastructure choice in PEP 555 is plain weird. You want to use a sequence of values to represent a mapping. And then you hand-waved all questions about what will happen in pathological cases, saying that "we'll have a cache and applications won't have to many context values anyways". But your design means that in the worst case, the uncached path requires you to potentially traverse all values in the context. Another thing: suppose someone calls 'context_var.assign().__enter__()' manually, without calling '__exit__()'. You will have unbound growth of the context values stack. You'll say that it's not how the API is supposed to be used, and we say that we want to convert things like decimal and numpy to use the new mechanism. That question was also hand-waved by you: numpy and decimal will have to come up with new/better APIs to use PEP 555. Well, that's just not good enough. And the key problem is that you still haven't directly highlighted differences in semantics between PEP 550 and PEP 555. This is the most annoying part, because almost no one (including me) knows the complete answer here. Maybe you know, but you refuse to include that in the PEP for some reason. > Some kind of > chained-lookup-like thing is inevitable if you want the state not to leak > though yields out of the generator: No, it's not "inevitable". In PEP 550 v1, generators captured the context when they are created and there was always only one level of context. This means that: 1. Context changes in generators aren't visible to the outside world. 2. Changes to the context in the outside world are not visible to running generators. PEP 550 v1 was the simplest thing possible with a very efficient implementation. It had the following "issues" that led us to v2+ semantics of chained lookup: 1. Refactoring. with some_context(): for i in gen(): pass would not be equivalent to: g = gen() with some_context(): for i in g: pass 2. Restricting generators to only see context at the point of their creation feels artificial. We know there are better solutions here (albeit more complex) and we try to see if they are worth it. 3. Nathaniel't use case in Trio. Yury From k7hoven at gmail.com Mon Oct 9 20:37:51 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 03:37:51 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 1:55 AM, Yury Selivanov wrote: > On Mon, Oct 9, 2017 at 4:39 PM, Koos Zevenhoven wrote: > > On Mon, Oct 9, 2017 at 6:24 PM, Guido van Rossum > wrote: > [..] > >> I'm not sure I agree on the usefulness. Certainly a lot of the > complexity > >> of PEP 550 exists just to cater to Nathaniel's desire to influence what > a > >> generator sees via the context of the send()/next() call. I'm still not > sure > >> that's worth it. In 550 v1 there's no need for chained lookups. > > > > > > We do need some sort of chained lookups, though, at least in terms of > > semantics. But it is possible to optimize that away in PEP 555. > > You keep using the "optimize away" terminology. I assume that you > mean that ContextVar.get() will have a cache (so it does in PEP 550 > btw). What else do you plan to "optimize away"? Where's a detailed > implementation spec? What you have in the PEP is still vague and > leaves many important implementation details to the imagination of the > reader. > > ?I'm hesitant to call it a cache, because the "cache" sort of automatically builds itself. I think I'll need to draw a diagram to explain it. The implementation is somewhat simpler than its explanation. I can go more into detail regarding the implementation, but I feel that semantics is more important at this point. > The fact is that the datastructure choice in PEP 555 is plain weird. > You want to use a sequence of values to represent a mapping. And then > you hand-waved all questions about what will happen in pathological > cases, saying that "we'll have a cache and applications won't have to > many context values anyways". ?I don't think I've heard of any pathological cases... What do you mean?? But your design means that in the worst > case, the uncached path requires you to potentially traverse all > values in the context. ?It is in fact possible to implement it in a way that this never happens, but the best thing might actually be an implementation, where this *almost* never happens. (Off-topic: It would be kind of cool, if you could do the same thing with MROs, so OOP method access will speed up?. But that might be somewhat more difficult to implement, because there are more moving parts there.) > Another thing: suppose someone calls > 'context_var.assign().__enter__()' manually, without calling > '__exit__()'. You will have unbound growth of the context values > stack. ?You can cause unbound growth in PEP 550 too. All you have to do is nest an unbounded number of generators. In PEP 555, nesting generators doesn't do anything really, unless you actually assign to context arguments in the generators.? Only those who use it will pay. But seriously, you will always end up in a weird situation if you call an unbounded number of contextmanager.__enter__() methods without calling __exit__(). Nothing new about that. But entering a handful of assignment contexts and leaving them open until a script ends is not the end of the world. I don't think anyone should do that though. > You'll say that it's not how the API is supposed to be used, > and we say that we want to convert things like decimal and numpy to > use the new mechanism. That question was also hand-waved by you: > numpy and decimal will have to come up with new/better APIs to use PEP > 555. Well, that's just not good enough. > ?What part of my explanation of this are you unhappy with? For instance, the 12th (I think) email in this thread, which is my response to Nathaniel. Could you reply to that and tell us your concern?? > > And the key problem is that you still haven't directly highlighted > differences in semantics between PEP 550 and PEP 555. This is the > most annoying part, because almost no one (including me) knows the > complete answer here. Maybe you know, but you refuse to include that > in the PEP for some reason. > I don't refuse to ?. I just haven't prioritized it. But I've probably made the mistake of mentioning *similarities* between 550 and 555? ?. ? ?One major difference is that there is no .set(value) in PEP 555, so one shouldn't try to map PEP 550 uses directly to PEP 555.? > Some kind of > > chained-lookup-like thing is inevitable if you want the state not to leak > > though yields out of the generator: > > No, it's not "inevitable". In PEP 550 v1, generators captured the > context when they are created and there was always only one level of > context. This means that: > > 1. Context changes in generators aren't visible to the outside world. > 2. Changes to the context in the outside world are not visible to > running generators. > ?Sure, if you make generators completely isolated from the outside world, then you can avoid chaining-like things too. But that would just sweep it under the carpet. > PEP 550 v1 was the simplest thing possible with a very efficient > implementation. It had the following "issues" that led us to v2+ > semantics of chained lookup: > > 1. Refactoring. > > with some_context(): > for i in gen(): > pass > > would not be equivalent to: > > g = gen() > with some_context(): > for i in g: > pass > ? ? > > ?What's the point of this? Moving stuff out of a with statement should not matter? The whole point of with statements is that it matters whether you do something inside it or outside it.? ?-- Koos? > 2. Restricting generators to only see context at the point of their > creation feels artificial. We know there are better solutions here > (albeit more complex) and we try to see if they are worth it. > > 3. Nathaniel't use case in Trio. > ? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Mon Oct 9 21:22:38 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Mon, 9 Oct 2017 21:22:38 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven wrote: [..] >> Another thing: suppose someone calls >> 'context_var.assign().__enter__()' manually, without calling >> '__exit__()'. You will have unbound growth of the context values >> stack. > > > You can cause unbound growth in PEP 550 too. All you have to do is nest an > unbounded number of generators. You can only nest up to 'sys.get_recursion_limit()' number of generators. With PEP 555 you can do: while True: context_var.assign(42).__enter__() > In PEP 555, nesting generators doesn't do > anything really, unless you actually assign to context arguments in the > generators. Only those who use it will pay. Same for 550. If a generator doesn't set context variables, its LC will be an empty mapping (or NULL if you want to micro-optimize things). Nodes for the chain will come from a freelist. The effective overhead for generators is a couple operations on pointers, and thus visible only in microbenchmarks. > > But seriously, you will always end up in a weird situation if you call an > unbounded number of contextmanager.__enter__() methods without calling > __exit__(). Nothing new about that. But entering a handful of assignment > contexts and leaving them open until a script ends is not the end of the > world. I don't think anyone should do that though. > > >> >> You'll say that it's not how the API is supposed to be used, >> and we say that we want to convert things like decimal and numpy to >> use the new mechanism. That question was also hand-waved by you: >> numpy and decimal will have to come up with new/better APIs to use PEP >> 555. Well, that's just not good enough. > > > What part of my explanation of this are you unhappy with? For instance, the > 12th (I think) email in this thread, which is my response to Nathaniel. > Could you reply to that and tell us your concern? I'm sorry, I'm not going to find some 12th email in some thread. I stated in this thread the following: not being able to use PEP 555 to fix *existing* decimal & numpy APIs is not good enough. And decimal & numpy is only one example, there's tons of code out there that can benefit from its APIs to be fixed to support for async code in Python 3.7. > >> >> >> And the key problem is that you still haven't directly highlighted >> differences in semantics between PEP 550 and PEP 555. This is the >> most annoying part, because almost no one (including me) knows the >> complete answer here. Maybe you know, but you refuse to include that >> in the PEP for some reason. > > > I don't refuse to > . I just haven't prioritized it. But I've probably made the mistake of > mentioning *similarities* between 550 and 555 > . > One major difference is that there is no .set(value) in PEP 555, so one > shouldn't try to map PEP 550 uses directly to PEP 555. This is not a "major difference". You might feel that it is, but it is a simple API design choice. As I illustrated in a few emails before, as long as users can call 'context_var.assign(..).__enter__()' manually, your PEP *does* allow to effectively do ".set(value)". If using a context manager instead of 'set' method is the only difference you can highlight, then why bother writing a PEP? The idea of using context managers to set values is very straightforward and is easy to be incorporated to PEP 550. In fact, it will be added to the next version of the PEP (after discussing it with Guido on the lang summit). > >> > Some kind of >> > chained-lookup-like thing is inevitable if you want the state not to >> > leak >> > though yields out of the generator: >> >> No, it's not "inevitable". In PEP 550 v1, generators captured the >> context when they are created and there was always only one level of >> context. This means that: >> >> 1. Context changes in generators aren't visible to the outside world. >> 2. Changes to the context in the outside world are not visible to >> running generators. > > > Sure, if you make generators completely isolated from the outside world, > then you can avoid chaining-like things too. But that would just sweep it > under the carpet. What do you mean by "just sweep it under the carpet"? Capturing the context at the moment of generators creation is a design choice with some consequences (that I illustrated in my previous email). There are cons and pros of doing that. Yury From levkivskyi at gmail.com Tue Oct 10 07:34:49 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Tue, 10 Oct 2017 13:34:49 +0200 Subject: [Python-ideas] PEP 561: Distributing Type Information V3 In-Reply-To: References: Message-ID: Thanks Ethan! The PEP draft now looks good to me. I think it makes sense to make a PoC implementation of the PEP at this point to see if everything works smoothly in practice. (You could also link few examples with your PoC implementation in the PEP) -- Ivan On 6 October 2017 at 22:00, Ethan Smith wrote: > Hello, > > I have made some changes to my PEP on distributing type information. A > summary of the changes: > > - Move to adding a new metadata specifier so that more packaging tools > can participate > - Clarify version matching between third party stubs and runtime > packages. > - various other fixes for clarity, readability, and removal of > repetition > > As usual I have replicated a copy below. > > Cheers, > Ethan > > > PEP: 561 > Title: Distributing and Packaging Type Information > Author: Ethan Smith > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 09-Sep-2017 > Python-Version: 3.7 > Post-History: > > > Abstract > ======== > > PEP 484 introduced type hinting to Python, with goals of making typing > gradual and easy to adopt. Currently, typing information must be distributed > manually. This PEP provides a standardized means to package and distribute > type information and an ordering for type checkers to resolve modules and > collect this information for type checking using existing packaging > architecture. > > > Rationale > ========= > > Currently, package authors wish to distribute code that has > inline type information. However, there is no standard method to distribute > packages with inline type annotations or syntax that can simultaneously > be used at runtime and in type checking. Additionally, if one wished to > ship typing information privately the only method would be via setting > ``MYPYPATH`` or the equivalent to manually point to stubs. If the package > can be released publicly, it can be added to typeshed [1]_. However, this > does not scale and becomes a burden on the maintainers of typeshed. > Additionally, it ties bugfixes to releases of the tool using typeshed. > > PEP 484 has a brief section on distributing typing information. In this > section [2]_ the PEP recommends using ``shared/typehints/pythonX.Y/`` for > shipping stub files. However, manually adding a path to stub files for each > third party library does not scale. The simplest approach people have taken > is to add ``site-packages`` to their ``MYPYPATH``, but this causes type > checkers to fail on packages that are highly dynamic (e.g. sqlalchemy > and Django). > > > Specification > ============= > > There are several motivations and methods of supporting typing in a package. > This PEP recognizes three (3) types of packages that may be created: > > 1. The package maintainer would like to add type information inline. > > 2. The package maintainer would like to add type information via stubs. > > 3. A third party would like to share stub files for a package, but the > maintainer does not want to include them in the source of the package. > > This PEP aims to support these scenarios and make them simple to add to > packaging and deployment. > > The two major parts of this specification are the packaging specifications > and the resolution order for resolving module type information. The packaging > spec is based on and extends PEP 345 metadata. The type checking spec is > meant to replace the ``shared/typehints/pythonX.Y/`` spec of PEP 484 [2]_. > > New third party stub libraries are encouraged to distribute stubs via the > third party packaging proposed in this PEP in place of being added to > typeshed. Typeshed will remain in use, but if maintainers are found, third > party stubs in typeshed are encouraged to be split into their own package. > > Packaging Type Information > -------------------------- > In order to make packaging and distributing type information as simple and > easy as possible, the distribution of type information, and typed Python code > is done through existing packaging frameworks. This PEP adds a new item to the > ``*.distinfo/METADATA`` file to contain metadata about a package's support for > typing. The new item is optional, but must have a name of ``Typed`` and have a > value of either ``inline`` or ``stubs``, if present. > > Metadata Examples:: > > Typed: inline > Typed: stubs > > > Stub Only Packages > '''''''''''''''''' > > For package maintainers wishing to ship stub files containing all of their > type information, it is prefered that the ``*.pyi`` stubs are alongside the > corresponding ``*.py`` files. However, the stubs may be put in a sub-folder > of the Python sources, with the same name the ``*.py`` files are in. For > example, the ``flyingcircus`` package would have its stubs in the folder > ``flyingcircus/flyingcircus/``. This path is chosen so that if stubs are > not found in ``flyingcircus/`` the type checker may treat the subdirectory as > a normal package. The normal resolution order of checking ``*.pyi`` before > ``*.py`` will be maintained. > > Third Party Stub Packages > ''''''''''''''''''''''''' > > Third parties seeking to distribute stub files are encouraged to contact the > maintainer of the package about distribution alongside the package. If the > maintainer does not wish to maintain or package stub files or type information > inline, then a "third party stub package" should be created. The structure is > similar, but slightly different from that of stub only packages. If the stubs > are for the library ``flyingcircus`` then the package should be named > ``flyingcircus-stubs`` and the stub files should be put in a sub-directory > named ``flyingcircus``. This allows the stubs to be checked as if they were in > a regular package. > > In addition, the third party stub package should indicate which version(s) > of the runtime package are supported by indicating the runtime package's > version(s) through the normal dependency data. For example, if there was a > stub package ``flyingcircus-stubs``, it can indicate the versions of the > runtime ``flyingcircus`` package supported through ``install_requires`` > in distutils based tools, or the equivalent in other packaging tools. > > Type Checker Module Resolution Order > ------------------------------------ > > The following is the order that type checkers supporting this PEP should > resolve modules containing type information: > > 1. User code - the files the type checker is running on. > > 2. Stubs or Python source manually put in the beginning of the path. Type > checkers should provide this to allow the user complete control of which > stubs to use, and patch broken stubs/inline types from packages. > > 3. Third party stub packages - these packages can supersede the installed > untyped packages. They can be found at ``pkg-stubs`` for package ``pkg``, > however it is encouraged to check the package's metadata using packaging > query APIs such as ``pkg_resources`` to assure that the package is meant > for type checking, and is compatible with the installed version. > > 4. Inline packages - finally, if there is nothing overriding the installed > package, and it opts into type checking. > > 5. Typeshed (if used) - Provides the stdlib types and several third party > libraries > > Type checkers that check a different Python version than the version they run > on must find the type information in the ``site-packages``/``dist-packages`` > of that Python version. This can be queried e.g. > ``pythonX.Y -c 'import site; print(site.getsitepackages())'``. It is also recommended > that the type checker allow for the user to point to a particular Python > binary, in case it is not in the path. > > To check if a package has opted into type checking, type checkers are > recommended to use the ``pkg_resources`` module to query the package > metadata. If the ``typed`` package metadata has ``None`` as its value, the > package has not opted into type checking, and the type checker should skip > that package. > > > References > ========== > .. [1] Typeshed (https://github.com/python/typeshed) > > .. [2] PEP 484, Storing and Distributing Stub Files > (https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files) > > Copyright > ========= > > This document has been placed in the public domain. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 10 08:34:14 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Oct 2017 22:34:14 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 10 October 2017 at 01:24, Guido van Rossum wrote: > On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: > >> On 8 October 2017 at 08:40, Koos Zevenhoven wrote: >> >>> ??I do remember Yury mentioning that the first draft of PEP 550 captured >>> something when the generator function was called. I think I started reading >>> the discussions after that had already been removed, so I don't know >>> exactly what it was. But I doubt that it was *exactly* the above, because >>> PEP 550 uses set and get operations instead of "assignment contexts" like >>> PEP 555 (this one) does. ?? >>> >> >> We didn't forget it, we just don't think it's very useful. >> > > I'm not sure I agree on the usefulness. Certainly a lot of the complexity > of PEP 550 exists just to cater to Nathaniel's desire to influence what a > generator sees via the context of the send()/next() call. I'm still not > sure that's worth it. In 550 v1 there's no need for chained lookups. > The compatibility concern is that we want developers of existing libraries to be able to transparently switch from using thread local storage to context local storage, and the way thread locals interact with generators means that decimal (et al) currently use the thread local state at the time when next() is called, *not* when the generator is created. I like Yury's example for this, which is that the following two examples are currently semantically equivalent, and we want to preserve that equivalence: with decimal.localcontext() as ctx: ctc.prex = 30 for i in gen(): pass g = gen() with decimal.localcontext() as ctx: ctc.prex = 30 for i in g: pass The easiest way to maintain that equivalence is to say that even though preventing state changes leaking *out* of generators is considered a desirable change, we see preventing them leaking *in* as a gratuitous backwards compatibility break. This does mean that *neither* form is semantically equivalent to eager extraction of the generator values before the decimal context is changed, but that's the status quo, and we don't have a compelling justification for changing it. If folks subsequently decide that they *do* want "capture on creation" or "capture on first iteration" semantics for their generators, those are easy enough to add as wrappers on top of the initial thread-local-compatible base by using the same building blocks as are being added to help event loops manage context snapshots for coroutine execution. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Oct 10 08:34:09 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 15:34:09 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 4:22 AM, Yury Selivanov wrote: > On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven wrote: > > You can cause unbound growth in PEP 550 too. All you have to do is nest > an > > unbounded number of generators. > > You can only nest up to 'sys.get_recursion_limit()' number of generators. > > With PEP 555 you can do: > > while True: > context_var.assign(42).__enter__() > > ?Well, in PEP 550, you can explicitly stack an unbounded number of LogicalContexts in a while True loop. Or you can run out of memory using plain lists even faster: l = [42] while True: l *= 2 # ensure exponential blow-up I don't see why your example with context_var.assign(42).__enter__() would be any more likely.? ??Sure, we could limit the number of allowed nested contexts in PEP 555. I don't really care. Just don't enter an unbounded number of context managers without exiting them. ?Really, it was my mistake to ever make you think that context_var.assign(42).__enter__() can be compared to .set(42) in PEP 550. I'll say it once more: PEP 555 context arguments have no equivalent of the PEP-550 .set(..). > > In PEP 555, nesting generators doesn't do > > anything really, unless you actually assign to context arguments in the > > generators. Only those who use it will pay. > > Same for 550. If a generator doesn't set context variables, its LC > will be an empty mapping (or NULL if you want to micro-optimize > things). Nodes for the chain will come from a freelist. The effective > overhead for generators is a couple operations on pointers, and thus > visible only in microbenchmarks. > ?Sure, you can implement push and pop and maintain a freelist by just doing operations on pointers. But that would be a handful of operations. Maybe you'd even manage to avoid INCREFs and DECREFs by not exposing things as Python objects. But I guarantee you, PEP 555 is simpler in this regard. In (pseudo?) C, the per-generator and per-send overhead would come from something like: /* On generator creation */ stack = PyThreadState_Get()->carg_stack; Py_INCREF(stack); self->carg_stack = stack; ---------- /* On each next / send */ stack_ptr = &PyThreadState_Get()->carg_stack; if (*stack_ptr == self->carg_stack) { /* no assignments made => do nothing */ } /* ... then after next yield */ if (*stack_ptr == self->carg_stack) { /* once more, do nothing */ } And there will of course be a PyDECREF after the generator has finished or when it is deallocated. If the generators *do* use context argument assignments, then some stuff would happen in the else clauses of the if statements above. (Or actually, using != instead of ==). > But seriously, you will always end up in a weird situation if you call an > > unbounded number of contextmanager.__enter__() methods without calling > > __exit__(). Nothing new about that. But entering a handful of assignment > > contexts and leaving them open until a script ends is not the end of the > > world. I don't think anyone should do that though. > > > > > >> > >> You'll say that it's not how the API is supposed to be used, > >> and we say that we want to convert things like decimal and numpy to > >> use the new mechanism. That question was also hand-waved by you: > >> numpy and decimal will have to come up with new/better APIs to use PEP > >> 555. Well, that's just not good enough. > > > > > > What part of my explanation of this are you unhappy with? For instance, > the > > 12th (I think) email in this thread, which is my response to Nathaniel. > > Could you reply to that and tell us your concern? > > I'm sorry, I'm not going to find some 12th email in some thread. I > stated in this thread the following: not being able to use PEP 555 to > fix *existing* decimal & numpy APIs is not good enough. And decimal & > numpy is only one example, there's tons of code out there that can > benefit from its APIs to be fixed to support for async code in Python > 3.7. > > Well, anyone interested can read that 12th email in this thread. In short, my recommendation for libraries would be as follows: * If the library does not provide a context manager yet, ?they should add one, using PEP 555. That will then work nicely in coroutines and generators. * If the library does have a context manager, implement it using PEP 555. Or to be safe, add a new API function, so behavior in existing async code won't change. * If the library needs to support some kind of set_state(..) operation, implement it by getting the state using a PEP 555 context argument and mutating its contents. ?* Fall back to thread-local storage if no ?context argument is present or if the Python version does not support context arguments. ?[...] > >> > Some kind of > >> > chained-lookup-like thing is inevitable if you want the state not to > >> > leak > >> > though yields out of the generator: > >> > >> No, it's not "inevitable". In PEP 550 v1, generators captured the > >> context when they are created and there was always only one level of > >> context. This means that: > >> > >> 1. Context changes in generators aren't visible to the outside world. > >> 2. Changes to the context in the outside world are not visible to > >> running generators. > > > > > > Sure, if you make generators completely isolated from the outside world, > > then you can avoid chaining-like things too. But that would just sweep it > > under the carpet. > > What do you mean by "just sweep it under the carpet"? Capturing the > context at the moment of generators creation is a design choice with > some consequences (that I illustrated in my previous email). There > are cons and pros of doing that. > > "Capturing the context at generator creation" and "isolating generators completely" are two different things. I've described pros of the former. The latter has no pros that I'm aware of, except if sweeping things under the carpet is considered as one. Yes, the latter works in some use cases, but in others it does not. For instance, if an async framework wants to make some information available throughout the async task. If you isolate generators, then async programmers will have to avoid generators, because they don't have access to the information the framework is trying to provide. Also, if you refactor your generator into subgenerators using `yield from`, the subgenerators will not see the context set by the outer generator. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 10 08:42:21 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 10 Oct 2017 22:42:21 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 10 October 2017 at 22:34, Koos Zevenhoven wrote: > Really, it was my mistake to ever make you think that > context_var.assign(42).__enter__() can be compared to .set(42) in PEP > 550. I'll say it once more: PEP 555 context arguments have no equivalent of > the PEP-550 .set(..). > Then your alternate PEP can't work, since it won't be useful to extension modules. Context managers are merely syntactic sugar for try/finally statements, so you can't wave your hands and say a context manager is the only supported API: you *have* to break the semantics down and explain what the try/finally equivalent looks like. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Oct 10 08:51:52 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 15:51:52 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 3:34 PM, Nick Coghlan wrote: > On 10 October 2017 at 01:24, Guido van Rossum wrote: > >> On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: >> >>> On 8 October 2017 at 08:40, Koos Zevenhoven wrote: >>> >>>> ??I do remember Yury mentioning that the first draft of PEP 550 >>>> captured something when the generator function was called. I think I >>>> started reading the discussions after that had already been removed, so I >>>> don't know exactly what it was. But I doubt that it was *exactly* the >>>> above, because PEP 550 uses set and get operations instead of "assignment >>>> contexts" like PEP 555 (this one) does. ?? >>>> >>> >>> We didn't forget it, we just don't think it's very useful. >>> >> >> I'm not sure I agree on the usefulness. Certainly a lot of the complexity >> of PEP 550 exists just to cater to Nathaniel's desire to influence what a >> generator sees via the context of the send()/next() call. I'm still not >> sure that's worth it. In 550 v1 there's no need for chained lookups. >> > > The compatibility concern is that we want developers of existing libraries > to be able to transparently switch from using thread local storage to > context local storage, and the way thread locals interact with generators > means that decimal (et al) currently use the thread local state at the time > when next() is called, *not* when the generator is created. > ?If you want to keep those semantics in decimal, then you're already done.? > I like Yury's example for this, which is that the following two examples > are currently semantically equivalent, and we want to preserve that > equivalence: > > with decimal.localcontext() as ctx: > ctc.prex = 30 > for i in gen(): > pass > > g = gen() > with decimal.localcontext() as ctx: > ctc.prex = 30 > for i in g: > pass > > > ?Generator functions aren't usually called `gen`. Change that to?: with decimal.localcontext() as ctx: ctc.prex = 30 for val in values(): do_stuff_with(val) ?# and? ?vals = values()? ?with decimal.localcontext() as ctx: ctc.prex = 30 for val in vals: do_stuff_with(val) ? ?I see no reason why these two should be equivalent. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 10 10:01:18 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 Oct 2017 00:01:18 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 10 October 2017 at 22:51, Koos Zevenhoven wrote: > ?I see no reason why these two should be equivalent. > There is no "should" about it: it's a brute fact that the two forms *are* currently equivalent for lazy iterators (including generators), and both different from the form that uses eager evaluation of the values before the context change. Where should enters into the picture is by way of PEP 550 saying that they should *remain* equivalent because we don't have an adequately compelling justification for changing the runtime semantics. That is, given the following code: itr = make_iter() with decimal.localcontext() as ctx: ctc.prex = 30 for i in itr: pass Right now, today, in 3.6. the calculations in the iterator will use the modified decimal context, *not* the context that applied when the iterator was created. If you want to ensure that isn't the case, you have to force eager evaluation before the context change. What PEP 550 is proposing is that, by default, *nothing changes*: the lazy iteration in the above will continue to use the updated decimal context by default. However, people *will* gain a new option for avoiding that: instead of forcing eager evaluation, they'll be able to capture the creation context instead, and switching back to that each time the iterator needs to calculate a new value. If PEP 555 proposes that we should instead make lazy iteration match eager evaluation semantics by *default*, then that's going to be a much harder case to make because it's a gratuitous compatibility break - code that currently works one way will suddenly start doing something different, and end users will have difficulty getting it to behave the same way on 3.7 as it does on earlier versions. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Oct 10 10:22:56 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 17:22:56 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 5:01 PM, Nick Coghlan wrote: > On 10 October 2017 at 22:51, Koos Zevenhoven wrote: > >> ?I see no reason why these two should be equivalent. >> > > There is no "should" about it: it's a brute fact that the two forms *are* > currently equivalent for lazy iterators (including generators), and both > different from the form that uses eager evaluation of the values before the > context change. > > Where should enters into the picture is by way of PEP 550 saying that they > should *remain* equivalent because we don't have an adequately compelling > justification for changing the runtime semantics. > > That is, given the following code: > > itr = make_iter() > with decimal.localcontext() as ctx: > ctc.prex = 30 > for i in itr: > pass > > Right now, today, in 3.6. the calculations in the iterator will use the > modified decimal context, *not* the context that applied when the iterator > was created. If you want to ensure that isn't the case, you have to force > eager evaluation before the context change. > > What PEP 550 is proposing is that, by default, *nothing changes*: the lazy > iteration in the above will continue to use the updated decimal context by > default. > ?That's just an arbitrary example. There are many things that *would* change if decimal contexts simply switched from using thread-local storage to using PEP 550. It's not at all obvious which of the changes would be most likely to cause problems. If I were to choose, I would probably introduce a new context manager which works with PEP 555 semantics, because that's the only way to ensure full backwards compatibility, regardless of whether PEP 555 or PEP 550 is used. But I'm sure one could decide otherwise. ???Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Oct 10 10:40:13 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 10 Oct 2017 10:40:13 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 8:34 AM, Koos Zevenhoven wrote: > On Tue, Oct 10, 2017 at 4:22 AM, Yury Selivanov > wrote: >> >> On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven wrote: >> > You can cause unbound growth in PEP 550 too. All you have to do is nest >> > an >> > unbounded number of generators. >> >> You can only nest up to 'sys.get_recursion_limit()' number of generators. >> >> With PEP 555 you can do: >> >> while True: >> context_var.assign(42).__enter__() >> > > Well, in PEP 550, you can explicitly stack an unbounded number of > LogicalContexts in a while True loop. No, you can't. PEP 550 doesn't have APIs to "stack ... LogicalContexts". > Or you can run out of memory using > plain lists even faster: > > l = [42] > > while True: > l *= 2 # ensure exponential blow-up > > I don't see why your example with context_var.assign(42).__enter__() would > be any more likely. Of course you can write broken code. The point is that contexts work like scopes/mappings, and it's counter-intuitive that setting a variable with 'cv.assign(..).__enter__()' will break the world. If a naive user tries to convert their existing decimal-like API to use your PEP, everything would work initially, but then blow up in production. [..] > Really, it was my mistake to ever make you think that > context_var.assign(42).__enter__() can be compared to .set(42) in PEP 550. > I'll say it once more: PEP 555 context arguments have no equivalent of the > PEP-550 .set(..). Any API exposing a context manager should have an alternative try..finally API. In your case it's 'context_var.assign(42).__enter__()'. 'With' statements are sugar in Python. It's unprecedented to design API solely around them. > >> >> > In PEP 555, nesting generators doesn't do >> > anything really, unless you actually assign to context arguments in the >> > generators. Only those who use it will pay. >> >> Same for 550. If a generator doesn't set context variables, its LC >> will be an empty mapping (or NULL if you want to micro-optimize >> things). Nodes for the chain will come from a freelist. The effective >> overhead for generators is a couple operations on pointers, and thus >> visible only in microbenchmarks. > > > Sure, you can implement push and pop and maintain a freelist by just doing > operations on pointers. But that would be a handful of operations. Maybe > you'd even manage to avoid INCREFs and DECREFs by not exposing things as > Python objects. > > But I guarantee you, PEP 555 is simpler in this regard. [..] I wrote several implementations of PEP 550 so far. No matter what you put in genobject.send(): one pointer op or two, the results are the same: in microbenchmarks generators become 1-2% slower. In macrobenchmarks of generators you can't observe any slowdown. And if we want the fastest possible context implementation, we can chose PEP 550 v1, which is the simplest solution. In any case, the performance argument is invalid, please stop using it. >> > But seriously, you will always end up in a weird situation if you call >> > an >> > unbounded number of contextmanager.__enter__() methods without calling >> > __exit__(). Nothing new about that. But entering a handful of assignment >> > contexts and leaving them open until a script ends is not the end of the >> > world. I don't think anyone should do that though. >> > >> > >> >> >> >> You'll say that it's not how the API is supposed to be used, >> >> and we say that we want to convert things like decimal and numpy to >> >> use the new mechanism. That question was also hand-waved by you: >> >> numpy and decimal will have to come up with new/better APIs to use PEP >> >> 555. Well, that's just not good enough. >> > >> > >> > What part of my explanation of this are you unhappy with? For instance, >> > the >> > 12th (I think) email in this thread, which is my response to Nathaniel. >> > Could you reply to that and tell us your concern? >> >> I'm sorry, I'm not going to find some 12th email in some thread. I >> stated in this thread the following: not being able to use PEP 555 to >> fix *existing* decimal & numpy APIs is not good enough. And decimal & >> numpy is only one example, there's tons of code out there that can >> benefit from its APIs to be fixed to support for async code in Python >> 3.7. >> > > Well, anyone interested can read that 12th email in this thread. In short, > my recommendation for libraries would be as follows: > > * If the library does not provide a context manager yet, they should add > one, using PEP 555. That will then work nicely in coroutines and generators. > > * If the library does have a context manager, implement it using PEP 555. Or > to be safe, add a new API function, so behavior in existing async code won't > change. > > * If the library needs to support some kind of set_state(..) operation, > implement it by getting the state using a PEP 555 context argument and > mutating its contents. > > * Fall back to thread-local storage if no context argument is present or if > the Python version does not support context arguments. The last bullet point is the problem. Everybody is saying to you that it's not acceptable. It's your choice to ignore that. [..] >> What do you mean by "just sweep it under the carpet"? Capturing the >> context at the moment of generators creation is a design choice with >> some consequences (that I illustrated in my previous email). There >> are cons and pros of doing that. >> > > "Capturing the context at generator creation" and "isolating generators > completely" are two different things. > > I've described pros of the former. The latter has no pros that I'm aware of, > except if sweeping things under the carpet is considered as one. > > Yes, the latter works in some use cases, but in others it does not. For > instance, if an async framework wants to make some information available > throughout the async task. If you isolate generators, then async programmers > will have to avoid generators, because they don't have access to the > information the framework is trying to provide. This is plain incorrect. Please read PEP 550v1 before continuing the discussion about it. > Also, if you refactor your > generator into subgenerators using `yield from`, the subgenerators will not > see the context set by the outer generator. Subgenerators see the context changes in the outer generator in all versions of PEP 550. The point you didn't like is that in all versions of PEP 550 subgenerators could not leak any context to the outer generator. Please don't confuse these two. Yury From yselivanov.ml at gmail.com Tue Oct 10 10:46:39 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 10 Oct 2017 10:46:39 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 10:22 AM, Koos Zevenhoven wrote: > On Tue, Oct 10, 2017 at 5:01 PM, Nick Coghlan wrote: >> >> On 10 October 2017 at 22:51, Koos Zevenhoven wrote: >>> >>> I see no reason why these two should be equivalent. >> >> >> There is no "should" about it: it's a brute fact that the two forms *are* >> currently equivalent for lazy iterators (including generators), and both >> different from the form that uses eager evaluation of the values before the >> context change. >> >> Where should enters into the picture is by way of PEP 550 saying that they >> should *remain* equivalent because we don't have an adequately compelling >> justification for changing the runtime semantics. >> >> That is, given the following code: >> >> itr = make_iter() >> with decimal.localcontext() as ctx: >> ctc.prex = 30 >> for i in itr: >> pass >> >> Right now, today, in 3.6. the calculations in the iterator will use the >> modified decimal context, *not* the context that applied when the iterator >> was created. If you want to ensure that isn't the case, you have to force >> eager evaluation before the context change. >> >> What PEP 550 is proposing is that, by default, *nothing changes*: the lazy >> iteration in the above will continue to use the updated decimal context by >> default. > > > That's just an arbitrary example. There are many things that *would* change > if decimal contexts simply switched from using thread-local storage to using > PEP 550. It's not at all obvious which of the changes would be most likely > to cause problems. If I were to choose, I would probably introduce a new > context manager which works with PEP 555 semantics, because that's the only > way to ensure full backwards compatibility, regardless of whether PEP 555 or > PEP 550 is used. But I'm sure one could decide otherwise. Please stop using "many things .. would", "most likely" etc. We have a very focused discussion here. If you know of any particular issue, please demonstrate it with a realistic example. Otherwise, we only increase the number of emails and make things harder to track for everybody. If decimal switches to use PEP 550, there will be no "many things that *would* change". The only thing that will change is this: def g(): with decimal_context(...): yield next(g()) # this will no longer leak decimal context to the outer world I consider the above a bug fix, because nobody in their right mind relies on partial iteration of generator expecting that some of it's internal code would affect your code indirectly. The only such case is contextlib.contextmanager, and PEP 550 provides mechanisms to make generators "leaky" explicitly. Yury From k7hoven at gmail.com Tue Oct 10 11:26:00 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 18:26:00 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 5:40 PM, Yury Selivanov wrote: > On Tue, Oct 10, 2017 at 8:34 AM, Koos Zevenhoven > wrote: > > On Tue, Oct 10, 2017 at 4:22 AM, Yury Selivanov > > > wrote: > >> > >> On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven > wrote: > >> > You can cause unbound growth in PEP 550 too. All you have to do is > nest > >> > an > >> > unbounded number of generators. > >> > >> You can only nest up to 'sys.get_recursion_limit()' number of > generators. > >> > >> With PEP 555 you can do: > >> > >> while True: > >> context_var.assign(42).__enter__() > >> > > > > Well, in PEP 550, you can explicitly stack an unbounded number of > > LogicalContexts in a while True loop. > > No, you can't. PEP 550 doesn't have APIs to "stack ... LogicalContexts". > > ?? ?That's ridiculous. Quoting PEP 550: "? The contextvars.run_with_logical_context(lc: LogicalContext, func, *args, **kwargs) function, which runs func with the provided logical context on top of the current execution context. ?"? > > Or you can run out of memory using > > plain lists even faster: > > > > l = [42] > > > > while True: > > l *= 2 # ensure exponential blow-up > > > > I don't see why your example with context_var.assign(42).__enter__() > would > > be any more likely. > > Of course you can write broken code. The point is that contexts work > like scopes/mappings, and it's counter-intuitive that setting a > variable with 'cv.assign(..).__enter__()' will break the world. If a > naive user tries to convert their existing decimal-like API to use > your PEP, everything would work initially, but then blow up in > production. > > ?The docs will tell them what to do. You can pass a context argument down the call chain. You don't "set" context arguments!? That's why I'm changing to "context argument", and I've said this many times now. > [..] > > Really, it was my mistake to ever make you think that > > context_var.assign(42).__enter__() can be compared to .set(42) in PEP > 550. > > I'll say it once more: PEP 555 context arguments have no equivalent of > the > > PEP-550 .set(..). > > Any API exposing a context manager should have an alternative > try..finally API. In your case it's > 'context_var.assign(42).__enter__()'. 'With' statements are sugar in > Python. It's unprecedented to design API solely around them. > > ?[..] > ?Yury writes:? >> >> That question was also hand-waved by you: > >> >> numpy and decimal will have to come up with new/better APIs to use > PEP > >> >> 555. Well, that's just not good enough. > >> > > >> > > ?Koos writes:? >> > What part of my explanation of this are you unhappy with? For instance, > >> > the > >> > 12th (I think) email in this thread, which is my response to > Nathaniel. > >> > Could you reply to that and tell us your concern? > >> > >> I'm sorry, I'm not going to find some 12th email in some thread. I > >> stated in this thread the following: not being able to use PEP 555 to > >> fix *existing* decimal & numpy APIs is not good enough. And decimal & > >> numpy is only one example, there's tons of code out there that can > >> benefit from its APIs to be fixed to support for async code in Python > >> 3.7. > >> > > > > Well, anyone interested can read that 12th email in this thread. In > short, > > my recommendation for libraries would be as follows: > > > > * If the library does not provide a context manager yet, they should add > > one, using PEP 555. That will then work nicely in coroutines and > generators. > > > > * If the library does have a context manager, implement it using PEP > 555. Or > > to be safe, add a new API function, so behavior in existing async code > won't > > change. > > > > * If the library needs to support some kind of set_state(..) operation, > > implement it by getting the state using a PEP 555 context argument and > > mutating its contents. > > > > * Fall back to thread-local storage if no context argument is present or > if > > the Python version does not support context arguments. > > The last bullet point is the problem. Everybody is saying to you that > it's not acceptable. It's your choice to ignore that. > > ?Never has anyone told me that that is not acceptable. Please stop that. [..] > >> What do you mean by "just sweep it under the carpet"? Capturing the > >> context at the moment of generators creation is a design choice with > >> some consequences (that I illustrated in my previous email). There > >> are cons and pros of doing that. > >> > > > > "Capturing the context at generator creation" and "isolating generators > > completely" are two different things. > > > > I've described pros of the former. The latter has no pros that I'm aware > of, > > except if sweeping things under the carpet is considered as one. > > > > Yes, the latter works in some use cases, but in others it does not. For > > instance, if an async framework wants to make some information available > > throughout the async task. If you isolate generators, then async > programmers > > will have to avoid generators, because they don't have access to the > > information the framework is trying to provide. > > This is plain incorrect. Please read PEP 550v1 before continuing the > discussion about it. > > I thought you wrote that they are isolated both ways. Maybe there's a misunderstanding. I found your "New PEP 550" email in the archives in some thread. That might be v1, but the figure supposedly explaining this part is missing. Whatever. This is not about PEP 550v1 anyway. > > Also, if you refactor your > > generator into subgenerators using `yield from`, the subgenerators will > not > > see the context set by the outer generator. > > Subgenerators see the context changes in the outer generator in all > versions of PEP 550. > > The point you didn't like is that in all versions of PEP 550 > subgenerators could not leak any context to the outer generator. > Please don't confuse these two. > ?That's a different thing. But it's not exactly right: I didn't like the fact that ?some subroutines (functions, coroutines, (async) generators) leak context and some don't. ? ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Oct 10 12:21:48 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 19:21:48 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 5:46 PM, Yury Selivanov wrote: > On Tue, Oct 10, 2017 at 10:22 AM, Koos Zevenhoven > wrote: > > On Tue, Oct 10, 2017 at 5:01 PM, Nick Coghlan > wrote: > >> > >> On 10 October 2017 at 22:51, Koos Zevenhoven wrote: > >>> > >>> I see no reason why these two should be equivalent. > >> > >> > >> There is no "should" about it: it's a brute fact that the two forms > *are* > >> currently equivalent for lazy iterators (including generators), and both > >> different from the form that uses eager evaluation of the values before > the > >> context change. > >> > >> Where should enters into the picture is by way of PEP 550 saying that > they > >> should *remain* equivalent because we don't have an adequately > compelling > >> justification for changing the runtime semantics. > >> > >> That is, given the following code: > >> > >> itr = make_iter() > >> with decimal.localcontext() as ctx: > >> ctc.prex = 30 > >> for i in itr: > >> pass > >> > >> Right now, today, in 3.6. the calculations in the iterator will use the > >> modified decimal context, *not* the context that applied when the > iterator > >> was created. If you want to ensure that isn't the case, you have to > force > >> eager evaluation before the context change. > >> > ?It is not obvious to me if changing the semantics of this is breakage or a bug fix (as you put it below).? > >> What PEP 550 is proposing is that, by default, *nothing changes*: the > lazy > >> iteration in the above will continue to use the updated decimal context > by > >> default. > > > > > > That's just an arbitrary example. There are many things that *would* > change > > if decimal contexts simply switched from using thread-local storage to > using > > PEP 550. It's not at all obvious which of the changes would be most > likely > > to cause problems. If I were to choose, I would probably introduce a new > > context manager which works with PEP 555 semantics, because that's the > only > > way to ensure full backwards compatibility, regardless of whether PEP > 555 or > > PEP 550 is used. But I'm sure one could decide otherwise. > > Please stop using "many things .. would", "most likely" etc. ?I can't explain everything, especially not in a single email.? I will use whatever English words I need. You can also think for yourself??or ask a question. > We have > a very focused discussion here. If you know of any particular issue, > please demonstrate it with a realistic example. Otherwise, we only > increase the number of emails and make things harder to track for > everybody. > > I'm not going to (and won't be able to) list all those many use cases. I'd like to keep this more focused too. I'm sure you are well aware of those differences. It's not up to me to decide what `decimal` should do. I'll give you some examples below, if that helps. > If decimal switches to use PEP 550, there will be no "many things that > *would* change". The only thing that will change is this: > > def g(): > with decimal_context(...): > yield > > next(g()) # this will no longer leak decimal context to the outer world > > You forgot `yield from g()?`. See also below. > I consider the above a bug fix, because nobody in their right mind > relies on partial iteration of generator expecting that some of it's > internal code would affect your code indirectly. ?People use generators for all kinds of things.? See below. > The only such case > is contextlib.contextmanager, and PEP 550 provides mechanisms to make > generators "leaky" explicitly. > > ?That's not the only one. ?Here's another example:? ?def context_switcher(): for c in contexts: decimal.setcontext(c) yield ? ctx_switcher = context_switcher() def next_context(): next(ctx_switcher) And one more example: def make_things(): old_ctx = None def first_things_first(): first = compute_first_value() yield first ctx = figure_out_context(first) nonlocal old_ctx old_ctx = decimal.getcontext() decimal.setcontext(ctx) yield get_second_value() def the_bulk_of_things(): return get_bulk() def last_but_not_least(): decimal.set_context(old_ctx) yield "LAST" yield from first_things_first() yield from the_bulk_of_things() yield from last_but_not_least() all_things = list(make_things()) ???Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Oct 10 12:22:51 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 10 Oct 2017 12:22:51 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 11:26 AM, Koos Zevenhoven wrote: > On Tue, Oct 10, 2017 at 5:40 PM, Yury Selivanov > wrote: >> >> On Tue, Oct 10, 2017 at 8:34 AM, Koos Zevenhoven >> wrote: >> > On Tue, Oct 10, 2017 at 4:22 AM, Yury Selivanov >> > >> > wrote: >> >> >> >> On Mon, Oct 9, 2017 at 8:37 PM, Koos Zevenhoven >> >> wrote: >> >> > You can cause unbound growth in PEP 550 too. All you have to do is >> >> > nest >> >> > an >> >> > unbounded number of generators. >> >> >> >> You can only nest up to 'sys.get_recursion_limit()' number of >> >> generators. >> >> >> >> With PEP 555 you can do: >> >> >> >> while True: >> >> context_var.assign(42).__enter__() >> >> >> > >> > Well, in PEP 550, you can explicitly stack an unbounded number of >> > LogicalContexts in a while True loop. >> >> No, you can't. PEP 550 doesn't have APIs to "stack ... LogicalContexts". >> > > That's ridiculous. Quoting PEP 550: " > The contextvars.run_with_logical_context(lc: LogicalContext, func, *args, > **kwargs) function, which runs func with the provided logical context on top > of the current execution context. Note that 'run_with_logical_context()' doesn't accept the EC. It gets it using the 'get_execution_context()' function, which will squash LCs if needed. I say it again: *by design*, PEP 550 APIs do not allow to manually stack LCs in such a way that an unbound growth of the stack is possible. > " > >> >> > Or you can run out of memory using >> > plain lists even faster: >> > >> > l = [42] >> > >> > while True: >> > l *= 2 # ensure exponential blow-up >> > >> > I don't see why your example with context_var.assign(42).__enter__() >> > would >> > be any more likely. >> >> Of course you can write broken code. The point is that contexts work >> like scopes/mappings, and it's counter-intuitive that setting a >> variable with 'cv.assign(..).__enter__()' will break the world. If a >> naive user tries to convert their existing decimal-like API to use >> your PEP, everything would work initially, but then blow up in >> production. >> > > The docs will tell them what to do. You can pass a context argument down the > call chain. You don't "set" context arguments! That's why I'm changing to > "context argument", and I've said this many times now. I'm saying this the last time: In Python, any context manager should have an equivalent try..finally form. Please give us an example, how we can use PEP 555 APIs with a try..finally block. By the way, PEP 555 has this, quote: """ By default, values assigned inside a generator do not leak through yields to the code that drives the generator. However, the assignment contexts entered and left open inside the generator do become visible outside the generator after the generator has finished with a StopIteration or another exception: assi = cvar.assign(new_value) def genfunc(): yield assi.__enter__(): yield """ Why do you call __enter__() manually in this example? I thought it's a strictly prohibited thing in your PEP -- it's unsafe to use it this way. Is it only for illustration purposes? If so, then how "the assignment contexts entered and left open inside the generator" can even be a thing in your design? [..] >> > * Fall back to thread-local storage if no context argument is present or >> > if >> > the Python version does not support context arguments. >> >> The last bullet point is the problem. Everybody is saying to you that >> it's not acceptable. It's your choice to ignore that. >> > > Never has anyone told me that that is not acceptable. Please stop that. The whole idea of PEP 550 was to provide a working alternative to TLS. So this is clearly not acceptable for PEP 550. PEP 555 may hand-wave this requirement, but it simply limits the scope of where it can be useful. Which in my opinion means that it provides strictly *less* functionality than PEP 550. [..] >> This is plain incorrect. Please read PEP 550v1 before continuing the >> discussion about it. >> > > I thought you wrote that they are isolated both ways. Maybe there's a > misunderstanding. I found your "New PEP 550" email in the archives in some > thread. PEP 550 has links to all versions of it. You can simply read it there. > That might be v1, but the figure supposedly explaining this part is > missing. Whatever. This is not about PEP 550v1 anyway. This is about you spreading wrong information about PEP 550 (all of its versions in this case). Again, in PEP 550: 1. Changes to contexts made in async generators and sync generators do not leak to the caller. Changes made in a caller are visible to the generator. 2. Changes to contexts made in async tasks do not leak to the outer code or other tasks. That's assuming async tasks implementation is tweaked to use 'run_with_execution_context'. Otherwise, coroutines work with EC just like functions. 3. Changes to contexts made in OS threads do not leak to other threads. How's PEP 555 different besides requiring to use a context manager? > >> >> > Also, if you refactor your >> > generator into subgenerators using `yield from`, the subgenerators will >> > not >> > see the context set by the outer generator. >> >> Subgenerators see the context changes in the outer generator in all >> versions of PEP 550. >> >> The point you didn't like is that in all versions of PEP 550 >> subgenerators could not leak any context to the outer generator. >> Please don't confuse these two. > > > That's a different thing. But it's not exactly right: I didn't like the fact > that some subroutines (functions, coroutines, (async) generators) leak > context and some don't. So your PEP is "solving" this by disallowing to simply "set" a variable without a context manager. Is this the only difference? Look, Koos, until you give me a full list of *semantical* differences between PEP 555 and PEP 550, I'm not going to waste my time on discussions here. And I encourage Guido, Nick, and Nathaniel to do the same. Yury From yselivanov.ml at gmail.com Tue Oct 10 12:29:54 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 10 Oct 2017 12:29:54 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 12:21 PM, Koos Zevenhoven wrote: [..] >> Please stop using "many things .. would", "most likely" etc. > > > I can't explain everything, especially not in a single email. I will use > whatever English words I need. You can also think for yourself??or ask a > question. I can't assign meaning to your examples formulated in "many things" and "most likely". I can reason about concrete words and code examples. You essentially asking us to *trust you* that you know of some examples and they exist. It's not going to happen. > > >> >> We have >> a very focused discussion here. If you know of any particular issue, >> please demonstrate it with a realistic example. Otherwise, we only >> increase the number of emails and make things harder to track for >> everybody. >> > > I'm not going to (and won't be able to) list all those many use cases. Then why are you working on a PEP? :) [..] >> The only such case >> is contextlib.contextmanager, and PEP 550 provides mechanisms to make >> generators "leaky" explicitly. >> > > That's not the only one. > > Here's another example: > > def context_switcher(): > for c in contexts: > decimal.setcontext(c) > yield > ctx_switcher = context_switcher() > > def next_context(): > next(ctx_switcher) In 10 years of me professionally writing Python code, I've never seen this pattern in any code base. But even if such pattern exists, you can simply decorate "context_switcher" generator to set it's __logical_context__ to None. And it will start to leak things. BTW, how does PEP 555 handle your own example? I thought it's not possible to implement "decimal.setcontext" with PEP 555 at all! > > > > And one more example: > > > def make_things(): > old_ctx = None > def first_things_first(): > first = compute_first_value() > yield first > > ctx = figure_out_context(first) > nonlocal old_ctx > old_ctx = decimal.getcontext() > decimal.setcontext(ctx) > > yield get_second_value() > > def the_bulk_of_things(): > return get_bulk() > > def last_but_not_least(): > decimal.set_context(old_ctx) > yield "LAST" > > > yield from first_things_first() > yield from the_bulk_of_things() > yield from last_but_not_least() > > all_things = list(make_things()) I can only say that this one wouldn't pass my code review :) This isn't a real example, this is something that you clearly just a piece of tangled convoluted code that you just invented. Yury From k7hoven at gmail.com Tue Oct 10 12:34:44 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 10 Oct 2017 19:34:44 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 3:42 PM, Nick Coghlan wrote: > On 10 October 2017 at 22:34, Koos Zevenhoven wrote: > >> Really, it was my mistake to ever make you think that >> context_var.assign(42).__enter__() can be compared to .set(42) in PEP >> 550. I'll say it once more: PEP 555 context arguments have no equivalent of >> the PEP-550 .set(..). >> > > Then your alternate PEP can't work, since it won't be useful to extension > modules. > > ?Maybe this helps: * PEP 550 is based on var.set(..), but you will then implement context managers on top of that. * PEP 555 is based context managers, but you can implement a var.set(..)? on top of that if you really need it. > Context managers are merely syntactic sugar for try/finally statements, so > you can't wave your hands and say a context manager is the only supported > API: you *have* to break the semantics down and explain what the > try/finally equivalent looks like. > > > Is this what you're asking?? ?assi = cvar.assign(value) assi.__enter__() try: # do stuff involving cvar.value finally: assi.__exit__() As written in the PEP, these functions would have C equivalents. But most C extensions will probably only need cvar.value, and the assignment contexts will be entered from Python. ???Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Tue Oct 10 12:40:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Tue, 10 Oct 2017 12:40:41 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 12:34 PM, Koos Zevenhoven wrote: > On Tue, Oct 10, 2017 at 3:42 PM, Nick Coghlan wrote: [..] >> Context managers are merely syntactic sugar for try/finally statements, so >> you can't wave your hands and say a context manager is the only supported >> API: you *have* to break the semantics down and explain what the try/finally >> equivalent looks like. >> >> > > Is this what you're asking? > > assi = cvar.assign(value) > assi.__enter__() > try: > # do stuff involving cvar.value > finally: > assi.__exit__() But then you *are* allowing users to use "__enter__()" and "__exit__()" directly. Which means that some users *can* experience an unbound growth of context values stack that will make their code run out of memory. This is not similar to appending something to a list -- people are aware that lists can't grow infinitely. But it's not obvious that you can't call "cvar.assign(value).__enter__()" many times. The problem with memory leaks like this is that you can easily write some code and ship it. And only after a while you start experiencing problems in production that are extremely hard to track. Yury From guido at python.org Tue Oct 10 12:52:39 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 10 Oct 2017 09:52:39 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Tue, Oct 10, 2017 at 5:34 AM, Nick Coghlan wrote: > On 10 October 2017 at 01:24, Guido van Rossum wrote: > >> On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: >> >>> On 8 October 2017 at 08:40, Koos Zevenhoven wrote: >>> >>>> ??I do remember Yury mentioning that the first draft of PEP 550 >>>> captured something when the generator function was called. I think I >>>> started reading the discussions after that had already been removed, so I >>>> don't know exactly what it was. But I doubt that it was *exactly* the >>>> above, because PEP 550 uses set and get operations instead of "assignment >>>> contexts" like PEP 555 (this one) does. ?? >>>> >>> >>> We didn't forget it, we just don't think it's very useful. >>> >> >> I'm not sure I agree on the usefulness. Certainly a lot of the complexity >> of PEP 550 exists just to cater to Nathaniel's desire to influence what a >> generator sees via the context of the send()/next() call. I'm still not >> sure that's worth it. In 550 v1 there's no need for chained lookups. >> > > The compatibility concern is that we want developers of existing libraries > to be able to transparently switch from using thread local storage to > context local storage, and the way thread locals interact with generators > means that decimal (et al) currently use the thread local state at the time > when next() is called, *not* when the generator is created. > Apart from the example in PEP 550, is that really a known idiom? > I like Yury's example for this, which is that the following two examples > are currently semantically equivalent, and we want to preserve that > equivalence: > > with decimal.localcontext() as ctx: > ctc.prex = 30 > for i in gen(): > pass > > g = gen() > with decimal.localcontext() as ctx: > ctc.prex = 30 > for i in g: > pass > Do we really want that equivalence? It goes against the equivalence from Koos' example. > The easiest way to maintain that equivalence is to say that even though > preventing state changes leaking *out* of generators is considered a > desirable change, we see preventing them leaking *in* as a gratuitous > backwards compatibility break. > I dunno, I think them leaking in in the first place is a dubious feature, and I'm not too excited that the design of the way forward should bend over backwards to be compatible here. The only real use case I've seen so far (not counting examples that just show how it works) is Nathaniel's timeout example (see point 9 in Nathaniel?s message ), and I'm still not convinced that that example is important enough to support either. It would all be easier to decide if there were use cases that were less far-fetched, or if the far-fetched use cases would be supportable with a small tweak. As it is, it seems that we could live in a simpler, happier world if we gave up on context values leaking in via next() etc. (I still claim that in that case we wouldn't need chained lookup in the exposed semantics, just fast copying of contexts.) > This does mean that *neither* form is semantically equivalent to eager > extraction of the generator values before the decimal context is changed, > but that's the status quo, and we don't have a compelling justification for > changing it. > I think the justification is that we could have a *significantly* simpler semantics and implementation. > If folks subsequently decide that they *do* want "capture on creation" or > "capture on first iteration" semantics for their generators, those are easy > enough to add as wrappers on top of the initial thread-local-compatible > base by using the same building blocks as are being added to help event > loops manage context snapshots for coroutine execution. > (BTW Capture on first iteration sounds just awful.) I think we really need to do more soul-searching before we decide that a much more complex semantics and implementation is worth it to maintain backwards compatibility for leaking in via next(). -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Wed Oct 11 00:46:35 2017 From: steve.dower at python.org (Steve Dower) Date: Tue, 10 Oct 2017 21:46:35 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: Nick: ?I like Yury's example for this, which is that the following two examples are currently semantically equivalent, and we want to preserve that equivalence: ??? with decimal.localcontext() as ctx: ? ????? ctc.prex = 30 ? ? ? ? for i in gen(): ? ? ? ? ?? pass ??? g = gen() ??? with decimal.localcontext() as ctx: ? ????? ctc.prex = 30 ? ? ? ? for i in g: ? ??? ? ? pass? I?m following this discussion from a distance, but cared enough about this point to chime in without even reading what comes later in the thread. (Hopefully it?s not twenty people making the same point?) I HATE this example! Looking solely at the code we can see, you are refactoring a function call from inside an *explicit* context manager to outside of it, and assuming the behavior will not change. There?s *absolutely no* logical or semantic reason that these should be equivalent, especially given the obvious alternative of leaving the call within the explicit context. Even moving the function call before the setattr can?t be assumed to not change its behavior ? how is moving it outside a with block ever supposed to be safe? I appreciate the desire to be able to take currently working code using one construct and have it continue working with a different construct, but the burden should be on that library and not the runtime. By that I mean that the parts of decimal that set and read the context should do the extra work to maintain compatibility (e.g. through a globally mutable structure using context variables as a slightly more fine-grained key than thread ID) rather than forcing an otherwise straightforward core runtime feature to jump through hoops to accommodate it. New users of this functionality very likely won?t assume that TLS is the semantic equivalent, especially when all the examples and naming make it sound like context managers are more related. (I predict people will expect this to behave more like unstated/implicit function arguments and be captured at the same time as other arguments are, but can?t really back that up except with gut-feel. It's certainly a feature that I want for myself more than I want another spelling for TLS?) Top-posted from my Windows phone From: Nick Coghlan Sent: Tuesday, October 10, 2017 5:35 To: Guido van Rossum Cc: Python-Ideas Subject: Re: [Python-ideas] PEP draft: context variables On 10 October 2017 at 01:24, Guido van Rossum wrote: On Sun, Oct 8, 2017 at 11:46 PM, Nick Coghlan wrote: On 8 October 2017 at 08:40, Koos Zevenhoven wrote: ??I do remember Yury mentioning that the first draft of PEP 550 captured something when the generator function was called. I think I started reading the discussions after that had already been removed, so I don't know exactly what it was. But I doubt that it was *exactly* the above, because PEP 550 uses set and get operations instead of "assignment contexts" like PEP 555 (this one) does. ?? We didn't forget it, we just don't think it's very useful. I'm not sure I agree on the usefulness. Certainly a lot of the complexity of PEP 550 exists just to cater to Nathaniel's desire to influence what a generator sees via the context of the send()/next() call. I'm still not sure that's worth it. In 550 v1 there's no need for chained lookups. The compatibility concern is that we want developers of existing libraries to be able to transparently switch from using thread local storage to context local storage, and the way thread locals interact with generators means that decimal (et al) currently use the thread local state at the time when next() is called, *not* when the generator is created. I like Yury's example for this, which is that the following two examples are currently semantically equivalent, and we want to preserve that equivalence: ??? with decimal.localcontext() as ctx: ? ????? ctc.prex = 30 ? ? ? ? for i in gen(): ? ? ? ? ?? pass ??? g = gen() ??? with decimal.localcontext() as ctx: ? ????? ctc.prex = 30 ? ? ? ? for i in g: ? ??? ? ? pass The easiest way to maintain that equivalence is to say that even though preventing state changes leaking *out* of generators is considered a desirable change, we see preventing them leaking *in* as a gratuitous backwards compatibility break. This does mean that *neither* form is semantically equivalent to eager extraction of the generator values before the decimal context is changed, but that's the status quo, and we don't have a compelling justification for changing it. If folks subsequently decide that they *do* want "capture on creation" or "capture on first iteration" semantics for their generators, those are easy enough to add as wrappers on top of the initial thread-local-compatible base by using the same building blocks as are being added to help event loops manage context snapshots for coroutine execution. Cheers, Nick. -- Nick Coghlan?? |?? ncoghlan at gmail.com?? |?? Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 11 04:28:37 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 11 Oct 2017 18:28:37 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 11 October 2017 at 02:52, Guido van Rossum wrote: > I think we really need to do more soul-searching before we decide that a > much more complex semantics and implementation is worth it to maintain > backwards compatibility for leaking in via next(). > As a less-contrived example, consider context managers implemented as generators. We want those to run with the execution context that's active when they're used in a with statement, not the one that's active when they're created (the fact that generator-based context managers can only be used once mitigates the risk of creation time context capture causing problems, but the implications would still be weird enough to be worth avoiding). For native coroutines, we want them to run with the execution context that's active when they're awaited or when they're prepared for submission to an event loop, not the one that's active when they're created. For generators-as-coroutines, we want them to be like their native coroutine counterparts: run with the execution context that's active when they're passed to "yield from" or prepared for submission to an event loop. It's only for generators-as-iterators that the question of what behaviour we want even really arises, as it's less clear cut whether we'd be better off overall if they behaved more like an eagerly populated container (and hence always ran with the execution context that's active when they're created), or more like the way they do now (where retrieval of the next value from a generator is treated like any other method call). That combination of use cases across context managers, native coroutines, top level event loop tasks, and generator-based coroutines mean we already need to support both execution models regardless, so the choice of default behaviour for generator-iterators won't make much difference to the overall complexity of the PEP. However, having generator-iterators default to *not* capturing their creation context makes them more consistent with the other lazy evaluation constructs, and also makes the default ContextVar semantics more consistent with thread local storage semantics. The flipside of that argument would be: * the choice doesn't matter if there aren't any context changes between creation & use * having generators capture their context by default may ease future migrations from eager container creation to generators in cases that involve context-dependent calculations * decorators clearing the implicitly captured context from the generator-iterator when appropriate is simpler than writing a custom iterator wrapper to handle the capturing I just don't find that counterargument compelling when we have specific use cases that definitely benefit from the proposed default behaviour (contextlib.contextmanager, asyncio.coroutine), but no concrete use cases for the proposed alternative that couldn't be addressed by a combination of map(), functools.partial(), and contextvars.run_in_execution_context(). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Oct 11 07:58:31 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 11 Oct 2017 14:58:31 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Wed, Oct 11, 2017 at 7:46 AM, Steve Dower wrote: > Nick: ?I like Yury's example for this, which is that the following two > examples are currently semantically equivalent, and we want to preserve > that equivalence: > > > > with decimal.localcontext() as ctx: > > ctc.prex = 30 > > for i in gen(): > pass > > g = gen() > > with decimal.localcontext() as ctx: > > ctc.prex = 30 > > for i in g: > pass? > > > > I?m following this discussion from a distance, but cared enough about this > point to chime in without even reading what comes later in the thread. > (Hopefully it?s not twenty people making the same point?) > > > > I HATE this example! Looking solely at the code we can see, you are > refactoring a function call from inside an **explicit** context manager > to outside of it, and assuming the behavior will not change. There?s **absolutely > no** logical or semantic reason that these should be equivalent, > especially given the obvious alternative of leaving the call within the > explicit context. Even moving the function call before the setattr can?t be > assumed to not change its behavior ? how is moving it outside a with block > ever supposed to be safe? > > > ?Exactly. You did say it less politely than I did, but this is exactly how I thought about it. And I'm not sure people got it the first time. > I appreciate the desire to be able to take currently working code using > one construct and have it continue working with a different construct, but > the burden should be on that library and not the runtime. By that I mean > that the parts of decimal that set and read the context should do the extra > work to maintain compatibility (e.g. through a globally mutable structure > using context variables as a slightly more fine-grained key than thread ID) > rather than forcing an otherwise straightforward core runtime feature to > jump through hoops to accommodate it. > > > ?In fact, one might then use the kind of enhanced context-local storage ?that I've been planning on top of PEP 555, as also mentioned in the PEP. It would not be the recommended way, but people might benefit from it in some cases, such as for a more backwards-compatible PEP-555 "feature enablement" or other special purposes like `trio`s timeouts. (However, these needs can be satisfied with an even simpler approach in PEP 555, if that's where we want to go.) I want PEP 555 to be how things *should be*, not how things are. After all, context arguments are a new feature. But plenty of effort in the design still goes into giving people ways to tweak things to their special needs and for compatibility issues. > New users of this functionality very likely won?t assume that TLS is the > semantic equivalent, especially when all the examples and naming make it > sound like context managers are more related. (I predict people will expect > this to behave more like unstated/implicit function arguments and be > captured at the same time as other arguments are, but can?t really back > that up except with gut-feel. It's certainly a feature that I want for > myself more than I want another spelling for TLS?) > I assume you like my decision to rename the concept to "context arguments" :). And indeed, new use cases would be more interesting than existing ones. Surely we don't want new use cases to copy the semantics from the old ones which currently have issues (because they were originally designed to work with traditional function and method calls, and using then-available techniques). ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve.dower at python.org Wed Oct 11 14:54:00 2017 From: steve.dower at python.org (Steve Dower) Date: Wed, 11 Oct 2017 11:54:00 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: <0f68e33b-0e7a-c3b3-08fe-e0322a9ae91d@python.org> On 11Oct2017 0458, Koos Zevenhoven wrote: > ?Exactly. You did say it less politely than I did, but this is exactly > how I thought about it. And I'm not sure people got it the first time. Yes, perhaps a little harsh. However, if I released a refactoring tool that moved function calls that far, people would file bugs against it for breaking their code (and in my experience of people filing bugs against tools that break their code, they can also be a little harsh). > I want PEP 555 to be how things *should be*, not how things are. Agreed. Start with the ideal target and backpedal when a sufficient case has been made to justify it. That's how Yury's PEP has travelled, but I disagree that this example is a compelling case for the amount of bending that is being done. >> New users of this functionality very likely won?t assume that TLS is >> the semantic equivalent, especially when all the examples and naming >> make it sound like context managers are more related. (I predict >> people will expect this to behave more like unstated/implicit >> function arguments and be captured at the same time as other >> arguments are, but can?t really back that up except with gut-feel. >> It's certainly a feature that I want for myself more than I want >> another spelling for TLS?) > > I assume you like my decision to rename the concept to "context > arguments" :). And indeed, new use cases would be more interesting than > existing ones. Surely we don't want new use cases to copy the semantics > from the old ones which currently have issues (because they were > originally designed to work with traditional function and method calls, > and using then-available techniques). I don't really care about names, as long as it's easy to use them to research the underlying concept or intended functionality. And I'm not particularly supportive of this concept as a whole anyway - EIBTI and all. But since it does address a fairly significant shortcoming in existing code, we're going to end up with something. If it's a new runtime feature then I'd like it to be an easy concept to grasp with clever hacks for the compatibility cases (and I do believe there are clever hacks available for getting "inject into my deferred function call" semantics), rather than the whole thing being a complicated edge-case. Cheers, Steve From ncoghlan at gmail.com Wed Oct 11 23:54:18 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 12 Oct 2017 13:54:18 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 11 October 2017 at 21:58, Koos Zevenhoven wrote: > On Wed, Oct 11, 2017 at 7:46 AM, Steve Dower > wrote: > >> Nick: ?I like Yury's example for this, which is that the following two >> examples are currently semantically equivalent, and we want to preserve >> that equivalence: >> >> >> >> with decimal.localcontext() as ctx: >> >> ctc.prex = 30 >> >> for i in gen(): >> pass >> >> g = gen() >> >> with decimal.localcontext() as ctx: >> >> ctc.prex = 30 >> >> for i in g: >> pass? >> >> >> >> I?m following this discussion from a distance, but cared enough about >> this point to chime in without even reading what comes later in the thread. >> (Hopefully it?s not twenty people making the same point?) >> >> >> >> I HATE this example! Looking solely at the code we can see, you are >> refactoring a function call from inside an **explicit** context manager >> to outside of it, and assuming the behavior will not change. There?s **absolutely >> no** logical or semantic reason that these should be equivalent, >> especially given the obvious alternative of leaving the call within the >> explicit context. Even moving the function call before the setattr can?t be >> assumed to not change its behavior ? how is moving it outside a with block >> ever supposed to be safe? >> >> >> > > ?Exactly. You did say it less politely than I did, but this is exactly how > I thought about it. And I'm not sure people got it the first time. > Refactoring isn't why I like the example, as I agree there's no logical reason why the two forms should be semantically equivalent in a greenfield context management design. The reason I like the example is because, in current Python, with the way generators and decimal contexts currently work, it *doesn't matter* which of these two forms you use - they'll both behave the same way, since no actual code execution takes place in the generator iterator at the time the generator is created. That means we have a choice to make, and that choice will affect how risky it is for a library like decimal to switch from using thread local storage to context local storage: is switching from thread locals to context variables in a synchronous context manager going to be a compatibility break for end user code that uses the second form, where generator creation happens outside a with statement, but use happens inside it? Personally, I want folks maintaining context managers to feel comfortable switching from thread local storage to context variables (when the latter are available), and in particular, I want the decimal module to be able to make such a switch and have it be an entirely backwards compatible change for synchronous single-threaded code. That means it doesn't matter to me whether we see separating generator (or context manager) creation from subsequent use is good style or not, what matters is that decimal contexts work a certain way today and hence we're faced with a choice between: 1. Preserve the current behaviour, since we don't have a compelling reason to change its semantics 2. Change the behaviour, in order to gain "I think it's more correct, but don't have any specific examples where the status quo subtly does the wrong thing" isn't an end user benefit, as: - of necessity, any existing tested code won't be written that way (since it would be doing the wrong thing, and will hence have been changed) - future code that does want creation time context capture can be handled via an explicit wrapper (as is proposed for coroutines, with event loops supplying the wrapper in that case) "It will be easier to implement & maintain" isn't an end user benefit either, but still a consideration that carries weight when true. In this case though, it's pretty much a wash - whichever form we make the default, we'll need to provide some way of switching to the other behaviour, since we need both behavioural variants ourselves to handle different use cases. That puts the burden squarely on the folks arguing for a semantic change: "We should break currently working code because ...". PEP 479 (the change to StopIteration semantics) is an example of doing that well, and so is the proposal in PEP 550 to keep context changes from implicitly leaking *out* of generators when yield or await is used in a with statement body. The challenge for folks arguing for generators capturing their creation context is to explain the pay-off that end users will gain from our implicitly changing the behaviour of code like the following: >>> data = [sum(Decimal(10)**-r for r in range(max_r+1)) for max_r in range(5)] >>> data [Decimal('1'), Decimal('1.1'), Decimal('1.11'), Decimal('1.111'), Decimal('1.1111')] >>> def lazily_round_to_current_context(data): ... for d in data: yield +d ... >>> g = lazily_round_to_current_context(data) >>> with decimal.localcontext() as ctx: ... ctx.prec = 2 ... rounded_data = list(g) ... >>> rounded_data [Decimal('1'), Decimal('1.1'), Decimal('1.1'), Decimal('1.1'), Decimal('1.1')] Yes, it's a contrived example, but it's also code that will work all the way back to when the decimal module was first introduced. Because of the way I've named the rounding generator, it's also clear to readers that the code is aware of the existing semantics, and is intentionally relying on them. The current version of PEP 550 means that the decimal module can switch to using context variables instead of thread local storage, and the above code won't even notice the difference. However, if generators were to start implicitly capturing their creation context, then the above code would break, since the rounding would start using a decimal context other than the one that's in effect in the current thread when the rounding takes place - the generator would implicitly reset it back to an earlier state. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Thu Oct 12 08:59:31 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 12 Oct 2017 15:59:31 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Thu, Oct 12, 2017 at 6:54 AM, Nick Coghlan wrote: > On 11 October 2017 at 21:58, Koos Zevenhoven wrote: > >> On Wed, Oct 11, 2017 at 7:46 AM, Steve Dower >> wrote: >> >>> Nick: ?I like Yury's example for this, which is that the following two >>> examples are currently semantically equivalent, and we want to preserve >>> that equivalence: >>> >>> >>> >>> with decimal.localcontext() as ctx: >>> >>> ctc.prex = 30 >>> >>> for i in gen(): >>> pass >>> >>> g = gen() >>> >>> with decimal.localcontext() as ctx: >>> >>> ctc.prex = 30 >>> >>> for i in g: >>> pass? >>> >>> >>> >>> I?m following this discussion from a distance, but cared enough about >>> this point to chime in without even reading what comes later in the thread. >>> (Hopefully it?s not twenty people making the same point?) >>> >>> >>> >>> I HATE this example! Looking solely at the code we can see, you are >>> refactoring a function call from inside an **explicit** context manager >>> to outside of it, and assuming the behavior will not change. There?s **absolutely >>> no** logical or semantic reason that these should be equivalent, >>> especially given the obvious alternative of leaving the call within the >>> explicit context. Even moving the function call before the setattr can?t be >>> assumed to not change its behavior ? how is moving it outside a with block >>> ever supposed to be safe? >>> >>> >>> >> >> ?Exactly. You did say it less politely than I did, but this is exactly >> how I thought about it. And I'm not sure people got it the first time. >> > > Refactoring isn't why I like the example, as I agree there's no logical > reason why the two forms should be semantically equivalent in a greenfield > context management design. > > The reason I like the example is because, in current Python, with the way > generators and decimal contexts currently work, it *doesn't matter* which > of these two forms you use - they'll both behave the same way, since no > actual code execution takes place in the generator iterator at the time the > generator is created. > > The latter version does not feel like a good way to write the code. People will hate it, because they can't tell what happens by looking at the code locally. What I think is that the current behavior of decimal contexts only satisfies some contrived examples. IMO, everything about decimal contexts together with generators is currently a misfeature. Of course, you can also make use of a misfeature, like in the above example, where the subtleties of decimal rounding are hidden underneath the iterator protocol and a `for` statement.? That means we have a choice to make, and that choice will affect how risky > it is for a library like decimal to switch from using thread local storage > to context local storage: is switching from thread locals to context > variables in a synchronous context manager going to be a compatibility > break for end user code that uses the second form, where generator creation > happens outside a with statement, but use happens inside it? > > ?AFAICT, the number of users of `decimal` could be anywhere between 3 and 3**19. ?Anything you do might break someone's code. Personally, I think the current behavior, which you explain using the example above, is counter-intuitive. But I can't tell you how much code would break by fixing it with direct PEP 555 semantics. I also can't tell how much would break when using PEP 550 to "fix" it, but I don't even like the semantics that that would give. I strongly believe that the most "normal" use case for a generator function is that it's a function that returns an iterable of values. Sadly, decimal contexts don't currently work correctly for this use case. Indeed, I would introduce a new context manager that behaves intuitively and then slowly deprecate the old one. [*] Personally, I want folks maintaining context managers to feel comfortable > switching from thread local storage to context variables (when the latter > are available), and in particular, I want the decimal module to be able to > make such a switch and have it be an entirely backwards compatible change > for synchronous single-threaded code. > ?Sure. But PEP 550 won't give you that, though. Being inside a generator affects the scope of changing the decimal context. Yes, as a side effect, the behavior of a decimal context manager inside a generator becomes more intuitive. But it's still a breaking change, even for synchronous code. That means it doesn't matter to me whether we see separating generator (or > context manager) creation from subsequent use is good style or not, what > matters is that decimal contexts work a certain way today and hence we're > faced with a choice between: > > 1. Preserve the current behaviour, since we don't have a compelling reason > to change its semantics > 2. Change the behaviour, in order to gain > > ?3. Introduce? a new context manager that behaves intuitively. ?My guess is that the two context managers could even be made to interact with each other in a fairly reasonable manner, even if you nest them in different orders. I'm not sure how necessary that is. > "I think it's more correct, but don't have any specific examples where the > status quo subtly does the wrong thing" isn't an end user benefit, as: > - of necessity, any existing tested code won't be written that way (since > it would be doing the wrong thing, and will hence have been changed) > ?Then it's actually better to not change the semantics of the existing functionality, but add new ones instead.? > - future code that does want creation time context capture can be handled > via an explicit wrapper (as is proposed for coroutines, with event loops > supplying the wrapper in that case) > > Handling the normal case with wrappers (that might even harm performance) ? just because decimal does not handle the normal case? ? > "It will be easier to implement & maintain" isn't an end user benefit > either, but still a consideration that carries weight when true. In this > case though, it's pretty much a wash - whichever form we make the default, > we'll need to provide some way of switching to the other behaviour, since > we need both behavioural variants ourselves to handle different use cases. > True, in PEP 555 there is not really much difference in complexity regarding leaking in from the side (next/send) and leaking in from the top (genfunc() call). Just a matter of some if statements. > > That puts the burden squarely on the folks arguing for a semantic change: > "We should break currently working code because ...". > > ? And on the folks that end up having to argue against it, or to come up with a better solution. And those that feel that it's a distraction from the discussion. > PEP 479 (the change to StopIteration semantics) is an example of doing > that well, and so is the proposal in PEP 550 to keep context changes from > implicitly leaking *out* of generators when yield or await is used in a > with statement body. > The former is a great example. The latter has good parts but is complicated and didn't end up getting all the way there. The challenge for folks arguing for generators capturing their creation > context is to explain the pay-off that end users will gain from our > implicitly changing the behaviour of code like the following: > > >>> data = [sum(Decimal(10)**-r for r in range(max_r+1)) for max_r in > range(5)] > >>> data > [Decimal('1'), Decimal('1.1'), Decimal('1.11'), Decimal('1.111'), > Decimal('1.1111')] > >>> def lazily_round_to_current_context(data): > ... for d in data: yield +d > ... > >>> g = lazily_round_to_current_context(data) > >>> with decimal.localcontext() as ctx: > ... ctx.prec = 2 > ... rounded_data = list(g) > ... > >>> rounded_data > [Decimal('1'), Decimal('1.1'), Decimal('1.1'), Decimal('1.1'), > Decimal('1.1')] > > Yes, it's a contrived example, but it's also code that will work all the > way back to when the decimal module was first introduced. Because of the > way I've named the rounding generator, it's also clear to readers that the > code is aware of the existing semantics, and is intentionally relying on > them. > > ?The way you've named the function (lazily_round_to_current_context) does not correspond to the behavior in the code example. "Current" means "current", not "the context of the caller of next at lazy evaluation time". Maybe you could make it: g = rounded_according_to_decimal_context_of_whoever_calls_next(data) Really, I think that, to get this behavior, the function should be defined with a decorator to mark that context should leak in through next(). But probably the programmer will realize??there must be a better way: with decimal.localcontext() as ctx: ctx.prec = 2 rounded_data = [round_in_context(d) for d in data] That one would already work and be equivalent in any of the proposed semantics. But there could be more improvements, perhaps: with decimal.context(prec=2): rounded_data = [round_in_context(d) for d in data] ??Koos? ??[*] Maybe somehow make the existing functionality a phantom easter egg??a blast from the past which you can import and use, but which is otherwise invisible :-). Then later give warnings and finally remove it completely. But we need better smooth upgrade paths anyway, maybe something like: from __compat__ import unintuitive_decimal_contexts with unintuitive_decimal_contexts: do_stuff() ?Now code bases can more quickly switch to new python versions and make the occasional compatibility adjustments more lazily, while already benefiting from other new language features. ??Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Oct 12 14:02:58 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 12 Oct 2017 14:02:58 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Thu, Oct 12, 2017 at 8:59 AM, Koos Zevenhoven wrote: > On Thu, Oct 12, 2017 at 6:54 AM, Nick Coghlan wrote: [..] >> 1. Preserve the current behaviour, since we don't have a compelling reason >> to change its semantics >> 2. Change the behaviour, in order to gain >> > > 3. Introduce a new context manager that behaves intuitively. My guess is > that the two context managers could even be made to interact with each other > in a fairly reasonable manner, even if you nest them in different orders. > I'm not sure how necessary that is. Note that this is an independent argument w.r.t. both PEPs. PEP 550 does not propose to change existing decimal APIs. It merely uses decimal to illustrate the problem, and suggests a fix using the new APIs. Although it is true that I plan to propose to use PEP 550 to reimplement decimal APIs on top of it, and so far I haven't seen any real-world examples of code that will be broken because of that. As far as I know?and I've done some research?nobody uses decimal contexts and generators because of the associated problems. It's a chicken and egg problem. Yury From k7hoven at gmail.com Thu Oct 12 17:48:20 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 13 Oct 2017 00:48:20 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Oct 12, 2017 9:03 PM, "Yury Selivanov" wrote: On Thu, Oct 12, 2017 at 8:59 AM, Koos Zevenhoven wrote: > On Thu, Oct 12, 2017 at 6:54 AM, Nick Coghlan wrote: [..] >> 1. Preserve the current behaviour, since we don't have a compelling reason >> to change its semantics >> 2. Change the behaviour, in order to gain >> > > 3. Introduce a new context manager that behaves intuitively. My guess is > that the two context managers could even be made to interact with each other > in a fairly reasonable manner, even if you nest them in different orders. > I'm not sure how necessary that is. Note that this is an independent argument w.r.t. both PEPs. PEP 550 does not propose to change existing decimal APIs. It merely uses decimal to illustrate the problem, and suggests a fix using the new APIs. Of course this particular point is independent. But not all the other points are. Although it is true that I plan to propose to use PEP 550 to reimplement decimal APIs on top of it, and so far I haven't seen any real-world examples of code that will be broken because of that. As far as I know?and I've done some research?nobody uses decimal contexts and generators because of the associated problems. It's a chicken and egg problem. I've been inclined to think so too. But that kind of research would be useful for decimal if?and only if?you share your methodology. It's not at all clear how one would do research to arrive at such a conclusion. ?Koos (mobile) -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Thu Oct 12 18:13:19 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Thu, 12 Oct 2017 18:13:19 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: > Although it is true that I plan to propose to use PEP 550 to > reimplement decimal APIs on top of it, and so far I haven't seen any > real-world examples of code that will be broken because of that. As > far as I know?and I've done some research?nobody uses decimal contexts > > and generators because of the associated problems. It's a chicken and > egg problem. > > > I've been inclined to think so too. But that kind of research would be > useful for decimal if?and only if?you share your methodology. It's not at > all clear how one would do research to arrive at such a conclusion. Specifically for decimal: I tried to find bug reports on bugs.python.org (found not even one), questions on stackoverflow (IIRC I didn't find anything), and used github code search and google (again, nothing directly relevant). Yury From stefan at bytereef.org Thu Oct 12 19:07:28 2017 From: stefan at bytereef.org (Stefan Krah) Date: Fri, 13 Oct 2017 01:07:28 +0200 Subject: [Python-ideas] PEP draft: context variables Message-ID: <20171012230728.GA3613@bytereef.org> Yury Selivanov wrote: [generators and decimal] > Specifically for decimal: I tried to find bug reports on > bugs.python.org (found not even one), questions on stackoverflow (IIRC > I didn't find anything), and used github code search and google > (again, nothing directly relevant). I also don't recall any single feature request to change the tls-context either for generators or for coroutines. Generators can quickly turn into a labyrinth, and that's not good for numerical code, which already has enough of its own complexities. So the decimal examples can be helpful for understanding, but (except for the performance issues) shouldn't be the centerpiece of the discussion. Speaking of performance, I have seen that adressed in Koos' PEP at all. Perhaps I missed something. Stefan Krah From guido at python.org Thu Oct 12 20:56:24 2017 From: guido at python.org (Guido van Rossum) Date: Thu, 12 Oct 2017 17:56:24 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: I'm out of energy to debate every point (Steve said it well -- that decimal/generator example is too contrived), but I found one nit in Nick's email that I wish to correct. On Wed, Oct 11, 2017 at 1:28 AM, Nick Coghlan wrote: > > As a less-contrived example, consider context managers implemented as > generators. > > We want those to run with the execution context that's active when they're > used in a with statement, not the one that's active when they're created > (the fact that generator-based context managers can only be used once > mitigates the risk of creation time context capture causing problems, but > the implications would still be weird enough to be worth avoiding). > Here I think we're in agreement about the desired semantics, but IMO all this requires is some special casing for @contextlib.contextmanager. To me this is the exception, not the rule -- in most *other* places I would want the yield to switch away from the caller's context. > For native coroutines, we want them to run with the execution context > that's active when they're awaited or when they're prepared for submission > to an event loop, not the one that's active when they're created. > This caught my eye as wrong. Considering that asyncio's tasks (as well as curio's and trio's) *are* native coroutines, we want complete isolation between the context active when `await` is called and the context active inside the `async def` function. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Oct 13 03:25:55 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 13 Oct 2017 17:25:55 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 13 October 2017 at 10:56, Guido van Rossum wrote: > I'm out of energy to debate every point (Steve said it well -- that > decimal/generator example is too contrived), but I found one nit in Nick's > email that I wish to correct. > > On Wed, Oct 11, 2017 at 1:28 AM, Nick Coghlan wrote: >> >> As a less-contrived example, consider context managers implemented as >> generators. >> >> We want those to run with the execution context that's active when >> they're used in a with statement, not the one that's active when they're >> created (the fact that generator-based context managers can only be used >> once mitigates the risk of creation time context capture causing problems, >> but the implications would still be weird enough to be worth avoiding). >> > > Here I think we're in agreement about the desired semantics, but IMO all > this requires is some special casing for @contextlib.contextmanager. To me > this is the exception, not the rule -- in most *other* places I would want > the yield to switch away from the caller's context. > > >> For native coroutines, we want them to run with the execution context >> that's active when they're awaited or when they're prepared for submission >> to an event loop, not the one that's active when they're created. >> > > This caught my eye as wrong. Considering that asyncio's tasks (as well as > curio's and trio's) *are* native coroutines, we want complete isolation > between the context active when `await` is called and the context active > inside the `async def` function. > The rationale for this behaviour *does* arise from a refactoring argument: async def original_async_function(): with some_context(): do_some_setup() raw_data = await some_operation() data = do_some_postprocessing(raw_data) Refactored: async def async_helper_function(): do_some_setup() raw_data = await some_operation() return do_some_postprocessing(raw_data) async def refactored_async_function(): with some_context(): data = await async_helper_function() However, considering that coroutines are almost always instantiated at the point where they're awaited, I do concede that creation time context capture would likely also work out OK for the coroutine case, which would leave contextlib.contextmanager as the only special case (and it would turn off both creation-time context capture *and* context isolation). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Fri Oct 13 10:12:39 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 13 Oct 2017 16:12:39 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution Message-ID: Hi, I would like to add new functions to return time as a number of nanosecond (Python int), especially time.time_ns(). It would enhance the time.time() clock resolution. In my experience, it decreases the minimum non-zero delta between two clock by 3 times, new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in Python. The question of this long email is if it's worth it to add more "_ns" time functions than just time.time_ns()? I would like to add: * time.time_ns() * time.monotonic_ns() * time.perf_counter_ns() * time.clock_gettime_ns() * time.clock_settime_ns() time(), monotonic() and perf_counter() clocks are the 3 most common clocks and users use them to get the best available clock resolution. clock_gettime/settime() are the generic UNIX API to access these clocks and so should also be enhanced to get nanosecond resolution. == Nanosecond resolution == More and more clocks have a frequency in MHz, up to GHz for the "TSC" CPU clock, and so the clocks resolution is getting closer to 1 nanosecond (or even better than 1 ns for the TSC clock!). The problem is that Python returns time as a floatting point number which is usually a 64-bit binary floatting number (in the IEEE 754 format). This type starts to loose nanoseconds after 104 days. Conversion from nanoseconds (int) to seconds (float) and then back to nanoseconds (int) to check if conversions loose precision: # no precision loss >>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x 0 # precision loss! (1 nanosecond) >>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x -1 >>> print(datetime.timedelta(seconds=2**53 / 1e9)) 104 days, 5:59:59.254741 While a system administrator can be proud to have an uptime longer than 104 days, the problem also exists for the time.time() clock which returns the number of seconds since the UNIX epoch (1970-01-01). This clock started to loose nanoseconds since mid-May 1970 (47 years ago): >>> import datetime >>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 / 1e9)) 1970-04-15 05:59:59.254741 == PEP 410 == Five years ago, I proposed a large and complex change in all Python functions returning time to support nanosecond resolution using the decimal.Decimal type: https://www.python.org/dev/peps/pep-0410/ The PEP was rejected for different reasons: * it wasn't clear if hardware clocks really had a resolution of 1 nanosecond, especially when the clock is read from Python, since reading a clock in Python also takes time... * Guido van Rossum rejected the idea of adding a new optional parameter to change the result type: it's an uncommon programming practice (bad design in Python) * decimal.Decimal is not widely used, it might be surprised to get such type == CPython enhancements of the last 5 years == Since this PEP was rejected: * the os.stat_result got 3 fields for timestamps as nanoseconds (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() and time.process_time() * I enhanced the private C API of Python handling time (API called "pytime") to store all timings as the new _PyTime_t type which is a simple 64-bit signed integer. The unit of _PyTime_t is not part of the API, it's an implementation detail. The unit is currently 1 nanosecond. This week, I converted one of the last clock to new _PyTime_t format: time.perf_counter() now has internally a resolution of 1 nanosecond, instead of using the C double type. XXX technically https://github.com/python/cpython/pull/3983 is not merged yet :-) == Clocks resolution in Python == I implemented time.time_ns(), time.monotonic_ns() and time.perf_counter_ns() which are similar of the functions without the "_ns" suffix, but return time as nanoseconds (Python int). I computed the smallest difference between two clock reads (ignoring a differences of zero): Linux: * time_ns(): 84 ns <=== !!! * time(): 239 ns <=== !!! * perf_counter_ns(): 84 ns * perf_counter(): 82 ns * monotonic_ns(): 84 ns * monotonic(): 81 ns Windows: * time_ns(): 318000 ns <=== !!! * time(): 894070 ns <=== !!! * perf_counter_ns(): 100 ns * perf_counter(): 100 ns * monotonic_ns(): 15000000 ns * monotonic(): 15000000 ns The difference on time.time() is significant: 84 ns (2.8x better) vs 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The difference will be larger next years since every day adds 864,00,000,000,000 nanoseconds to the system clock :-) (please don't bug me with leap seconds! you got my point) The difference on perf_counter and monotonic clocks are not visible in this quick script since my script runs less than 1 minute, my computer uptime is smaller than 1 weak, ... and Python internally starts these clocks at zero *to reduce the precision loss*! Using an uptime larger than 104 days, you would probably see a significant difference (at least +/- 1 nanosecond) between the regular (seconds as double) and the "_ns" (nanoseconds as int) clocks. == How many new nanosecond clocks? == The PEP 410 proposed to modify the following functions: * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime and st_mtime fields of the stat structure), sched_rr_get_interval(), times(), wait3() and wait4() * resource module: ru_utime and ru_stime fields of getrusage() * signal module: getitimer(), setitimer() * time module: clock(), clock_gettime(), clock_getres(), monotonic(), time() and wallclock() ("wallclock()" was finally called "monotonic", see PEP 418) According to my tests of the previous section, the precision loss starts after 104 days (stored in nanoseconds). I don't know if it's worth it to modify functions which return "CPU time" or "process time" of processes, since most processes live shorter than 104 days. Do you care of a resolution of 1 nanosecond for the CPU and process time? Maybe we need 1 nanosecond resolution for profiling and benchmarks. But in that case, you might want to implement your profiler in C rather in Python, like the hotshot module, no? The "pytime" private API of CPython gives you clocks with a resolution of 1 nanosecond. == Annex: clock performance == To have an idea of the cost of reading the clock on the clock resolution in Python, I also ran a microbenchmark on *reading* a clock. Example: $ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' 't()' Linux (Mean +- std dev): * time.time(): 45.4 ns +- 0.5 ns * time.time_ns(): 47.8 ns +- 0.8 ns * time.perf_counter(): 46.8 ns +- 0.7 ns * time.perf_counter_ns(): 46.0 ns +- 0.6 ns Windows (Mean +- std dev): * time.time(): 42.2 ns +- 0.8 ns * time.time_ns(): 49.3 ns +- 0.8 ns * time.perf_counter(): 136 ns +- 2 ns <=== * time.perf_counter_ns(): 143 ns +- 4 ns <=== * time.monotonic(): 38.3 ns +- 0.9 ns * time.monotonic_ns(): 48.8 ns +- 1.2 ns Most clocks have the same performance except of perf_counter on Windows: around 140 ns whereas other clocks are around 45 ns (on Linux and Windows): 3x slower. Maybe the "bad" perf_counter performance can be explained by the fact that I'm running Windows in a VM, which is not ideal for benchmarking. Or maybe my C implementation of time.perf_counter() is slow? Note: I expect that a significant part of the numbers are the cost of Python function calls. Reading these clocks using the Python C functions are likely faster. Victor From stefan_ml at behnel.de Fri Oct 13 10:57:28 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 13 Oct 2017 16:57:28 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: Victor Stinner schrieb am 13.10.2017 um 16:12: > I would like to add new functions to return time as a number of > nanosecond (Python int), especially time.time_ns(). I might have missed it while skipping through your post, but could you quickly explain why improving the precision of time.time() itself wouldn't help already? Would double FP precision not be accurate enough here? Stefan From victor.stinner at gmail.com Fri Oct 13 11:02:13 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 13 Oct 2017 17:02:13 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: 2017-10-13 16:57 GMT+02:00 Stefan Behnel : > I might have missed it while skipping through your post, but could you > quickly explain why improving the precision of time.time() itself wouldn't > help already? Would double FP precision not be accurate enough here? 80-bit binary float ("long double") is not portable. Since SSE, Intel CPU don't use them anymore, no? Modifying the Python float type would be a large change. Victor From solipsis at pitrou.net Fri Oct 13 11:03:40 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Fri, 13 Oct 2017 17:03:40 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution References: Message-ID: <20171013170340.0ec38a32@fsol> On Fri, 13 Oct 2017 16:57:28 +0200 Stefan Behnel wrote: > Victor Stinner schrieb am 13.10.2017 um 16:12: > > I would like to add new functions to return time as a number of > > nanosecond (Python int), especially time.time_ns(). > > I might have missed it while skipping through your post, but could you > quickly explain why improving the precision of time.time() itself wouldn't > help already? Would double FP precision not be accurate enough here? To quote Victor's message: ? The problem is that Python returns time as a floatting point number which is usually a 64-bit binary floatting number (in the IEEE 754 format). This type starts to loose nanoseconds after 104 days. ? Regards Antoine. From yselivanov.ml at gmail.com Fri Oct 13 11:41:28 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 11:41:28 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 3:25 AM, Nick Coghlan wrote: [..] > The rationale for this behaviour *does* arise from a refactoring argument: > > async def original_async_function(): > with some_context(): > do_some_setup() > raw_data = await some_operation() > data = do_some_postprocessing(raw_data) > > Refactored: > > async def async_helper_function(): > do_some_setup() > raw_data = await some_operation() > return do_some_postprocessing(raw_data) > > async def refactored_async_function(): > with some_context(): > data = await async_helper_function() > > However, considering that coroutines are almost always instantiated at the > point where they're awaited, "almost always" is an incorrect assumption. "usually" would be the correct one. > I do concede that creation time context capture > would likely also work out OK for the coroutine case, which would leave > contextlib.contextmanager as the only special case (and it would turn off > both creation-time context capture *and* context isolation). I still believe that both versions of PEP 550 (v1 & latest) got this right: * Coroutines on their own don't capture context; * Tasks manage context for coroutines they wrap. Yury From k7hoven at gmail.com Fri Oct 13 11:49:59 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 13 Oct 2017 18:49:59 +0300 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code Message-ID: This is a continuation of the PEP 555 discussion in https://mail.python.org/pipermail/python-ideas/2017-September/046916.html And this month in https://mail.python.org/pipermail/python-ideas/2017-October/047279.html If you are new to the discussion, the best point to start reading this might be at my second full paragraph below ("The status quo..."). On Fri, Oct 13, 2017 at 10:25 AM, Nick Coghlan wrote: > On 13 October 2017 at 10:56, Guido van Rossum wrote: > >> I'm out of energy to debate every point (Steve said it well -- that >> decimal/generator example is too contrived), but I found one nit in Nick's >> email that I wish to correct. >> >> On Wed, Oct 11, 2017 at 1:28 AM, Nick Coghlan wrote: >>> >>> As a less-contrived example, consider context managers implemented as >>> generators. >>> >>> We want those to run with the execution context that's active when >>> they're used in a with statement, not the one that's active when they're >>> created (the fact that generator-based context managers can only be used >>> once mitigates the risk of creation time context capture causing problems, >>> but the implications would still be weird enough to be worth avoiding). >>> >> >> Here I think we're in agreement about the desired semantics, but IMO all >> this requires is some special casing for @contextlib.contextmanager. To me >> this is the exception, not the rule -- in most *other* places I would want >> the yield to switch away from the caller's context. >> >> >>> For native coroutines, we want them to run with the execution context >>> that's active when they're awaited or when they're prepared for submission >>> to an event loop, not the one that's active when they're created. >>> >> >> This caught my eye as wrong. Considering that asyncio's tasks (as well as >> curio's and trio's) *are* native coroutines, we want complete isolation >> between the context active when `await` is called and the context active >> inside the `async def` function. >> > > The rationale for this behaviour *does* arise from a refactoring argument: > > async def original_async_function(): > with some_context(): > do_some_setup() > raw_data = await some_operation() > data = do_some_postprocessing(raw_data) > > Refactored: > > async def async_helper_function(): > do_some_setup() > raw_data = await some_operation() > return do_some_postprocessing(raw_data) > > async def refactored_async_function(): > with some_context(): > data = await async_helper_function() > > ?*This* type of refactoring argument I *do* subscribe to.? > However, considering that coroutines are almost always instantiated at the > point where they're awaited, I do concede that creation time context > capture would likely also work out OK for the coroutine case, which would > leave contextlib.contextmanager as the only special case (and it would turn > off both creation-time context capture *and* context isolation). > ?The difference between context propagation through coroutine function calls and awaits comes up when you need help from "the" event loop, which means things like creating new tasks from coroutines. However, we cannot even assume that the loop is the only one. So far, it makes no difference where you call the coroutine function. It is only when you await it or schedule it for execution in a loop when something can actually happen. The status quo is that there's nothing that prevents you from calling a coroutine function from within one event loop and then awaiting it in another. So if we want an event loop to be able to pass information down the call chain in such a way that the information is available *throughout the whole task that it is driving*, then the contexts needs to a least propagate through `await`s. This was my starting point 2.5 years ago, when Yury was drafting this status quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that there will be a problem, where each API that uses "blocking IO" somewhere under the hood would need a duplicate version for asyncio (and one for each third-party async framework!). I felt it was necessary to think about a solution before PEP 492 is accepted, and this became a fairly short-lived thread here on python-ideas: https://mail.python.org/pipermail/python-ideas/2015-May/033267.html ?This year, the discussion on Yury's PEP 550 somehow ended up with a very similar need before I got involved, apparently for independent reasons. A design for solving this need (and others) is also found in my first draft of PEP 555, found at https://mail.python.org/pipermail/python-ideas/2017-September/046916.html Essentially, it's a way of *passing information down the call chain* when it's inconvenient or impossible to pass the information as normal function arguments. I now call the concept "context arguments". ?More recently, I put some focus on the direct needs of normal users (as opposed direct needs of async framework authors). Those thoughts are most "easily" discussed in terms of generator functions, which are very similar to coroutine functions: A generator function is often thought of as a function that returns an iterable of lazily evaluated values. In this type of usage, the relevant "function call" happens when calling the generator function. The subsequent calls to next() (or a yield from) are thought of as merely getting the items in the iterable, even if they do actually run code in the generator's frame. The discussion on this is found starting from this email: https://mail.python.org/pipermail/python-ideas/2017-October/047279.html However, also coroutines are evaluated lazily. The question is, when should we consider the "subroutine call" to happen: when the coroutine function is called, or when the resulting object is awaited. Often these two are indeed on the same line of code, so it does not matter. But as I discuss above, there are definitely cases where it matters. This has mostly to do with the interactions of different tasks within one event loop, or code where multiple event loops interact. As mentioned above, there are cases where propagating the context through next() and await is desirable. However, there are also cases where the coroutine call is important. This comes up in the case of multiple interacting tasks or multiple event loops. To start with, probably a more example-friendly case, however, is running an event loop and a coroutine from synchronous code: import asyncio async def do_something_context_aware(): do_something_that_depends_on(current_context()) loop = asyncio.get_event_loop() with some_context(): coro = do_something_context_aware() loop.run_until_complete(coro) ? ? ? ? Now, if the coroutine function call `do_something_context_aware()` does not save the current context on `coro`, then there is no way some_context() can affect the code that will run inside the coroutine, even if that is what we are explicitly trying to do here. ?The easy solution is to delegate the context transfer to the scheduling function (run_until_complete), and require that the context is passed to that function: with some_context?(): ? coro = do_something_context_aware() ? loop.run_until_complete(coro)? ?This gives the async framework (here asyncio) a chance to make sure the context propagates as expected. In general, I'm in favor of giving async frameworks some freedom in how this is implemented. However, to give the framework even more freedom, the coroutine call, do_something_context_aware(), could save the current context branch on `coro`, which run_until_complete can attach to the Task that gets created. The bigger question is, what should happen when a coroutine awaits on another coroutine directly, without giving the framework a change to interfere: async def inner(): do_context_aware_stuff() async def outer(): with first_context(): coro = inner() with second_context(): await coro The big question is: ?In the above, which context should the coroutine be run in? "The" event loop does not have a chance to interfere, so we cannot delegate the decision. ?We need both versions: the one that propagates first_context() into the coroutine, and the one that propagates second_context() into it. Or, using my metaphor from the other thread, we need "both the forest and the trees". ? ?A solution to this would be to have two types of context arguments: 1. (calling) context arguments? and 2. execution context arguments ?Both of these would have their own? stack of (argument, value) assignment pairs, explained in the implementation part of the first PEP 555 draft. While this is a complication, the performance overhead of these is so small, that doubling the overhead should not be a performance concern. The question is, do we want these two types of stacks, or do we want to work around it somehow, for instance using context-local storage, implemented on top of the first kind, to implement something like the second kind. However, that again raises some issues of how to propagate the context-local storage down the ambiguous call chain. ???Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Oct 13 12:38:41 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 12:38:41 -0400 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 11:49 AM, Koos Zevenhoven wrote: [..] > This was my starting point 2.5 years ago, when Yury was drafting this status > quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that there > will be a problem, where each API that uses "blocking IO" somewhere under > the hood would need a duplicate version for asyncio (and one for each > third-party async framework!). I felt it was necessary to think about a > solution before PEP 492 is accepted, and this became a fairly short-lived > thread here on python-ideas: Well, it's obvious why the thread was "short-lived". Don't mix non-blocking and blocking code and don't nest asyncio loops. But I believe this new subtopic is a distraction. You should start a new thread on Python-ideas if you want to discuss the acceptance of PEP 492 2.5 years ago. [..] > The bigger question is, what should happen when a coroutine awaits on > another coroutine directly, without giving the framework a change to > interfere: > > > async def inner(): > do_context_aware_stuff() > > async def outer(): > with first_context(): > coro = inner() > > with second_context(): > await coro > > The big question is: In the above, which context should the coroutine be run > in? The real big question is how people usually write code. And the answer is that they *don't write it like that* at all. Many context managers in many frameworks (aiohttp, tornado, and even asyncio) require you to wrap your await expressions in them. Not coroutine instantiation. A more important point is that existing context solutions for async frameworks can only support a with statement around an await expression. And people that use such solutions know that 'with ...: coro = inner()' isn't going to work at all. Therefore wrapping coroutine instantiation in a 'with' statement is not a pattern. It can only become a pattern, if whatever execution context PEP accepted in Python 3.7 encouraged people to use it. [..] > Both of these would have their own stack of (argument, value) assignment > pairs, explained in the implementation part of the first PEP 555 draft. > While this is a complication, the performance overhead of these is so small, > that doubling the overhead should not be a performance concern. Please stop handwaving performance. Using big O notation: PEP 555, worst complexity for uncached lookup: O(N), where 'N' is the total number of all context values for all context keys for the current frame stack. For a recursive function you can easily have a situation where cache is invalidated often, and code starts to run slower and slower. PEP 550 v1, worst complexity for uncached lookup: O(1), see [1]. PEP 550 v2+, worst complexity for uncached lookup: O(k), where 'k' is the number of nested generators for the current frame. Usually k=1. While caching will mitigate PEP 555' bad performance characteristics in *tight loops*, the performance of uncached path must not be ignored. Yury [1] https://www.python.org/dev/peps/pep-0550/#appendix-hamt-performance-analysis From yselivanov.ml at gmail.com Fri Oct 13 12:41:47 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 12:41:47 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 3:25 AM, Nick Coghlan wrote: [..] > However, considering that coroutines are almost always instantiated at the > point where they're awaited, I do concede that creation time context capture > would likely also work out OK for the coroutine case, which would leave > contextlib.contextmanager as the only special case (and it would turn off > both creation-time context capture *and* context isolation). Actually, capturing context at the moment of coroutine creation (in PEP 550 v1 semantics) will not work at all. Async context managers will break. class AC: async def __aenter__(self): pass ^ If the context is captured when coroutines are instantiated, __aenter__ won't be able to set context variables and thus affect the code it wraps. That's why coroutines shouldn't capture context when created, nor they should isolate context. It's a job of async Task. Yury From steve.dower at python.org Fri Oct 13 12:48:55 2017 From: steve.dower at python.org (Steve Dower) Date: Fri, 13 Oct 2017 09:48:55 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: On 13Oct2017 0941, Yury Selivanov wrote: > On Fri, Oct 13, 2017 at 3:25 AM, Nick Coghlan wrote: > [..] >> However, considering that coroutines are almost always instantiated at the >> point where they're awaited, I do concede that creation time context capture >> would likely also work out OK for the coroutine case, which would leave >> contextlib.contextmanager as the only special case (and it would turn off >> both creation-time context capture *and* context isolation). > > Actually, capturing context at the moment of coroutine creation (in > PEP 550 v1 semantics) will not work at all. Async context managers > will break. > > class AC: > async def __aenter__(self): > pass > > ^ If the context is captured when coroutines are instantiated, > __aenter__ won't be able to set context variables and thus affect the > code it wraps. That's why coroutines shouldn't capture context when > created, nor they should isolate context. It's a job of async Task. Then make __aenter__/__aexit__ when called by "async with" an exception to the normal semantics? It seems simpler to have one specially named and specially called function be special, rather than make the semantics more complicated for all functions. Cheers, Steve From k7hoven at gmail.com Fri Oct 13 13:08:12 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 13 Oct 2017 20:08:12 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: <20171012230728.GA3613@bytereef.org> References: <20171012230728.GA3613@bytereef.org> Message-ID: On Fri, Oct 13, 2017 at 2:07 AM, Stefan Krah wrote: [..] > So the decimal examples can be helpful for understanding, but (except > for the performance issues) shouldn't be the centerpiece of the > discussion. > > > Speaking of performance, I have seen that adressed in Koos' PEP at all. > Perhaps I missed something. > ?Indeed I do mention performance here and there in the PEP 555 draft. Lookups can be made fast and O(1) in most cases. Even with the simplest unoptimized implementation, the worst-case lookup complexity would be O(n), where n is the number of assignment contexts entered after the one which is being looked up from (or in other words, nested inside the one that is being looked up from). This means that for use cases where the relevant context is entered as the innermost context level, the lookups are O(1) even without any optimizations. It is perfectly reasonable to make an implementation where lookups are *always* O(1). Still, it might make more sense to implement a half-way solution with "often O(1)", because that has somewhat less overhead in case the callees end up not doing any lookups. For synchronous code that does not use context arguments and that does not involve generators, there is absolutely *zero* overhead. For code that uses generators, but does not use context arguments, there is virtually no overhead either. I explain this in terms of C code in https://mail.python.org/pipermail/python-ideas/2017-October/047292.html In fact, I might want to add a another Py_INCREF and Py_DECREF per each call to next/send, because the hack to defer (and often avoid) the Py_INCREF of the outer stack would not be worth the performance gain. But that's it. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Oct 13 13:45:35 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 13 Oct 2017 10:45:35 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: Message-ID: <59E0FBBF.3010402@stoneleaf.us> On 10/13/2017 09:48 AM, Steve Dower wrote: > On 13Oct2017 0941, Yury Selivanov wrote: >> Actually, capturing context at the moment of coroutine creation (in >> PEP 550 v1 semantics) will not work at all. Async context managers >> will break. >> >> class AC: >> async def __aenter__(self): >> pass >> >> ^ If the context is captured when coroutines are instantiated, >> __aenter__ won't be able to set context variables and thus affect the >> code it wraps. That's why coroutines shouldn't capture context when >> created, nor they should isolate context. It's a job of async Task. > > Then make __aenter__/__aexit__ when called by "async with" an exception to the normal semantics? > > It seems simpler to have one specially named and specially called function be special, rather than make the semantics > more complicated for all functions. +1. I think that would make it much more usable by those of us who are not experts. -- ~Ethan~ From k7hoven at gmail.com Fri Oct 13 13:46:34 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 13 Oct 2017 20:46:34 +0300 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 7:38 PM, Yury Selivanov wrote: > On Fri, Oct 13, 2017 at 11:49 AM, Koos Zevenhoven > wrote: > [..] > > This was my starting point 2.5 years ago, when Yury was drafting this > status > > quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that there > > will be a problem, where each API that uses "blocking IO" somewhere under > > the hood would need a duplicate version for asyncio (and one for each > > third-party async framework!). I felt it was necessary to think about a > > solution before PEP 492 is accepted, and this became a fairly short-lived > > thread here on python-ideas: > > Well, it's obvious why the thread was "short-lived". Don't mix > non-blocking and blocking code and don't nest asyncio loops. But I > believe this new subtopic is a distraction. ?Nesting is not the only way to have interaction between two event loops.? ? But whenever anyone *does* want to nest two loops, they are perhaps more likely to be loops of different frameworks.? ?You believe that the semantics in async code is a distraction? > You should start a new > thread on Python-ideas if you want to discuss the acceptance of PEP > 492 2.5 years ago. > I ?'m definitely not interested in discussing the acceptance of PEP 492. ? > > [..] > > The bigger question is, what should happen when a coroutine awaits on > > another coroutine directly, without giving the framework a change to > > interfere: > > > > > > async def inner(): > > do_context_aware_stuff() > > > > async def outer(): > > with first_context(): > > coro = inner() > > > > with second_context(): > > await coro > > > > The big question is: In the above, which context should the coroutine be > run > > in? > > The real big question is how people usually write code. And the > answer is that they *don't write it like that* at all. Many context > managers in many frameworks (aiohttp, tornado, and even asyncio) > require you to wrap your await expressions in them. Not coroutine > instantiation. > ?You know very well that I've been talking about how people usually write code etc. But we still need to handle the corner cases too.? > > A more important point is that existing context solutions for async > frameworks can only support a with statement around an await > expression. And people that use such solutions know that 'with ...: > coro = inner()' isn't going to work at all. > > Therefore wrapping coroutine instantiation in a 'with' statement is > not a pattern. It can only become a pattern, if whatever execution > context PEP accepted in Python 3.7 encouraged people to use it. > > ?The code is to illustrate semantics, not an example of real code. The point is to highlight that the context has changed between when the coroutine function was called and when it is awaited. That's certainly a thing that can happen in real code, even if it is not the most typical case. I do mention this in my previous email. > [..] > > Both of these would have their own stack of (argument, value) assignment > > pairs, explained in the implementation part of the first PEP 555 draft. > > While this is a complication, the performance overhead of these is so > small, > > that doubling the overhead should not be a performance concern. > > Please stop handwaving performance. Using big O notation: > > ?There is discussion on perfomance elsewhere, now also in this other subthread: https://mail.python.org/pipermail/python-ideas/2017-October/047327.html PEP 555, worst complexity for uncached lookup: O(N), where 'N' is the > total number of all context values for all context keys for the > current frame stack. ?Not true. See the above link. Lookups are fast (*and* O(1), if we want them to be). ?PEP 555 stacks are independent of frames, BTW.? > For a recursive function you can easily have a > situation where cache is invalidated often, and code starts to run > slower and slower. > ?Not true either. The lookups are O(1) in a recursive function, with and without nested contexts.? ?I started this thread for discussion about semantics in an async context. Stefan asked about performance in the other thread, so I posted there. ??Koos ? > PEP 550 v1, worst complexity for uncached lookup: O(1), see [1]. > > PEP 550 v2+, worst complexity for uncached lookup: O(k), where 'k' is > the number of nested generators for the current frame. Usually k=1. > > While caching will mitigate PEP 555' bad performance characteristics > in *tight loops*, the performance of uncached path must not be > ignored. > > Yury > > [1] https://www.python.org/dev/peps/pep-0550/#appendix-hamt- > performance-analysis > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Fri Oct 13 13:55:51 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Fri, 13 Oct 2017 20:55:51 +0300 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: <59E0FBBF.3010402@stoneleaf.us> References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On Fri, Oct 13, 2017 at 8:45 PM, Ethan Furman wrote: > On 10/13/2017 09:48 AM, Steve Dower wrote: > >> On 13Oct2017 0941, Yury Selivanov wrote: >> > > Actually, capturing context at the moment of coroutine creation (in >>> PEP 550 v1 semantics) will not work at all. Async context managers >>> will break. >>> >>> class AC: >>> async def __aenter__(self): >>> pass >>> >>> ^ If the context is captured when coroutines are instantiated, >>> __aenter__ won't be able to set context variables and thus affect the >>> code it wraps. That's why coroutines shouldn't capture context when >>> created, nor they should isolate context. It's a job of async Task. >>> >> >> Then make __aenter__/__aexit__ when called by "async with" an exception >> to the normal semantics? >> >> It seems simpler to have one specially named and specially called >> function be special, rather than make the semantics >> more complicated for all functions. >> > > +1. I think that would make it much more usable by those of us who are > not experts. > ?The semantics is not really dependent on __aenter__ and __aexit__. They can be used together with both semantic variants that I'm describing for PEP 555, and without any special casing. IOW, this is independent of any remaining concerns in PEP 555. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine.rozo at gmail.com Fri Oct 13 14:16:23 2017 From: antoine.rozo at gmail.com (Antoine Rozo) Date: Fri, 13 Oct 2017 20:16:23 +0200 Subject: [Python-ideas] Add a module itertools.recipes Message-ID: Hello, A very useful part of the itertools module's documentation is the section "Recipes", giving utility functions that use itertools iterators. But when you want to use one of theese functions, you have to copy it in your source code (or use external PyPI modules like iterutils). Can we consider making itertools a package and adding a module itertools.recipes that implements all these utilility functions? Regards. -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Fri Oct 13 14:32:10 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 14:32:10 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: <59E0FBBF.3010402@stoneleaf.us> References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On Fri, Oct 13, 2017 at 1:45 PM, Ethan Furman wrote: > On 10/13/2017 09:48 AM, Steve Dower wrote: >> >> On 13Oct2017 0941, Yury Selivanov wrote: > > >>> Actually, capturing context at the moment of coroutine creation (in >>> PEP 550 v1 semantics) will not work at all. Async context managers >>> will break. >>> >>> class AC: >>> async def __aenter__(self): >>> pass >>> >>> ^ If the context is captured when coroutines are instantiated, >>> __aenter__ won't be able to set context variables and thus affect the >>> code it wraps. That's why coroutines shouldn't capture context when >>> created, nor they should isolate context. It's a job of async Task. >> >> >> Then make __aenter__/__aexit__ when called by "async with" an exception to >> the normal semantics? >> >> It seems simpler to have one specially named and specially called function >> be special, rather than make the semantics >> more complicated for all functions. > It's not possible to special case __aenter__ and __aexit__ reliably (supporting wrappers, decorators, and possible side effects). > +1. I think that would make it much more usable by those of us who are not > experts. I still don't understand what Steve means by "more usable", to be honest. Yury From rosuav at gmail.com Fri Oct 13 14:35:08 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sat, 14 Oct 2017 05:35:08 +1100 Subject: [Python-ideas] Add a module itertools.recipes In-Reply-To: References: Message-ID: On Sat, Oct 14, 2017 at 5:16 AM, Antoine Rozo wrote: > Hello, > > A very useful part of the itertools module's documentation is the section > "Recipes", giving utility functions that use itertools iterators. > But when you want to use one of theese functions, you have to copy it in > your source code (or use external PyPI modules like iterutils). > > Can we consider making itertools a package and adding a module > itertools.recipes that implements all these utilility functions? Check out more-itertools on PyPI - maybe that's what you want? ChrisA From yselivanov.ml at gmail.com Fri Oct 13 15:08:06 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 15:08:06 -0400 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 1:46 PM, Koos Zevenhoven wrote: > On Fri, Oct 13, 2017 at 7:38 PM, Yury Selivanov > wrote: >> >> On Fri, Oct 13, 2017 at 11:49 AM, Koos Zevenhoven >> wrote: >> [..] >> > This was my starting point 2.5 years ago, when Yury was drafting this >> > status >> > quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that there >> > will be a problem, where each API that uses "blocking IO" somewhere >> > under >> > the hood would need a duplicate version for asyncio (and one for each >> > third-party async framework!). I felt it was necessary to think about a >> > solution before PEP 492 is accepted, and this became a fairly >> > short-lived >> > thread here on python-ideas: >> >> Well, it's obvious why the thread was "short-lived". Don't mix >> non-blocking and blocking code and don't nest asyncio loops. But I >> believe this new subtopic is a distraction. > > > Nesting is not the only way to have interaction between two event loops. > But whenever anyone *does* want to nest two loops, they are perhaps more > likely to be loops of different frameworks. > > You believe that the semantics in async code is a distraction? Discussing blocking calls and/or nested event loops in async code is certainly a distraction :) [..] >> The real big question is how people usually write code. And the >> answer is that they *don't write it like that* at all. Many context >> managers in many frameworks (aiohttp, tornado, and even asyncio) >> require you to wrap your await expressions in them. Not coroutine >> instantiation. > > > You know very well that I've been talking about how people usually write > code etc. But we still need to handle the corner cases too. [..] > The code is to illustrate semantics, not an example of real code. The point > is to highlight that the context has changed between when the coroutine > function was called and when it is awaited. That's certainly a thing that > can happen in real code, even if it is not the most typical case. I do > mention this in my previous email. I understand the point and what you're trying to illustrate. I'm saying that people don't write 'with smth: c = coro()' because it's currently pointless. And unless you tell them they should, they won't. > >> >> [..] >> > Both of these would have their own stack of (argument, value) assignment >> > pairs, explained in the implementation part of the first PEP 555 draft. >> > While this is a complication, the performance overhead of these is so >> > small, >> > that doubling the overhead should not be a performance concern. >> >> Please stop handwaving performance. Using big O notation: >> > > There is discussion on perfomance elsewhere, now also in this other > subthread: > > https://mail.python.org/pipermail/python-ideas/2017-October/047327.html > >> PEP 555, worst complexity for uncached lookup: O(N), where 'N' is the >> total number of all context values for all context keys for the >> current frame stack. Quoting you from that link: "Indeed I do mention performance here and there in the PEP 555 draft. Lookups can be made fast and O(1) in most cases. Even with the simplest unoptimized implementation, the worst-case lookup complexity would be O(n), where n is the number of assignment contexts entered after the one which is being looked up from (or in other words, nested inside the one that is being looked up from). This means that for use cases where the relevant context is entered as the innermost context level, the lookups are O(1) even without any optimizations. It is perfectly reasonable to make an implementation where lookups are *always* O(1). Still, it might make more sense to implement a half-way solution with "often O(1)", because that has somewhat less overhead in case the callees end up not doing any lookups." So where's the actual explanation of how you can make *uncached* lookups O(1) in your best implementation? I only see you claiming that you know how to do that. And since you're using a stack of values instead of hash tables, your explanation can make a big impact on the CS field :) It's perfectly reasonable to say that "cached lookups in my optimization is O(1)". Saying that "in most cases it's O(1)" isn't how the big O notation should be used. BTW, what's the big O for capturing the entire context in PEP 555 (get_execution_context() in PEP 550)? How will that operation be implemented? A shallow copy of the stack? Also, if I had this: with c.assign(o1): with c.assign(o2): with c.assign(o3): ctx = capture_context() will ctx have references to o1, o2, and o3? > Not true. See the above link. Lookups are fast (*and* O(1), if we want them > to be). > > PEP 555 stacks are independent of frames, BTW. > > >> >> For a recursive function you can easily have a >> situation where cache is invalidated often, and code starts to run >> slower and slower. > > > Not true either. The lookups are O(1) in a recursive function, with and > without nested contexts. See the above. I claim that you can't say that *uncached* lookups can be O(1) in PEP 555 with the current choice of datastructures. Yury From p.f.moore at gmail.com Fri Oct 13 16:29:22 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Fri, 13 Oct 2017 21:29:22 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 13 October 2017 at 19:32, Yury Selivanov wrote: >>> It seems simpler to have one specially named and specially called function >>> be special, rather than make the semantics >>> more complicated for all functions. >> > > It's not possible to special case __aenter__ and __aexit__ reliably > (supporting wrappers, decorators, and possible side effects). > >> +1. I think that would make it much more usable by those of us who are not >> experts. > > I still don't understand what Steve means by "more usable", to be honest. I'd consider myself a "non-expert" in async. Essentially, I ignore it - I don't write the sort of applications that would benefit significantly from it, and I don't see any way to just do "a little bit" of async, so I never use it. But I *do* see value in the context variable proposals here - if only in terms of them being a way to write my code to respond to external settings in an async-friendly way. I don't follow the underlying justification (which is based in "we need this to let things work with async/coroutines) at all, but I'm completely OK with the basic idea (if I want to have a setting that behaves "naturally", like I'd expect decimal contexts to do, it needs a certain amount of language support, so the proposal is to add that). I'd expect to be able to write context variables that my code could respond to using a relatively simple pattern, and have things "just work". Much like I can write a context manager using @contextmanager and yield, and not need to understand all the intricacies of __enter__ and __exit__. (BTW, apologies if I'm mangling the terminology here - write it off as part of me being "not an expert" :-)) What I'm getting from this discussion is that even if I *do* have a simple way of writing context variables, they'll still behave in ways that seem mildly weird to me (as a non-async user). Specifically, my head hurts when I try to understand what that decimal context example "should do". My instincts say that the current behaviour is wrong - but I'm not sure I can explain why. So on that example, I'd ask the following of any proposal: 1. Users trying to write a context variable[1] shouldn't have to jump through hoops to get "natural" behaviour. That means that suggestions that the complexity be pushed onto decimal.context aren't OK unless it's also accepted that the current behaviour is wrong, and the only reason decimal.context needs to replicated is for backward compatibility (and new code can ignore the problem). 2. The proposal should clearly establish what it views as "natural" behaviour, and why. I'm not happy with "it's how decimal.context has always behaved" as an explanation. Sure, people asking to break backward compatibility should have a good justification, but equally, people arguing to *preserve* an unintuitive current behaviour in new code should be prepared to explain why it's not a bug. To put it another way, context variables aren't required to be bug-compatible with thread local storage. [1] I'm assuming here that "settings that affect how a library behave" is a common requirement, and the PEP is intended as the "one obvious way" to implement them. Nick's other async refactoring example is different. If the two forms he showed don't behave identically in all contexts, then I'd consider that to be a major problem. Saying that "coroutines are special" just reads to me as "coroutines/async are sufficiently weird that I can't expect my normal patterns of reasoning to work with them". (Apologies if I'm conflating coroutines and async incorrectly - as a non-expert, they are essentially indistinguishable to me). I sincerely hope that isn't the message I should be getting - async is already more inaccessible than I'd like for the average user. The fact that Nick's async example immediately devolved into a discussion that I can't follow at all is fine - to an extent. I don't mind the experts debating implementation details that I don't need to know about. But if you make writing context variables harder, just to fix Nick's example, or if you make *using* async code like (either of) Nick's forms harder, then I do object, because that's affecting the end user experience. In that context, I take Steve's comment as meaning "fiddling about with how __aenter__ and __aexit__ work is fine, as that's internals that non-experts like me don't care about - but making context variables behave oddly because of this is *not* fine". Apologies if the above is unhelpful. I've been lurking but not commenting here, precisely because I *am* a non-expert, and I trust the experts to build something that works. But when non-experts were explicitly mentioned, I thought my input might be useful. The following quote from the Zen seems particularly relevant here: If the implementation is hard to explain, it's a bad idea. (although the one about needing to be Dutch to understand why something is obvious might well trump it ;-)) Paul From yselivanov.ml at gmail.com Fri Oct 13 18:30:14 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Fri, 13 Oct 2017 18:30:14 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On Fri, Oct 13, 2017 at 4:29 PM, Paul Moore wrote: [..] > Nick's other async refactoring example is different. If the two forms > he showed don't behave identically in all contexts, then I'd consider > that to be a major problem. Saying that "coroutines are special" just > reads to me as "coroutines/async are sufficiently weird that I can't > expect my normal patterns of reasoning to work with them". (Apologies > if I'm conflating coroutines and async incorrectly - as a non-expert, > they are essentially indistinguishable to me). I sincerely hope that > isn't the message I should be getting - async is already more > inaccessible than I'd like for the average user. Nick's idea that coroutines can isolate context was actually explored before in PEP 550 v3, and then, rather quickly, it became apparent that it wouldn't work. Steve's comments were about a specific example about generators, not coroutines. We can't special case __aenter__, we simply can not. __aenter__ can be a chain of coroutines -- its own separate call stack, we can't say that this whole call stack is behaving differently from all other code with respect to execution context. At this time, we have so many conflicted examples and tangled discussions on these topics, that I myself just lost what everybody is implying by "this semantics isn't obvious to *me*". Which semantics? It's hard to tell. At this point of time, there's just one place which describes one well defined semantics: PEP 550 latest version. Paul, if you have time/interest, please take a look at it, and say what's confusing there. Yury From steve.dower at python.org Fri Oct 13 18:44:02 2017 From: steve.dower at python.org (Steve Dower) Date: Fri, 13 Oct 2017 15:44:02 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 13Oct2017 1132, Yury Selivanov wrote: > On Fri, Oct 13, 2017 at 1:45 PM, Ethan Furman wrote: >> On 10/13/2017 09:48 AM, Steve Dower wrote: >>> >>> On 13Oct2017 0941, Yury Selivanov wrote: >> >> >>>> Actually, capturing context at the moment of coroutine creation (in >>>> PEP 550 v1 semantics) will not work at all. Async context managers >>>> will break. >>>> >>>> class AC: >>>> async def __aenter__(self): >>>> pass >>>> >>>> ^ If the context is captured when coroutines are instantiated, >>>> __aenter__ won't be able to set context variables and thus affect the >>>> code it wraps. That's why coroutines shouldn't capture context when >>>> created, nor they should isolate context. It's a job of async Task. >>> >>> >>> Then make __aenter__/__aexit__ when called by "async with" an exception to >>> the normal semantics? >>> >>> It seems simpler to have one specially named and specially called function >>> be special, rather than make the semantics >>> more complicated for all functions. >> > > It's not possible to special case __aenter__ and __aexit__ reliably > (supporting wrappers, decorators, and possible side effects). Why not? Can you not add a decorator that sets a flag on the code object that means "do not create a new context when called", and then it doesn't matter where the call comes from - these functions will always read and write to the caller's context. That seems generally useful anyway, and then you just say that __aenter__ and __aexit__ are special and always have that flag set. >> +1. I think that would make it much more usable by those of us who are not >> experts. > > I still don't understand what Steve means by "more usable", to be honest. I don't know that I said "more usable", but it would certainly be easier to explain. The Zen has something to say about that... Cheers, Steve From lucas.wiman at gmail.com Fri Oct 13 20:26:09 2017 From: lucas.wiman at gmail.com (Lucas Wiman) Date: Fri, 13 Oct 2017 17:26:09 -0700 Subject: [Python-ideas] Add a module itertools.recipes In-Reply-To: References: Message-ID: On Fri, Oct 13, 2017 at 11:35 AM, Chris Angelico wrote: > On Sat, Oct 14, 2017 at 5:16 AM, Antoine Rozo > wrote: > [...] > Can we consider making itertools a package and adding a module > > itertools.recipes that implements all these utilility functions? > > Check out more-itertools on PyPI - maybe that's what you want? > toolz is another good collection of itertools-related recipes: http://toolz.readthedocs.io/en/latest/api.html - Lucas -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Fri Oct 13 21:07:26 2017 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 13 Oct 2017 18:07:26 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: <59E1634E.4070905@stoneleaf.us> On 10/13/2017 03:30 PM, Yury Selivanov wrote: > At this time, we have so many conflicted examples and tangled > discussions on these topics, that I myself just lost what everybody is > implying by "this semantics isn't obvious to *me*". Which semantics? > It's hard to tell. For me, it's not apparent why __aenter__ and __aexit__ cannot be special-cased. I would be grateful for a small code-snippet illustrating the danger. -- ~Ethan~ From amit.mixie at gmail.com Fri Oct 13 23:53:39 2017 From: amit.mixie at gmail.com (Amit Green) Date: Fri, 13 Oct 2017 23:53:39 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: really like what Paul Moore wrote here as it matches a *LOT* of what I have been feeling as I have been reading this whole discussion; specifically: - I find the example, and discussion, really hard to follow. - I also, don't understand async, but I do understand generators very well (like Paul Moore) - A lot of this doesn't seem natural (generators & context variable syntax) - And particular: " If the implementation is hard to explain, it's a bad idea." I've spend a lot of time thinking about this, and what the issues are. I think they are multi-fold: - I really use Generators a lot -- and find them wonderful & are one of the joy's of python. They are super useful. However, as I am going to, hopefully, demonstrate here, they are not initially intuitive (to a beginner). - Generators are not really functions; but they appear to be functions, this was very confusing to me when I started working with generators. - Now, I'm used to it -- BUT, we really need to consider new people - and I suggest making this easier. - I find the proposed context syntax very confusing (and slow). I think contexts are super-important & instead need to be better integrated into the language (like nonlocal is) - People keep writing they want a real example -- so this is a very real example from real code I am writing (a python parser) and how I use contexts (obviously they are not part of the language yet, so I have emulated them) & how they interact with generators. The full example, which took me a few hours to write is available here (its a very very reduced example from a real parser of the python language written in python): - https://github.com/AmitGreen/Gem/blob/emerald_6/work/demo.py Here is the result of running the code -- which reads & executes demo1.py (importing & executing demo2.py twice): [Not by executing, I mean the code is running its own parser to execute it & its own code to emulate an 'import' -- thus showing nested contexts): It creates two input files for testing -- demo1.py: print 1 print 8 - 2 * 3 import demo2 print 9 - sqrt(16) print 10 / (8 - 2 * 3) import demo2 print 2 * 2 * 2 + 3 - 4 And it also creates demo2.py: print 3 * (2 - 1) error print 4 There are two syntax errors (on purpose) in the files, but since demo2.py is imported twice, this will show three syntax errors. Running the code produces the following: demo1.py#1: expression '1' evaluates to 1 demo1.py#2: expression '8 - 2 * 3' evaluates to 2 demo1.py#3: importing module demo2 demo2.py#1: expression '3 * (3 - 2)' evaluates to 3 demo2.py#2: UNKNOWN STATEMENT: 'error' demo2.py#3: expression '4' evaluates to 4 demo1.py#4: UNKNOWN ATOM: ' sqrt(16)' demo1.py#5: expression '10 / (8 - 2 * 3)' evaluates to 5 demo1.py#6: importing module demo2 demo2.py#1: expression '3 * (3 - 2)' evaluates to 3 demo2.py#2: UNKNOWN STATEMENT: 'error' demo2.py#3: expression '4' evaluates to 4 demo1.py#7: expression '2 * 2 * 2 + 3 - 4' evaluates to 7 This code demonstrates all of the following: - Nested contexts - Using contexts 'naturally' -- i.e.: directly as variables; without a 'context.' prefix -- which I would find too harder to read & also slower. - Using a generator that is deliberately broken up into three parts, start, next & stop. - Handling errors & how it interacts with both the generator & 'context' - Actually parsing the input -- which creates a deeply nested stack (due to recursive calls during expression parsing) -- thus a perfect example for contexts. So given all of the above, I'd first like to focus on the generator: - Currently we can write generators as either: (1) functions; or (2) classes with a __next__ method. However this is very confusing to a beginner. - Given a generator like the following (actually in the code): def __iter__(self): while not self.finished: self.loop += 1 yield self - What I found so surprising when I started working with generator, is that calling the generator does *NOT* actually start the function. - Instead, the actual code does not actually get called until the first __next__ method is called. - This is quite counter-intuitive. I therefore suggest the following: - Give generators their own first class language syntax. - This syntax, would have good entry point's, to allow their interaction with context variables. Here is the generator in my code sample: # # Here is our generator to walk over a file. # # This generator has three sections: # # generator_start - Always run when the generator is started. # This opens the file & reads it. # # generator_next - Run each time the generator needs to retrieve # The next value. # # generator_stop - Called when the generator is going to stop. # def iterate_lines(path): data_lines = None def generator_startup(path): nonlocal current_path, data_lines with open(path) as f: current_path = path data = f.read() data_lines = tuple(data.splitlines()) def generator_next(): nonlocal current_line, line_number for current_line in data_lines: line_number += 1 line_position = 0 yield current_line generator_stop() def generator_stop(): current_path = None line_number = 0 line_position = 0 generator_startup(path) return generator_next() This generator demonstrates the following: - It immediately starts up when called (and in fact opens the file when called -- so if the file doesn't exist, an exception is thrown then, not later when the __next__ method is first called) - It's half way between a function generator & a class generator; thus (1) efficient; and (2) more understandable than a class generator. Here is (a first draft) proposal and how I would like to re-write the above generator, so it would have its own first class syntax: generator iterate_lines(path): local data_lines = None context current_path, current_line, line_number, line_position start: with open(path) as f: current_path = path data = f.read() data_lines = tuple(data.splitlines()) next: for current_line in data_lines: line_number += 1 line_position = 0 yield current_line stop: current_path = None line_number = 0 line_position = 0 This: 1. Adds a keyword 'generator' so its obvious this is a generator not a function. 2. Declares it variables (data_lines) 3. Declares which context variables it wants to use (current_path, currentline, line_number, & line_position) 4. Has a start section that immediately gets executed. 5. Has a next section that executes on each call to __next__ (and this is where the yield keyword must appear) 6. Has a stop section that executes when the generator receives a StopIteration. 7. The compiler could generate equally efficient code for generators as it does for current generators; while making the syntax clearer to the user. 8. The syntax is chosen so the user can edit it & convert it to a class generator. Given the above: - I could now add special code to either the 'start' or 'next' section, saying which context I wanted to use (once we have that syntax implemented). The reason for its own syntax is to allow us to think more clearly about the different parts of a generator & then makes it easier for the user to choose which part of the generator interacts with contexts & which context. In particular the user could interact with multiple contexts (one in the start section & a different one in the next section). [Also for other generators I think the syntax needs to be extended, to something like: next(context): use context: .... Allowing two new features --- requesting that the __next__ receive the context of the caller & secondly being able to use that context itself. Next, moving on to contexts: - I love how non-local works & how you can access variables declared in your surrounding function. - I really think that contexts should work the same way - You would simply declare 'context' (like non-local) & just be able to use the variables directly. - Way easier to understand & use. The sample code I have actually emulates contexts using non-local, so as to demonstrate the idea I am explaining. Thanks, Amit P.S.: As I'm very new to python ideas, I'm not sure if I should start a separate thread to discuss this or use the current thread. Also I'm not sure if I should attached the sample code here or not ... So I just provided the link above. On Fri, Oct 13, 2017 at 4:29 PM, Paul Moore wrote: > On 13 October 2017 at 19:32, Yury Selivanov > wrote: > >>> It seems simpler to have one specially named and specially called > function > >>> be special, rather than make the semantics > >>> more complicated for all functions. > >> > > > > It's not possible to special case __aenter__ and __aexit__ reliably > > (supporting wrappers, decorators, and possible side effects). > > > >> +1. I think that would make it much more usable by those of us who are > not > >> experts. > > > > I still don't understand what Steve means by "more usable", to be honest. > > I'd consider myself a "non-expert" in async. Essentially, I ignore it > - I don't write the sort of applications that would benefit > significantly from it, and I don't see any way to just do "a little > bit" of async, so I never use it. > > But I *do* see value in the context variable proposals here - if only > in terms of them being a way to write my code to respond to external > settings in an async-friendly way. I don't follow the underlying > justification (which is based in "we need this to let things work with > async/coroutines) at all, but I'm completely OK with the basic idea > (if I want to have a setting that behaves "naturally", like I'd expect > decimal contexts to do, it needs a certain amount of language support, > so the proposal is to add that). I'd expect to be able to write > context variables that my code could respond to using a relatively > simple pattern, and have things "just work". Much like I can write a > context manager using @contextmanager and yield, and not need to > understand all the intricacies of __enter__ and __exit__. (BTW, > apologies if I'm mangling the terminology here - write it off as part > of me being "not an expert" :-)) > > What I'm getting from this discussion is that even if I *do* have a > simple way of writing context variables, they'll still behave in ways > that seem mildly weird to me (as a non-async user). Specifically, my > head hurts when I try to understand what that decimal context example > "should do". My instincts say that the current behaviour is wrong - > but I'm not sure I can explain why. So on that example, I'd ask the > following of any proposal: > > 1. Users trying to write a context variable[1] shouldn't have to jump > through hoops to get "natural" behaviour. That means that suggestions > that the complexity be pushed onto decimal.context aren't OK unless > it's also accepted that the current behaviour is wrong, and the only > reason decimal.context needs to replicated is for backward > compatibility (and new code can ignore the problem). > 2. The proposal should clearly establish what it views as "natural" > behaviour, and why. I'm not happy with "it's how decimal.context has > always behaved" as an explanation. Sure, people asking to break > backward compatibility should have a good justification, but equally, > people arguing to *preserve* an unintuitive current behaviour in new > code should be prepared to explain why it's not a bug. To put it > another way, context variables aren't required to be bug-compatible > with thread local storage. > > [1] I'm assuming here that "settings that affect how a library behave" > is a common requirement, and the PEP is intended as the "one obvious > way" to implement them. > > Nick's other async refactoring example is different. If the two forms > he showed don't behave identically in all contexts, then I'd consider > that to be a major problem. Saying that "coroutines are special" just > reads to me as "coroutines/async are sufficiently weird that I can't > expect my normal patterns of reasoning to work with them". (Apologies > if I'm conflating coroutines and async incorrectly - as a non-expert, > they are essentially indistinguishable to me). I sincerely hope that > isn't the message I should be getting - async is already more > inaccessible than I'd like for the average user. > > The fact that Nick's async example immediately devolved into a > discussion that I can't follow at all is fine - to an extent. I don't > mind the experts debating implementation details that I don't need to > know about. But if you make writing context variables harder, just to > fix Nick's example, or if you make *using* async code like (either of) > Nick's forms harder, then I do object, because that's affecting the > end user experience. > > In that context, I take Steve's comment as meaning "fiddling about > with how __aenter__ and __aexit__ work is fine, as that's internals > that non-experts like me don't care about - but making context > variables behave oddly because of this is *not* fine". > > Apologies if the above is unhelpful. I've been lurking but not > commenting here, precisely because I *am* a non-expert, and I trust > the experts to build something that works. But when non-experts were > explicitly mentioned, I thought my input might be useful. > > The following quote from the Zen seems particularly relevant here: > > If the implementation is hard to explain, it's a bad idea. > > (although the one about needing to be Dutch to understand why > something is obvious might well trump it ;-)) > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine.rozo at gmail.com Sat Oct 14 02:06:54 2017 From: antoine.rozo at gmail.com (Antoine Rozo) Date: Sat, 14 Oct 2017 08:06:54 +0200 Subject: [Python-ideas] Add a module itertools.recipes In-Reply-To: References: Message-ID: I am not searching for an external library (as I pointed, there are some on PyPI like iterutils or more-itertools). My point was that recipes are documented in itertools module, but not implemented in standard library, and it would be useful to have them available. 2017-10-13 20:35 GMT+02:00 Chris Angelico : > On Sat, Oct 14, 2017 at 5:16 AM, Antoine Rozo > wrote: > > Hello, > > > > A very useful part of the itertools module's documentation is the section > > "Recipes", giving utility functions that use itertools iterators. > > But when you want to use one of theese functions, you have to copy it in > > your source code (or use external PyPI modules like iterutils). > > > > Can we consider making itertools a package and adding a module > > itertools.recipes that implements all these utilility functions? > > Check out more-itertools on PyPI - maybe that's what you want? > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 14 03:09:07 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Oct 2017 17:09:07 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 14 October 2017 at 08:44, Steve Dower wrote: > > It's not possible to special case __aenter__ and __aexit__ reliably >> (supporting wrappers, decorators, and possible side effects). >> > > Why not? Can you not add a decorator that sets a flag on the code object > that means "do not create a new context when called", and then it doesn't > matter where the call comes from - these functions will always read and > write to the caller's context. That seems generally useful anyway, and then > you just say that __aenter__ and __aexit__ are special and always have that > flag set. > One example where giving function names implicit semantic significance becomes problematic: async def start_transaction(self): ... async def end_transaction(self, *exc_details): ... __aenter__ = start_transaction __aexit__ = end_transaction There *are* ways around that (e.g. type.__new__ implicitly wraps __init_subclass__ with classmethod since it makes no sense as a regular instance method), but then you still run into problems like this: async def __aenter__(self): return await self.start_transaction() async def __aexit__(self, *exc_details): return await self.end_transaction(*exc_details) If coroutines were isolated from their parents by default, then the above method implementations would be broken, even though the exact same invocation pattern works fine for synchronous function calls. To try and bring this back to synchronous examples that folks may find more intuitive, I figure it's worth framing the question this way: do we want people to reason about context variables like the active context is implicitly linked to the synchronous call stack, or do we want to encourage them to learn to reason about them more like they're a new kind of closure? The reason I ask that is because there are three "interesting" times in the life of a coroutine or generator: - definition time (when the def statement runs - this determines the lexical closure) - instance creation time (when the generator-iterator or coroutine is instantiated) - execution time (when the frame actually starts running - this determines the runtime call stack) For synchronous functions, instance creation time and execution time are intrinsically linked, since the execution frame is allocated and executed directly as part of calling the function. For asynchronous operations, there's more of a question, since actual execution is deferred until you call await or next() - the original synchronous call to the factory function instantiates an object, it doesn't actually *do* anything. The current position of PEP 550 (which I agree with) is that context variables should default to being closely associated with the active call stack (regardless of whether those calls are regular synchronous ones, or asynchronous ones with await), as this keeps the synchronous and asynchronous semantics of context variables as close to each other as we can feasibly make them. When implicit isolation takes place, it's either to keep concurrently active logical call stacks isolated from each other (the event loop case), and else to keep context changes from implicitly leaking *up* a stack (the generator case), not to keep context changes from propagating *down* a call stack. When we do want to prevent downward propagation for some reason, then that's what "run_in_execution_context" is for: deliberate creation of a new concurrently active call stack (similar to running something in another thread to isolate the synchronous call stack). Don't get me wrong, I'm not opposed to the idea of making it trivial to define "micro tasks" (iterables that perform a context switch to a specified execution context every time they retrieve a new value) that can provide easy execution context isolation without an event loop to manage it, I just think that would be more appropriate as a wrapper API that can be placed around any iterable, rather than being baked in as an intrinsic property of generators. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 14 03:20:17 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sat, 14 Oct 2017 17:20:17 +1000 Subject: [Python-ideas] Add a module itertools.recipes In-Reply-To: References: Message-ID: On 14 October 2017 at 16:06, Antoine Rozo wrote: > I am not searching for an external library (as I pointed, there are some > on PyPI like iterutils or more-itertools). > My point was that recipes are documented in itertools module, but not > implemented in standard library, and it would be useful to have them > available. > Not providing the recipes as an importable API is a deliberate design decision, as what folks often need is code that is similar-to-but-not-exactly-the-same-as the code in the recipe. If they've copied the code into their own utility library, then that's not a problem - they can just edit their version to have the exact semantics they need. Individual recipes may occasionally get promoted to be part of the module API, but that only happens on a case by case basis, and requires a compelling justification for the change ("It's sometimes useful" isn't compelling enough - we know it's sometimes useful, that's why it's listed as an example recipe). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Sat Oct 14 03:49:11 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Sat, 14 Oct 2017 10:49:11 +0300 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: 13.10.17 17:12, Victor Stinner ????: > I would like to add new functions to return time as a number of > nanosecond (Python int), especially time.time_ns(). > > It would enhance the time.time() clock resolution. In my experience, > it decreases the minimum non-zero delta between two clock by 3 times, > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > Python. > > The question of this long email is if it's worth it to add more "_ns" > time functions than just time.time_ns()? > > I would like to add: > > * time.time_ns() > * time.monotonic_ns() > * time.perf_counter_ns() > * time.clock_gettime_ns() > * time.clock_settime_ns() > > time(), monotonic() and perf_counter() clocks are the 3 most common > clocks and users use them to get the best available clock resolution. > clock_gettime/settime() are the generic UNIX API to access these > clocks and so should also be enhanced to get nanosecond resolution. I don't like the idea of adding a parallel set of functions. In the list of alternatives in PEP 410 there is no an idea about fixed precision float type with nanoseconds precision. It can be implemented internally as a 64-bit integer, but provide all methods required for float-compatible number. It would be simpler and faster than general Decimal. From solipsis at pitrou.net Sat Oct 14 04:21:22 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 14 Oct 2017 10:21:22 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution References: Message-ID: <20171014102122.7d718d93@fsol> On Sat, 14 Oct 2017 10:49:11 +0300 Serhiy Storchaka wrote: > 13.10.17 17:12, Victor Stinner ????: > > I would like to add new functions to return time as a number of > > nanosecond (Python int), especially time.time_ns(). > > > > It would enhance the time.time() clock resolution. In my experience, > > it decreases the minimum non-zero delta between two clock by 3 times, > > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > > Python. > > > > The question of this long email is if it's worth it to add more "_ns" > > time functions than just time.time_ns()? > > > > I would like to add: > > > > * time.time_ns() > > * time.monotonic_ns() > > * time.perf_counter_ns() > > * time.clock_gettime_ns() > > * time.clock_settime_ns() > > > > time(), monotonic() and perf_counter() clocks are the 3 most common > > clocks and users use them to get the best available clock resolution. > > clock_gettime/settime() are the generic UNIX API to access these > > clocks and so should also be enhanced to get nanosecond resolution. > > I don't like the idea of adding a parallel set of functions. > > In the list of alternatives in PEP 410 there is no an idea about fixed > precision float type with nanoseconds precision. It can be implemented > internally as a 64-bit integer, but provide all methods required for > float-compatible number. It would be simpler and faster than general > Decimal. I agree a parallel set of functions is not ideal, but I think a parallel set of functions is still more appropriate than a new number type specific to the time module. Also, if you change existing functions to return a new type, you risk breaking compatibility even if you are very careful about designing the new type. Regards Antoine. From p.f.moore at gmail.com Sat Oct 14 07:56:44 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 14 Oct 2017 12:56:44 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 14 October 2017 at 08:09, Nick Coghlan wrote: > To try and bring this back to synchronous examples that folks may find more > intuitive, I figure it's worth framing the question this way: do we want > people to reason about context variables like the active context is > implicitly linked to the synchronous call stack, or do we want to encourage > them to learn to reason about them more like they're a new kind of closure? I'm really struggling to keep up here. I need to go and fully read the PEP as Yury suggested, and focus on what's in there. But I'll try to answer this comment. I will ask one question, though, based on Yury's point "the PEP is where you should look for the actual semantics" - can you state where in the PEP is affected by the answer to this question? I want to make sure that when I read the PEP, I don't miss the place that this whole discussion thread is about... I don't think of contexts in terms of *either* the "synchronous call stack" (which, by the way, is much too technical a term to make sense to the "non-expert" people around here like me - I know what the term means, but only in a way that's far to low-level to give me an intuitive sense of what contexts are) or closures. At the risk of using another analogy that's unfamiliar to a lot of people, I think of them in terms of Lisp's dynamic variables. Code that needs a context variable, gets the value that's current *at that time*. I don't want to have to think lower level than that - if I have to, then in my view there's a problem with a *different* abstraction (specifically async ;-)) To give an example: async def get_webpage(id): url = f"https://{server}/{app}/items?id={id}" # 1 encoding, content = await url_get(url) #2 return content.decode(encoding) I would expect that, if I set a context variable at #1, and read it at #2, then: 1. code run as part of url_get would see the value set at 1 2. code run as part of url_get could set the value, and I'd see the new value at 2 It doesn't matter what form the lines in the function take (loops, with statements, conditionals, ...) as long as they are run immediately (class and function definitions should be ignored - there's no lexical capture of context variables). That probably means "synchronous call stack" in your terms, but please don't assume that any implications of that term which aren't covered by the above example are obvious to me. To use the decimal context example: > with decimal.localcontext() as ctx: > ctx.prec = 30 > for i in gen(): > pass There's only one setting of a context here, so it's obvious - values returned from gen have precision 30. > g = gen() > with decimal.localcontext() as ctx: > ctx.prec = 30 > for i in g: > pass "for i in g" is getting values from the generator, at a time when the precision is 30, so those values should have precision 30. There's no confusion here to me. If that's not what decimal currently does, I'd happily report that as a bug. The refactoring case is similarly obvious to me: > async def original_async_function(): > with some_context(): > do_some_setup() > raw_data = await some_operation() > data = do_some_postprocessing(raw_data) > > Refactored: > > async def async_helper_function(): > do_some_setup() > raw_data = await some_operation() > return do_some_postprocessing(raw_data) > > async def refactored_async_function(): > with some_context(): > data = await async_helper_function() > All we've done here is take some code out of the with block and write it as a helper. There should be no change of semantics when doing so. That's a fundamental principle to me, and honestly I don't see it as credible for anyone to say otherwise. (Anyone who suggests that is basically saying "if you use async, common sense goes out of the window" as far as I'm concerned). > The reason I ask that is because there are three "interesting" times in the > life of a coroutine or generator: > > - definition time (when the def statement runs - this determines the lexical > closure) > - instance creation time (when the generator-iterator or coroutine is > instantiated) > - execution time (when the frame actually starts running - this determines > the runtime call stack) OK. They aren't *really* interesting to me (they are a low-level detail, but they should work to support intuitive semantics, not to define what my intuition should be) but I'd say that my expectation is that the *execution time* value of the context variable is what I'd expect to get and set. > For synchronous functions, instance creation time and execution time are > intrinsically linked, since the execution frame is allocated and executed > directly as part of calling the function. > > For asynchronous operations, there's more of a question, since actual > execution is deferred until you call await or next() - the original > synchronous call to the factory function instantiates an object, it doesn't > actually *do* anything. This isn't particularly a question for me: g = gen() creates an object. next(g) - or more likely "for o in g" - runs it, and that's when the context matters. I struggle to understand why anyone would think otherwise. > The current position of PEP 550 (which I agree with) is that context > variables should default to being closely associated with the active call > stack (regardless of whether those calls are regular synchronous ones, or > asynchronous ones with await), as this keeps the synchronous and > asynchronous semantics of context variables as close to each other as we can > feasibly make them. At the high level we're talking here, I agree with this. > When implicit isolation takes place, it's either to keep concurrently active > logical call stacks isolated from each other (the event loop case), and else > to keep context changes from implicitly leaking *up* a stack (the generator > case), not to keep context changes from propagating *down* a call stack. I don't understand this. If it matters, in terms of explaining corner cases of the semantics, then it needs to be explained in more intuitive terms. If it's an implementation detail of *how* the PEP ensures it acts intuitively, then I'm fine with not needing to care. > When we do want to prevent downward propagation for some reason, then that's > what "run_in_execution_context" is for: deliberate creation of a new > concurrently active call stack (similar to running something in another > thread to isolate the synchronous call stack). I read that as "run_in_execution_context is a specialised thing that you'll never need to use, because you don't understand its purpose - so just hope that in your code, everything will just work as you expect without it". The obvious omission here is an explanation of precisely who my interpretation *doesn't* apply for. Who are the audience for run_in_execution_context? If it's "people who write context managers that use context variables" then I'd say that's a problem, because I'd hope a lot of people would find use for this, and I wouldn't want them to have to understand the internals to this level. If it's something like "people who write async context managers using raw __aenter__ and __aexit__ functions, as opposed to the async version of @contextmanager", then that's probably fine. > Don't get me wrong, I'm not opposed to the idea of making it trivial to > define "micro tasks" (iterables that perform a context switch to a specified > execution context every time they retrieve a new value) that can provide > easy execution context isolation without an event loop to manage it, I just > think that would be more appropriate as a wrapper API that can be placed > around any iterable, rather than being baked in as an intrinsic property of > generators. I don't think it matters whether it's trivial to write "micro tasks" if non-experts don't know what they are ;-) I *do* think it matters if "micro tasks" are something non-experts might need to write, but not realise they are straying into deep waters. But I've no way of knowing how likely that is. One final point, this is all pretty deeply intertwined with the comprehensibility of async as a whole. At the moment, as I said before, async is a specialised area that's largely only used in projects that centre around it. In the same way that Twisted is its own realm - people write network applications without Twisted, or they write them using Twisted. Nobody uses Twisted in the middle of some normal non-async application like pip to handle grabbing a webpage. I'm uncertain whether the intent is for the core async features to follow this model, or whether we'd expect in the longer term for "utility adoption" of async to happen (tactical use of async for something like web crawling or collecting subprocess output in a largely non-async app). If that *does* happen, then async needs to be much more widely understandable - maintenance programmers who have never used async will start encountering it in corners of their non-async applications, or find it used under the hood in libraries that they use. This discussion is a good example of the implications of that - async quirks leaking out into the "normal" world (decimal contexts) and as a result the async experts needing to be able to communicate their concerns and issues to non-experts. Hopefully some of this helps, Paul From ncoghlan at gmail.com Sat Oct 14 11:46:50 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 01:46:50 +1000 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: <20171014102122.7d718d93@fsol> References: <20171014102122.7d718d93@fsol> Message-ID: On 14 October 2017 at 18:21, Antoine Pitrou wrote: > On Sat, 14 Oct 2017 10:49:11 +0300 > Serhiy Storchaka > wrote: > > I don't like the idea of adding a parallel set of functions. > > > > In the list of alternatives in PEP 410 there is no an idea about fixed > > precision float type with nanoseconds precision. It can be implemented > > internally as a 64-bit integer, but provide all methods required for > > float-compatible number. It would be simpler and faster than general > > Decimal. > > I agree a parallel set of functions is not ideal, but I think a parallel > set of functions is still more appropriate than a new number type > specific to the time module. > > Also, if you change existing functions to return a new type, you risk > breaking compatibility even if you are very careful about designing the > new type. > Might it make more sense to have a parallel *module* that works with a different base data type rather than parallel functions within the existing API? That is, if folks wanted to switch to 64-bit nanosecond time, they would use: * time_ns.time() * time_ns.monotonic() * time_ns.perf_counter() * time_ns.clock_gettime() * time_ns.clock_settime() The idea here would be akin to the fact we have both math and cmath as modules, where the common APIs conceptually implement the same algorithms, they just work with a different numeric type (floats vs complex numbers). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sat Oct 14 11:56:36 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 14 Oct 2017 17:56:36 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution References: <20171014102122.7d718d93@fsol> Message-ID: <20171014175636.23959d78@fsol> On Sun, 15 Oct 2017 01:46:50 +1000 Nick Coghlan wrote: > > Might it make more sense to have a parallel *module* that works with a > different base data type rather than parallel functions within the existing > API? > > That is, if folks wanted to switch to 64-bit nanosecond time, they would > use: > > * time_ns.time() > * time_ns.monotonic() > * time_ns.perf_counter() > * time_ns.clock_gettime() > * time_ns.clock_settime() > > The idea here would be akin to the fact we have both math and cmath as > modules, where the common APIs conceptually implement the same algorithms, > they just work with a different numeric type (floats vs complex numbers). -1 from me. The math/cmath separation isn't even very well grounded, it just mirrors the C API that those two modules reflect. But regardless, the *operations* in math and cmath are different and operate in different domains (try e.g. ``sqrt(-1)``), which is not the case here. Regards Antoine. From ncoghlan at gmail.com Sat Oct 14 12:50:27 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 02:50:27 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 14 October 2017 at 21:56, Paul Moore wrote: TL;DR of below: PEP 550 currently gives you what you're after, so your perspective counts as a preference for "please don't eagerly capture the creation time context in generators or coroutines". To give an example: > > async def get_webpage(id): > url = f"https://{server}/{app}/items?id={id}" > # 1 > encoding, content = await url_get(url) > #2 > return content.decode(encoding) > > I would expect that, if I set a context variable at #1, and read it at #2, > then: > > 1. code run as part of url_get would see the value set at 1 > 2. code run as part of url_get could set the value, and I'd see > the new value at 2 > This is consistent with what PEP 550 currently proposes, because you're creating the coroutine and calling it in the same expression: "await url_get(url)". That's the same as what happens for synchronous function calls, which is why we think it's also the right way for coroutines to behave. The slightly more open-to-challenge case is this one: # Point 1 (pre-create) cr = url_get(url) # Point 2 (post-create, pre-call) encoding, content = await cr # Point 3 (post-call) PEP 550 currently says that it doesn't matter whether you change the context at point 1 or point 2, as "get_url" will see the context as it is at the await call (i.e. when it actually gets executed), *not* as it is when the coroutine is created. The suggestion has been made that we should instead be capturing the active context when "url_get(url)" is called, and implicitly switching back to that at the point where await is called. It doesn't seem like a good idea to me, as it breaks the "top to bottom" mental model of code execution (since the "await cr" expression would briefly switch the context back to the one that was in effect on the "cr = url_get(url)" line without even a nested suite to indicate that we may be adjusting the order of code execution). It would also cause problems with allowing context changes to propagate out of the "await cr" call, since capturing a context implies forking it, and hence any changes would somehow need to be transplanted back to a potentially divergent context history (if a context change *did* happen at point 2 in the split example). It doesn't matter what form the lines in the function take (loops, > with statements, conditionals, ...) as long as they are run > immediately (class and function definitions should be ignored - > there's no lexical capture of context variables). That probably means > "synchronous call stack" in your terms, but please don't assume that > any implications of that term which aren't covered by the above > example are obvious to me. > I think you got everything, as I really do just mean the stack of frames in the current thread that will show up in a traceback. We normally just call it "the call stack", but that's ambiguous whenever we're also talking about coroutines (since each await chain creates its own distinct asynchronous call stack). > > g = gen() > > with decimal.localcontext() as ctx: > > ctx.prec = 30 > > for i in g: > > pass > > "for i in g" is getting values from the generator, at a time when the > precision is 30, so those values should have precision 30. > > There's no confusion here to me. If that's not what decimal currently > does, I'd happily report that as a bug. > This is the existing behaviour that PEP 550 is recommending we preserve as the default generator semantics, even if decimal (or a comparable context manager) switches to using context vars instead of thread locals. As with coroutines, the question has been asked whether or not the "g = gen()" line should be implicitly capturing the active execution context at that point, and then switching backing it for each iteration of "for i in g:". > The reason I ask that is because there are three "interesting" times in > the > > life of a coroutine or generator: > > > > - definition time (when the def statement runs - this determines the > lexical > > closure) > > - instance creation time (when the generator-iterator or coroutine is > > instantiated) > > - execution time (when the frame actually starts running - this > determines > > the runtime call stack) > > OK. They aren't *really* interesting to me (they are a low-level > detail, but they should work to support intuitive semantics, not to > define what my intuition should be) but I'd say that my expectation is > that the *execution time* value of the context variable is what I'd > expect to get and set. > That's the view PEP 550 currently takes as well. > > For asynchronous operations, there's more of a question, since actual > > execution is deferred until you call await or next() - the original > > synchronous call to the factory function instantiates an object, it > doesn't > > actually *do* anything. > > This isn't particularly a question for me: g = gen() creates an > object. next(g) - or more likely "for o in g" - runs it, and that's > when the context matters. I struggle to understand why anyone would > think otherwise. > If you capture the context eagerly, then there are fewer opportunities to get materially different values from "data = list(iterable)" and "data = iter(context_capturing_iterable)". While that's a valid intent for folks to want to be able to express, I personally think it would be more clearly requested via an expression like "data = iter_in_context(iterable)" rather than having it be implicit in the way generators work (especially since having eager context capture be generator-only behaviour would create an odd discrepancy between generators and other iterators like those in itertools). > > When implicit isolation takes place, it's either to keep concurrently > active > > logical call stacks isolated from each other (the event loop case), and > else > > to keep context changes from implicitly leaking *up* a stack (the > generator > > case), not to keep context changes from propagating *down* a call stack. > > I don't understand this. If it matters, in terms of explaining corner > cases of the semantics, then it needs to be explained in more > intuitive terms. If it's an implementation detail of *how* the PEP > ensures it acts intuitively, then I'm fine with not needing to care. > Cases where we expect context changes to be able to propagate into or out of a frame: - when you call something, it can see your context - when something you called returns, you can see changes it made to your context - when a generator-based context manager is suspended Call in the above deliberately covers both sychronous calls (with regular call syntax) and asynchronous calls (with await or yield from). Cases where we *don't* expect context changes to propagate out of a frame: - when you spun up a separate logical thread of execution (either an actual OS thread, or an event loop task) - when a generator-based iterator is suspended > > When we do want to prevent downward propagation for some reason, then > that's > > what "run_in_execution_context" is for: deliberate creation of a new > > concurrently active call stack (similar to running something in another > > thread to isolate the synchronous call stack). > > I read that as "run_in_execution_context is a specialised thing that > you'll never need to use, because you don't understand its purpose - > so just hope that in your code, everything will just work as you > expect without it". The obvious omission here is an explanation of > precisely who my interpretation *doesn't* apply for. Who are the > audience for run_in_execution_context? If it's "people who write > context managers that use context variables" then I'd say that's a > problem, because I'd hope a lot of people would find use for this, and > I wouldn't want them to have to understand the internals to this > level. If it's something like "people who write async context managers > using raw __aenter__ and __aexit__ functions, as opposed to the async > version of @contextmanager", then that's probably fine. > Context managers would be fine (the defaults are deliberately set up to make those "just work", either globally, or in the relevant decorators). However, people who write event loops will need to care about it, as would anyone writing an "iter_in_context" helper function. Folks trying to strictly emulate generator semantics in their own iterators would also need to worry about it, but "revert any context changes before returning from __next__" is a simpler alternative to actually doing that. > > Don't get me wrong, I'm not opposed to the idea of making it trivial to > > define "micro tasks" (iterables that perform a context switch to a > specified > > execution context every time they retrieve a new value) that can provide > > easy execution context isolation without an event loop to manage it, I > just > > think that would be more appropriate as a wrapper API that can be placed > > around any iterable, rather than being baked in as an intrinsic property > of > > generators. > > I don't think it matters whether it's trivial to write "micro tasks" > if non-experts don't know what they are ;-) I *do* think it matters if > "micro tasks" are something non-experts might need to write, but not > realise they are straying into deep waters. But I've no way of knowing > how likely that is. > A micro-task is just a fancier name for the "iter_in_context" idea above (save the current context when the iterator is created, switch back to that context every time you're asked for a new value). > One final point, this is all pretty deeply intertwined with the > comprehensibility of async as a whole. At the moment, as I said > before, async is a specialised area that's largely only used in > projects that centre around it. In the same way that Twisted is its > own realm - people write network applications without Twisted, or they > write them using Twisted. Nobody uses Twisted in the middle of some > normal non-async application like pip to handle grabbing a webpage. > I'm uncertain whether the intent is for the core async features to > follow this model, or whether we'd expect in the longer term for > "utility adoption" of async to happen (tactical use of async for > something like web crawling or collecting subprocess output in a > largely non-async app). If that *does* happen, then async needs to be > much more widely understandable - maintenance programmers who have > never used async will start encountering it in corners of their > non-async applications, or find it used under the hood in libraries > that they use. This discussion is a good example of the implications > of that - async quirks leaking out into the "normal" world (decimal > contexts) and as a result the async experts needing to be able to > communicate their concerns and issues to non-experts. > Aye, this is why I'd like the semantics of context variables to be almost indistinguishable from those of thread local variables for synchronous code (aside from avoiding context changes leaking out of generator-iterators when they yield from inside a with statement). PEP 550 currently does a good job of ensuring that, but we'd break that near equivalence if generators were to implicitly capture their creation context. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sat Oct 14 15:47:42 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sat, 14 Oct 2017 20:47:42 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 14 October 2017 at 17:50, Nick Coghlan wrote: > On 14 October 2017 at 21:56, Paul Moore wrote: > > TL;DR of below: PEP 550 currently gives you what you're after, so your > perspective counts as a preference for "please don't eagerly capture the > creation time context in generators or coroutines". Thank you. That satisfies my concerns pretty well. > The suggestion has been made that we should instead be capturing the active > context when "url_get(url)" is called, and implicitly switching back to that > at the point where await is called. It doesn't seem like a good idea to me, > as it breaks the "top to bottom" mental model of code execution (since the > "await cr" expression would briefly switch the context back to the one that > was in effect on the "cr = url_get(url)" line without even a nested suite to > indicate that we may be adjusting the order of code execution). OK. Then I think that's a bad idea - and anyone proposing it probably needs to explain much more clearly why it might be a good idea to jump around in the timeline like that. > If you capture the context eagerly, then there are fewer opportunities to > get materially different values from "data = list(iterable)" and "data = > iter(context_capturing_iterable)". > > While that's a valid intent for folks to want to be able to express, I > personally think it would be more clearly requested via an expression like > "data = iter_in_context(iterable)" rather than having it be implicit in the > way generators work (especially since having eager context capture be > generator-only behaviour would create an odd discrepancy between generators > and other iterators like those in itertools). OK. I understand the point here - but I'm not sure I see the practical use case for iter_in_context. When would something like that be used? Paul From ncoghlan at gmail.com Sun Oct 15 00:39:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 14:39:24 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 05:47, Paul Moore wrote: > On 14 October 2017 at 17:50, Nick Coghlan wrote: > > If you capture the context eagerly, then there are fewer opportunities to > > get materially different values from "data = list(iterable)" and "data = > > iter(context_capturing_iterable)". > > > > While that's a valid intent for folks to want to be able to express, I > > personally think it would be more clearly requested via an expression > like > > "data = iter_in_context(iterable)" rather than having it be implicit in > the > > way generators work (especially since having eager context capture be > > generator-only behaviour would create an odd discrepancy between > generators > > and other iterators like those in itertools). > > OK. I understand the point here - but I'm not sure I see the practical > use case for iter_in_context. When would something like that be used? > Suppose you have some existing code that looks like this: results = [calculate_result(a, b) for a, b in data] If calculate_result is context dependent in some way (e.g. a & b might be decimal values), then eager evaluation of "calculate_result(a, b)" will use the context that's in effect on this line for every result. Now, suppose you want to change the code to use lazy evaluation, so that you don't need to bother calculating any results you don't actually use: results = (calculate_result(a, b) for a, b in data) In a PEP 550 world, this refactoring now has a side-effect that goes beyond simply delaying the calculation: since "calculate_result(a, b)" is no longer executed immediately, it will default to using whatever execution context is in effect when it actually does get executed, *not* the one that's in effect on this line. A context capturing helper for iterators would let you decide whether or not that's what you actually wanted by instead writing: results = iter_in_context(calculate_result(a, b) for a, b in data) Here, "iter_in_context" would indicate explicitly to the reader that whenever another item is taken from this iterator, the execution context is going to be temporarily reset back to the way it was on this line. And since it would be a protocol based iterator-in-iterator-out function, you could wrap it around *any* iterator, not just generator-iterator objects. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 15 01:05:13 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 14 Oct 2017 22:05:13 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: I would like to reboot this discussion (again). It feels to me we're getting farther and farther from solving any of the problems we might solve. I think we need to give up on doing anything about generators; the use cases point in too many conflicting directions. So we should keep the semantics there, and if you don't want your numeric or decimal context to leak out of a generator, don't put `yield` inside `with`. (Yury and Stefan have both remarked that this is not a problem in practice, given that there are no bug reports or StackOverflow questions about this topic.) Nobody understands async generators, so let's not worry about them. That leaves coroutines (`async def` and `await`). It looks like we don't want to change the original semantics here either, *except* when a framework like asyncio or Twisted has some kind of abstraction for a "task". (I intentionally don't define tasks, but task switches should be explicit, e.g. via `await` or some API -- note that even gevent qualifies, since it only switches when you make a blocking call.) The key things we want then are (a) an interface to get and set context variables whose API is independent from the framework in use (if any), and (b) a way for a framework to decide when context variables are copied, shared or reinitialized. For (a) I like the API from PEP 550: var = contextvars.ContextVar('description') value = var.get() var.set(value) It should be easy to adopt this e.g. in the decimal module instead of the current approach based on thread-local state. For (b) I am leaning towards something simple that emulates thread-local state. Let's define "context" as a mutable mapping whose keys are ContextVar objects, tied to the current thread (each Python thread knows about exactly one context, which is deemed the current context). A framework can decide to clone the current context and assign it to a new task, or initialize a fresh context, etc. The one key feature we want here is that the right thing happens when we switch tasks via `await`, just as the right thing happens when we switch threads. (When a framework uses some other API to switch tasks, the framework do what it pleases.) I don't have a complete design, but I don't want chained lookups, and I don't want to obsess over performance. (I would be fine with some kind of copy-on-write implementation, and switching out the current context should be fast.) I also don't want to obsess over API abstraction. Finally I don't want the design to be closely tied to `with`. Maybe I need to write my own PEP? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 15 01:13:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 15:13:05 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 14:53, M.-A. Lemburg wrote: > On 15.10.2017 06:39, Nick Coghlan wrote: > > On 15 October 2017 at 05:47, Paul Moore > > wrote: > > > > On 14 October 2017 at 17:50, Nick Coghlan > > wrote: > > > If you capture the context eagerly, then there are fewer > opportunities to > > > get materially different values from "data = list(iterable)" and > "data = > > > iter(context_capturing_iterable)". > > > > > > While that's a valid intent for folks to want to be able to > express, I > > > personally think it would be more clearly requested via an > expression like > > > "data = iter_in_context(iterable)" rather than having it be > implicit in the > > > way generators work (especially since having eager context capture > be > > > generator-only behaviour would create an odd discrepancy between > generators > > > and other iterators like those in itertools). > > > > OK. I understand the point here - but I'm not sure I see the > practical > > use case for iter_in_context. When would something like that be used? > > > > > > Suppose you have some existing code that looks like this: > > > > results = [calculate_result(a, b) for a, b in data] > > > > If calculate_result is context dependent in some way (e.g. a & b might > > be decimal values), then eager evaluation of "calculate_result(a, b)" > > will use the context that's in effect on this line for every result. > > > > Now, suppose you want to change the code to use lazy evaluation, so that > > you don't need to bother calculating any results you don't actually use: > > > > results = (calculate_result(a, b) for a, b in data) > > > > In a PEP 550 world, this refactoring now has a side-effect that goes > > beyond simply delaying the calculation: since "calculate_result(a, b)" > > is no longer executed immediately, it will default to using whatever > > execution context is in effect when it actually does get executed, *not* > > the one that's in effect on this line. > > > > A context capturing helper for iterators would let you decide whether or > > not that's what you actually wanted by instead writing: > > > > results = iter_in_context(calculate_result(a, b) for a, b in data) > > > > Here, "iter_in_context" would indicate explicitly to the reader that > > whenever another item is taken from this iterator, the execution context > > is going to be temporarily reset back to the way it was on this line. > > And since it would be a protocol based iterator-in-iterator-out > > function, you could wrap it around *any* iterator, not just > > generator-iterator objects. > > I have a hard time seeing the advantage of having a default > where the context at the time of execution is dependent on > where it happens rather than where it's defined. > The underlying rationale is that the generator form should continue to be as close as we can reasonably make it to being pure syntactic sugar for the iterator form: class ResultsIterator: def __init__(self, data): self._itr = iter(data) def __next__(self): return calculate_result(next(self._itr)) results = _ResultsIterator(data) The logical context adjustments in PEP 550 then serve to make using a with statement around a yield expression in a generator closer in meaning to using one around a return statement in a __next__ method implementation. > IMO, the default should be to use the context where the line > was defined in the code, since that matches the intuitive > way of writing and defining code. > This would introduce a major behavioural discrepancy between generators and iterators. > The behavior of also deferring the context to time of > execution should be the non-standard form to not break > this intuition, otherwise debugging will be a pain and > writing fully working code would be really hard in the > face of changing contexts (e.g. say decimal rounding > changes in different parts of the code). > No, it really wouldn't, since "the execution context is the context that's active when the code is executed" is relatively easy to understand based entirely on the way functions, methods, and other forms of delayed execution work (including iterators). "The execution context is the context that's active when the code is executed, *unless* the code is in a generator, in which case, it's the context that was active when the generator-iterator was instantiated" is harder to follow. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From mal at egenix.com Sun Oct 15 00:53:58 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 15 Oct 2017 06:53:58 +0200 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15.10.2017 06:39, Nick Coghlan wrote: > On 15 October 2017 at 05:47, Paul Moore > wrote: > > On 14 October 2017 at 17:50, Nick Coghlan > wrote: > > If you capture the context eagerly, then there are fewer opportunities to > > get materially different values from "data = list(iterable)" and "data = > > iter(context_capturing_iterable)". > > > > While that's a valid intent for folks to want to be able to express, I > > personally think it would be more clearly requested via an expression like > > "data = iter_in_context(iterable)" rather than having it be implicit in the > > way generators work (especially since having eager context capture be > > generator-only behaviour would create an odd discrepancy between generators > > and other iterators like those in itertools). > > OK. I understand the point here - but I'm not sure I see the practical > use case for iter_in_context. When would something like that be used? > > > Suppose you have some existing code that looks like this: > > results = [calculate_result(a, b) for a, b in data] > > If calculate_result is context dependent in some way (e.g. a & b might > be decimal values), then eager evaluation of "calculate_result(a, b)" > will use the context that's in effect on this line for every result. > > Now, suppose you want to change the code to use lazy evaluation, so that > you don't need to bother calculating any results you don't actually use: > > results = (calculate_result(a, b) for a, b in data) > > In a PEP 550 world, this refactoring now has a side-effect that goes > beyond simply delaying the calculation: since "calculate_result(a, b)" > is no longer executed immediately, it will default to using whatever > execution context is in effect when it actually does get executed, *not* > the one that's in effect on this line. > > A context capturing helper for iterators would let you decide whether or > not that's what you actually wanted by instead writing: > > results = iter_in_context(calculate_result(a, b) for a, b in data) > > Here, "iter_in_context" would indicate explicitly to the reader that > whenever another item is taken from this iterator, the execution context > is going to be temporarily reset back to the way it was on this line. > And since it would be a protocol based iterator-in-iterator-out > function, you could wrap it around *any* iterator, not just > generator-iterator objects. I have a hard time seeing the advantage of having a default where the context at the time of execution is dependent on where it happens rather than where it's defined. IMO, the default should be to use the context where the line was defined in the code, since that matches the intuitive way of writing and defining code. The behavior of also deferring the context to time of execution should be the non-standard form to not break this intuition, otherwise debugging will be a pain and writing fully working code would be really hard in the face of changing contexts (e.g. say decimal rounding changes in different parts of the code). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Oct 15 2017) >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/ From ncoghlan at gmail.com Sun Oct 15 01:43:00 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 15:43:00 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 15:05, Guido van Rossum wrote: > I would like to reboot this discussion (again). It feels to me we're > getting farther and farther from solving any of the problems we might solve. > > I think we need to give up on doing anything about generators; the use > cases point in too many conflicting directions. So we should keep the > semantics there, and if you don't want your numeric or decimal context to > leak out of a generator, don't put `yield` inside `with`. (Yury and Stefan > have both remarked that this is not a problem in practice, given that there > are no bug reports or StackOverflow questions about this topic.) > Let me have another go at building up the PEP 550 generator argument from first principles. The behaviour that PEP 550 says *shouldn't* change is the semantic equivalence of the following code: # Iterator form class ResultsIterator: def __init__(self, data): self._itr = iter(data) def __next__(self): return calculate_result(next(self._itr)) results = _ResultsIterator(data) # Generator form def _results_gen(data): for item in data: yield calculate_result(item) results = _results_gen(data) This *had* been non-controversial until recently, and I still don't understand why folks suddenly decided we should bring it into question by proposing that generators should start implicitly capturing state at creation time just because it's technically possible for them to do so (yes we can implicitly change the way all generators work, but no, we can't implicitly change the way all *iterators* work). The behaviour that PEP 550 thinks *should* change is for the following code to become roughly semantically equivalent, given the constraint that the context manager involved either doesn't manipulate any shared state at all (already supported), or else only manipulates context variables (the new part that PEP 550 adds): # Iterator form class ResultsIterator: def __init__(self, data): self._itr = iter(data) def __next__(self): with adjusted_context(): return calculate_result(next(self._itr)) results = _ResultsIterator(data) # Generator form def _results_gen(data): for item in data: with adjusted_context(): yield calculate_result(item) results = _results_gen(data) Today, while these two forms look like they *should* be comparable, they're not especially close to being semantically equivalent, as there's no mechanism that allows for implicit context reversion at the yield point in the generator form. While I think PEP 550 would still be usable without fixing this discrepancy, I'd be thoroughly disappointed if the only reason we decided not to do it was because we couldn't clearly articulate the difference in reasoning between: * "Generators currently have no way to reasonably express the equivalent of having a context-dependent return statement inside a with statement in a __next__ method implementation, so let's define one" (aka "context variable changes shouldn't leak out of generators, as that will make them *more* like explicit iterator __next__ methods"); and * "Generator functions should otherwise continue to be unsurprising syntactic sugar for objects that implement the regular iterator protocol" (aka "generators shouldn't implicitly capture their creation context, as that would make them *less* like explicit iterator __init__ methods"). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 15 01:49:10 2017 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 14 Oct 2017 22:49:10 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On Sat, Oct 14, 2017 at 9:53 PM, M.-A. Lemburg wrote: > I have a hard time seeing the advantage of having a default > where the context at the time of execution is dependent on > where it happens rather than where it's defined. > > IMO, the default should be to use the context where the line > was defined in the code, since that matches the intuitive > way of writing and defining code. Of course, that's already the default: it's now regular variables and function arguments work. The reason we have forms like 'with decimal.localcontext', 'with numpy.errstate' is to handle the case where you want the context value to be determined by the runtime context when it's accessed rather than the static context where it's accessed. That's literally the whole point. It's not like this is a new and weird concept in Python either -- e.g. when you raise an exception, the relevant 'except' block is determined based on where the 'raise' happens (the runtime stack), not where the 'raise' was written: try: def foo(): raise RuntimeError except RuntimeError: print("this is not going to execute, because Python doesn't work that way") foo() -n -- Nathaniel J. Smith -- https://vorpus.org From ncoghlan at gmail.com Sun Oct 15 03:29:34 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 15 Oct 2017 17:29:34 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 15:49, Nathaniel Smith wrote: > It's not like this is a new and weird concept in Python either -- e.g. > when you raise an exception, the relevant 'except' block is determined > based on where the 'raise' happens (the runtime stack), not where the > 'raise' was written: > > try: > def foo(): > raise RuntimeError > except RuntimeError: > print("this is not going to execute, because Python doesn't work that > way") > foo() > Exactly - this is a better formulation of what I was trying to get at when I said that we want the semantics of context variables in synchronous code to reliably align with the semantics of the synchronous call stack as it appears in an exception traceback. Attempting a pithy summary of PEP 550's related semantics for use in explanations to folks that don't care about all the fine details: The currently active execution context aligns with the expected flow of exception handling for any exceptions raised in the code being executed. And with a bit more detail: * If the code in question will see the exceptions your code raises, then your code will also be able to see the context variables that it defined or set * By default, this relationship is symmetrical, such that if your code will see the exceptions that other code raises as a regular Python exception, then you will also see the context changes that that code makes. * However, APIs and language features that enable concurrent code execution within a single operating system level thread (like event loops, coroutines and generators) may break that symmetry to avoid context variable management conflicts between concurrently executing code. This is the key behavioural difference between context variables (which enable this by design) and thread local variables (which don't). * Pretty much everything else in the PEP 550 API design is a lower level performance optimisation detail to make management of this dynamic state sharing efficient in event-driven code Even PEP 550's proposal for how yield would work aligns with that "the currently active execution context is the inverse of how exceptions will flow" notion: the idea there is that if a context manager's __exit__ method wouldn't see an exception raised by a piece of code, then that piece of code also shouldn't be able to see any context variable changes made by that context manager's __enter__ method (since the changes may not get reverted correctly on failure in that case). Exceptions raised in a for loop body *don't* typically get thrown back into the body of the generator-iterator, so generator-iterators' context variable changes should be reverted at their yield points. By contrast, exceptions raised in a with statement body *do* get thrown back into the body of a generator decorated with contextlib.contextmanager, so those context variable changes should *not* be reverted at yield points, and instead left for __exit__ to handle. Similarly, coroutines are in the exception handling path for the other coroutines they call (just like regular functions), so those coroutines should share an execution context rather than each having their own. All of that leads to it being specifically APIs that already need to do special things to account for exception handling flows within a single thread (e.g. asyncio.gather, asyncio.ensure_future, contextlib.contextmanager) that are likely to have to put some thought into how they will impact the active execution context. Code for which the existing language level exception handling semantics already work just fine should then also be able to rely on the default execution context management semantics. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Sun Oct 15 04:29:15 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Sun, 15 Oct 2017 10:29:15 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: Nick Coghlan schrieb am 14.10.2017 um 17:46: > On 14 October 2017 at 18:21, Antoine Pitrou wrote: >> On Sat, 14 Oct 2017 10:49:11 +0300 >> Serhiy Storchaka wrote: >>> I don't like the idea of adding a parallel set of functions. >>> >>> In the list of alternatives in PEP 410 there is no an idea about fixed >>> precision float type with nanoseconds precision. It can be implemented >>> internally as a 64-bit integer, but provide all methods required for >>> float-compatible number. It would be simpler and faster than general >>> Decimal. >> >> I agree a parallel set of functions is not ideal, but I think a parallel >> set of functions is still more appropriate than a new number type >> specific to the time module. >> >> Also, if you change existing functions to return a new type, you risk >> breaking compatibility even if you are very careful about designing the >> new type. >> > > Might it make more sense to have a parallel *module* that works with a > different base data type rather than parallel functions within the existing > API? > > That is, if folks wanted to switch to 64-bit nanosecond time, they would > use: > > * time_ns.time() > * time_ns.monotonic() > * time_ns.perf_counter() > * time_ns.clock_gettime() > * time_ns.clock_settime() > > The idea here would be akin to the fact we have both math and cmath as > modules, where the common APIs conceptually implement the same algorithms, > they just work with a different numeric type (floats vs complex numbers). I thought of that, too. People are used to rename things on import, so this would provide a very easy way for them to switch. OTOH, I would guess that "from time import time" covers more than 90% of the use cases of the time module and it doesn't really matter if we make people change the first or the second part of that import statement... But the real point here is that the data type which the current time module deals with is really (semantically) different from what is proposed now. All functionality in the time module assumes to work with "seconds", and accepts fractions of seconds for better precision. But the common semantic ground is "seconds". That suggests to me that "nanoseconds" really fits best into a new module which clearly separates the semantics of the two data types. (That being said, I'm a big fan of fractions, so I wonder if a Fraction with a constant nano denominator wouldn't fit in here...) Stefan From victor.stinner at gmail.com Sun Oct 15 05:45:31 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 15 Oct 2017 11:45:31 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: > (That being said, I'm a big fan of fractions, so I wonder if a Fraction with a constant nano denominator wouldn't fit in here...) It was discussed in depth in PEP 410, and the PEP was rejected. Guido voted for nanoseconds as int, when os.stat_result.st_mtime_ns was added. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Sun Oct 15 05:59:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 15 Oct 2017 11:59:57 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: > Might it make more sense to have a parallel *module* that works with a different base data type rather than parallel functions within the existing API? I asked about adding new functions to 4 different modules: os, resource, signal, time. For example, I dislike the idea of having os and os_ns modules. We already have os.stat() which returns time as seconds and nanoseconds (both at the same time). There is also os.utime() which accepts time as seconds *or* nanoseconds: os.utime (path, times=seconds) or os.utime(path, ns=nanoseconds). If we had a time_ns module, would it only contain 4 clocks or does it have to duplicate the full API? If yes, it is likely to be a mess to maintain them. How will user choose between time and time_ns? What if tomorrow clocks get picosecond resolution? (CPU TSC also has sub-nanosecond resolution, but OS API uses timespec, 1 ns res.) Add a third module? I prefer to leave all "time functions" in the "time module". For example, I don't think that we need to add time.nanosleep() or time.sleep_ns(), since the precision loss starts after a sleep of 104 days. Who cares of 1 nanosecond after a sleep of 104 days? -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 15 06:31:29 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 15 Oct 2017 11:31:29 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 05:39, Nick Coghlan wrote: > On 15 October 2017 at 05:47, Paul Moore wrote: >> >> On 14 October 2017 at 17:50, Nick Coghlan wrote: >> > If you capture the context eagerly, then there are fewer opportunities >> > to >> > get materially different values from "data = list(iterable)" and "data = >> > iter(context_capturing_iterable)". >> > >> > While that's a valid intent for folks to want to be able to express, I >> > personally think it would be more clearly requested via an expression >> > like >> > "data = iter_in_context(iterable)" rather than having it be implicit in >> > the >> > way generators work (especially since having eager context capture be >> > generator-only behaviour would create an odd discrepancy between >> > generators >> > and other iterators like those in itertools). >> >> OK. I understand the point here - but I'm not sure I see the practical >> use case for iter_in_context. When would something like that be used? > > > Suppose you have some existing code that looks like this: > > results = [calculate_result(a, b) for a, b in data] > > If calculate_result is context dependent in some way (e.g. a & b might be > decimal values), then eager evaluation of "calculate_result(a, b)" will use > the context that's in effect on this line for every result. > > Now, suppose you want to change the code to use lazy evaluation, so that you > don't need to bother calculating any results you don't actually use: > > results = (calculate_result(a, b) for a, b in data) > > In a PEP 550 world, this refactoring now has a side-effect that goes beyond > simply delaying the calculation: since "calculate_result(a, b)" is no longer > executed immediately, it will default to using whatever execution context is > in effect when it actually does get executed, *not* the one that's in effect > on this line. > > A context capturing helper for iterators would let you decide whether or not > that's what you actually wanted by instead writing: > > results = iter_in_context(calculate_result(a, b) for a, b in data) > > Here, "iter_in_context" would indicate explicitly to the reader that > whenever another item is taken from this iterator, the execution context is > going to be temporarily reset back to the way it was on this line. And since > it would be a protocol based iterator-in-iterator-out function, you could > wrap it around *any* iterator, not just generator-iterator objects. OK, got it. That sounds to me like a candidate for a stdlib function (either because it's seen as a common requirement, or because it's tricky to get right - or both). The PEP doesn't include it, as far as I can see, though. But I do agree with MAL, it seems wrong to need a helper for this, even though it's a logical consequence of the other semantics I described as intuitive :-( Paul From p.f.moore at gmail.com Sun Oct 15 06:45:55 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 15 Oct 2017 11:45:55 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 06:43, Nick Coghlan wrote: > On 15 October 2017 at 15:05, Guido van Rossum wrote: >> >> I would like to reboot this discussion (again). It feels to me we're >> getting farther and farther from solving any of the problems we might solve. >> >> I think we need to give up on doing anything about generators; the use >> cases point in too many conflicting directions. So we should keep the >> semantics there, and if you don't want your numeric or decimal context to >> leak out of a generator, don't put `yield` inside `with`. (Yury and Stefan >> have both remarked that this is not a problem in practice, given that there >> are no bug reports or StackOverflow questions about this topic.) > > > Let me have another go at building up the PEP 550 generator argument from > first principles. > > The behaviour that PEP 550 says *shouldn't* change is the semantic > equivalence of the following code: > > # Iterator form > class ResultsIterator: > def __init__(self, data): > self._itr = iter(data) > def __next__(self): > return calculate_result(next(self._itr)) > > results = _ResultsIterator(data) > > # Generator form > def _results_gen(data): > for item in data: > yield calculate_result(item) > > results = _results_gen(data) > > This *had* been non-controversial until recently, and I still don't > understand why folks suddenly decided we should bring it into question by > proposing that generators should start implicitly capturing state at > creation time just because it's technically possible for them to do so (yes > we can implicitly change the way all generators work, but no, we can't > implicitly change the way all *iterators* work). This is non-controversial to me. > The behaviour that PEP 550 thinks *should* change is for the following code > to become roughly semantically equivalent, given the constraint that the > context manager involved either doesn't manipulate any shared state at all > (already supported), or else only manipulates context variables (the new > part that PEP 550 adds): > > # Iterator form > class ResultsIterator: > def __init__(self, data): > self._itr = iter(data) > def __next__(self): > with adjusted_context(): > return calculate_result(next(self._itr)) > > results = _ResultsIterator(data) > > # Generator form > def _results_gen(data): > for item in data: > with adjusted_context(): > yield calculate_result(item) > > results = _results_gen(data) > > Today, while these two forms look like they *should* be comparable, they're > not especially close to being semantically equivalent, as there's no > mechanism that allows for implicit context reversion at the yield point in > the generator form. I'll have to take your word for this, as I can't think of an actual example that follows the pattern of your abstract description, for which I can immediately see the difference. In the absence of being able to understand why the difference matters in current code, I have no view on whether PEP 550 needs to "fix" this issue. > While I think PEP 550 would still be usable without fixing this discrepancy, > I'd be thoroughly disappointed if the only reason we decided not to do it > was because we couldn't clearly articulate the difference in reasoning > between: > > * "Generators currently have no way to reasonably express the equivalent of > having a context-dependent return statement inside a with statement in a > __next__ method implementation, so let's define one" (aka "context variable > changes shouldn't leak out of generators, as that will make them *more* like > explicit iterator __next__ methods"); and > * "Generator functions should otherwise continue to be unsurprising > syntactic sugar for objects that implement the regular iterator protocol" > (aka "generators shouldn't implicitly capture their creation context, as > that would make them *less* like explicit iterator __init__ methods"). I think that if we can't describe the problem that makes it obvious to the average Python user, then that implies it's a corner case that's irrelevant to said average Python user - and so I'd consider fixing it to be low priority. Specifically, a lot lower priority than providing a context variable facility - which while still not a *common* need, at least resonates with the average user in the sense of "I can imagine writing code that needed context like Decimal does". (And apologies for presenting an imagined viewpoint as what "the average user" might think...) Paul From stefan at bytereef.org Sun Oct 15 07:18:44 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 15 Oct 2017 13:18:44 +0200 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: <20171015111844.GA2730@bytereef.org> On Sun, Oct 15, 2017 at 06:53:58AM +0200, M.-A. Lemburg wrote: > I have a hard time seeing the advantage of having a default > where the context at the time of execution is dependent on > where it happens rather than where it's defined. > > IMO, the default should be to use the context where the line > was defined in the code, since that matches the intuitive > way of writing and defining code. > > The behavior of also deferring the context to time of > execution should be the non-standard form to not break > this intuition, otherwise debugging will be a pain and > writing fully working code would be really hard in the > face of changing contexts (e.g. say decimal rounding > changes in different parts of the code). It would be a major change, but I also think lexical scoping would work best for the decimal context. Isolation of modules and libraries (which IMO is a bigger problem than the often cited generator issues) would be solved. It would probably not work best (or even at all) for the async call chain use case. Stefan Krah From p.f.moore at gmail.com Sun Oct 15 08:15:34 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 15 Oct 2017 13:15:34 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 13 October 2017 at 23:30, Yury Selivanov wrote: > At this point of time, there's just one place which describes one well > defined semantics: PEP 550 latest version. Paul, if you have > time/interest, please take a look at it, and say what's confusing > there. Hi Yury, The following is my impressions from a read-through of the initial part of the PEP. tl; dr - you say "concurrent" too much and it makes my head hurt :-) 1. The abstract feels like it's talking about async. The phrase "consistent access to non-local state in the context of out-of-order execution, such as in Python generators and coroutines" said async to me, even though it mentioned generators. Probably because any time I see generators mentioned alongside coroutines (a term I really don't grasp yet in the context of Python) I immediately assume the reference is to the weird extensions of generators when send() and yield expressions are used. It quite genuinely took me two or three attempts to get past the abstract and actually read the next section, because the "this is async" idea came across so strongly. 2. The rationale says that "Prior to the advent of asynchronous programming in Python" threads and TLS were used - and it implies this was fine. But the section goes on to say "TLS does not work well for programs which execute concurrently in a single thread". But it uses a *generator* as the example. I'm sorry, but to me a generator is pure and simple standard Python, and definitely not "executing concurrently in a single thread" (see below). So again, the clash between what the description said and the actual example left me confused (and confused enough to equate all of this in my mind with "all that async stuff I don't follow"). 3. "This is because implicit Decimal context is stored as a thread-local, so concurrent iteration of the fractions() generator would corrupt the state." This makes no sense to me. The example isn't concurrent. There's only one thread, and no async. So no concurrency. It's interleaved iteration through two generators, which I understand is *technically* considered concurrency in the async sense, but doesn't *feel* like concurrency. At its core, this is the problem I'm hitting throughout the whole document - the conceptual clash between examples that don't feel concurrent, and discussions that talk almost totally in terms of concurrency, means that understanding every section is a significant mental effort. 4. By the end of the rationale, what I'd got from the document was: "There's a bug in decimal.context, caused by the fact that it uses TLS. It's basically a limitation of TLS. To fix it they need a new mechanism, which this PEP provides." So unless I'm using (or would expect to use) TLS in my own code, this doesn't affect me. Which really isn't the point (if I now understand correctly) - the PEP is actually providing a safe (and hopefully easy to use/understand!) mechanism for handling a specific class of programming problem, maintaining dynamic state that follows the execution order of the code, rather than the lexical structure. (I didn't state that well - but I hope I got the idea across) Basically, the problem that Lisp dynamic variables are designed to solve (although I don't think that describing the feature in terms of Lisp is a good idea either). 4a. I'd much prefer this part of the PEP to be structured as follows: * There's a class of programming problems that need to allow code to access "state" in a way that follows the runtime path the code takes. Prior art in this area include Lisp's dynamic scope, ... (more examples would be good - IIRC, Perl has this type of variable too). * Normal local variables can't do this as they are lexically scoped. Global variables can be used, but they don't work in the presence of threads. * TLS work for threads, but hit problems when code execution paths aren't nested subroutine-style. Examples where this happens are generators (which suspend execution and yield back to their parent), and async (which simulates multiple threads by interleaving execution of generators). [Note - I know this explanation is probably inaccurate] * This PEP proposes a general mechanism that will allow programmers to simply write code that manages state like this, which will work in all of the above cases. That's it. Barely any mention of async, no need to focus on the Decimal bug except as a motivating example of why TLS isn't sufficient, and so no risk that people think "why not just fix decimal.context" - so no need to go into detail as to why you can't "just fix it". And it frames the PEP as providing a new piece of functionality that *anyone* might find a use for, rather than as a fix for a corner case of async/TLS interaction. 5. The "Goals" section says "provide a more reliable threading.local() alternative" which is fine. But the bullet points do exactly the same as before, using terms that I associate with async to describe the benefits, and so they aren't compelling to me. I'd say something like: * Is a reliable replacement for TLS that doesn't have the issue that was described in the rationale * Is closely modeled on the TLS API, to minimise the impact of switching on code that currently uses TLS * Performance yada yada yada - I don't think this is important, there's been no mention yet that any of this is performance critical (but see 4a above, this could probably be improved further if the rationale were structured the way I propose there). 6. The high level spec, under generators, says: """ Unlike regular function calls, generators can cooperatively yield their control of execution to the caller. Furthermore, a generator does not control where the execution would continue after it yields. It may be resumed from an arbitrary code location. """ That's not how I understand generators. To me, a generator can *suspend its execution* to be resumed later. On suspension, control *always* returns to the caller. Generators can be resumed from anywhere, although the most common pattern is to resume them repeatedly in a loop. To me, this implies that context variables should follow that execution path. If the caller sets a value, the generator sees it. If the generator sets a value then yields, the caller will see that. If code changes the value between two resumptions of the generator, the generator will see those changes. The PEP at this point, though, states the behaviour of context variables in a way that I simply don't follow - it's using the idea of an "outer context" - which as far as I can see, has never been defined at this point (and doesn't have any obvious meaning in terms of the execution flow, which is not nested in any obvious sense - that's the *point* of generators, to not have a purely nested execution path). The problem with the decimal context isn't about any of that - it's about how "yield" interacts with "with", and specifically that yielding out of the with *doesn't* run the exit part of the context manager, as the code inside the with statement hasn't finished running yet. Having stated the problem like that, I'm wondering why the solution isn't to add some sort of "suspend/resume" mechanism to the context manager protocol, rather than introducing context variables? That may be worth adding to the "Rejected ideas" section if it's not a viable solution. The next section of the high level spec is coroutines and async, which I'll skip, as I firmly believe that as I don't use them, if there's anything of relevance to me in that section, it should be moved to somewhere that isn't about async. I'm not going to comment on anything further. At this point, I'm far too overwhelmed with concepts and ideas that are at odds with my understanding of the problem to really take in detail-level information. I'd assume that the detail is about how the overall functionality as described is implemented, but as I don't really have a working mental model of the high-level functionality, I doubt I'd get much from the detail. I hope this is of some use. I appreciate I'm talking about a pretty wholesale rewrite, and it's quite far down the line to be suggesting such a thing. I'll understand if you don't feel it's worthwhile to take that route. Paul From amit.mixie at gmail.com Sun Oct 15 08:51:56 2017 From: amit.mixie at gmail.com (Amit Green) Date: Sun, 15 Oct 2017 08:51:56 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: Once again, I think Paul Moore gets to the heart of the issue. Generators are simply confusing & async even more so. Per my earlier email, the fact that generators look like functions, but are not functions, is at the root of the confusion. This next part really gets to the heart of the matter: On Sun, Oct 15, 2017 at 8:15 AM, Paul Moore wrote: > To me, this implies that context variables should follow that > execution path. If the caller sets a value, the generator sees it. If > the generator sets a value then yields, the caller will see that. If > code changes the value between two resumptions of the generator, the > generator will see those changes. The PEP at this point, though, > states the behaviour of context variables in a way that I simply don't > follow - it's using the idea of an "outer context" - which as far as I > can see, has never been defined at this point > This is the totally natural way to think of generators -- and exactly how I thought about them when I started -- and how I suspect 99% of beginners think of them: - And exactly what you expect since generators appear to be functions (since they use 'def' to create them). Now, as I understand it, its not what really happens, in fact, they can have their own context, which the major discussion here is all about: 1. Do we "bind" the context at the time you create the generator (which appears to call the generator, but really doesn't)?; OR 2. Do we "bind" the context at the time the first .__next__ method is called? And, as far as I can figure out, people are strongly arguing for #2 so it doesn't break backwards compatibility: - And then discussion of using wrappers to request #1 instead of #2 My answer is, you can't do #1 or #2 -- you need to do #3, as the default, -- what Paul write above -- anything else is "unnatural" and "non-intuitive". Now, I fully understand we *actually* want the unnatural behavior of #1 & #2 in real code (and #3 is not really that useful in real code). However #3 is the natural understanding of what it does ... so that what I am arguing needs to be implemented (as the default). Then when we want either #1 or #2, when we are writing real code, -- there should be special syntax to indicate that is what we want (and yes, we'll have to use the special syntax 95%+ of the time since we really want #1 or #2 95%+ of the time; and don't want #3). But that is the whole point, we *should* use special syntax to indicate we are doing something that is non-intuitive. This special syntax helps beginners understand better & will help them think about the concepts more clearly (See previous post be me on adding first class language to defining generators, so its a lot clearer what is going on with then). My argument, is a major strength of python, is how the syntax helps teach you concepts & how easy the language is to pick up. I've talked to so many people who have said the same thing about the language when they started. Generators (and this whole discussion of context variables) is not properly following that paradigm; and I believe it should. It would make python even stronger as a programming language of choice, that not only is easy to use, but easy to learn from as you start programming. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 15 09:33:58 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 15 Oct 2017 14:33:58 +0100 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 13:51, Amit Green wrote: > Once again, I think Paul Moore gets to the heart of the issue. > > Generators are simply confusing & async even more so. > > Per my earlier email, the fact that generators look like functions, but are > not functions, is at the root of the confusion. I don't agree. I don't find generators *at all* confusing. They are a very natural way of expressing things, as has been proven by how popular they are in the Python community. I don't *personally* understand async, but I'm currently willing to reserve judgement until I've been in a situation where it would be useful, and therefore needed to learn it. Paul From k7hoven at gmail.com Sun Oct 15 09:44:52 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 15 Oct 2017 16:44:52 +0300 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: Let me respond to my own email. I'm sorry I wrote a long email, but I figured I'll have to take the time to write this down carefully (and even in a new thread with a clear title) so that people would know what the discussion was about. Probably I could have done better structuring that email, but I seriously ran out of time. This is directly related to how "a normal user" writing async code would be affected by the semantics of this (context arguments/variables). It's also related to the semantics of contexts combined with normal generator functions, partly because the situation is somewhat similar, and partly because we might want the same basic rules to apply in both situations. (Side note: This also has to do with more obscure cases like multiple different async frameworks in the same process (or in the same program, or perhaps the same server, or even larger ? ? whatever the constraints are). Any of the context propagation and isolation/leakage semantics I have described (or that I recall anyone else describing) could be implemented in the PEP 555 approach without any problems. The difference is just an if statement branch or two in C code. So, see below for some more discussion between (it would be useful if some people could reply to this email and say if and why they agree or disagree with something below -- also non-experts that roughly understand what I'm talking about): On Fri, Oct 13, 2017 at 6:49 PM, Koos Zevenhoven wrote: > This is a continuation of the PEP 555 discussion in > > https://mail.python.org/pipermail/python-ideas/2017-September/046916.html > > And this month in > > https://mail.python.org/pipermail/python-ideas/2017-October/047279.html > > If you are new to the discussion, the best point to start reading this > might be at my second full paragraph below ("The status quo..."). > > On Fri, Oct 13, 2017 at 10:25 AM, Nick Coghlan wrote: > >> On 13 October 2017 at 10:56, Guido van Rossum wrote: >> >>> I'm out of energy to debate every point (Steve said it well -- that >>> decimal/generator example is too contrived), but I found one nit in Nick's >>> email that I wish to correct. >>> >>> On Wed, Oct 11, 2017 at 1:28 AM, Nick Coghlan >>> wrote: >>>> >>>> As a less-contrived example, consider context managers implemented as >>>> generators. >>>> >>>> We want those to run with the execution context that's active when >>>> they're used in a with statement, not the one that's active when they're >>>> created (the fact that generator-based context managers can only be used >>>> once mitigates the risk of creation time context capture causing problems, >>>> but the implications would still be weird enough to be worth avoiding). >>>> >>> >>> Here I think we're in agreement about the desired semantics, but IMO all >>> this requires is some special casing for @contextlib.contextmanager. To me >>> this is the exception, not the rule -- in most *other* places I would want >>> the yield to switch away from the caller's context. >>> >>> >>>> For native coroutines, we want them to run with the execution context >>>> that's active when they're awaited or when they're prepared for submission >>>> to an event loop, not the one that's active when they're created. >>>> >>> >>> This caught my eye as wrong. Considering that asyncio's tasks (as well >>> as curio's and trio's) *are* native coroutines, we want complete isolation >>> between the context active when `await` is called and the context active >>> inside the `async def` function. >>> >> >> The rationale for this behaviour *does* arise from a refactoring argument: >> >> async def original_async_function(): >> with some_context(): >> do_some_setup() >> raw_data = await some_operation() >> data = do_some_postprocessing(raw_data) >> >> Refactored: >> >> async def async_helper_function(): >> do_some_setup() >> raw_data = await some_operation() >> return do_some_postprocessing(raw_data) >> >> async def refactored_async_function(): >> with some_context(): >> data = await async_helper_function() >> >> > ?*This* type of refactoring argument I *do* subscribe to.? > > >> However, considering that coroutines are almost always instantiated at >> the point where they're awaited, I do concede that creation time context >> capture would likely also work out OK for the coroutine case, which would >> leave contextlib.contextmanager as the only special case (and it would turn >> off both creation-time context capture *and* context isolation). >> > > ?The difference between context propagation through coroutine function > calls and awaits comes up when you need help from "the" event loop, which > means things like creating new tasks from coroutines. However, we cannot > even assume that the loop is the only one. So far, it makes no difference > where you call the coroutine function. It is only when you await it or > schedule it for execution in a loop when something can actually happen. > > The status quo is that there's nothing that prevents you from calling a > coroutine function from within one event loop and then awaiting it in > another. So if we want an event loop to be able to pass information down > the call chain in such a way that the information is available *throughout > the whole task that it is driving*, then the contexts needs to a least > propagate through `await`s. > > This was my starting point 2.5 years ago, when Yury was drafting this > status quo (PEP 492). It looked a lot of PEP 492 was inevitable, but that > there will be a problem, where each API that uses "blocking IO" somewhere > under the hood would need a duplicate version for asyncio (and one for each > third-party async framework!). I felt it was necessary to think about a > solution before PEP 492 is accepted, and this became a fairly short-lived > thread here on python-ideas: > > https://mail.python.org/pipermail/python-ideas/2015-May/033267.html > > ?This year, the discussion on Yury's PEP 550 somehow ended up with a very > similar need before I got involved, apparently for independent reasons. > > A design for solving this need (and others) is also found in my first > draft of PEP 555, found at > > https://mail.python.org/pipermail/python-ideas/2017-September/046916.html > > Essentially, it's a way of *passing information down the call chain* when > it's inconvenient or impossible to pass the information as normal function > arguments. I now call the concept "context arguments". > > ?More recently, I put some focus on the direct needs of normal users (as > opposed direct needs of async framework authors). > > Those thoughts are most "easily" discussed in terms of generator > functions, which are very similar to coroutine functions: A generator > function is often thought of as a function that returns an iterable of > lazily evaluated values. In this type of usage, the relevant "function > call" happens when calling the generator function. The subsequent calls to > next() (or a yield from) are thought of as merely getting the items in the > iterable, even if they do actually run code in the generator's frame. The > discussion on this is found starting from this email: > > https://mail.python.org/pipermail/python-ideas/2017-October/047279.html > > However, also coroutines are evaluated lazily. The question is, when > should we consider the "subroutine call" to happen: when the coroutine function > is called, or when the resulting object is awaited. Often these two are > indeed on the same line of code, so it does not matter. But as I discuss > above, there are definitely cases where it matters. This has mostly to do > with the interactions of different tasks within one event loop, or code > where multiple event loops interact. > > As mentioned above, there are cases where propagating the context through > next() and await is desirable. However, there are also cases where the > coroutine call is important. This comes up in the case of multiple > interacting tasks or multiple event loops. > > To start with, probably a more example-friendly case, however, is running > an event loop and a coroutine from synchronous code: > > import asyncio > > async def do_something_context_aware(): > do_something_that_depends_on(current_context()) > > loop = asyncio.get_event_loop() > > with some_context(): > coro = do_something_context_aware() > > loop.run_until_complete(coro) > > ? > ? > ? > ? > Now, if the coroutine function call `do_something_context_aware()` does > not save the current context on `coro`, then there is no way some_context() > can affect the code that will run inside the coroutine, even if that is > what we are explicitly trying to do here. > > ?The easy solution is to delegate the context transfer to the scheduling > function (run_until_complete), and require that the context is passed to > that function: > > with some_context?(): > ? coro = do_something_context_aware() > ? loop.run_until_complete(coro)? > > ?This gives the async framework (here asyncio) a chance to make sure the > context propagates as expected. In general, I'm in favor of giving async > frameworks some freedom in how this is implemented. However, to give the > framework even more freedom, the coroutine call, > do_something_context_aware(), could save the current context branch on > `coro`, which run_until_complete can attach to the Task that gets created. > > The bigger question is, what should happen when a coroutine awaits on > another coroutine directly, without giving the framework a change to > interfere: > > > async def inner(): > do_context_aware_stuff() > > async def outer(): > with first_context(): > coro = inner() > > with second_context(): > await coro > > The big question is: ?In the above, which context should the coroutine be > run in? > > "The" event loop does not have a chance to interfere, so we cannot > delegate the decision. > > ? Note that I did not write the above as what real code is expected to look like. It's just to underline the semantic difference that the context can change between the call and the await. Indeed, one might say that people don't write code like that. And maybe async/await is still sufficiently young that one can sort of say "this is how we showed people how to do it, so that's how they'll do it" [*]. But let's make the above code just a little bit more complicated, so that it becomes easier to believe that the semantic difference here really matters, and cannot be hand-waved away: async def outer(): with some_context(): a = inner() with other_context(): b = inner() await gather(a, b) # execute coroutines a and b concurrently ? If the coroutine function call, inner(), does not save a pointer to the current context at that point, then the code would just ignore the with statements completely and run both coroutines in the outer context, which is clearly not what an author of such code would want the code to do. It is certainly possible to fix the problem with requiring wrapping the coroutines with stuff, but that would lead to nobody ever knowing what the semantics will be without checking if the the coroutine has been wrapped or not. On the other hand, we could make the code *just work*, and that would be completely in line with what I've been promoting also as the semantics for generator functions in this thread: https://mail.python.org/pipermail/python-ideas/2017-October/047279.html I am definitely *not* talking about this kind of semantics because of something *I personally* need: In fact, I arrived at these thoughts because my designs for solving "my" original problem had turned into a more general-purpose mechanism (PEP 555) that would eventually also affect how code written by completely normal users of with statements and generator functions would behave. And certainly the situation is *very* similar to the case of coroutine functions, as (only?) Guido seems to acknowledge. But then how to address "my" original problem where the context would propagate through awaits, and next/send? From what others have written, it seems there are also other situations where that is desired. There are several ways to solve the problem as an extension to PEP 555, but below is one: > ?We need both versions: the one that propagates first_context() into the > coroutine, and the one that propagates second_context() into it. Or, using > my metaphor from the other thread, we need "both the forest and the trees". > ? > > ?A solution to this would be to have two types of context arguments: > > 1. (calling) context arguments? > > and > > 2. execution context arguments > > > So yes, I'm actually serious about this possibility. Now it would be up to library and framework authors to pick the right variant of the two. And this is definitely something that could be documented very clearly.? > ?Both of these would have their own? stack of (argument, value) assignment > pairs, explained in the implementation part of the first PEP 555 draft. > While this is a complication, the performance overhead of these is so > small, that doubling the overhead should not be a performance concern. The > question is, do we want these two types of stacks, or do we want to work > around it somehow, for instance using context-local storage, implemented on > top of the first kind, to implement something like the second kind. > However, that again raises some issues of how to propagate the > context-local storage down the ambiguous call chain. > > ?This would also reduce the need to decorate and wrap generators and decorator functions, although in some cases that would still be needed. If something was not clear, but seems relevant to what I'm trying to discuss here, please ask :) ? ?? ?? ? Koos? ?[*] Maybe it would not even be too late to make minor changes in the PEP 492 semantics of coroutine functions at this point if there was a good enough reason. But in fact, I think the current semantics might be perfectly fine, and I'm definitely not suggesting any changes to existing semantics here. Only extensions to the existing semantics. -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mixie at gmail.com Sun Oct 15 10:06:41 2017 From: amit.mixie at gmail.com (Amit Green) Date: Sun, 15 Oct 2017 10:06:41 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: Generators are a wonderful feature of the python language, and one of its best idea. They are initially very intuitive to understand & easy to use. However, once you get beyond that; they are actually quite confusing because their behavior is not natural. Thus they have a initial easy learning & acceptance curve; and then as you go from initial use to more advanced use, there is a sudden "bump" in the learning curve, which is not as smooth as it could be. Specifically, the fact that when you call them, they do not actually call your code (but instead call a wrapper) is quite confusing. Example: import __main__ def read_and_prefix_each_line(path): with open(path) as f: data = f.read() for s in data.splitlines(): yield '!' + s def print_prefixed_file(path): reader = read_and_prefix_each_line(path) #LINE 12 print('Here is how %r looks prefixed' % path) for s in reader: #LINE 16 print(s) print_prefixed_file(__main__.__file__) print_prefixed_file('nonexistent') Will produce the following: Traceback (most recent call last): File "x.py", line 20, in print_prefixed_file('nonexistent') File "x.py", line 16, in print_prefixed_file for s in reader: File "x.py", line 5, in read_and_prefix_each_line with open(path) as f: IOError: [Errno 2] No such file or directory: 'nonexistent' This is quite confusing to a person who has been using generators for a month, and thinks they understand them. WHY is the traceback happening at line 16 instead of at line 12, when the function is called? It is much more intuitive, and natural [to a beginner], to expect the failure to open the file "nonexistent" to happen at line 12, not line 16. So, now, the user, while trying to figure out a bug, has to learn that: - NO, calling a generator (which looks like a function) does not actually call the body of the function (as the user defined it) - Instead it calls some generator wrapper. - And finally the first time the __next__ method of the wrapper is called, the body of the function (as the user defined it) gets called. And this learning curve is quite steep. It is made even harder by the following: >>> def f(): yield 1 ... >>> f So the user is now convinced that 'f' really is a function. Further investigation makes it even more confusing: >>> f() At this point, the user starts to suspect that something is kind of unusual about 'yield' keyword. Eventually, after a while, the user starts to figure this out: >>> def f(): print('f started'); yield 1 ... >>> f() >>> f().__next__() f started 1 >>> And eventually after reading https://docs.python.org/3/reference/datamodel.html the following sentence: "The following flag bits are defined for co_flags: bit 0x04 is set if the function uses the *arguments syntax to accept an arbitrary number of positional arguments; bit 0x08 is set if the function uses the **keywords syntax to accept arbitrary keyword arguments; bit 0x20 is set if the function is a generator." Finally figures it out: >>> def f(): yield 1 ... >>> f >>> f.__code__.co_flags & 0x20 32 My point is ... this learning process is overly confusing to a new user & not a smooth learning curve. Here is, just a quick initial proposal on how to fix it: >>> def f(): yield 1 >>> ... Syntax Error: the 'yield' keyword can only be used in a generator; please be sure to use @generator before the definition of the generator >>> @generator ... def f(): yield 1 ... >>> f Just the fact it says ' wrote: > On 15 October 2017 at 13:51, Amit Green wrote: > > Once again, I think Paul Moore gets to the heart of the issue. > > > > Generators are simply confusing & async even more so. > > > > Per my earlier email, the fact that generators look like functions, but > are > > not functions, is at the root of the confusion. > > I don't agree. I don't find generators *at all* confusing. They are a > very natural way of expressing things, as has been proven by how > popular they are in the Python community. > > I don't *personally* understand async, but I'm currently willing to > reserve judgement until I've been in a situation where it would be > useful, and therefore needed to learn it. > > Paul > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amit.mixie at gmail.com Sun Oct 15 10:34:01 2017 From: amit.mixie at gmail.com (Amit Green) Date: Sun, 15 Oct 2017 10:34:01 -0400 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: On Sun, Oct 15, 2017 at 9:44 AM, Koos Zevenhoven wrote: > So, see below for some more discussion between (it would be useful if some > people could reply to this email and say if and why they agree or disagree > with something below -- also non-experts that roughly understand what I'm > talking about): > Yes, I understand what you are roughly talking about. Also, yes, generators are co-routines [though when starting to work with generators, people don't fully realize this]. But then how to address "my" original problem where the context would > propagate through awaits, and next/send? From what others have written, it > seems there are also other situations where that is desired. There are > several ways to solve the problem as an extension to PEP 555, but below is > one: > > > >> ?We need both versions: the one that propagates first_context() into the >> coroutine, and the one that propagates second_context() into it. Or, using >> my metaphor from the other thread, we need "both the forest and the trees". >> ? >> >> ?A solution to this would be to have two types of context arguments: >> >> 1. (calling) context arguments? >> >> and >> >> 2. execution context arguments >> >> >> > So yes, I'm actually serious about this possibility. Now it would be up to > library and framework authors to pick the right variant of the two. And > this is definitely something that could be documented very clearly.? > > This is an interesting idea. I would add you also need: 3. Shared context, the generator shares the context with it's caller which means: - If the caller changes the context, the generator, see the changed context next time it's __next__ function is called - If the generator changes the context, the caller sees the changed context. - [This clearly make changing the context using 'with' totally unusable in both the caller & the generator -- unless we add even odder semantics, that the generator restores the original context when it exists???] - (As per previous email by me, I claim this is the most natural way beginners are going to think it works; and needs to be supported; also in real code this is not often useful] - I'm not sure if this would even work with async or not -- *IF* not, I would still have a syntax for the user to attempt this -- and throw a Syntax Error when they do, with a good explanation of why this combination doesn't work for async. I believe good explanations are a great way for people to learn which features can't be combined together & why. > If something was not clear, but seems relevant to what I'm trying to > discuss here, please ask :) > I looked for you quote "we need both the forest & the trees", but didn't find it here. I quite strongly agree we need both; in fact need also the third case I highlighted above. As for what Guido wrote, that we might be trying to solve too many problems -- probably. However, these are real issues with context's, not edge cases. Thus Guido writing we don't want to allow yield within a 'with' clause (as it leaks context) .. I would argue two things: - There are use cases where we *DO* want this -- rare -- true -- but they exist (i.e.: my #3 above) - IF, for simplicity, sake, it is decided not to handle this case now; then make it a syntax error in the language; i.e.: def f(): with context() as x: yield 1 Syntax error: 'yield' may not be used inside a 'with' clause. This would really help new users not to make a mistake that takes hours to debug; & help correct their [initial mistaken] thinking on how contexts & generators interact. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sun Oct 15 11:11:56 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 15 Oct 2017 18:11:56 +0300 Subject: [Python-ideas] (PEP 555 subtopic) Propagation of context in async code In-Reply-To: References: Message-ID: On Sun, Oct 15, 2017 at 5:34 PM, Amit Green wrote: > On Sun, Oct 15, 2017 at 9:44 AM, Koos Zevenhoven > wrote: > >> So, see below for some more discussion between (it would be useful if >> some people could reply to this email and say if and why they agree or >> disagree with something below -- also non-experts that roughly understand >> what I'm talking about): >> > > Yes, I understand what you are roughly talking about. > > Also, yes, generators are co-routines [though when starting to work with > generators, people don't fully realize this]. > > But then how to address "my" original problem where the context would >> propagate through awaits, and next/send? From what others have written, it >> seems there are also other situations where that is desired. There are >> several ways to solve the problem as an extension to PEP 555, but below is >> one: >> >> >> >>> ?We need both versions: the one that propagates first_context() into the >>> coroutine, and the one that propagates second_context() into it. Or, using >>> my metaphor from the other thread, we need "both the forest and the trees". >>> ? >>> >>> ?A solution to this would be to have two types of context arguments: >>> >>> 1. (calling) context arguments? >>> >>> and >>> >>> 2. execution context arguments >>> >>> >>> >> So yes, I'm actually serious about this possibility. Now it would be up >> to library and framework authors to pick the right variant of the two. And >> this is definitely something that could be documented very clearly.? >> >> > This is an interesting idea. I would add you also need: > > 3. Shared context, the generator shares the context with it's caller > which means: > > - If the caller changes the context, the generator, see the changed > context next time it's __next__ function is called > - If the generator changes the context, the caller sees the changed > context. > - [This clearly make changing the context using 'with' totally > unusable in both the caller & the generator -- unless we add even odder > semantics, that the generator restores the original context when it > exists???] > - (As per previous email by me, I claim this is the most natural way > beginners are going to think it works; and needs to be supported; also in > real code this is not often useful] > - I'm not sure if this would even work with async or not -- *IF* not, > I would still have a syntax for the user to attempt this -- and throw a > Syntax Error when they do, with a good explanation of why this combination > doesn't work for async. I believe good explanations are a great way for > people to learn which features can't be combined together & why. > > Just as a quick note, after skimming through your bullet points: ?All of this is indeed covered with decorators and ?other explicit mechanisms in the PEP 555 approach. I don't think we need syntax errors, though. > >> If something was not clear, but seems relevant to what I'm trying to >> discuss here, please ask :) >> > > > I looked for you quote "we need both the forest & the trees", but didn't > find it here. I quite strongly agree we need both; in fact need also the > third case I highlighted above. > > The ordering of the archive was indeed thoroughly destroyed. Ordering by date might help. ?But the quote you ask for is here: https://mail.python.org/pipermail/python-ideas/2017-October/047285.html ?? ?-Koos? > As for what Guido wrote, that we might be trying to solve too many > problems -- probably. However, these are real issues with context's, not > edge cases. > > Thus Guido writing we don't want to allow yield within a 'with' clause (as > it leaks context) .. I would argue two things: > > - There are use cases where we *DO* want this -- rare -- true -- but > they exist (i.e.: my #3 above) > > - IF, for simplicity, sake, it is decided not to handle this case now; > then make it a syntax error in the language; i.e.: > > def f(): > with context() as x: > yield 1 > > Syntax error: 'yield' may not be used inside a 'with' clause. > > This would really help new users not to make a mistake that takes hours to > debug; & help correct their [initial mistaken] thinking on how contexts & > generators interact. > > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 15 11:58:28 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Oct 2017 08:58:28 -0700 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: I'd like to have time.time_ns() -- this is most parallel to st_mtime_ns. On Sun, Oct 15, 2017 at 2:59 AM, Victor Stinner wrote: > > Might it make more sense to have a parallel *module* that works with a > different base data type rather than parallel functions within the existing > API? > > I asked about adding new functions to 4 different modules: os, resource, > signal, time. > > For example, I dislike the idea of having os and os_ns modules. We already > have os.stat() which returns time as seconds and nanoseconds (both at the > same time). There is also os.utime() which accepts time as seconds *or* > nanoseconds: os.utime (path, times=seconds) or os.utime(path, > ns=nanoseconds). > > If we had a time_ns module, would it only contain 4 clocks or does it have > to duplicate the full API? If yes, it is likely to be a mess to maintain > them. How will user choose between time and time_ns? What if tomorrow > clocks get picosecond resolution? (CPU TSC also has sub-nanosecond > resolution, but OS API uses timespec, 1 ns res.) Add a third module? > > I prefer to leave all "time functions" in the "time module". > > For example, I don't think that we need to add time.nanosleep() or > time.sleep_ns(), since the precision loss starts after a sleep of 104 days. > Who cares of 1 nanosecond after a sleep of 104 days? > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sun Oct 15 13:04:25 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 15 Oct 2017 20:04:25 +0300 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum wrote: > I'd like to have time.time_ns() -- this is most parallel to st_mtime_ns. > > ? Welcome to the list Guido! You sound like a C programmer. For many people, that was the best language they knew of when they learned to program. But have you ever tried Python? You should give it a try! -- Koos P.S. ?Sorry, couldn't resist :-) I guess having two versions of one function would not be that bad. I will probably never use the ns version anyway. But I'd like a more general solution to such problems in the long run. -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Oct 15 13:17:16 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 15 Oct 2017 19:17:16 +0200 Subject: [Python-ideas] Why not picoseconds? References: Message-ID: <20171015191716.371fba63@fsol> Since new APIs are expensive and we'd like to be future-proof, why not move to picoseconds? That would be safe until clocks reach the THz barrier, which is quite far away from us. Regards Antoine. On Fri, 13 Oct 2017 16:12:39 +0200 Victor Stinner wrote: > Hi, > > I would like to add new functions to return time as a number of > nanosecond (Python int), especially time.time_ns(). > > It would enhance the time.time() clock resolution. In my experience, > it decreases the minimum non-zero delta between two clock by 3 times, > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > Python. > > The question of this long email is if it's worth it to add more "_ns" > time functions than just time.time_ns()? > > I would like to add: > > * time.time_ns() > * time.monotonic_ns() > * time.perf_counter_ns() > * time.clock_gettime_ns() > * time.clock_settime_ns() > > time(), monotonic() and perf_counter() clocks are the 3 most common > clocks and users use them to get the best available clock resolution. > clock_gettime/settime() are the generic UNIX API to access these > clocks and so should also be enhanced to get nanosecond resolution. > > > == Nanosecond resolution == > > More and more clocks have a frequency in MHz, up to GHz for the "TSC" > CPU clock, and so the clocks resolution is getting closer to 1 > nanosecond (or even better than 1 ns for the TSC clock!). > > The problem is that Python returns time as a floatting point number > which is usually a 64-bit binary floatting number (in the IEEE 754 > format). This type starts to loose nanoseconds after 104 days. > Conversion from nanoseconds (int) to seconds (float) and then back to > nanoseconds (int) to check if conversions loose precision: > > # no precision loss > >>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x > 0 > # precision loss! (1 nanosecond) > >>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x > -1 > >>> print(datetime.timedelta(seconds=2**53 / 1e9)) > 104 days, 5:59:59.254741 > > While a system administrator can be proud to have an uptime longer > than 104 days, the problem also exists for the time.time() clock which > returns the number of seconds since the UNIX epoch (1970-01-01). This > clock started to loose nanoseconds since mid-May 1970 (47 years ago): > > >>> import datetime > >>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 / 1e9)) > 1970-04-15 05:59:59.254741 > > > == PEP 410 == > > Five years ago, I proposed a large and complex change in all Python > functions returning time to support nanosecond resolution using the > decimal.Decimal type: > > https://www.python.org/dev/peps/pep-0410/ > > The PEP was rejected for different reasons: > > * it wasn't clear if hardware clocks really had a resolution of 1 > nanosecond, especially when the clock is read from Python, since > reading a clock in Python also takes time... > > * Guido van Rossum rejected the idea of adding a new optional > parameter to change the result type: it's an uncommon programming > practice (bad design in Python) > > * decimal.Decimal is not widely used, it might be surprised to get such type > > > == CPython enhancements of the last 5 years == > > Since this PEP was rejected: > > * the os.stat_result got 3 fields for timestamps as nanoseconds > (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns > > * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() > and time.process_time() > > * I enhanced the private C API of Python handling time (API called > "pytime") to store all timings as the new _PyTime_t type which is a > simple 64-bit signed integer. The unit of _PyTime_t is not part of the > API, it's an implementation detail. The unit is currently 1 > nanosecond. > > > This week, I converted one of the last clock to new _PyTime_t format: > time.perf_counter() now has internally a resolution of 1 nanosecond, > instead of using the C double type. > > XXX technically https://github.com/python/cpython/pull/3983 is not > merged yet :-) > > > > == Clocks resolution in Python == > > I implemented time.time_ns(), time.monotonic_ns() and > time.perf_counter_ns() which are similar of the functions without the > "_ns" suffix, but return time as nanoseconds (Python int). > > I computed the smallest difference between two clock reads (ignoring a > differences of zero): > > Linux: > > * time_ns(): 84 ns <=== !!! > * time(): 239 ns <=== !!! > * perf_counter_ns(): 84 ns > * perf_counter(): 82 ns > * monotonic_ns(): 84 ns > * monotonic(): 81 ns > > Windows: > > * time_ns(): 318000 ns <=== !!! > * time(): 894070 ns <=== !!! > * perf_counter_ns(): 100 ns > * perf_counter(): 100 ns > * monotonic_ns(): 15000000 ns > * monotonic(): 15000000 ns > > The difference on time.time() is significant: 84 ns (2.8x better) vs > 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The > difference will be larger next years since every day adds > 864,00,000,000,000 nanoseconds to the system clock :-) (please don't > bug me with leap seconds! you got my point) > > The difference on perf_counter and monotonic clocks are not visible in > this quick script since my script runs less than 1 minute, my computer > uptime is smaller than 1 weak, ... and Python internally starts these > clocks at zero *to reduce the precision loss*! Using an uptime larger > than 104 days, you would probably see a significant difference (at > least +/- 1 nanosecond) between the regular (seconds as double) and > the "_ns" (nanoseconds as int) clocks. > > > > == How many new nanosecond clocks? == > > The PEP 410 proposed to modify the following functions: > > * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime > and st_mtime fields of the stat structure), sched_rr_get_interval(), > times(), wait3() and wait4() > > * resource module: ru_utime and ru_stime fields of getrusage() > > * signal module: getitimer(), setitimer() > > * time module: clock(), clock_gettime(), clock_getres(), monotonic(), > time() and wallclock() ("wallclock()" was finally called "monotonic", > see PEP 418) > > > According to my tests of the previous section, the precision loss > starts after 104 days (stored in nanoseconds). I don't know if it's > worth it to modify functions which return "CPU time" or "process time" > of processes, since most processes live shorter than 104 days. Do you > care of a resolution of 1 nanosecond for the CPU and process time? > > Maybe we need 1 nanosecond resolution for profiling and benchmarks. > But in that case, you might want to implement your profiler in C > rather in Python, like the hotshot module, no? The "pytime" private > API of CPython gives you clocks with a resolution of 1 nanosecond. > > > == Annex: clock performance == > > To have an idea of the cost of reading the clock on the clock > resolution in Python, I also ran a microbenchmark on *reading* a > clock. Example: > > $ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' 't()' > > Linux (Mean +- std dev): > > * time.time(): 45.4 ns +- 0.5 ns > * time.time_ns(): 47.8 ns +- 0.8 ns > * time.perf_counter(): 46.8 ns +- 0.7 ns > * time.perf_counter_ns(): 46.0 ns +- 0.6 ns > > Windows (Mean +- std dev): > > * time.time(): 42.2 ns +- 0.8 ns > * time.time_ns(): 49.3 ns +- 0.8 ns > * time.perf_counter(): 136 ns +- 2 ns <=== > * time.perf_counter_ns(): 143 ns +- 4 ns <=== > * time.monotonic(): 38.3 ns +- 0.9 ns > * time.monotonic_ns(): 48.8 ns +- 1.2 ns > > Most clocks have the same performance except of perf_counter on > Windows: around 140 ns whereas other clocks are around 45 ns (on Linux > and Windows): 3x slower. Maybe the "bad" perf_counter performance can > be explained by the fact that I'm running Windows in a VM, which is > not ideal for benchmarking. Or maybe my C implementation of > time.perf_counter() is slow? > > Note: I expect that a significant part of the numbers are the cost of > Python function calls. Reading these clocks using the Python C > functions are likely faster. > > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From mal at egenix.com Sun Oct 15 13:17:57 2017 From: mal at egenix.com (M.-A. Lemburg) Date: Sun, 15 Oct 2017 19:17:57 +0200 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: <248ab692-87f7-8900-d4a0-c4ed3cb28112@egenix.com> On 15.10.2017 07:13, Nick Coghlan wrote: > On 15 October 2017 at 14:53, M.-A. Lemburg wrote: >> The behavior of also deferring the context to time of >> execution should be the non-standard form to not break >> this intuition, otherwise debugging will be a pain and >> writing fully working code would be really hard in the >> face of changing contexts (e.g. say decimal rounding >> changes in different parts of the code). >> > > No, it really wouldn't, since "the execution context is the context that's > active when the code is executed" is relatively easy to understand based > entirely on the way functions, methods, and other forms of delayed > execution work (including iterators). > > "The execution context is the context that's active when the code is > executed, *unless* the code is in a generator, in which case, it's the > context that was active when the generator-iterator was instantiated" is > harder to follow. I think you're mixing two concepts here: the context defines a way code is supposed to be interpreted at runtime. This doesn't have anything to do with when the code is actually run. Just think what happens if you write code using a specific context (let's say rounding to two decimal places), which then get executed deferred within another context (say rounding to three decimal places) for part of the generator run and yet another context (say rounding to whole integers) for the remainder of the generator. I can't think of a good use case where such behavior would be intuitive, expected or even reasonable ;-) The context should be inherited by the generator when instantiated and not change after that, so that the context defining the generator takes precedent over any later context in which the generator is later run. Note that the above is not the same as raising an exception and catching it somewhere else (as Nathaniel brought up). The context actually changes semantics of code, whereas exceptions only flag a special state and let other code decide what to do with it (defined where the exception handling is happening, not where the raise is caused). Just for clarification: I haven't followed the thread, just saw your posting and found the argument you put forward a bit hard to follow. I may well be missing some context or evaluating the argument in a different one as the one where it was defined ;-) -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ ________________________________________________________________________ ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ From k7hoven at gmail.com Sun Oct 15 13:40:00 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 15 Oct 2017 20:40:00 +0300 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: All joking aside, I actually like it that Python also allows one to interact with lower-level concepts when needed. Maybe there could be even more of this? -- Koos On Sun, Oct 15, 2017 at 8:04 PM, Koos Zevenhoven wrote: > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum > wrote: > >> I'd like to have time.time_ns() -- this is most parallel to st_mtime_ns. >> >> ? > Welcome to the list Guido! You sound like a C programmer. For many people, > that was the best language they knew of when they learned to program. But > have you ever tried Python? You should give it a try! > > > -- Koos > > P.S. ?Sorry, couldn't resist :-) I guess having two versions of one > function would not be that bad. I will probably never use the ns version > anyway. But I'd like a more general solution to such problems in the long > run. > > > -- > + Koos Zevenhoven + http://twitter.com/k7hoven + > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sun Oct 15 14:02:49 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 15 Oct 2017 21:02:49 +0300 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <20171015191716.371fba63@fsol> References: <20171015191716.371fba63@fsol> Message-ID: On Sun, Oct 15, 2017 at 8:17 PM, Antoine Pitrou wrote: > > Since new APIs are expensive and we'd like to be future-proof, why not > move to picoseconds? That would be safe until clocks reach the THz > barrier, which is quite far away from us. > > ?I somewhat like the thought, but would everyone then end up thinking about what power of 1000 they need to multiply with? -- Koos? > Regards > > Antoine. > > > On Fri, 13 Oct 2017 16:12:39 +0200 > Victor Stinner > wrote: > > Hi, > > > > I would like to add new functions to return time as a number of > > nanosecond (Python int), especially time.time_ns(). > > > > It would enhance the time.time() clock resolution. In my experience, > > it decreases the minimum non-zero delta between two clock by 3 times, > > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > > Python. > > > > The question of this long email is if it's worth it to add more "_ns" > > time functions than just time.time_ns()? > > > > I would like to add: > > > > * time.time_ns() > > * time.monotonic_ns() > > * time.perf_counter_ns() > > * time.clock_gettime_ns() > > * time.clock_settime_ns() > > > > time(), monotonic() and perf_counter() clocks are the 3 most common > > clocks and users use them to get the best available clock resolution. > > clock_gettime/settime() are the generic UNIX API to access these > > clocks and so should also be enhanced to get nanosecond resolution. > > > > > > == Nanosecond resolution == > > > > More and more clocks have a frequency in MHz, up to GHz for the "TSC" > > CPU clock, and so the clocks resolution is getting closer to 1 > > nanosecond (or even better than 1 ns for the TSC clock!). > > > > The problem is that Python returns time as a floatting point number > > which is usually a 64-bit binary floatting number (in the IEEE 754 > > format). This type starts to loose nanoseconds after 104 days. > > Conversion from nanoseconds (int) to seconds (float) and then back to > > nanoseconds (int) to check if conversions loose precision: > > > > # no precision loss > > >>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x > > 0 > > # precision loss! (1 nanosecond) > > >>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x > > -1 > > >>> print(datetime.timedelta(seconds=2**53 / 1e9)) > > 104 days, 5:59:59.254741 > > > > While a system administrator can be proud to have an uptime longer > > than 104 days, the problem also exists for the time.time() clock which > > returns the number of seconds since the UNIX epoch (1970-01-01). This > > clock started to loose nanoseconds since mid-May 1970 (47 years ago): > > > > >>> import datetime > > >>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 > / 1e9)) > > 1970-04-15 05:59:59.254741 > > > > > > == PEP 410 == > > > > Five years ago, I proposed a large and complex change in all Python > > functions returning time to support nanosecond resolution using the > > decimal.Decimal type: > > > > https://www.python.org/dev/peps/pep-0410/ > > > > The PEP was rejected for different reasons: > > > > * it wasn't clear if hardware clocks really had a resolution of 1 > > nanosecond, especially when the clock is read from Python, since > > reading a clock in Python also takes time... > > > > * Guido van Rossum rejected the idea of adding a new optional > > parameter to change the result type: it's an uncommon programming > > practice (bad design in Python) > > > > * decimal.Decimal is not widely used, it might be surprised to get such > type > > > > > > == CPython enhancements of the last 5 years == > > > > Since this PEP was rejected: > > > > * the os.stat_result got 3 fields for timestamps as nanoseconds > > (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns > > > > * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() > > and time.process_time() > > > > * I enhanced the private C API of Python handling time (API called > > "pytime") to store all timings as the new _PyTime_t type which is a > > simple 64-bit signed integer. The unit of _PyTime_t is not part of the > > API, it's an implementation detail. The unit is currently 1 > > nanosecond. > > > > > > This week, I converted one of the last clock to new _PyTime_t format: > > time.perf_counter() now has internally a resolution of 1 nanosecond, > > instead of using the C double type. > > > > XXX technically https://github.com/python/cpython/pull/3983 is not > > merged yet :-) > > > > > > > > == Clocks resolution in Python == > > > > I implemented time.time_ns(), time.monotonic_ns() and > > time.perf_counter_ns() which are similar of the functions without the > > "_ns" suffix, but return time as nanoseconds (Python int). > > > > I computed the smallest difference between two clock reads (ignoring a > > differences of zero): > > > > Linux: > > > > * time_ns(): 84 ns <=== !!! > > * time(): 239 ns <=== !!! > > * perf_counter_ns(): 84 ns > > * perf_counter(): 82 ns > > * monotonic_ns(): 84 ns > > * monotonic(): 81 ns > > > > Windows: > > > > * time_ns(): 318000 ns <=== !!! > > * time(): 894070 ns <=== !!! > > * perf_counter_ns(): 100 ns > > * perf_counter(): 100 ns > > * monotonic_ns(): 15000000 ns > > * monotonic(): 15000000 ns > > > > The difference on time.time() is significant: 84 ns (2.8x better) vs > > 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The > > difference will be larger next years since every day adds > > 864,00,000,000,000 nanoseconds to the system clock :-) (please don't > > bug me with leap seconds! you got my point) > > > > The difference on perf_counter and monotonic clocks are not visible in > > this quick script since my script runs less than 1 minute, my computer > > uptime is smaller than 1 weak, ... and Python internally starts these > > clocks at zero *to reduce the precision loss*! Using an uptime larger > > than 104 days, you would probably see a significant difference (at > > least +/- 1 nanosecond) between the regular (seconds as double) and > > the "_ns" (nanoseconds as int) clocks. > > > > > > > > == How many new nanosecond clocks? == > > > > The PEP 410 proposed to modify the following functions: > > > > * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime > > and st_mtime fields of the stat structure), sched_rr_get_interval(), > > times(), wait3() and wait4() > > > > * resource module: ru_utime and ru_stime fields of getrusage() > > > > * signal module: getitimer(), setitimer() > > > > * time module: clock(), clock_gettime(), clock_getres(), monotonic(), > > time() and wallclock() ("wallclock()" was finally called "monotonic", > > see PEP 418) > > > > > > According to my tests of the previous section, the precision loss > > starts after 104 days (stored in nanoseconds). I don't know if it's > > worth it to modify functions which return "CPU time" or "process time" > > of processes, since most processes live shorter than 104 days. Do you > > care of a resolution of 1 nanosecond for the CPU and process time? > > > > Maybe we need 1 nanosecond resolution for profiling and benchmarks. > > But in that case, you might want to implement your profiler in C > > rather in Python, like the hotshot module, no? The "pytime" private > > API of CPython gives you clocks with a resolution of 1 nanosecond. > > > > > > == Annex: clock performance == > > > > To have an idea of the cost of reading the clock on the clock > > resolution in Python, I also ran a microbenchmark on *reading* a > > clock. Example: > > > > $ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' > 't()' > > > > Linux (Mean +- std dev): > > > > * time.time(): 45.4 ns +- 0.5 ns > > * time.time_ns(): 47.8 ns +- 0.8 ns > > * time.perf_counter(): 46.8 ns +- 0.7 ns > > * time.perf_counter_ns(): 46.0 ns +- 0.6 ns > > > > Windows (Mean +- std dev): > > > > * time.time(): 42.2 ns +- 0.8 ns > > * time.time_ns(): 49.3 ns +- 0.8 ns > > * time.perf_counter(): 136 ns +- 2 ns <=== > > * time.perf_counter_ns(): 143 ns +- 4 ns <=== > > * time.monotonic(): 38.3 ns +- 0.9 ns > > * time.monotonic_ns(): 48.8 ns +- 1.2 ns > > > > Most clocks have the same performance except of perf_counter on > > Windows: around 140 ns whereas other clocks are around 45 ns (on Linux > > and Windows): 3x slower. Maybe the "bad" perf_counter performance can > > be explained by the fact that I'm running Windows in a VM, which is > > not ideal for benchmarking. Or maybe my C implementation of > > time.perf_counter() is slow? > > > > Note: I expect that a significant part of the numbers are the cost of > > Python function calls. Reading these clocks using the Python C > > functions are likely faster. > > > > > > Victor > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Sun Oct 15 14:15:53 2017 From: python at mrabarnett.plus.com (MRAB) Date: Sun, 15 Oct 2017 19:15:53 +0100 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> Message-ID: On 2017-10-15 19:02, Koos Zevenhoven wrote: > On Sun, Oct 15, 2017 at 8:17 PM, Antoine Pitrou >wrote: > > > Since new APIs are expensive and we'd like to be future-proof, why not > move to picoseconds?? That would be safe until clocks reach the THz > barrier, which is quite far away from us. > > > ?I somewhat like the thought, but would everyone then end up thinking > about what power of 1000 they need to multiply with? > A simple solution to that would be to provide the multiplier as a named constant. [snip] From victor.stinner at gmail.com Sun Oct 15 14:28:40 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Sun, 15 Oct 2017 20:28:40 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <20171015191716.371fba63@fsol> References: <20171015191716.371fba63@fsol> Message-ID: I proposed to use nanoseconds because UNIX has 1 ns resolution in timespec, the most recent API, and Windows has 100 ns. Using picoseconds would confuse users who may expect sub-nanosecond resolution, whereas no OS support them currently. Moreover, nanoseconds as int already landed in os.stat and os.utime. Last but not least, I already strugle in pytime.c to prevent integer overflow with 1 ns resolution. It can quickly become much more complex if there is no native C int type supporting a range large enough to more 1 picosecond resolution usable. I really like using int64_t for _PyTime_t, it's well supported, very easy to use (ex: "t = t2 - t1"). 64-bit int supports year after 2200 for delta since 1970. Note: I only know Ruby which chose picoseconds. Victor Le 15 oct. 2017 19:18, "Antoine Pitrou" a ?crit : > > Since new APIs are expensive and we'd like to be future-proof, why not > move to picoseconds? That would be safe until clocks reach the THz > barrier, which is quite far away from us. > > Regards > > Antoine. > > > On Fri, 13 Oct 2017 16:12:39 +0200 > Victor Stinner > wrote: > > Hi, > > > > I would like to add new functions to return time as a number of > > nanosecond (Python int), especially time.time_ns(). > > > > It would enhance the time.time() clock resolution. In my experience, > > it decreases the minimum non-zero delta between two clock by 3 times, > > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > > Python. > > > > The question of this long email is if it's worth it to add more "_ns" > > time functions than just time.time_ns()? > > > > I would like to add: > > > > * time.time_ns() > > * time.monotonic_ns() > > * time.perf_counter_ns() > > * time.clock_gettime_ns() > > * time.clock_settime_ns() > > > > time(), monotonic() and perf_counter() clocks are the 3 most common > > clocks and users use them to get the best available clock resolution. > > clock_gettime/settime() are the generic UNIX API to access these > > clocks and so should also be enhanced to get nanosecond resolution. > > > > > > == Nanosecond resolution == > > > > More and more clocks have a frequency in MHz, up to GHz for the "TSC" > > CPU clock, and so the clocks resolution is getting closer to 1 > > nanosecond (or even better than 1 ns for the TSC clock!). > > > > The problem is that Python returns time as a floatting point number > > which is usually a 64-bit binary floatting number (in the IEEE 754 > > format). This type starts to loose nanoseconds after 104 days. > > Conversion from nanoseconds (int) to seconds (float) and then back to > > nanoseconds (int) to check if conversions loose precision: > > > > # no precision loss > > >>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x > > 0 > > # precision loss! (1 nanosecond) > > >>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x > > -1 > > >>> print(datetime.timedelta(seconds=2**53 / 1e9)) > > 104 days, 5:59:59.254741 > > > > While a system administrator can be proud to have an uptime longer > > than 104 days, the problem also exists for the time.time() clock which > > returns the number of seconds since the UNIX epoch (1970-01-01). This > > clock started to loose nanoseconds since mid-May 1970 (47 years ago): > > > > >>> import datetime > > >>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 > / 1e9)) > > 1970-04-15 05:59:59.254741 > > > > > > == PEP 410 == > > > > Five years ago, I proposed a large and complex change in all Python > > functions returning time to support nanosecond resolution using the > > decimal.Decimal type: > > > > https://www.python.org/dev/peps/pep-0410/ > > > > The PEP was rejected for different reasons: > > > > * it wasn't clear if hardware clocks really had a resolution of 1 > > nanosecond, especially when the clock is read from Python, since > > reading a clock in Python also takes time... > > > > * Guido van Rossum rejected the idea of adding a new optional > > parameter to change the result type: it's an uncommon programming > > practice (bad design in Python) > > > > * decimal.Decimal is not widely used, it might be surprised to get such > type > > > > > > == CPython enhancements of the last 5 years == > > > > Since this PEP was rejected: > > > > * the os.stat_result got 3 fields for timestamps as nanoseconds > > (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns > > > > * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() > > and time.process_time() > > > > * I enhanced the private C API of Python handling time (API called > > "pytime") to store all timings as the new _PyTime_t type which is a > > simple 64-bit signed integer. The unit of _PyTime_t is not part of the > > API, it's an implementation detail. The unit is currently 1 > > nanosecond. > > > > > > This week, I converted one of the last clock to new _PyTime_t format: > > time.perf_counter() now has internally a resolution of 1 nanosecond, > > instead of using the C double type. > > > > XXX technically https://github.com/python/cpython/pull/3983 is not > > merged yet :-) > > > > > > > > == Clocks resolution in Python == > > > > I implemented time.time_ns(), time.monotonic_ns() and > > time.perf_counter_ns() which are similar of the functions without the > > "_ns" suffix, but return time as nanoseconds (Python int). > > > > I computed the smallest difference between two clock reads (ignoring a > > differences of zero): > > > > Linux: > > > > * time_ns(): 84 ns <=== !!! > > * time(): 239 ns <=== !!! > > * perf_counter_ns(): 84 ns > > * perf_counter(): 82 ns > > * monotonic_ns(): 84 ns > > * monotonic(): 81 ns > > > > Windows: > > > > * time_ns(): 318000 ns <=== !!! > > * time(): 894070 ns <=== !!! > > * perf_counter_ns(): 100 ns > > * perf_counter(): 100 ns > > * monotonic_ns(): 15000000 ns > > * monotonic(): 15000000 ns > > > > The difference on time.time() is significant: 84 ns (2.8x better) vs > > 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The > > difference will be larger next years since every day adds > > 864,00,000,000,000 nanoseconds to the system clock :-) (please don't > > bug me with leap seconds! you got my point) > > > > The difference on perf_counter and monotonic clocks are not visible in > > this quick script since my script runs less than 1 minute, my computer > > uptime is smaller than 1 weak, ... and Python internally starts these > > clocks at zero *to reduce the precision loss*! Using an uptime larger > > than 104 days, you would probably see a significant difference (at > > least +/- 1 nanosecond) between the regular (seconds as double) and > > the "_ns" (nanoseconds as int) clocks. > > > > > > > > == How many new nanosecond clocks? == > > > > The PEP 410 proposed to modify the following functions: > > > > * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime > > and st_mtime fields of the stat structure), sched_rr_get_interval(), > > times(), wait3() and wait4() > > > > * resource module: ru_utime and ru_stime fields of getrusage() > > > > * signal module: getitimer(), setitimer() > > > > * time module: clock(), clock_gettime(), clock_getres(), monotonic(), > > time() and wallclock() ("wallclock()" was finally called "monotonic", > > see PEP 418) > > > > > > According to my tests of the previous section, the precision loss > > starts after 104 days (stored in nanoseconds). I don't know if it's > > worth it to modify functions which return "CPU time" or "process time" > > of processes, since most processes live shorter than 104 days. Do you > > care of a resolution of 1 nanosecond for the CPU and process time? > > > > Maybe we need 1 nanosecond resolution for profiling and benchmarks. > > But in that case, you might want to implement your profiler in C > > rather in Python, like the hotshot module, no? The "pytime" private > > API of CPython gives you clocks with a resolution of 1 nanosecond. > > > > > > == Annex: clock performance == > > > > To have an idea of the cost of reading the clock on the clock > > resolution in Python, I also ran a microbenchmark on *reading* a > > clock. Example: > > > > $ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' > 't()' > > > > Linux (Mean +- std dev): > > > > * time.time(): 45.4 ns +- 0.5 ns > > * time.time_ns(): 47.8 ns +- 0.8 ns > > * time.perf_counter(): 46.8 ns +- 0.7 ns > > * time.perf_counter_ns(): 46.0 ns +- 0.6 ns > > > > Windows (Mean +- std dev): > > > > * time.time(): 42.2 ns +- 0.8 ns > > * time.time_ns(): 49.3 ns +- 0.8 ns > > * time.perf_counter(): 136 ns +- 2 ns <=== > > * time.perf_counter_ns(): 143 ns +- 4 ns <=== > > * time.monotonic(): 38.3 ns +- 0.9 ns > > * time.monotonic_ns(): 48.8 ns +- 1.2 ns > > > > Most clocks have the same performance except of perf_counter on > > Windows: around 140 ns whereas other clocks are around 45 ns (on Linux > > and Windows): 3x slower. Maybe the "bad" perf_counter performance can > > be explained by the fact that I'm running Windows in a VM, which is > > not ideal for benchmarking. Or maybe my C implementation of > > time.perf_counter() is slow? > > > > Note: I expect that a significant part of the numbers are the cost of > > Python function calls. Reading these clocks using the Python C > > functions are likely faster. > > > > > > Victor > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Sun Oct 15 15:13:27 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 15 Oct 2017 21:13:27 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> Message-ID: Hi all, I propose multiples of the Planck time, 5.39 ? 10 ?44 s. Unlikely computers can be more accurate than that anytime soon. On a more serious note, I am not sure what problem was solved by moving from double to a fixed-precision format. I do know that it now introduced the issue of finding the correct base unit of the fixed-precision format. Stephan 2017-10-15 20:28 GMT+02:00 Victor Stinner : > I proposed to use nanoseconds because UNIX has 1 ns resolution in > timespec, the most recent API, and Windows has 100 ns. > > Using picoseconds would confuse users who may expect sub-nanosecond > resolution, whereas no OS support them currently. > > Moreover, nanoseconds as int already landed in os.stat and os.utime. > > Last but not least, I already strugle in pytime.c to prevent integer > overflow with 1 ns resolution. It can quickly become much more complex if > there is no native C int type supporting a range large enough to more 1 > picosecond resolution usable. I really like using int64_t for _PyTime_t, > it's well supported, very easy to use (ex: "t = t2 - t1"). 64-bit int > supports year after 2200 for delta since 1970. > > Note: I only know Ruby which chose picoseconds. > > Victor > > Le 15 oct. 2017 19:18, "Antoine Pitrou" a ?crit : > >> >> Since new APIs are expensive and we'd like to be future-proof, why not >> move to picoseconds? That would be safe until clocks reach the THz >> barrier, which is quite far away from us. >> >> Regards >> >> Antoine. >> >> >> On Fri, 13 Oct 2017 16:12:39 +0200 >> Victor Stinner >> wrote: >> > Hi, >> > >> > I would like to add new functions to return time as a number of >> > nanosecond (Python int), especially time.time_ns(). >> > >> > It would enhance the time.time() clock resolution. In my experience, >> > it decreases the minimum non-zero delta between two clock by 3 times, >> > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on >> > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in >> > Python. >> > >> > The question of this long email is if it's worth it to add more "_ns" >> > time functions than just time.time_ns()? >> > >> > I would like to add: >> > >> > * time.time_ns() >> > * time.monotonic_ns() >> > * time.perf_counter_ns() >> > * time.clock_gettime_ns() >> > * time.clock_settime_ns() >> > >> > time(), monotonic() and perf_counter() clocks are the 3 most common >> > clocks and users use them to get the best available clock resolution. >> > clock_gettime/settime() are the generic UNIX API to access these >> > clocks and so should also be enhanced to get nanosecond resolution. >> > >> > >> > == Nanosecond resolution == >> > >> > More and more clocks have a frequency in MHz, up to GHz for the "TSC" >> > CPU clock, and so the clocks resolution is getting closer to 1 >> > nanosecond (or even better than 1 ns for the TSC clock!). >> > >> > The problem is that Python returns time as a floatting point number >> > which is usually a 64-bit binary floatting number (in the IEEE 754 >> > format). This type starts to loose nanoseconds after 104 days. >> > Conversion from nanoseconds (int) to seconds (float) and then back to >> > nanoseconds (int) to check if conversions loose precision: >> > >> > # no precision loss >> > >>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x >> > 0 >> > # precision loss! (1 nanosecond) >> > >>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x >> > -1 >> > >>> print(datetime.timedelta(seconds=2**53 / 1e9)) >> > 104 days, 5:59:59.254741 >> > >> > While a system administrator can be proud to have an uptime longer >> > than 104 days, the problem also exists for the time.time() clock which >> > returns the number of seconds since the UNIX epoch (1970-01-01). This >> > clock started to loose nanoseconds since mid-May 1970 (47 years ago): >> > >> > >>> import datetime >> > >>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 >> / 1e9)) >> > 1970-04-15 05:59:59.254741 >> > >> > >> > == PEP 410 == >> > >> > Five years ago, I proposed a large and complex change in all Python >> > functions returning time to support nanosecond resolution using the >> > decimal.Decimal type: >> > >> > https://www.python.org/dev/peps/pep-0410/ >> > >> > The PEP was rejected for different reasons: >> > >> > * it wasn't clear if hardware clocks really had a resolution of 1 >> > nanosecond, especially when the clock is read from Python, since >> > reading a clock in Python also takes time... >> > >> > * Guido van Rossum rejected the idea of adding a new optional >> > parameter to change the result type: it's an uncommon programming >> > practice (bad design in Python) >> > >> > * decimal.Decimal is not widely used, it might be surprised to get such >> type >> > >> > >> > == CPython enhancements of the last 5 years == >> > >> > Since this PEP was rejected: >> > >> > * the os.stat_result got 3 fields for timestamps as nanoseconds >> > (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns >> > >> > * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() >> > and time.process_time() >> > >> > * I enhanced the private C API of Python handling time (API called >> > "pytime") to store all timings as the new _PyTime_t type which is a >> > simple 64-bit signed integer. The unit of _PyTime_t is not part of the >> > API, it's an implementation detail. The unit is currently 1 >> > nanosecond. >> > >> > >> > This week, I converted one of the last clock to new _PyTime_t format: >> > time.perf_counter() now has internally a resolution of 1 nanosecond, >> > instead of using the C double type. >> > >> > XXX technically https://github.com/python/cpython/pull/3983 is not >> > merged yet :-) >> > >> > >> > >> > == Clocks resolution in Python == >> > >> > I implemented time.time_ns(), time.monotonic_ns() and >> > time.perf_counter_ns() which are similar of the functions without the >> > "_ns" suffix, but return time as nanoseconds (Python int). >> > >> > I computed the smallest difference between two clock reads (ignoring a >> > differences of zero): >> > >> > Linux: >> > >> > * time_ns(): 84 ns <=== !!! >> > * time(): 239 ns <=== !!! >> > * perf_counter_ns(): 84 ns >> > * perf_counter(): 82 ns >> > * monotonic_ns(): 84 ns >> > * monotonic(): 81 ns >> > >> > Windows: >> > >> > * time_ns(): 318000 ns <=== !!! >> > * time(): 894070 ns <=== !!! >> > * perf_counter_ns(): 100 ns >> > * perf_counter(): 100 ns >> > * monotonic_ns(): 15000000 ns >> > * monotonic(): 15000000 ns >> > >> > The difference on time.time() is significant: 84 ns (2.8x better) vs >> > 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The >> > difference will be larger next years since every day adds >> > 864,00,000,000,000 nanoseconds to the system clock :-) (please don't >> > bug me with leap seconds! you got my point) >> > >> > The difference on perf_counter and monotonic clocks are not visible in >> > this quick script since my script runs less than 1 minute, my computer >> > uptime is smaller than 1 weak, ... and Python internally starts these >> > clocks at zero *to reduce the precision loss*! Using an uptime larger >> > than 104 days, you would probably see a significant difference (at >> > least +/- 1 nanosecond) between the regular (seconds as double) and >> > the "_ns" (nanoseconds as int) clocks. >> > >> > >> > >> > == How many new nanosecond clocks? == >> > >> > The PEP 410 proposed to modify the following functions: >> > >> > * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime >> > and st_mtime fields of the stat structure), sched_rr_get_interval(), >> > times(), wait3() and wait4() >> > >> > * resource module: ru_utime and ru_stime fields of getrusage() >> > >> > * signal module: getitimer(), setitimer() >> > >> > * time module: clock(), clock_gettime(), clock_getres(), monotonic(), >> > time() and wallclock() ("wallclock()" was finally called "monotonic", >> > see PEP 418) >> > >> > >> > According to my tests of the previous section, the precision loss >> > starts after 104 days (stored in nanoseconds). I don't know if it's >> > worth it to modify functions which return "CPU time" or "process time" >> > of processes, since most processes live shorter than 104 days. Do you >> > care of a resolution of 1 nanosecond for the CPU and process time? >> > >> > Maybe we need 1 nanosecond resolution for profiling and benchmarks. >> > But in that case, you might want to implement your profiler in C >> > rather in Python, like the hotshot module, no? The "pytime" private >> > API of CPython gives you clocks with a resolution of 1 nanosecond. >> > >> > >> > == Annex: clock performance == >> > >> > To have an idea of the cost of reading the clock on the clock >> > resolution in Python, I also ran a microbenchmark on *reading* a >> > clock. Example: >> > >> > $ ./python -m perf timeit --duplicate 1024 -s 'import time; >> t=time.time' 't()' >> > >> > Linux (Mean +- std dev): >> > >> > * time.time(): 45.4 ns +- 0.5 ns >> > * time.time_ns(): 47.8 ns +- 0.8 ns >> > * time.perf_counter(): 46.8 ns +- 0.7 ns >> > * time.perf_counter_ns(): 46.0 ns +- 0.6 ns >> > >> > Windows (Mean +- std dev): >> > >> > * time.time(): 42.2 ns +- 0.8 ns >> > * time.time_ns(): 49.3 ns +- 0.8 ns >> > * time.perf_counter(): 136 ns +- 2 ns <=== >> > * time.perf_counter_ns(): 143 ns +- 4 ns <=== >> > * time.monotonic(): 38.3 ns +- 0.9 ns >> > * time.monotonic_ns(): 48.8 ns +- 1.2 ns >> > >> > Most clocks have the same performance except of perf_counter on >> > Windows: around 140 ns whereas other clocks are around 45 ns (on Linux >> > and Windows): 3x slower. Maybe the "bad" perf_counter performance can >> > be explained by the fact that I'm running Windows in a VM, which is >> > not ideal for benchmarking. Or maybe my C implementation of >> > time.perf_counter() is slow? >> > >> > Note: I expect that a significant part of the numbers are the cost of >> > Python function calls. Reading these clocks using the Python C >> > functions are likely faster. >> > >> > >> > Victor >> > _______________________________________________ >> > Python-ideas mailing list >> > Python-ideas at python.org >> > https://mail.python.org/mailman/listinfo/python-ideas >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> > >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric at trueblade.com Sun Oct 15 16:08:35 2017 From: eric at trueblade.com (Eric V. Smith) Date: Sun, 15 Oct 2017 16:08:35 -0400 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> Message-ID: <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> On 10/15/2017 3:13 PM, Stephan Houben wrote: > Hi all, > > I propose multiples of the Planck time, 5.39 ? 10 ^?44 s. > Unlikely computers can be more accurate than that anytime soon. > > On a more serious note, I am not sure what problem was solved by > moving from > double to a fixed-precision format. > > I do know that it now introduced the issue of finding > the correct base unit of the fixed-precision format. From Victor's original message, describing the current functions using 64-bit binary floating point numbers (aka double). They lose precision: "The problem is that Python returns time as a floatting point number which is usually a 64-bit binary floatting number (in the IEEE 754 format). This type starts to loose nanoseconds after 104 days." Eric. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Sun Oct 15 16:48:41 2017 From: toddrjen at gmail.com (Todd) Date: Sun, 15 Oct 2017 16:48:41 -0400 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: On Sat, Oct 14, 2017 at 11:46 AM, Nick Coghlan wrote: > > On 14 October 2017 at 18:21, Antoine Pitrou wrote: > >> On Sat, 14 Oct 2017 10:49:11 +0300 >> Serhiy Storchaka >> wrote: >> > I don't like the idea of adding a parallel set of functions. >> > >> > In the list of alternatives in PEP 410 there is no an idea about fixed >> > precision float type with nanoseconds precision. It can be implemented >> > internally as a 64-bit integer, but provide all methods required for >> > float-compatible number. It would be simpler and faster than general >> > Decimal. >> >> I agree a parallel set of functions is not ideal, but I think a parallel >> set of functions is still more appropriate than a new number type >> specific to the time module. >> >> Also, if you change existing functions to return a new type, you risk >> breaking compatibility even if you are very careful about designing the >> new type. >> > > Might it make more sense to have a parallel *module* that works with a > different base data type rather than parallel functions within the existing > API? > > That is, if folks wanted to switch to 64-bit nanosecond time, they would > use: > > * time_ns.time() > * time_ns.monotonic() > * time_ns.perf_counter() > * time_ns.clock_gettime() > * time_ns.clock_settime() > > The idea here would be akin to the fact we have both math and cmath as > modules, where the common APIs conceptually implement the same algorithms, > they just work with a different numeric type (floats vs complex numbers). > > Cheers, > Nick. > What if we had a class, say time.time_base. The user could specify the base units (such as "s", "ns", 1e-7, etc) and the data type ('float', 'int', 'decimal', etc.) when the class is initialized. It would then present as methods the entire time API using that precision and data type. Then the existing functions could internally wrap an instance of the class where the base units are "1" and the data type is "float". That way the user could pick the representation most appropriate for their use-case, rather than python needing who knows how many different time formats for . The other advantage is that third-party module could potentially subclass this with additional options, such as an astronomy module providing an option to choose between sidereal time vs. solar time, without having to duplicate the entire API. -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 15 16:54:18 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Oct 2017 13:54:18 -0700 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: All this arguing based on "equivalence" between different code fragments is nuts. The equivalences were never meant to be exact, and people don't typically understand code using generators using these equivalencies. The key problem we're trying to address is creating a "context" abstraction that can be controlled by async task frameworks without making the *access* API specific to the framework. Because coroutines and generators are similar under the covers, Yury demonstrated the issue with generators instead of coroutines (which are unfamiliar to many people). And then somehow we got hung up about fixing the problem in the example. I want to back out of this entirely, because the problem (while it can be demonstrated) is entirely theoretical, and the proposed solution is made too complicated by attempting to solve the problem for generators as well as for tasks. --Guido On Sat, Oct 14, 2017 at 10:43 PM, Nick Coghlan wrote: > On 15 October 2017 at 15:05, Guido van Rossum wrote: > >> I would like to reboot this discussion (again). It feels to me we're >> getting farther and farther from solving any of the problems we might solve. >> >> I think we need to give up on doing anything about generators; the use >> cases point in too many conflicting directions. So we should keep the >> semantics there, and if you don't want your numeric or decimal context to >> leak out of a generator, don't put `yield` inside `with`. (Yury and Stefan >> have both remarked that this is not a problem in practice, given that there >> are no bug reports or StackOverflow questions about this topic.) >> > > Let me have another go at building up the PEP 550 generator argument from > first principles. > > The behaviour that PEP 550 says *shouldn't* change is the semantic > equivalence of the following code: > > # Iterator form > class ResultsIterator: > def __init__(self, data): > self._itr = iter(data) > def __next__(self): > return calculate_result(next(self._itr)) > > results = _ResultsIterator(data) > > # Generator form > def _results_gen(data): > for item in data: > yield calculate_result(item) > > results = _results_gen(data) > > This *had* been non-controversial until recently, and I still don't > understand why folks suddenly decided we should bring it into question by > proposing that generators should start implicitly capturing state at > creation time just because it's technically possible for them to do so (yes > we can implicitly change the way all generators work, but no, we can't > implicitly change the way all *iterators* work). > > The behaviour that PEP 550 thinks *should* change is for the following > code to become roughly semantically equivalent, given the constraint that > the context manager involved either doesn't manipulate any shared state at > all (already supported), or else only manipulates context variables (the > new part that PEP 550 adds): > > # Iterator form > class ResultsIterator: > def __init__(self, data): > self._itr = iter(data) > def __next__(self): > with adjusted_context(): > return calculate_result(next(self._itr)) > > results = _ResultsIterator(data) > > # Generator form > def _results_gen(data): > for item in data: > with adjusted_context(): > yield calculate_result(item) > > results = _results_gen(data) > > Today, while these two forms look like they *should* be comparable, > they're not especially close to being semantically equivalent, as there's > no mechanism that allows for implicit context reversion at the yield point > in the generator form. > > While I think PEP 550 would still be usable without fixing this > discrepancy, I'd be thoroughly disappointed if the only reason we decided > not to do it was because we couldn't clearly articulate the difference in > reasoning between: > > * "Generators currently have no way to reasonably express the equivalent > of having a context-dependent return statement inside a with statement in a > __next__ method implementation, so let's define one" (aka "context variable > changes shouldn't leak out of generators, as that will make them *more* > like explicit iterator __next__ methods"); and > * "Generator functions should otherwise continue to be unsurprising > syntactic sugar for objects that implement the regular iterator protocol" > (aka "generators shouldn't implicitly capture their creation context, as > that would make them *less* like explicit iterator __init__ methods"). > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Sun Oct 15 18:16:17 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 16 Oct 2017 09:16:17 +1100 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: <20171015221616.GY9068@ando.pearwood.info> On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum wrote: > > > I'd like to have time.time_ns() -- this is most parallel to st_mtime_ns. > > > > ? > Welcome to the list Guido! You sound like a C programmer. For many people, > that was the best language they knew of when they learned to program. But > have you ever tried Python? You should give it a try! You seem to be making a joke, but I have *no idea* what the joke is. Are you making fun of Guido's choice of preferred name? "time_ns" instead of "time_nanoseconds" perhaps? Other than that, I cannot imagine what the joke is about. Sorry for being so slow. -- Steve From guido at python.org Sun Oct 15 19:44:30 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Oct 2017 16:44:30 -0700 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: <20171015221616.GY9068@ando.pearwood.info> References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: Sorry, that's an in-joke. Koos is expressing his disappointment in the rejection of PEP 555 in a way that's only obvious if you're Dutch. On Sun, Oct 15, 2017 at 3:16 PM, Steven D'Aprano wrote: > On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: > > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum > wrote: > > > > > I'd like to have time.time_ns() -- this is most parallel to > st_mtime_ns. > > > > > > ? > > Welcome to the list Guido! You sound like a C programmer. For many > people, > > that was the best language they knew of when they learned to program. But > > have you ever tried Python? You should give it a try! > > You seem to be making a joke, but I have *no idea* what the joke is. > > Are you making fun of Guido's choice of preferred name? "time_ns" > instead of "time_nanoseconds" perhaps? > > Other than that, I cannot imagine what the joke is about. Sorry for > being so slow. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Sun Oct 15 21:19:34 2017 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Sun, 15 Oct 2017 21:19:34 -0400 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: It could be: time.time(ns=True) On Sun, Oct 15, 2017 at 7:44 PM, Guido van Rossum wrote: > Sorry, that's an in-joke. Koos is expressing his disappointment in the > rejection of PEP 555 in a way that's only obvious if you're Dutch. > > On Sun, Oct 15, 2017 at 3:16 PM, Steven D'Aprano > wrote: > >> On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: >> > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum >> wrote: >> > >> > > I'd like to have time.time_ns() -- this is most parallel to >> st_mtime_ns. >> > > >> > > ? >> > Welcome to the list Guido! You sound like a C programmer. For many >> people, >> > that was the best language they knew of when they learned to program. >> But >> > have you ever tried Python? You should give it a try! >> >> You seem to be making a joke, but I have *no idea* what the joke is. >> >> Are you making fun of Guido's choice of preferred name? "time_ns" >> instead of "time_nanoseconds" perhaps? >> >> Other than that, I cannot imagine what the joke is about. Sorry for >> being so slow. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 15 23:40:36 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Oct 2017 13:40:36 +1000 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> Message-ID: On 16 October 2017 at 04:28, Victor Stinner wrote: > I proposed to use nanoseconds because UNIX has 1 ns resolution in > timespec, the most recent API, and Windows has 100 ns. > > Using picoseconds would confuse users who may expect sub-nanosecond > resolution, whereas no OS support them currently. > > Moreover, nanoseconds as int already landed in os.stat and os.utime. > And this precedent also makes sense to me as the rationale for using an "_ns" API suffix within the existing module rather than introducing a new module. > Last but not least, I already strugle in pytime.c to prevent integer > overflow with 1 ns resolution. It can quickly become much more complex if > there is no native C int type supporting a range large enough to more 1 > picosecond resolution usable. I really like using int64_t for _PyTime_t, > it's well supported, very easy to use (ex: "t = t2 - t1"). 64-bit int > supports year after 2200 for delta since 1970. > Hopefully by the time we decide it's worth worrying about picoseconds in "regular" code, compiler support for decimal128 will be sufficiently ubiquitous that we'll be able to rely on that as our 3rd generation time representation (where the first gen is seconds as a 64 bit binary float and the second gen is nanoseconds as a 64 bit integer). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 16 00:15:41 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Oct 2017 14:15:41 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 15 October 2017 at 20:45, Paul Moore wrote: > On 15 October 2017 at 06:43, Nick Coghlan wrote: > > > # Generator form > > def _results_gen(data): > > for item in data: > > with adjusted_context(): > > yield calculate_result(item) > > > > results = _results_gen(data) > > > > Today, while these two forms look like they *should* be comparable, > they're > > not especially close to being semantically equivalent, as there's no > > mechanism that allows for implicit context reversion at the yield point > in > > the generator form. > > I'll have to take your word for this, as I can't think of an actual > example that follows the pattern of your abstract description, for > which I can immediately see the difference. > Interestingly, thinking about the problem in terms of exception handling flow reminded me of the fact that having a generator-iterator yield while inside a with statement or try/except block is already considered an anti-pattern in many situations, precisely because it means that any exceptions that get thrown in (including GeneratorExit) will be intercepted when that may not be what the author really intended. Accordingly, the canonical guaranteed-to-be-consistent-with-the-previous-behaviour iterator -> generator transformation already involves the use of a temporary variable to move the yield outside any exception handling constructs and ensure that the exception handling only covers the code that it was originally written to cover: def _results_gen(data): for item in data: with adjusted_context(): result_for_item = calculate_result(item) yield result_for_item results = _results_gen(data) The exception handling expectations with coroutines are different, since an "await cr" expression explicitly indicates that any exceptions "cr" fails to handle *will* propagate back through to where the await appears, just as "f()" indicates that unhandled exceptions in "f" will be seen by the current frame. And even if as a new API context variables were to be defined in a yield-tolerant way, a lot of existing context managers still wouldn't be "yield safe", since they may be manipulating thread local or process global state, rather than context variables or a particular object instance. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 16 01:00:10 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 15 Oct 2017 22:00:10 -0700 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> Message-ID: On Sun, Oct 15, 2017 at 8:40 PM, Nick Coghlan wrote: > Hopefully by the time we decide it's worth worrying about picoseconds in > "regular" code, compiler support for decimal128 will be sufficiently > ubiquitous that we'll be able to rely on that as our 3rd generation time > representation (where the first gen is seconds as a 64 bit binary float and > the second gen is nanoseconds as a 64 bit integer). > I hope we'll never see time_ns() and friends as the second generation -- it's a hack that hopefully we can retire in those glorious days of hardware decimal128 support. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Mon Oct 16 02:40:33 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 16 Oct 2017 08:40:33 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> Message-ID: 2017-10-15 22:08 GMT+02:00 Eric V. Smith : >From Victor's original message, describing the current functions using 64-bit binary floating point numbers (aka double). They lose precision: "The problem is that Python returns time as a floatting point number > which is usually a 64-bit binary floatting number (in the IEEE 754 > format). This type starts to loose nanoseconds after 104 days." > > Do we realize that at this level of accuracy, relativistic time dilatation due to continental drift starts to matter? Stephan > Eric. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Mon Oct 16 03:03:14 2017 From: stefan at bytereef.org (Stefan Krah) Date: Mon, 16 Oct 2017 09:03:14 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> Message-ID: <20171016070314.GA2706@bytereef.org> On Mon, Oct 16, 2017 at 08:40:33AM +0200, Stephan Houben wrote: > "The problem is that Python returns time as a floatting point number > > which is usually a 64-bit binary floatting number (in the IEEE 754 > > format). This type starts to loose nanoseconds after 104 days." > > > > > Do we realize that at this level of accuracy, relativistic time dilatation > due > to continental drift starts to matter? tai64na has supported attoseconds for quite some time: https://cr.yp.to/libtai/tai64.html The relativity issue is declared to be out of scope for the document. :-) Stefan Krah From stephanh42 at gmail.com Mon Oct 16 03:46:00 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 16 Oct 2017 09:46:00 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <20171016070314.GA2706@bytereef.org> References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> Message-ID: Hi all, I realize this is a bit of a pet peeve of me, since in my day job I sometimes get people complaining that numerical data is "off" in the sixteenth significant digit (i.e. it was stored as a double). I have a bunch of favorite comparisons to make this clear how accurate a "double" really is: you can measure the distance to Neptune down to a millimeter with it, or the time from the extinction of the dinosaurs until now down to half-second resolution. Unfortunately for my argument, measuring time is one of the most accurate physical measurements we can make, and the best of the best exceed double-precision accuracy. For example, the best atomic clock https://www.sciencealert.com/physicists-have-broken-the-record-for-the-most-accurate-clock-ever-built achieves a measurement uncertainty of 3e-18, which is about two order of magnitudes more accurate than what double-precision gives you; the latter runs out of steam at 2.2e-16. So I realize there is obvious a strong need for (the illusion of) such precise time keeping in the Python API . Interestingly, that 2.2e-16 pretty much aligns with the accuracy of the cesium atomic clocks which are currently used to *define* the second. So we move to this new API, we should provide our own definition of the second, since those rough SI seconds are just too imprecise for that. Stephan 2017-10-16 9:03 GMT+02:00 Stefan Krah : > On Mon, Oct 16, 2017 at 08:40:33AM +0200, Stephan Houben wrote: > > "The problem is that Python returns time as a floatting point number > > > which is usually a 64-bit binary floatting number (in the IEEE 754 > > > format). This type starts to loose nanoseconds after 104 days." > > > > > > > > Do we realize that at this level of accuracy, relativistic time > dilatation > > due > > to continental drift starts to matter? > > tai64na has supported attoseconds for quite some time: > > https://cr.yp.to/libtai/tai64.html > > The relativity issue is declared to be out of scope for the document. :-) > > > Stefan Krah > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Oct 16 03:54:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 09:54:57 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: 2017-10-16 3:19 GMT+02:00 Juancarlo A?ez : > It could be: time.time(ns=True) Please read my initial message: """ [PEP 410] was rejected for different reasons: (...) * Guido van Rossum rejected the idea of adding a new optional parameter to change the result type: it's an uncommon programming practice (bad design in Python) """ Victor From victor.stinner at gmail.com Mon Oct 16 03:58:55 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 09:58:55 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: Hi, > What if we had a class, say time.time_base. The user could specify the base > units (such as "s", "ns", 1e-7, etc) and the data type ('float', 'int', > 'decimal', etc.) when the class is initialized. (...) It's easy to invent various funny new types for arbitrary precision. But I prefer reusing the standard Python int type since it's very well known and very easy to manipulate. There is not need to modify the whole stdlib to support a new type. Moreover, we don't have to "implement a new type". > The other advantage is that third-party module could potentially subclass > this with additional options, such as an astronomy module providing an > option to choose between sidereal time vs. solar time, without having to > duplicate the entire API. Using nanoseconds as int doesn't prevent you to convert it to your own nice class. Victor From solipsis at pitrou.net Mon Oct 16 04:20:23 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 16 Oct 2017 10:20:23 +0200 Subject: [Python-ideas] Why not picoseconds? References: <20171015191716.371fba63@fsol> Message-ID: <20171016102023.3e0c5385@fsol> On Sun, 15 Oct 2017 22:00:10 -0700 Guido van Rossum wrote: > On Sun, Oct 15, 2017 at 8:40 PM, Nick Coghlan wrote: > > > Hopefully by the time we decide it's worth worrying about picoseconds in > > "regular" code, compiler support for decimal128 will be sufficiently > > ubiquitous that we'll be able to rely on that as our 3rd generation time > > representation (where the first gen is seconds as a 64 bit binary float and > > the second gen is nanoseconds as a 64 bit integer). > > > > I hope we'll never see time_ns() and friends as the second generation -- > it's a hack that hopefully we can retire in those glorious days of hardware > decimal128 support. Given the implementation costs, hardware decimal128 will only become mainstream if there's a strong incentive for it, which I'm not sure exists or will ever exist ;-) Regards Antoine. From songofacandy at gmail.com Mon Oct 16 04:50:28 2017 From: songofacandy at gmail.com (INADA Naoki) Date: Mon, 16 Oct 2017 17:50:28 +0900 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: I'm completely +1 to Victor. nanosecond integer timestamp is used these days. And no complex newtype or newmodule is not needed for supporting it. INADA Naoki On Fri, Oct 13, 2017 at 11:12 PM, Victor Stinner wrote: > Hi, > > I would like to add new functions to return time as a number of > nanosecond (Python int), especially time.time_ns(). > > It would enhance the time.time() clock resolution. In my experience, > it decreases the minimum non-zero delta between two clock by 3 times, > new "ns" clock versus current clock: 84 ns (2.8x better) vs 239 ns on > Linux, and 318 us (2.8x better) vs 894 us on Windows, measured in > Python. > > The question of this long email is if it's worth it to add more "_ns" > time functions than just time.time_ns()? > > I would like to add: > > * time.time_ns() > * time.monotonic_ns() > * time.perf_counter_ns() > * time.clock_gettime_ns() > * time.clock_settime_ns() > > time(), monotonic() and perf_counter() clocks are the 3 most common > clocks and users use them to get the best available clock resolution. > clock_gettime/settime() are the generic UNIX API to access these > clocks and so should also be enhanced to get nanosecond resolution. > > > == Nanosecond resolution == > > More and more clocks have a frequency in MHz, up to GHz for the "TSC" > CPU clock, and so the clocks resolution is getting closer to 1 > nanosecond (or even better than 1 ns for the TSC clock!). > > The problem is that Python returns time as a floatting point number > which is usually a 64-bit binary floatting number (in the IEEE 754 > format). This type starts to loose nanoseconds after 104 days. > Conversion from nanoseconds (int) to seconds (float) and then back to > nanoseconds (int) to check if conversions loose precision: > > # no precision loss >>>> x=2**52+1; int(float(x * 1e-9) * 1e9) - x > 0 > # precision loss! (1 nanosecond) >>>> x=2**53+1; int(float(x * 1e-9) * 1e9) - x > -1 >>>> print(datetime.timedelta(seconds=2**53 / 1e9)) > 104 days, 5:59:59.254741 > > While a system administrator can be proud to have an uptime longer > than 104 days, the problem also exists for the time.time() clock which > returns the number of seconds since the UNIX epoch (1970-01-01). This > clock started to loose nanoseconds since mid-May 1970 (47 years ago): > >>>> import datetime >>>> print(datetime.datetime(1970, 1, 1) + datetime.timedelta(seconds=2**53 / 1e9)) > 1970-04-15 05:59:59.254741 > > > == PEP 410 == > > Five years ago, I proposed a large and complex change in all Python > functions returning time to support nanosecond resolution using the > decimal.Decimal type: > > https://www.python.org/dev/peps/pep-0410/ > > The PEP was rejected for different reasons: > > * it wasn't clear if hardware clocks really had a resolution of 1 > nanosecond, especially when the clock is read from Python, since > reading a clock in Python also takes time... > > * Guido van Rossum rejected the idea of adding a new optional > parameter to change the result type: it's an uncommon programming > practice (bad design in Python) > > * decimal.Decimal is not widely used, it might be surprised to get such type > > > == CPython enhancements of the last 5 years == > > Since this PEP was rejected: > > * the os.stat_result got 3 fields for timestamps as nanoseconds > (Python int): st_atime_ns, st_ctime_ns, st_mtime_ns > > * Python 3.3 got 3 new clocks: time.monotonic(), time.perf_counter() > and time.process_time() > > * I enhanced the private C API of Python handling time (API called > "pytime") to store all timings as the new _PyTime_t type which is a > simple 64-bit signed integer. The unit of _PyTime_t is not part of the > API, it's an implementation detail. The unit is currently 1 > nanosecond. > > > This week, I converted one of the last clock to new _PyTime_t format: > time.perf_counter() now has internally a resolution of 1 nanosecond, > instead of using the C double type. > > XXX technically https://github.com/python/cpython/pull/3983 is not > merged yet :-) > > > > == Clocks resolution in Python == > > I implemented time.time_ns(), time.monotonic_ns() and > time.perf_counter_ns() which are similar of the functions without the > "_ns" suffix, but return time as nanoseconds (Python int). > > I computed the smallest difference between two clock reads (ignoring a > differences of zero): > > Linux: > > * time_ns(): 84 ns <=== !!! > * time(): 239 ns <=== !!! > * perf_counter_ns(): 84 ns > * perf_counter(): 82 ns > * monotonic_ns(): 84 ns > * monotonic(): 81 ns > > Windows: > > * time_ns(): 318000 ns <=== !!! > * time(): 894070 ns <=== !!! > * perf_counter_ns(): 100 ns > * perf_counter(): 100 ns > * monotonic_ns(): 15000000 ns > * monotonic(): 15000000 ns > > The difference on time.time() is significant: 84 ns (2.8x better) vs > 239 ns on Linux and 318 us (2.8x better) vs 894 us on Windows. The > difference will be larger next years since every day adds > 864,00,000,000,000 nanoseconds to the system clock :-) (please don't > bug me with leap seconds! you got my point) > > The difference on perf_counter and monotonic clocks are not visible in > this quick script since my script runs less than 1 minute, my computer > uptime is smaller than 1 weak, ... and Python internally starts these > clocks at zero *to reduce the precision loss*! Using an uptime larger > than 104 days, you would probably see a significant difference (at > least +/- 1 nanosecond) between the regular (seconds as double) and > the "_ns" (nanoseconds as int) clocks. > > > > == How many new nanosecond clocks? == > > The PEP 410 proposed to modify the following functions: > > * os module: fstat(), fstatat(), lstat(), stat() (st_atime, st_ctime > and st_mtime fields of the stat structure), sched_rr_get_interval(), > times(), wait3() and wait4() > > * resource module: ru_utime and ru_stime fields of getrusage() > > * signal module: getitimer(), setitimer() > > * time module: clock(), clock_gettime(), clock_getres(), monotonic(), > time() and wallclock() ("wallclock()" was finally called "monotonic", > see PEP 418) > > > According to my tests of the previous section, the precision loss > starts after 104 days (stored in nanoseconds). I don't know if it's > worth it to modify functions which return "CPU time" or "process time" > of processes, since most processes live shorter than 104 days. Do you > care of a resolution of 1 nanosecond for the CPU and process time? > > Maybe we need 1 nanosecond resolution for profiling and benchmarks. > But in that case, you might want to implement your profiler in C > rather in Python, like the hotshot module, no? The "pytime" private > API of CPython gives you clocks with a resolution of 1 nanosecond. > > > == Annex: clock performance == > > To have an idea of the cost of reading the clock on the clock > resolution in Python, I also ran a microbenchmark on *reading* a > clock. Example: > > $ ./python -m perf timeit --duplicate 1024 -s 'import time; t=time.time' 't()' > > Linux (Mean +- std dev): > > * time.time(): 45.4 ns +- 0.5 ns > * time.time_ns(): 47.8 ns +- 0.8 ns > * time.perf_counter(): 46.8 ns +- 0.7 ns > * time.perf_counter_ns(): 46.0 ns +- 0.6 ns > > Windows (Mean +- std dev): > > * time.time(): 42.2 ns +- 0.8 ns > * time.time_ns(): 49.3 ns +- 0.8 ns > * time.perf_counter(): 136 ns +- 2 ns <=== > * time.perf_counter_ns(): 143 ns +- 4 ns <=== > * time.monotonic(): 38.3 ns +- 0.9 ns > * time.monotonic_ns(): 48.8 ns +- 1.2 ns > > Most clocks have the same performance except of perf_counter on > Windows: around 140 ns whereas other clocks are around 45 ns (on Linux > and Windows): 3x slower. Maybe the "bad" perf_counter performance can > be explained by the fact that I'm running Windows in a VM, which is > not ideal for benchmarking. Or maybe my C implementation of > time.perf_counter() is slow? > > Note: I expect that a significant part of the numbers are the cost of > Python function calls. Reading these clocks using the Python C > functions are likely faster. > > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From k7hoven at gmail.com Mon Oct 16 04:56:26 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 16 Oct 2017 11:56:26 +0300 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: On Mon, Oct 16, 2017 at 2:44 AM, Guido van Rossum wrote: > Sorry, that's an in-joke. Koos is expressing his disappointment in the > rejection of PEP 555 in a way that's only obvious if you're Dutch. > > Hmm, or more accurately, it has to do with me going crazy because of the frustration of how the PEP 555 discussions went. We could have arrived at the same conclusion [*] in much less time and with less pulling one's hair out. But this discrepancy probably comes from the fact that we're not dealing with the most pure kind of being Dutch here. ?Koos ?[*] Although right now different people still have slightly different ideas about what that conclusion is. ? > On Sun, Oct 15, 2017 at 3:16 PM, Steven D'Aprano > wrote: > >> On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: >> > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum >> wrote: >> > >> > > I'd like to have time.time_ns() -- this is most parallel to >> st_mtime_ns. >> > > >> > > ? >> > Welcome to the list Guido! You sound like a C programmer. For many >> people, >> > that was the best language they knew of when they learned to program. >> But >> > have you ever tried Python? You should give it a try! >> >> You seem to be making a joke, but I have *no idea* what the joke is. >> >> Are you making fun of Guido's choice of preferred name? "time_ns" >> instead of "time_nanoseconds" perhaps? >> >> Other than that, I cannot imagine what the joke is about. Sorry for >> being so slow. >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > > > > -- > --Guido van Rossum (python.org/~guido) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 16 03:53:06 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Mon, 16 Oct 2017 20:53:06 +1300 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> Message-ID: <59E46562.5030705@canterbury.ac.nz> Stephan Houben wrote: > Do we realize that at this level of accuracy, relativistic time > dilatation due to continental drift starts to matter? Probably also want an accurate GPS position for your computer so that variations in the local gravitational field can be taken into account. -- Greg From python at 2sn.net Mon Oct 16 07:03:18 2017 From: python at 2sn.net (Alexander Heger) Date: Mon, 16 Oct 2017 22:03:18 +1100 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <59E46562.5030705@canterbury.ac.nz> References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <59E46562.5030705@canterbury.ac.nz> Message-ID: w/r relativistic effects and continental drift - not really. The speed is about 1cm/yr or v = 1e-18 c. Relativistic effect would go like 0.5 * (v/c)**2, so more like 5E-37 in relative rate of proper time. You can just barely capture a few minutes of that even with int128 resolution. As for financial incentive for int128, considering the super-rich will get exponentially more rich while inflation devalues the possession of the rest, the super-rich will eventually have demand for int128 to be able to count their wealth down to the last ct. It's a lot for money, though. A lot more than the net value of all real things on the planet. But the net value of assets on people's bank accounts already exceeds that number by some factor 50-100, and this factor will grow exponentially as more financial "products" are being created. When I was a student (~1990) we had to create new bookkeeping software using int64 ("comp") because the investment company could not deal with billions of dollars worth of Italian Lira down to centimos otherwise. Accounting needs will get us int128, sooner than you think, don't worry. Yes, it would be good to create routines than can handle time in units provided, maybe as a string. Or you could just create a time object that handles this internally, e.g., algebraic operations, similar to numpy, at the accuracy available. You can access its value by providing the desired divisor, and you may inquires the available precision. If you want to handle this properly, it may get you back to interval arithmetics. Regarding concerns about the effect of gravity on your computer, there will be a press release in a few h that may be of interest to some https://www.ligo.caltech.edu/news/ligo20171011 -Alexander On 16 October 2017 at 18:53, Greg Ewing wrote: > Stephan Houben wrote: > >> Do we realize that at this level of accuracy, relativistic time >> dilatation due to continental drift starts to matter? >> > > Probably also want an accurate GPS position for your computer > so that variations in the local gravitational field can be > taken into account. > > -- > Greg > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Mon Oct 16 07:08:40 2017 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Mon, 16 Oct 2017 07:08:40 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: > Interestingly, thinking about the problem in terms of exception handling > flow reminded me of the fact that having a generator-iterator yield while > inside a with statement or try/except block is already considered an > anti-pattern in many situations, precisely because it means that any > exceptions that get thrown in (including GeneratorExit) will be intercepted > when that may not be what the author really intended. > > It all works fine now: https://github.com/neogeny/TatSu/blob/master/tatsu/contexts.py So, I have a strong requirement: whatever is decided on this PEP... Please don't break it? (or make it illegal) -- Juancarlo *A?ez* -------------- next part -------------- An HTML attachment was scrubbed... URL: From victor.stinner at gmail.com Mon Oct 16 08:05:57 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 14:05:57 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <20171015191716.371fba63@fsol> References: <20171015191716.371fba63@fsol> Message-ID: Hi, FYI I proposed the PEP 564 directly on python-dev. The paragraph about "picosecond": https://www.python.org/dev/peps/pep-0564/#sub-nanosecond-resolution Let's move the discussion on python-dev ;-) Victor From ncoghlan at gmail.com Mon Oct 16 08:05:51 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 16 Oct 2017 22:05:51 +1000 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On 16 October 2017 at 21:08, Juancarlo A?ez wrote: > > Interestingly, thinking about the problem in terms of exception handling >> flow reminded me of the fact that having a generator-iterator yield while >> inside a with statement or try/except block is already considered an >> anti-pattern in many situations, precisely because it means that any >> exceptions that get thrown in (including GeneratorExit) will be intercepted >> when that may not be what the author really intended. >> >> > It all works fine now: > > https://github.com/neogeny/TatSu/blob/master/tatsu/contexts.py > > > So, I have a strong requirement: whatever is decided on this PEP... > > Please don't break it? (or make it illegal) > The "anti-pattern in many situations" qualifier was there because there are cases where it's explicitly expected to work, and isn't an anti-pattern at all (e.g. when the generator is decorated with contextlib.contextmanager, or when you're using a context manager to hold open an external resource like a file until the generator is closed). So this wasn't intended as an argument for changing anything - rather, it's about my changing my perspective on how beneficial it would be to have generators default to maintaining their own distinct logical context (which then becomes an argument for *not* changing anything). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Mon Oct 16 08:30:41 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 17 Oct 2017 01:30:41 +1300 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> Message-ID: <59E4A671.2020207@canterbury.ac.nz> Stephan Houben wrote: > Interestingly, that 2.2e-16 pretty much aligns with the accuracy of the > cesium atomic clocks which are currently used to *define* the second. > So we move to this new API, we should provide our own definition > of the second, since those rough SI seconds are just too imprecise for that. The Python second: 1.1544865564196655e-06 of the time taken for an unladen swallow to fly from Cape Town to London. -- Greg From victor.stinner at gmail.com Mon Oct 16 09:10:38 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 15:10:38 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> Message-ID: 2017-10-16 9:46 GMT+02:00 Stephan Houben : > Hi all, > > I realize this is a bit of a pet peeve of me, since > in my day job I sometimes get people complaining that > numerical data is "off" in the sixteenth significant digit > (i.e. it was stored as a double). > (...) Oh. As usual, I suck at explaining the rationale. I'm sorry about that. The problem is not to know the number of nanoseconds since 1970. The problem is that you lose precision even on short durations, say less than 1 minute. The precision loss depends on the reference of the clock, which can be the UNIX epoch, a reference based on the computer boot time, a reference based on the process startup time, etc. Let's say that your web server runs since 105 days, now you want to measure how much time it took to render a HTML template and this time is smaller than 1 ms. In such case, the benchmark lose precision just because of the float type, not because of the clock resolution. That's one use case. Another use case is when you have two applications storing time. A program A writes a timestamp, another program B compares the timestamp to the current time. To explain the issue, let's say that the format used to store the timestamp and clock used by program A have a resolution of 1 nanosecond, whereas the clock used by program B has a resolution of 1 second. In that case, there is a window of 1 second where the time is seen as "created in the future". For example, the GNU tar program emits a warning in that case ("file created in the future", or something like that). More generally, if you store time with a resolution N and your clock has resolution P, it's better to have N >= P to prevent bad surprises. More and more databases and filesystems support storing time with nanosecond resolution. Victor From victor.stinner at gmail.com Mon Oct 16 08:06:43 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 14:06:43 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: Hi, FYI I proposed the PEP 564 on python-dev. "PEP 564 -- Add new time functions with nanosecond resolution" https://www.python.org/dev/peps/pep-0564/ Victor From python at mrabarnett.plus.com Mon Oct 16 10:42:11 2017 From: python at mrabarnett.plus.com (MRAB) Date: Mon, 16 Oct 2017 15:42:11 +0100 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <59E4A671.2020207@canterbury.ac.nz> References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> <59E4A671.2020207@canterbury.ac.nz> Message-ID: <19a71ce3-f917-1d13-6247-52b78d310b27@mrabarnett.plus.com> On 2017-10-16 13:30, Greg Ewing wrote: > Stephan Houben wrote: > >> Interestingly, that 2.2e-16 pretty much aligns with the accuracy of the >> cesium atomic clocks which are currently used to *define* the second. >> So we move to this new API, we should provide our own definition >> of the second, since those rough SI seconds are just too imprecise for that. > > The Python second: 1.1544865564196655e-06 of the time > taken for an unladen swallow to fly from Cape Town > to London. > Is that an African or a European swallow? From k7hoven at gmail.com Mon Oct 16 10:42:42 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 16 Oct 2017 17:42:42 +0300 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> Message-ID: On Mon, Oct 16, 2017 at 4:10 PM, Victor Stinner wrote: > 2017-10-16 9:46 GMT+02:00 Stephan Houben : > > Hi all, > > > > I realize this is a bit of a pet peeve of me, since > > in my day job I sometimes get people complaining that > > numerical data is "off" in the sixteenth significant digit > > (i.e. it was stored as a double). > > (...) > > Oh. As usual, I suck at explaining the rationale. I'm sorry about that. > > The problem is not to know the number of nanoseconds since 1970. The > problem is that you lose precision even on short durations, say less > than 1 minute. The precision loss depends on the reference of the > clock, which can be the UNIX epoch, a reference based on the computer > boot time, a reference based on the process startup time, etc. > > Let's say that your web server runs since 105 days, now you want to > measure how much time it took to render a HTML template and this time > is smaller than 1 ms. In such case, the benchmark lose precision just > because of the float type, not because of the clock resolution. > > That's one use case. > > Another use case is when you have two applications storing time. A > program A writes a timestamp, another program B compares the timestamp > to the current time. To explain the issue, let's say that the format > used to store the timestamp and clock used by program A have a > resolution of 1 nanosecond, whereas the clock used by program B has a > resolution of 1 second. In that case, there is a window of 1 second > where the time is seen as "created in the future". For example, the > GNU tar program emits a warning in that case ("file created in the > future", or something like that). > > More generally, if you store time with a resolution N and your clock > has resolution P, it's better to have N >= P to prevent bad surprises. > More and more databases and filesystems support storing time with > nanosecond resolution. > ?Indeed. And some more on where the precision loss comes from: When you measure time starting from one point, like 1970, the timer reaches large numbers today, like 10**9 seconds. Tiny fractions of a second are especially tiny when compared to a number like that. You then need log2(10**9) ~ 30 bits of precision just to get a one-second resolution in your timer. A double-precision (64bit) floating point number has 53 bits of precision in the mantissa, so you end up with 23 bits of precision left for fractions of a second, which means you get a resolution of 1 / 2**23 seconds, which is about 100 ns, which is well in line with the data that Victor provided (~100 ns + overhead = ~200 ns). ?Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 16 11:12:14 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Oct 2017 08:12:14 -0700 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: Sorry about your frustration. I should not have given you hope about PEP 555 -- it was never going to make it, so I should just have spared you the effort. Do you want to withdraw it or do I have to actually reject it? On Mon, Oct 16, 2017 at 1:56 AM, Koos Zevenhoven wrote: > On Mon, Oct 16, 2017 at 2:44 AM, Guido van Rossum > wrote: > >> Sorry, that's an in-joke. Koos is expressing his disappointment in the >> rejection of PEP 555 in a way that's only obvious if you're Dutch. >> >> > Hmm, or more accurately, it has to do with me going crazy because of the > frustration of how the PEP 555 discussions went. We could have arrived at > the same conclusion [*] in much less time and with less pulling one's hair > out. > > But this discrepancy probably comes from the fact that we're not dealing > with the most pure kind of being Dutch here. > > ?Koos > > ?[*] Although right now different people still have slightly different > ideas about what that conclusion is. ? > > > >> On Sun, Oct 15, 2017 at 3:16 PM, Steven D'Aprano >> wrote: >> >>> On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: >>> > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum >>> wrote: >>> > >>> > > I'd like to have time.time_ns() -- this is most parallel to >>> st_mtime_ns. >>> > > >>> > > ? >>> > Welcome to the list Guido! You sound like a C programmer. For many >>> people, >>> > that was the best language they knew of when they learned to program. >>> But >>> > have you ever tried Python? You should give it a try! >>> >>> You seem to be making a joke, but I have *no idea* what the joke is. >>> >>> Are you making fun of Guido's choice of preferred name? "time_ns" >>> instead of "time_nanoseconds" perhaps? >>> >>> Other than that, I cannot imagine what the joke is about. Sorry for >>> being so slow. >>> >>> >>> -- >>> Steve >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >> >> >> >> -- >> --Guido van Rossum (python.org/~guido) >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > > -- > + Koos Zevenhoven + http://twitter.com/k7hoven + > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Mon Oct 16 12:13:10 2017 From: toddrjen at gmail.com (Todd) Date: Mon, 16 Oct 2017 12:13:10 -0400 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: On Mon, Oct 16, 2017 at 3:58 AM, Victor Stinner wrote: > Hi, > > > What if we had a class, say time.time_base. The user could specify the > base > > units (such as "s", "ns", 1e-7, etc) and the data type ('float', 'int', > > 'decimal', etc.) when the class is initialized. (...) > > It's easy to invent various funny new types for arbitrary precision. > > But I prefer reusing the standard Python int type since it's very well > known and very easy to manipulate. There is not need to modify the > whole stdlib to support a new type. Moreover, we don't have to > "implement a new type". > > I guess I wasn't clear. I am not suggesting implementing a new numeric data type. People wouldn't use the class directly like they would an int or float, they would simply use it to define the the precision and numeric type (float, int, decimal). Then they would have access to the entire "time" API as methods. So for example you could do something like: >>> import time >>> >>> ns_time_int = time.time_base(units='ns', type=int) >>> ms_time_dec = time.time_base(units=1e-6, type='Decimal') >>> s_time_float= time.time_base(units=1, type=float) # This is identical to the existing "time" functions >>> >>> ns_time_int.clock() 4978480000 >>> ms_time_dec.clock() Decimal('5174165.999999999') >>> s_time_float.clock() 5.276855 >>> >>> ns_time_int.perf_counter() 243163378188085 >>> ms_time_dec.perf_counter() Decimal('243171477786.264') >>> s_time_float.perf_counter() 243186.530955074 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjol at tjol.eu Mon Oct 16 11:46:53 2017 From: tjol at tjol.eu (Thomas Jollans) Date: Mon, 16 Oct 2017 17:46:53 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <19a71ce3-f917-1d13-6247-52b78d310b27@mrabarnett.plus.com> References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> <59E4A671.2020207@canterbury.ac.nz> <19a71ce3-f917-1d13-6247-52b78d310b27@mrabarnett.plus.com> Message-ID: On 2017-10-16 16:42, MRAB wrote: > On 2017-10-16 13:30, Greg Ewing wrote: >> Stephan Houben wrote: >> >>> Interestingly, that 2.2e-16 pretty much aligns with the accuracy of the >>> cesium atomic clocks which are currently used to *define* the second. >>> So we move to this new API, we should provide our own definition >>> of the second, since those rough SI seconds are just too imprecise >>> for that. >> >> The Python second: 1.1544865564196655e-06 of the time >> taken for an unladen swallow to fly from Cape Town >> to London. >> > Is that an African or a European swallow? It starts African, but it flips and can be observed as either African or European in London. It's a neutrino swallow. -- Thomas Jollans From victor.stinner at gmail.com Mon Oct 16 12:49:18 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Mon, 16 Oct 2017 18:49:18 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: 2017-10-16 18:13 GMT+02:00 Todd : > I am not suggesting implementing a new numeric data type. People wouldn't > use the class directly like they would an int or float, they would simply > use it to define the the precision and numeric type (float, int, decimal). > Then they would have access to the entire "time" API as methods. So for > example you could do something like: I tried to include your idea in a more general description of "different API": https://www.python.org/dev/peps/pep-0564/#different-api > >>> import time > >>> > >>> ns_time_int = time.time_base(units='ns', type=int) > >>> ms_time_dec = time.time_base(units=1e-6, type='Decimal') > >>> s_time_float= time.time_base(units=1, type=float) # This is > identical to the existing "time" functions > >>> > >>> ns_time_int.clock() > 4978480000 > >>> ms_time_dec.clock() > Decimal('5174165.999999999') > >>> s_time_float.clock() > 5.276855 *I* dislike this API. IMHO it's overcomplicated just to get the current time :-( For example, I wouldn't like to have to teach to a newbie "how to get the current time" with such API. I also expect that the implementation will be quite complicated. Somewhere, you will need an internal "standard" type to store time, a "master type" used to build all other types. Without that, you would have to duplicate code and it would be a mess. You have many options for such master time. For the private C API of CPython, I already "implemented" such "master type": I decided to use C int64_t type: it's a 64-bit signed integer. There is an API on top of it which is unit agnostic, while it uses nanoseconds in practice. The private "pytime" API supports many timestamps conversions to adapt to all funny operating system functions: * from seconds: C int * from nanoseconds: C long long * from seconds: Python float or int * from milliseconds: Python float or int * to seconds: C double * to milliseconds: _PyTime_t * to microseconds: _PyTime_t * to nanoseconds: Python int * to timeval (struct timeval) * to timeval (time_t seconds, int us) * to timespec (struct timespec) At the end, I think that it's better to only provide the "master type" at the Python level, so nanoseconds as Python int. You can *easily* implement your API on top of the PEP 564. You will be limited to nanosecond resolution, but in practice, all clocks are already limited to this resolution through operating system APIs: https://www.python.org/dev/peps/pep-0564/#sub-nanosecond-resolution Victor From guido at python.org Mon Oct 16 13:07:35 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 16 Oct 2017 10:07:35 -0700 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> Message-ID: I agree, we shouldn't pursue Todd's proposal. It's too complicated, for too little benefit. On Mon, Oct 16, 2017 at 9:49 AM, Victor Stinner wrote: > 2017-10-16 18:13 GMT+02:00 Todd : > > I am not suggesting implementing a new numeric data type. People > wouldn't > > use the class directly like they would an int or float, they would simply > > use it to define the the precision and numeric type (float, int, > decimal). > > Then they would have access to the entire "time" API as methods. So for > > example you could do something like: > > I tried to include your idea in a more general description of "different > API": > https://www.python.org/dev/peps/pep-0564/#different-api > > > >>> import time > > >>> > > >>> ns_time_int = time.time_base(units='ns', type=int) > > >>> ms_time_dec = time.time_base(units=1e-6, type='Decimal') > > >>> s_time_float= time.time_base(units=1, type=float) # This is > > identical to the existing "time" functions > > >>> > > >>> ns_time_int.clock() > > 4978480000 > > >>> ms_time_dec.clock() > > Decimal('5174165.999999999') > > >>> s_time_float.clock() > > 5.276855 > > *I* dislike this API. IMHO it's overcomplicated just to get the > current time :-( For example, I wouldn't like to have to teach to a > newbie "how to get the current time" with such API. > > I also expect that the implementation will be quite complicated. > Somewhere, you will need an internal "standard" type to store time, a > "master type" used to build all other types. Without that, you would > have to duplicate code and it would be a mess. You have many options > for such master time. > > For the private C API of CPython, I already "implemented" such "master > type": I decided to use C int64_t type: it's a 64-bit signed integer. > There is an API on top of it which is unit agnostic, while it uses > nanoseconds in practice. The private "pytime" API supports many > timestamps conversions to adapt to all funny operating system > functions: > > * from seconds: C int > * from nanoseconds: C long long > * from seconds: Python float or int > * from milliseconds: Python float or int > * to seconds: C double > * to milliseconds: _PyTime_t > * to microseconds: _PyTime_t > * to nanoseconds: Python int > * to timeval (struct timeval) > * to timeval (time_t seconds, int us) > * to timespec (struct timespec) > > At the end, I think that it's better to only provide the "master type" > at the Python level, so nanoseconds as Python int. You can *easily* > implement your API on top of the PEP 564. You will be limited to > nanosecond resolution, but in practice, all clocks are already limited > to this resolution through operating system APIs: > https://www.python.org/dev/peps/pep-0564/#sub-nanosecond-resolution > > Victor > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From j_campbell7 at hotmail.com Sun Oct 15 21:12:08 2017 From: j_campbell7 at hotmail.com (Jason Campbell) Date: Mon, 16 Oct 2017 01:12:08 +0000 Subject: [Python-ideas] Membership of infinite iterators Message-ID: I recently came across a bug where checking negative membership (__contains__ returns False) of an infinite iterator will freeze the program. It may seem a bit obvious, but you can check membership in range for example without iterating over the entire range. `int(1e100) in range(int(1e101))` on my machine takes about 1.5us `int(1e7) in itertools.count()` on my machine takes about 250ms (and gets worse quite quickly). Any membership test on the infinite iterators that is negative will freeze the program as stated earlier, which is odd to me. itertools.count can use the same math as range itertools.cycle could use membership from the underlying iterable itertools.repeat is even easier, just compare to the repeatable element Does anyone else think this would be useful? -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Mon Oct 16 16:53:09 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 16 Oct 2017 23:53:09 +0300 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: <20171014102122.7d718d93@fsol> <20171015221616.GY9068@ando.pearwood.info> Message-ID: On Mon, Oct 16, 2017 at 6:12 PM, Guido van Rossum wrote: > Sorry about your frustration. I should not have given you hope about PEP > 555 -- it was never going to make it, so I should just have spared you the > effort. Do you want to withdraw it or do I have to actually reject it? > > ?That's definitely not the reason for the frustration. You already told me earlier that you did not really like it. I think I know why you didn't, but we now have more reasons than we had then. We're definitely in a different place from where we started. And I still succeeded in the most important thing that needed to be done. The only thing is that, at least for now, nobody *really* needed the awesome, clean, flexible and super performant solution that I had to come up with to solve a wide class of problems which I thought many people would actually have. But that's not where the frustration came from either. I'll finish the PEP for archiving, and for throwing it on the pile of other PEPs that have tried to tackle these issues. -- Koos P.S. Of course this discussion really does not belong here, but at least the time_ns discussion is now on python-dev anyway. > On Mon, Oct 16, 2017 at 1:56 AM, Koos Zevenhoven > wrote: > >> On Mon, Oct 16, 2017 at 2:44 AM, Guido van Rossum >> wrote: >> >>> Sorry, that's an in-joke. Koos is expressing his disappointment in the >>> rejection of PEP 555 in a way that's only obvious if you're Dutch. >>> >>> >> Hmm, or more accurately, it has to do with me going crazy because of the >> frustration of how the PEP 555 discussions went. We could have arrived at >> the same conclusion [*] in much less time and with less pulling one's hair >> out. >> >> But this discrepancy probably comes from the fact that we're not dealing >> with the most pure kind of being Dutch here. >> >> ?Koos >> >> ?[*] Although right now different people still have slightly different >> ideas about what that conclusion is. ? >> >> >> >>> On Sun, Oct 15, 2017 at 3:16 PM, Steven D'Aprano >>> wrote: >>> >>>> On Sun, Oct 15, 2017 at 08:04:25PM +0300, Koos Zevenhoven wrote: >>>> > On Sun, Oct 15, 2017 at 6:58 PM, Guido van Rossum >>>> wrote: >>>> > >>>> > > I'd like to have time.time_ns() -- this is most parallel to >>>> st_mtime_ns. >>>> > > >>>> > > ? >>>> > Welcome to the list Guido! You sound like a C programmer. For many >>>> people, >>>> > that was the best language they knew of when they learned to program. >>>> But >>>> > have you ever tried Python? You should give it a try! >>>> >>>> You seem to be making a joke, but I have *no idea* what the joke is. >>>> >>>> Are you making fun of Guido's choice of preferred name? "time_ns" >>>> instead of "time_nanoseconds" perhaps? >>>> >>>> Other than that, I cannot imagine what the joke is about. Sorry for >>>> being so slow. >>>> >>>> >>>> -- >>>> Steve >>>> _______________________________________________ >>>> Python-ideas mailing list >>>> Python-ideas at python.org >>>> https://mail.python.org/mailman/listinfo/python-ideas >>>> Code of Conduct: http://python.org/psf/codeofconduct/ >>>> >>> >>> >>> >>> -- >>> --Guido van Rossum (python.org/~guido) >>> >>> _______________________________________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> >> >> -- >> + Koos Zevenhoven + http://twitter.com/k7hoven + >> > > > > -- > --Guido van Rossum (python.org/~guido) > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjreedy at udel.edu Mon Oct 16 17:12:15 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Mon, 16 Oct 2017 17:12:15 -0400 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 10/15/2017 9:12 PM, Jason Campbell wrote: > I recently came across a bug where checking negative membership > (__contains__ returns False) of an infinite iterator will freeze the > program. > > > It may seem a bit obvious, but you can check membership in range for > example without iterating over the entire range. A range is a finite (non-iterator) *iterable*, not an *iterator*. In particular, a range is a finite sequence. In Python 2, ranges were implemented as lists. Python 3 now uses a more compact implementation. Either way, a ranges have a .__contains__ method. (In Py3 it is O(1) instead of O(n)). > `int(1e100) in range(int(1e101))` on my machine takes about 1.5us > `int(1e7) in itertools.count()` on my machine takes about 250ms (and > gets worse quite quickly). Non-iterator iterables and iterators are different things. It you compared iter(range(n)) with itertools.count(n), you would get similar results. > Any membership test on the infinite iterators that is negative will > freeze the program as stated earlier, which is odd to me. Membership testing with iterators is generally a bad idea. In general, iterators do not have a .__contains__ method, so at best, one exhausts the iterator, making it useless. In particular, generators do not have any non-iterator, non-generator methods. The itertools module has iterator classes rather than generator functions because it is coded in C, but they are closely equivalent in behavior to the generator functions given in the doc. > itertools.count can use the same math as range You could use range(start, 10000000000000000000[, step]) instead of count(start[, step]), or wrap count in an infinite sequence Count class that has all the methods and arithmetic of range. The __iter__ method would return a count iterator. > itertools.cycle could use membership from the underlying iterable If the underlying iterable is an iterator, this does not work. You could define a Cycle class that requires that the input have .__contain__. > itertools.repeat is even easier, just compare to the repeatable element Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and that has .__contains__(x) returning 'x == ob'. > Does anyone else think this would be useful? itertools is not going to switch from iterators to non-iterator iterables. -- Terry Jan Reedy From elazarg at gmail.com Mon Oct 16 23:27:29 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Tue, 17 Oct 2017 03:27:29 +0000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: ?????? ??? ??, 17 ????? 2017, 00:13, ??? Terry Reedy ?: > On 10/15/2017 9:12 PM, Jason Campbell wrote: > ... > > itertools.cycle could use membership from the underlying iterable > > If the underlying iterable is an iterator, this does not work. You > could define a Cycle class that requires that the input have .__contain__. > > > itertools.repeat is even easier, just compare to the repeatable element > > Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and > that has .__contains__(x) returning 'x == ob'. > > > Does anyone else think this would be useful? > > itertools is not going to switch from iterators to non-iterator iterables. > It doesn't have to switch, and does not have to _require_ the input to define __contains__ method for the proposal to be meaningful. It can work with the same kind of iterables as its inputs, delegating whenever possible. I'm sure there are good reasons not to do it, but that's not an either/or decision. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From turnbull.stephen.fw at u.tsukuba.ac.jp Mon Oct 16 23:33:35 2017 From: turnbull.stephen.fw at u.tsukuba.ac.jp (Stephen J. Turnbull) Date: Tue, 17 Oct 2017 12:33:35 +0900 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: <23013.31247.42873.395596@turnbull.sk.tsukuba.ac.jp> Paul Moore writes: > But I do agree with MAL, it seems wrong to need a helper for this, > even though it's a logical consequence of the other semantics I > described as intuitive :-( It seems to me this is an argument for using Haskell if you want life to be simple. :-) Or, in the spirit of the Zen: "Side effects induce complexity." Steve From desmoulinmichel at gmail.com Tue Oct 17 01:52:02 2017 From: desmoulinmichel at gmail.com (Michel Desmoulin) Date: Tue, 17 Oct 2017 07:52:02 +0200 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: <3a4f560a-5e78-6d34-c81a-23caac149cfd@gmail.com> Given that: - it doesn't break anything; - the behavior makes sense; - it can avoid the machine freezing when one is not careful; It can be a good idea. Those generators could have iterators that are not themselves and have a __contains__ method behaving accordingly. I only did the mistake of using "in" on those once, but it did happen. I can imagine everybody do the same at least once. It's not a huge help, but the joy of using python is a collection of small things after all. Le 17/10/2017 ? 05:27, ????? a ?crit?: > > > ?????? ??? ??, 17 ????? 2017, 00:13, ??? Terry Reedy ? >: > > On 10/15/2017 9:12 PM, Jason Campbell wrote: > ...?? > > itertools.cycle could use membership from the underlying iterable > > If the underlying iterable is an iterator, this does not work.? You > could define a Cycle class that requires that the input have > .__contain__. > > > itertools.repeat is even easier, just compare to the repeatable > element > > Again, create a Repeat(ob) class whose .__iter__ returns repeat(ob) and > that has .__contains__(x) returning 'x == ob'. > > > Does anyone else think this would be useful? > > itertools is not going to switch from iterators to non-iterator > iterables. > > > It doesn't have to switch, and?does not have to _require_ the input to > define __contains__ method for the proposal to be meaningful. It can > work with the same kind of iterables as its inputs, delegating whenever > possible. I'm sure there are good reasons not to do it, but that's not > an either/or decision.?? > > Elazar? > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From ncoghlan at gmail.com Tue Oct 17 02:32:05 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Oct 2017 16:32:05 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 16 October 2017 at 11:12, Jason Campbell wrote: > I recently came across a bug where checking negative membership > (__contains__ returns False) of an infinite iterator will freeze the > program. > > It may seem a bit obvious, but you can check membership in range for > example without iterating over the entire range. > As Terry noted, this is due to a semantic difference between range (compact representation of a tuple of integers) and arbitrary iterators. However, if we can avoid seemingly innocent code triggering an infinite loop, that's probably a good thing. > `int(1e100) in range(int(1e101))` on my machine takes about 1.5us > `int(1e7) in itertools.count()` on my machine takes about 250ms (and gets > worse quite quickly). > > Any membership test on the infinite iterators that is negative will freeze > the program as stated earlier, which is odd to me. > The other relevant behavioural difference is that even a successful membership test will consume the iterator up to that point. > itertools.count can use the same math as range > +1, with the caveat that a successful containment test should still advance the iterator to the point immediately after the requested element, just as it does today (if people don't want that behaviour, they should be defining a suitable finite range instead). > itertools.cycle could use membership from the underlying iterable > Sort of - you'll still need to iterate through the underlying iterator at least once in order to see all of the candidate elements. However, the containment test could still be defined as running through the full cycle at most once, such that you end up either at the point immediately after the item or else back where you started (if the item wasn't found). > itertools.repeat is even easier, just compare to the repeatable element > +1 So this sounds like a reasonable API UX improvement to me, but you'd need to ensure that you don't inadvertently change the external behaviour of *successful* containment tests. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 17 02:42:35 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Oct 2017 16:42:35 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 17 October 2017 at 16:32, Nick Coghlan wrote: > So this sounds like a reasonable API UX improvement to me, but you'd need > to ensure that you don't inadvertently change the external behaviour of > *successful* containment tests. > I should also note that there's another option here beyond just returning "False": it would also be reasonable to raise an exception like "RuntimeError('Attempted negative containment check on infinite iterator')". That way currently working code would be semantically unchanged, but code that otherwise results in an infinite loop would turn into an immediate exception instead. The rationale for this approach would be "What you are trying to do doesn't really make sense, so we're going to complain about it, rather than just giving you an answer". The rationale against the error is that "If this item is present, advance past it, otherwise don't do anything" would be an at least arguably reasonable operation to request. Whether "x in itr" is a reasonable *spelling* of that operation would then be a different question that still favoured the "raise an exception" approach. Cheers, NIck. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Oct 17 03:44:34 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 17 Oct 2017 18:44:34 +1100 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: <20171017074433.GC9068@ando.pearwood.info> On Tue, Oct 17, 2017 at 04:42:35PM +1000, Nick Coghlan wrote: > I should also note that there's another option here beyond just returning > "False": it would also be reasonable to raise an exception like > "RuntimeError('Attempted negative containment check on infinite iterator')". I don't think that works, even if we limit discussion to just itertools.count() rather than arbitrary iterators. Obviously we cannot wait until the entire infinite iterator is checked (that might take longer than is acceptible...) but if you only check a *finite* number before giving up, you lead to false-negatives: # say we only check 100 values before raising 0 in itertools.count(1) # correctly raises 101 in itertools.count(1) # wrongly raises If we do a computed membership test, then why raise at all? We quickly know whether or not the value is in the sequence, so there's no error to report. Personally, I think a better approach is to give the specialist itertools iterator types a __contains__ method which unconditionally raises a warning regardless of whether the containment test returns true, false or doesn't return at all. Perhaps with a flag (module-wide?) to disable the warning, or turn it into an error. I think a warning (by default) is better than an error because we don't really know for sure that it is an error: n in itertools.count() is, on the face of it, no more than an error than any other potentially infinite loop: while condition(n): ... and like any so-called infinite loop, we can never be sure when to give up and raise. A thousand loops? A million? A millisecond? An hour? Whatever we pick, it will be a case of one-size fits none. I appreciate that, in practice it is easier to mess up a containment test using one of the itertools iterators than to accidentally write an infinite loop using while, and my concession to that is to raise a warning, and let the programmer decide whether to ignore it or turn it into an error. -- Steve From k7hoven at gmail.com Tue Oct 17 04:48:24 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 17 Oct 2017 11:48:24 +0300 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <10deafef-0a2d-cbd2-121f-f5dfe72d6380@trueblade.com> <20171016070314.GA2706@bytereef.org> Message-ID: Replying to myself again here, as nobody else said anything: On Mon, Oct 16, 2017 at 5:42 PM, Koos Zevenhoven wrote: > > > ?Indeed. And some more on where the precision loss comes from: > > When you measure time starting from one point, like 1970, the timer > reaches large numbers today, like 10**9 seconds. Tiny fractions of a second > are especially tiny when compared to a number like that. > > You then need log2(10**9) ~ 30 bits of precision just to get a one-second > resolution in your timer. A double-precision (64bit) floating point number > has 53 bits of precision in the mantissa, so you end up with 23 bits of > precision left for fractions of a second, which means you get a resolution > of 1 / 2**23 seconds, which is about 100 ns, which is well in line with the > data that Victor provided (~100 ns + overhead = ~200 ns). > > ?My calculation is indeed *approximately* correct, but ?the problem is that I made a bunch of decimal rounding errors while doing it, which was not really desirable here. The exact expression for the resolution of time.time() today is: >>> 1 / 2**(53 - math.ceil(math.log2(time.time()))) ?2.384185791015625e-07 So this is in fact a little over 238 ns. Victor got 239 ns experimentally. So actually the resolution is coarse enough to completely drown the the effects of overhead in Victor's tests, and now that the theory is done correctly, it is completely in line with practice. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Oct 17 05:19:26 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 17 Oct 2017 12:19:26 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: 17.10.17 09:42, Nick Coghlan ????: > On 17 October 2017 at 16:32, Nick Coghlan > > wrote: > > So this sounds like a reasonable API UX improvement to me, but you'd > need to ensure that you don't inadvertently change the external > behaviour of *successful* containment tests. > > > I should also note that there's another option here beyond just > returning "False": it would also be reasonable to raise an exception > like "RuntimeError('Attempted negative containment check on infinite > iterator')". What about other operations with infinite iterators? min(count()), max(count()), all(count(1))? Do you want to implement special cases for all of them? From ncoghlan at gmail.com Tue Oct 17 06:51:35 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Oct 2017 20:51:35 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: <20171017074433.GC9068@ando.pearwood.info> References: <20171017074433.GC9068@ando.pearwood.info> Message-ID: On 17 October 2017 at 17:44, Steven D'Aprano wrote: > On Tue, Oct 17, 2017 at 04:42:35PM +1000, Nick Coghlan wrote: > > > I should also note that there's another option here beyond just returning > > "False": it would also be reasonable to raise an exception like > > "RuntimeError('Attempted negative containment check on infinite > iterator')". > > I don't think that works, even if we limit discussion to just > itertools.count() rather than arbitrary iterators. Obviously we > cannot wait until the entire infinite iterator is checked (that > might take longer than is acceptible...) but if you only check a > *finite* number before giving up, you lead to false-negatives: > > # say we only check 100 values before raising > 0 in itertools.count(1) # correctly raises > 101 in itertools.count(1) # wrongly raises > Nobody suggested that, as it's obviously wrong. This discussion is solely about infinite iterators that have closed form containment tests, either because they're computed (itertools.count()), or because they're based on an underlying finite sequence of values (cycle(), repeat()). > If we do a computed membership test, then why raise at all? We quickly > know whether or not the value is in the sequence, so there's no error to > report. > Because we should probably always be raising for these particular containment checks, and it's safe to start doing so in the negative case, since that's currently a guaranteed infinite loop. And unlike a "while True" loop (which has many real world applications), none of these implicit infinite loops allow for any kind of useful work on each iteration, they just end up in a tight loop deep inside the interpreter internals, doing absolutely nothing. They won't even check for signals or release the GIL, so you'll need to ask the operating system to clobber the entire process to break out of it - Ctrl-C will be ignored. I'd also have no major objection to deprecating containment tests on these iterators entirely, but that doesn't offer the same kind of UX benefit that replacing an infinite loop with an immediate exception does, so I think the two questions should be considered separately. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 17 07:10:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 17 Oct 2017 21:10:59 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 17 October 2017 at 19:19, Serhiy Storchaka wrote: > 17.10.17 09:42, Nick Coghlan ????: > >> On 17 October 2017 at 16:32, Nick Coghlan > ncoghlan at gmail.com>> wrote: >> >> So this sounds like a reasonable API UX improvement to me, but you'd >> need to ensure that you don't inadvertently change the external >> behaviour of *successful* containment tests. >> >> >> I should also note that there's another option here beyond just returning >> "False": it would also be reasonable to raise an exception like >> "RuntimeError('Attempted negative containment check on infinite iterator')". >> > > What about other operations with infinite iterators? min(count()), > max(count()), all(count(1))? Do you want to implement special cases for all > of them? No, as folks expect those to iterate without the opportunity to break out, and are hence more careful with them when infinite iterators are part of their application. We also don't have any existing protocols we could use to intercept them, even if we decided we *did* want to do so. The distinction I see with "x in y" is: 1. It's pretty easy to write "for x in y in y" when you really meant to write "for x in y", and if "y" is an infinite iterator, the "y in y" part will become an unbreakable infinite loop when executed instead of the breakable one you intended (especially annoying if it means you have to discard and restart a REPL session due to it, and that's exactly where that kind of typo is going to be easiest to make) 2. Containment testing already has a dedicated protocol so containers can implement optimised containment tests, which means it's also trivial for an infinite iterator to intercept and explicitly disallow containment checks if it chooses to do so So the problem is more likely to be encountered due to "x in y" appearing in both the containment test syntax and as part of the iteration syntax, *and* it's straightforward to do something about it because the __contains__ hook already exists. Those two points together are enough for me to say "Sure, it makes sense to replace the current behaviour with something more user friendly". If either of them was false, then I'd say "No, that's not worth the hassle of changing anything". Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Oct 17 07:46:38 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 17 Oct 2017 14:46:38 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: 17.10.17 14:10, Nick Coghlan ????: > 1. It's pretty easy to write "for x in y in y" when you really meant to > write "for x in y", and if "y" is an infinite iterator, the "y in y" > part will become an unbreakable infinite loop when executed instead of > the breakable one you intended (especially annoying if it means you have > to discard and restart a REPL session due to it, and that's exactly > where that kind of typo is going to be easiest to make) I think it is better to left this on linters. I never encountered this mistake and doubt it is common. In any case the first execution of this code will expose the mistake. > 2. Containment testing already has a dedicated protocol so containers > can implement optimised containment tests, which means it's also trivial > for an infinite iterator to intercept and explicitly disallow > containment checks if it chooses to do so But this has non-zero maintaining cost. As the one who made many changes in itertools.c I don't like the idea of increasing its complexity for optimizing a pretty rare case. And note that the comparison can have side effect. You can implement the optimization of `x in count()` only for the limited set of builtin types. For example `x in range()` is optimized only for exact int and bool. You can't guarantee the finite time for cycle() and repeat() either since they can emit values of arbitrary types, with arbitrary __eq__. From k7hoven at gmail.com Tue Oct 17 09:17:45 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 17 Oct 2017 16:17:45 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On Tue, Oct 17, 2017 at 2:46 PM, Serhiy Storchaka wrote: > 17.10.17 14:10, Nick Coghlan ????: > >> 1. It's pretty easy to write "for x in y in y" when you really meant to >> write "for x in y", and if "y" is an infinite iterator, the "y in y" part >> will become an unbreakable infinite loop when executed instead of the >> breakable one you intended (especially annoying if it means you have to >> discard and restart a REPL session due to it, and that's exactly where that >> kind of typo is going to be easiest to make) >> > > I think it is better to left this on linters. ?Just to note that there is currently nothing that would prevent making `for x in y in z`? a syntax error. There is nothing meaningful that it could do, really, because y in z can only return True or False (or raise an Exception or loop infinitely). But for an infinite iterable, the right answer may be Maybe ;) ???Koos? > I never encountered this mistake and doubt it is common. In any case the > first execution of this code will expose the mistake. > > 2. Containment testing already has a dedicated protocol so containers can >> implement optimised containment tests, which means it's also trivial for an >> infinite iterator to intercept and explicitly disallow containment checks >> if it chooses to do so >> > > But this has non-zero maintaining cost. As the one who made many changes > in itertools.c I don't like the idea of increasing its complexity for > optimizing a pretty rare case. > > And note that the comparison can have side effect. You can implement the > optimization of `x in count()` only for the limited set of builtin types. > For example `x in range()` is optimized only for exact int and bool. You > can't guarantee the finite time for cycle() and repeat() either since they > can emit values of arbitrary types, with arbitrary __eq__. > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 17 10:06:11 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 00:06:11 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 17 October 2017 at 21:46, Serhiy Storchaka wrote: > 17.10.17 14:10, Nick Coghlan ????: > >> 1. It's pretty easy to write "for x in y in y" when you really meant to >> write "for x in y", and if "y" is an infinite iterator, the "y in y" part >> will become an unbreakable infinite loop when executed instead of the >> breakable one you intended (especially annoying if it means you have to >> discard and restart a REPL session due to it, and that's exactly where that >> kind of typo is going to be easiest to make) >> > > I think it is better to left this on linters. I never encountered this > mistake and doubt it is common. In any case the first execution of this > code will expose the mistake. > People don't run linters at the REPL, and it's at the REPL where accidentally getting an unbreakable infinite loop is most annoying. Keep in mind we're not talking about a regular loop you can break out of with Ctrl-C here - we're talking about a tight loop inside the interpreter internals that leads to having to kill the whole host process just to get out of it. > 2. Containment testing already has a dedicated protocol so containers can >> implement optimised containment tests, which means it's also trivial for an >> infinite iterator to intercept and explicitly disallow containment checks >> if it chooses to do so >> > > But this has non-zero maintaining cost. As the one who made many changes > in itertools.c I don't like the idea of increasing its complexity for > optimizing a pretty rare case. > It's not an optimisation, it's a UX improvement for the interactive prompt. The maintenance burden should be low, as it's highly unlikely we'd ever need to change this behaviour again in the future (I do think deprecating the success case would be more trouble than it would be worth though). > And note that the comparison can have side effect. You can implement the > optimization of `x in count()` only for the limited set of builtin types. > For example `x in range()` is optimized only for exact int and bool. You > can't guarantee the finite time for cycle() and repeat() either since they > can emit values of arbitrary types, with arbitrary __eq__. We're not trying to guarantee finite execution time in general, we're just making it more likely that either Ctrl-C works, or else you don't get stuck in an infinite loop in the first place. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 17 10:09:26 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 00:09:26 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 17 October 2017 at 23:17, Koos Zevenhoven wrote: > On Tue, Oct 17, 2017 at 2:46 PM, Serhiy Storchaka > wrote: > >> 17.10.17 14:10, Nick Coghlan ????: >> >>> 1. It's pretty easy to write "for x in y in y" when you really meant to >>> write "for x in y", and if "y" is an infinite iterator, the "y in y" part >>> will become an unbreakable infinite loop when executed instead of the >>> breakable one you intended (especially annoying if it means you have to >>> discard and restart a REPL session due to it, and that's exactly where that >>> kind of typo is going to be easiest to make) >>> >> >> I think it is better to left this on linters. > > > ?Just to note that there is currently nothing that would prevent making > `for x in y in z`? a syntax error. There is nothing meaningful that it > could do, really, because y in z can only return True or False (or raise an > Exception or loop infinitely). > That was just an example of one of the ways we can accidentally end up writing "x in y" at the REPL, where "y" is an infinite iterator, since it's the kind that's specific to "x in y", whereas other forms (like accidentally using the wrong variable name) also apply to other iterator consuming APIs (like the ones Serhiy mentioned). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Tue Oct 17 10:26:44 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Tue, 17 Oct 2017 17:26:44 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: 17.10.17 17:06, Nick Coghlan ????: > Keep in mind we're not talking about a regular loop you can break out of > with Ctrl-C here - we're talking about a tight loop inside the > interpreter internals that leads to having to kill the whole host > process just to get out of it. And this is the root of the issue. Just let more tight loops be interruptible with Ctrl-C, and this will fix the more general issue. From k7hoven at gmail.com Tue Oct 17 13:39:14 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 17 Oct 2017 20:39:14 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka wrote: > 17.10.17 17:06, Nick Coghlan ????: > >> Keep in mind we're not talking about a regular loop you can break out of >> with Ctrl-C here - we're talking about a tight loop inside the interpreter >> internals that leads to having to kill the whole host process just to get >> out of it. >> > > And this is the root of the issue. Just let more tight loops be > interruptible with Ctrl-C, and this will fix the more general issue. > > ?Not being able to interrupt something with Ctrl-C in the repl or with the interrupt command in Jupyter notebooks is definitely a thing I sometimes encounter. A pity I don't remember when it happens, because I usually forget it very soon after I've restarted the kernel and continued working. But my guess is it's usually not because of an infinite iterator. Regarding what the OP might have been after, and just for some wild brainstorming based on true stories: In some sense, x in y should always have an answer, even if it may be expensive to compute. Currently, it's possible to implement "lazy truth values" which compute the bool value lazily when .__bool__() is called. Until you call bool(..) on it, it would just be Maybe, and then after the call, you'd actually have True or False. In many cases it can even be enough to know if something is Maybe true. Also, if you do something like any(*truth_values), then you could skip the Maybe ones on the first pass, because if you find one that's plain True, you already have the answer. Regarding `x in y`, where y is an infinite iterable without well defined contents, that would return an instance of MaybeType, but .__bool__() would raise an exception. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Tue Oct 17 14:44:49 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Tue, 17 Oct 2017 11:44:49 -0700 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: <59E64FA1.1020307@brenbarn.net> On 2017-10-17 07:26, Serhiy Storchaka wrote: > 17.10.17 17:06, Nick Coghlan ????: >> >Keep in mind we're not talking about a regular loop you can break out of >> >with Ctrl-C here - we're talking about a tight loop inside the >> >interpreter internals that leads to having to kill the whole host >> >process just to get out of it. > And this is the root of the issue. Just let more tight loops be > interruptible with Ctrl-C, and this will fix the more general issue. I was just thinking the same thing. I think in general it's always bad for code to be uninterruptible with Ctrl-C. If these infinite iterators were fixed so they could be interrupted, this containment problem would be much less painful. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From k7hoven at gmail.com Wed Oct 18 05:56:23 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 12:56:23 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: <59E64FA1.1020307@brenbarn.net> References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Tue, Oct 17, 2017 at 9:44 PM, Brendan Barnwell wrote: > On 2017-10-17 07:26, Serhiy Storchaka wrote: > >> 17.10.17 17:06, Nick Coghlan ????: >> >>> >Keep in mind we're not talking about a regular loop you can break out of >>> >with Ctrl-C here - we're talking about a tight loop inside the >>> >interpreter internals that leads to having to kill the whole host >>> >process just to get out of it. >>> >> And this is the root of the issue. Just let more tight loops be >> interruptible with Ctrl-C, and this will fix the more general issue. >> > > I was just thinking the same thing. I think in general it's > always bad for code to be uninterruptible with Ctrl-C. Indeed I agree about this. > If these infinite iterators were fixed so they could be interrupted, this > containment problem would be much less painful. > > ? I'm unable to reproduce the "uninterruptible with Ctrl-C"? problem with infinite iterators. At least itertools doesn't seem to have it: >>> import itertools >>> for i in itertools.count(): ... pass ... ^CTraceback (most recent call last): File "", line 1, in KeyboardInterrupt >>> for i in itertools.repeat(1): ... pass ... ^CTraceback (most recent call last): File "", line 1, in KeyboardInterrupt >>> for i in itertools.cycle((1,)): ... pass ... ^CTraceback (most recent call last): File "", line 1, in KeyboardInterrupt >>> Same thing on both Windows and Linux, Python 3.6. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 06:22:47 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 20:22:47 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 18 October 2017 at 03:39, Koos Zevenhoven wrote: > On Tue, Oct 17, 2017 at 5:26 PM, Serhiy Storchaka > wrote: > >> 17.10.17 17:06, Nick Coghlan ????: >> >>> Keep in mind we're not talking about a regular loop you can break out of >>> with Ctrl-C here - we're talking about a tight loop inside the interpreter >>> internals that leads to having to kill the whole host process just to get >>> out of it. >>> >> >> And this is the root of the issue. Just let more tight loops be >> interruptible with Ctrl-C, and this will fix the more general issue. >> >> > ?Not being able to interrupt something with Ctrl-C in the repl or with the > interrupt command in Jupyter notebooks is definitely a thing I sometimes > encounter. A pity I don't remember when it happens, because I usually > forget it very soon after I've restarted the kernel and continued working. > But my guess is it's usually not because of an infinite iterator. > Fixing the general case is hard, because the assumption that signals are only checked between interpreter opcodes is a pervasive one throughout the interpreter internals. We certainly *could* redefine affected C APIs as potentially raising KeyboardInterrupt (adjusting the signal management infrastructure accordingly), and if someone actually follows through and implements that some day, then the argument could then be made that given such change, it might be reasonable to drop any a priori guards that we have put in place for particular *detectable* uninterruptible infinite loops. However, that's not the design question being discussed in this thread. The design question here is "We have 3 known uninterruptible infinite loops that are easy to detect and prevent. Should we detect and prevent them?". "We shouldn't allow anyone to do this easy thing, because it would be preferable for someone to instead do this hard and complicated thing that nobody is offering to do" isn't a valid design argument in that situation. And I have a four step check for that which prompts me to say "Yes, we should detect and prevent them": 1. Uninterruptible loops are bad, so having fewer of them is better 2. These particular cases can be addressed locally using existing protocols, so the chances of negative side effects are low 3. The total amount of code involved is likely to be small (a dozen or so lines of C, a similar number of lines of Python in the tests) in well-isolated protocol functions, so the chances of introducing future maintainability problems are low 4. We have a potential contributor who is presumably offering to do the work (if that's not the case, then the question is moot anyway until a sufficiently interested volunteer turns up) As an alternative implementation approach, the case could also be made that these iterators should be raising TypeError in __length_hint__, as that protocol method is explicitly designed to be used for finite container pre-allocation. That way things like "list(itertools.count())" would fail immediately (similar to the way "list(range(10**100))" already does) rather than attempting to consume all available memory before (hopefully) finally failing with MemoryError. If we were to do that, then we *could* make the solution to the reported problem more general by having all builtin and standard library operations that expect to be working with finite iterators (the containment testing fallback, min, max, sum, any, all, functools.reduce, etc) check for a length hint, even if they aren't actually pre-allocating any memory. Then the general purpose marker for "infinite iterator" would be "Explicitly defines __length_hint__ to raise TypeError", and it would prevent a priori all operations that attempted to fully consume the iterator. That more general approach would cause some currently "working" code (like "any(itertools.count())" and "all(itertools.count())", both of which consume at most 2 items from the iterator) to raise an exception instead, and hence would require the introduction of a DeprecationWarning in 3.7 (where the affected APIs would start calling length hint, but suppress any exceptions from it), before allowing the exception to propagate in 3.8+. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 18 06:26:52 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 18 Oct 2017 11:26:52 +0100 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 18 October 2017 at 10:56, Koos Zevenhoven wrote: > I'm unable to reproduce the "uninterruptible with Ctrl-C" problem with > infinite iterators. At least itertools doesn't seem to have it: > >>>> import itertools >>>> for i in itertools.count(): > ... pass > ... > ^CTraceback (most recent call last): > File "", line 1, in > KeyboardInterrupt That's not the issue here, as the CPython interpreter implements this with multiple opcodes, and checks between opcodes for Ctrl-C. The demonstration is: >>> import itertools >>> 'x' in itertools.count() ... only way to break out is to kill the process. Paul From ncoghlan at gmail.com Wed Oct 18 06:28:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 20:28:59 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 18 October 2017 at 19:56, Koos Zevenhoven wrote: > I'm unable to reproduce the "uninterruptible with Ctrl-C"? problem with > infinite iterators. At least itertools doesn't seem to have it: > > >>> import itertools > >>> for i in itertools.count(): > ... pass > ... > That's interrupting the for loop, not the iterator. This is the test case you want for the problem Jason raised: >>> "a" in itertools.count() Be prepared to suspend and terminate the affected process, because Ctrl-C isn't going to help :) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Oct 18 06:39:28 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 13:39:28 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Oct 18, 2017 13:29, "Nick Coghlan" wrote: On 18 October 2017 at 19:56, Koos Zevenhoven wrote: > I'm unable to reproduce the "uninterruptible with Ctrl-C"? problem with > infinite iterators. At least itertools doesn't seem to have it: > > >>> import itertools > >>> for i in itertools.count(): > ... pass > ... > That's interrupting the for loop, not the iterator. This is the test case you want for the problem Jason raised: >>> "a" in itertools.count() Be prepared to suspend and terminate the affected process, because Ctrl-C isn't going to help :) I'm writing from my phone now, cause I was dumb enough to try list(count()) But should it be fixed in list or in count? -- Koos PS. Nick, sorry about the duplicate email. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 07:08:04 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 21:08:04 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 18 October 2017 at 20:39, Koos Zevenhoven wrote: > On Oct 18, 2017 13:29, "Nick Coghlan" wrote: > > On 18 October 2017 at 19:56, Koos Zevenhoven wrote: > >> I'm unable to reproduce the "uninterruptible with Ctrl-C"? problem with >> infinite iterators. At least itertools doesn't seem to have it: >> >> >>> import itertools >> >>> for i in itertools.count(): >> ... pass >> ... >> > > That's interrupting the for loop, not the iterator. This is the test case > you want for the problem Jason raised: > > >>> "a" in itertools.count() > > Be prepared to suspend and terminate the affected process, because Ctrl-C > isn't going to help :) > > > I'm writing from my phone now, cause I was dumb enough to try list(count()) > Yeah, that's pretty much the worst case example, since the machine starts thrashing memory long before it actually gives up and starts denying the allocation requests :( > But should it be fixed in list or in count? > That one can only be fixed in count() - list already checks operator.length_hint(), so implementing itertools.count.__length_hint__() to always raise an exception would be enough to handle the container constructor case. The open question would then be the cases that don't pre-allocate memory, but still always attempt to consume the entire iterator: min(itr) max(itr) sum(itr) functools.reduce(op, itr) "".join(itr) And those which *may* attempt to consume the entire iterator, but won't necessarily do so: x in itr any(itr) all(itr) The items in the first category could likely be updated to check length_hint and propagate any errors immediately, since they don't provide any short circuiting behaviour - feeding them an infinite iterator is a guaranteed uninterruptible infinite loop, so checking for a length hint won't break any currently working code (operator.length_hint defaults to returning zero if a type doesn't implement __length_hint__). I'm tempted to say the same for the APIs in the latter category as well, but their short-circuiting semantics mean those can technically have well-defined behaviour, even when given an infinite iterator: >>> any(itertools.count()) True >>> all(itertools.count()) False >>> 1 in itertools.count() True It's only the "never short-circuits" branch that is ill-defined for non-terminating input. So for these, the safer path would be to emit DeprecationWarning if length_hint fails in 3.7, and then pass the exception through in 3.8+. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Wed Oct 18 07:38:17 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 18 Oct 2017 22:38:17 +1100 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: <20171018113817.GE9068@ando.pearwood.info> On Wed, Oct 18, 2017 at 01:39:28PM +0300, Koos Zevenhoven wrote: > I'm writing from my phone now, cause I was dumb enough to try list(count()) You have my sympathies -- I once, due to typo, accidentally ran something like range(10**100) in Python 2. > But should it be fixed in list or in count? Neither. There are too many other places this can break for it to be effective to try to fix each one in place. e.g. set(xrange(2**64)), or tuple(itertools.repeat([1])) Rather, I think we should set a memory limit that applies to the whole process. Once you try to allocate more memory, you get an MemoryError exception rather than have the OS thrash forever trying to allocate a terabyte on a 4GB machine. (I don't actually understand why the OS can't fix this.) Being able to limit memory use is a fairly common request, e.g. on Stackoverflow: https://stackoverflow.com/questions/30269238/limit-memory-usage https://stackoverflow.com/questions/2308091/how-to-limit-the-heap-size https://community.webfaction.com/questions/15367/setting-max-memory-for-python-script And Java apparently has a commandline switch to manage memory: https://stackoverflow.com/questions/22887400/is-there-an-equivalent-to-java-xmx-for-python The problems with the resources module are that its effectively an interface to ulimit, which makes it confusing and platform-specific; it is no help to Windows users; it isn't used by default; and not many people know about it. (I know I didn't until tonight.) So here is my suggestion: 1. Let's add a function in sys to set the "maximum memory" available, for some definition of memory that makes the most sense on your platform. Ordinary Python programmers shouldn't have to try to decipher the ulimit interface. 2. Have a command line switch to set that value, e.g.: python3 -L 1073741824 # 1 GiB python3 -L 0 # machine-dependent limit python3 -L -1 # unlimited where the machine-dependent limit is set by the interpreter, depending on the amount of memory it thinks it has available. 3. For the moment, stick to defaulting to -L -1 "unlimited", but with the intention to change to -L 0 "let the interpreter decide" in some future release, after an appropriate transition period. On Linux, we can always run ulimit XXXX python3 but honestly I never know which option to give (maximum stack size? maximum virtual memory? why is there no setting for maximum real memory?) and that doesn't help Windows users. Thoughts? -- Steve From k7hoven at gmail.com Wed Oct 18 08:36:15 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 15:36:15 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan wrote: > On 18 October 2017 at 20:39, Koos Zevenhoven wrote: > >> On Oct 18, 2017 13:29, "Nick Coghlan" wrote: >> >> On 18 October 2017 at 19:56, Koos Zevenhoven wrote: >> >>> I'm unable to reproduce the "uninterruptible with Ctrl-C"? problem with >>> infinite iterators. At least itertools doesn't seem to have it: >>> >>> >>> import itertools >>> >>> for i in itertools.count(): >>> ... pass >>> ... >>> >> >> That's interrupting the for loop, not the iterator. This is the test case >> you want for the problem Jason raised: >> >> >>> "a" in itertools.count() >> >> Be prepared to suspend and terminate the affected process, because Ctrl-C >> isn't going to help :) >> >> >> I'm writing from my phone now, cause I was dumb enough to try >> list(count()) >> > > Yeah, that's pretty much the worst case example, since the machine starts > thrashing memory long before it actually gives up and starts denying the > allocation requests :( > > >> But should it be fixed in list or in count? >> > > That one can only be fixed in count() - list already checks > operator.length_hint(), so implementing itertools.count.__length_hint__() > to always raise an exception would be enough to handle the container > constructor case. > > While that may be a convenient hack to solve some of the cases, maybe it's possible for list(..) etc. to give Ctrl-C a chance every now and then? (Without a noticeable performance penalty, that is.) That would also help with *finite* C-implemented iterables that are just slow to turn into a list. If I'm not mistaken, we're talking about C-implemented functions that iterate over C-implemented iterators. It's not at all obvious to me that it's the iterator that should handle Ctrl-C. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 08:43:57 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 22:43:57 +1000 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: <20171018113817.GE9068@ando.pearwood.info> References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> Message-ID: On 18 October 2017 at 21:38, Steven D'Aprano wrote: > > But should it be fixed in list or in count? > > Neither. There are too many other places this can break for it to be > effective to try to fix each one in place. > > e.g. set(xrange(2**64)), or tuple(itertools.repeat([1])) > A great many of these call operator.length_hint() these days in order to make a better guess as to how much memory to pre-allocate, so while that still wouldn't intercept everything, it would catch a lot of them. > Rather, I think we should set a memory limit that applies to the whole > process. Once you try to allocate more memory, you get an MemoryError > exception rather than have the OS thrash forever trying to allocate a > terabyte on a 4GB machine. > > (I don't actually understand why the OS can't fix this.) > Trying to allocate enormous amounts of memory all at once isn't the problem, as that just fails outright with "Not enough memory": >>> data = bytes(2**62) Traceback (most recent call last): File "", line 1, in MemoryError The machine-killing case is repeated allocation requests that the operating system *can* satisfy, but require paging almost everything else out of RAM. And that's exactly what "list(infinite_iterator)" entails, since the interpreter will make an initial guess as to the correct size, and then keep resizing the allocation to 125% of its previous size each time it fills up (or so - I didn't check the current overallocation factor) . Per-process memory quotas *can* help avoid this, but enforcing them requires that every process run in a resource controlled sandbox. Hence, it's not a coincidence that mobile operating systems and container-based server environments already work that way, and the improved ability to cope with misbehaving applications is part of why desktop operating systems would like to follow the lead of their mobile and server counterparts :) So here is my suggestion: > > 1. Let's add a function in sys to set the "maximum memory" available, > for some definition of memory that makes the most sense on your > platform. Ordinary Python programmers shouldn't have to try to decipher > the ulimit interface. > Historically, one key reason we didn't do that was because the `PyMem_*` APIs bypassed CPython's memory allocator, so such a limit wouldn't have been particularly effective. As of 3.6 though, even bulk memory allocations pass through pymalloc, making a Python level memory allocation limit potentially more viable (since it would pick up almost all of the interpeter's own allocations, even if it missed those in extension modules): https://docs.python.org/dev/whatsnew/3.6.html#optimizations Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Wed Oct 18 08:51:37 2017 From: stefan at bytereef.org (Stefan Krah) Date: Wed, 18 Oct 2017 14:51:37 +0200 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> Message-ID: <20171018125137.GA16840@bytereef.org> On Wed, Oct 18, 2017 at 10:43:57PM +1000, Nick Coghlan wrote: > Per-process memory quotas *can* help avoid this, but enforcing them > requires that every process run in a resource controlled sandbox. Hence, > it's not a coincidence that mobile operating systems and container-based > server environments already work that way, and the improved ability to cope > with misbehaving applications is part of why desktop operating systems > would like to follow the lead of their mobile and server counterparts :) Does this also fall under the sandbox definition? $ softlimit -m 1000000000 python3 Python 3.7.0a1+ (heads/master:bdaeb7d237, Oct 16 2017, 18:54:55) [GCC 4.8.5] on linux Type "help", "copyright", "credits" or "license" for more information. >>> [0] * 10000000000000 Traceback (most recent call last): File "", line 1, in MemoryError People who are worried could make a python3 alias or use Ctrl-\. Stefan Krah From storchaka at gmail.com Wed Oct 18 08:53:19 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 18 Oct 2017 15:53:19 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: 18.10.17 13:22, Nick Coghlan ????: > 2.. These particular cases can be addressed locally using existing > protocols, so the chances of negative side effects are low Only the particular case `count() in count()` can be addressed without breaking the following examples: >>> class C: ... def __init__(self, count): ... self.count = count ... def __eq__(self, other): ... print(self.count, other) ... if not self.count: ... return True ... self.count -= 1 ... return False ... >>> import itertools >>> C(5) in itertools.count() 5 0 4 1 3 2 2 3 1 4 0 5 True >>> it = itertools.cycle([C(5)]); it in it 5 4 3 2 1 0 True >>> it = itertools.repeat(C(5)); it in it 5 repeat(<__main__.C object at 0x7f65512c5dc0>) 4 repeat(<__main__.C object at 0x7f65512c5dc0>) 3 repeat(<__main__.C object at 0x7f65512c5dc0>) 2 repeat(<__main__.C object at 0x7f65512c5dc0>) 1 repeat(<__main__.C object at 0x7f65512c5dc0>) 0 repeat(<__main__.C object at 0x7f65512c5dc0>) True > 3. The total amount of code involved is likely to be small (a dozen or > so lines of C, a similar number of lines of Python in the tests) in > well-isolated protocol functions, so the chances of introducing future > maintainability problems are low It depends on what you want to achieve. Just prevent an infinity loop in `count() in count()`, or optimize `int in count()`, or optimize more special cases. > 4. We have a potential contributor who is presumably offering to do the > work (if that's not the case, then the question is moot anyway until a > sufficiently interested volunteer turns up) Maintaining is more than writing an initial code. > If we were to do that, then we *could* make the solution to the reported > problem more general by having all builtin and standard library > operations that expect to be working with finite iterators (the > containment testing fallback, min, max, sum, any, all, functools.reduce, > etc) check for a length hint, even if they aren't actually > pre-allocating any memory. This will add a significant overhead for relatively short (hundreds of items) sequences. I already did benchmarking for similar cases in the past. From k7hoven at gmail.com Wed Oct 18 09:40:28 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 16:40:28 +0300 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: <20171018113817.GE9068@ando.pearwood.info> References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> Message-ID: On Wed, Oct 18, 2017 at 2:38 PM, Steven D'Aprano wrote: > On Wed, Oct 18, 2017 at 01:39:28PM +0300, Koos Zevenhoven wrote: > > > I'm writing from my phone now, cause I was dumb enough to try > list(count()) > > You have my sympathies -- I once, due to typo, accidentally ran > something like range(10**100) in Python 2. > > ?Oh, I think I've done something like that too, and there are definitely still opportunities in Python 3 to ask for the impossible. But what I did now, I did "on purpose".? For a split second, I really wanted to know how bad it would be. But a few minutes later I had little interest left in that ;). Rebooting a computer definitely takes longer than restarting a Python process. > > > But should it be fixed in list or in count? > > Neither. There are too many other places this can break for it to be > effective to try to fix each one in place. > > ?To clarify, I was talking about allowing Ctrl-C to break it, which somebody had suggested. That would also help if the C-implemented iterable just takes a lot of time to generate the items. ?And for the record, I just tried >>> sum(itertools.count()) And as we could expect, it does not respect Ctrl-C either. ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 09:43:12 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 18 Oct 2017 23:43:12 +1000 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: <20171018125137.GA16840@bytereef.org> References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> Message-ID: On 18 October 2017 at 22:51, Stefan Krah wrote: > On Wed, Oct 18, 2017 at 10:43:57PM +1000, Nick Coghlan wrote: > > Per-process memory quotas *can* help avoid this, but enforcing them > > requires that every process run in a resource controlled sandbox. Hence, > > it's not a coincidence that mobile operating systems and container-based > > server environments already work that way, and the improved ability to > cope > > with misbehaving applications is part of why desktop operating systems > > would like to follow the lead of their mobile and server counterparts :) > > Does this also fall under the sandbox definition? > > $ softlimit -m 1000000000 python3 > Yeah, Linux offers good opt-in tools for this kind of thing, and the combination of Android and containerised server environments means they're only getting better. But we're still some time away from it being routine for your desktop to be well protected from memory management misbehaviour in arbitrary console or GUI applications. The resource module (which Steven mentioned in passing) already provides opt-in access to some of those features from within the program itself: https://docs.python.org/3/library/resource.html For example: >>> import sys, resource >>> data = bytes(2**32) >>> resource.setrlimit(resource.RLIMIT_DATA, (2**31, sys.maxsize)) >>> data = bytes(2**32) Traceback (most recent call last): File "", line 1, in MemoryError >>> resource.setrlimit(resource.RLIMIT_DATA, (sys.maxsize, sys.maxsize)) >>> data = bytes(2**32) (Bulk memory allocations start failing on my machine somewhere between 2**33 and 2**34, which is about what I'd expect, since it has 8 GiB of physical RAM installed) Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 10:29:47 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Oct 2017 00:29:47 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: On 18 October 2017 at 22:53, Serhiy Storchaka wrote: > 18.10.17 13:22, Nick Coghlan ????: > >> 2.. These particular cases can be addressed locally using existing >> protocols, so the chances of negative side effects are low >> > > Only the particular case `count() in count()` can be addressed without > breaking the following examples: > You're right, the potential impact on objects with weird __eq__ implementations would mean that even the `__contains__` approach would require deprecation warnings for the APIs that allow short-circuiting. So we can discard it in favour of exploring the "Can we make a beneficial improvement via __length_hint__?" question. > 3. The total amount of code involved is likely to be small (a dozen or so >> lines of C, a similar number of lines of Python in the tests) in >> well-isolated protocol functions, so the chances of introducing future >> maintainability problems are low >> > > It depends on what you want to achieve. Just prevent an infinity loop in > `count() in count()`, or optimize `int in count()`, or optimize more > special cases. > My interest lies specifically in reducing the number of innocent looking ways we offer to provoke uninterruptible infinite loops or unbounded memory consumption. > 4. We have a potential contributor who is presumably offering to do the >> work (if that's not the case, then the question is moot anyway until a >> sufficiently interested volunteer turns up) >> > > Maintaining is more than writing an initial code. > Aye, that's why the preceding point was to ask how large a change we'd be offering to maintain indefinitely, and how well isolated that change would be. > If we were to do that, then we *could* make the solution to the reported >> problem more general by having all builtin and standard library operations >> that expect to be working with finite iterators (the containment testing >> fallback, min, max, sum, any, all, functools.reduce, etc) check for a >> length hint, even if they aren't actually pre-allocating any memory. >> > > This will add a significant overhead for relatively short (hundreds of > items) sequences. I already did benchmarking for similar cases in the past. I did wonder about that, so I guess the baseline zero-risk enhancement idea would be to only prevent the infinite loop in cases that already request a length hint as a memory pre-allocation check. That would reduce the likelihood of the most painful case (grinding the machine to a halt), without worrying about the less painful cases (which will break the current process, but the rest of the machine will be fine). Given that, adding TypeError raising __length_hint__ implementations to itertools.count(), itertools.cycle(), and itertools.repeat() would make sense as an independent RFE, without worrying about any APIs that don't already check for a length hint. A more intrusive option would then be to look at breaking the other tight iteration loops into two phases, such that checking for potentially infinite iterators could be delayed until after the first thousand iterations or so. That option is potentially worth exploring anyway, since breaking up the current single level loops as nested loops would be a pre-requisite for allowing these APIs to check for signals while they're running while keeping the per-iteration overhead low (only one pre-requisite of many though, and probably one of the easier ones). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Wed Oct 18 10:44:57 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Wed, 18 Oct 2017 16:44:57 +0200 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: <59E64FA1.1020307@brenbarn.net> References: <59E64FA1.1020307@brenbarn.net> Message-ID: Hi all, FWIW, I just tried the list(count()) experiment on my phone (Termux Python interpreter under Android). Python 3.6.2 (default, Sep 16 2017, 23:55:07) [GCC 4.2.1 Compatible Android Clang 5.0.300080 ] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import itertools >>> list(itertools.count()) Killed Interestingly even the Termux app stays alive and otherwise the phone remains responsive and doesn't get hot. I am now sending this mail from that very phone. So this issue is not an issue on the world's most popular OS ? Stephan Op 18 okt. 2017 08:46 schreef "Brendan Barnwell" : > On 2017-10-17 07:26, Serhiy Storchaka wrote: > >> 17.10.17 17:06, Nick Coghlan ????: >> >>> >Keep in mind we're not talking about a regular loop you can break out of >>> >with Ctrl-C here - we're talking about a tight loop inside the >>> >interpreter internals that leads to having to kill the whole host >>> >process just to get out of it. >>> >> And this is the root of the issue. Just let more tight loops be >> interruptible with Ctrl-C, and this will fix the more general issue. >> > > I was just thinking the same thing. I think in general it's > always bad for code to be uninterruptible with Ctrl-C. If these infinite > iterators were fixed so they could be interrupted, this containment problem > would be much less painful. > > -- > Brendan Barnwell > "Do not follow where the path may lead. Go, instead, where there is no > path, and leave a trail." > --author unknown > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Wed Oct 18 10:48:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Oct 2017 00:48:59 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 18 October 2017 at 22:36, Koos Zevenhoven wrote: > On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan wrote: > >> That one can only be fixed in count() - list already checks >> operator.length_hint(), so implementing itertools.count.__length_hint__() >> to always raise an exception would be enough to handle the container >> constructor case. >> > > While that may be a convenient hack to solve some of the cases, maybe it's > possible for list(..) etc. to give Ctrl-C a chance every now and then? > (Without a noticeable performance penalty, that is.) That would also help > with *finite* C-implemented iterables that are just slow to turn into a > list. > > If I'm not mistaken, we're talking about C-implemented functions that > iterate over C-implemented iterators. It's not at all obvious to me that > it's the iterator that should handle Ctrl-C. > It isn't, it's the loop's responsibility. The problem is that one of the core design assumptions in the CPython interpreter implementation is that signals from the operating system get handled by the opcode eval loop in the main thread, and Ctrl-C is one of those signals. This is why "for x in itertools.cycle(): pass" can be interrupted, while "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop isn't running, as we're inside a tight loop inside the sum() implementation. It's easy to say "Well those loops should all be checking for signals then", but I expect folks wouldn't actually like the consequences of doing something about it, as: 1. It will make those loops slower, due to the extra overhead of checking for signals (even the opcode eval loop includes all sorts of tricks to avoid actually checking for new signals, since doing so is relatively slow) 2. It will make those loops harder to maintain, since the high cost of checking for signals means the existing flat loops will need to be replaced with nested ones to reduce the per-iteration cost of the more expensive checks 3. It means making the signal checking even harder to reason about than it already is, since even C implemented methods that avoid invoking arbitrary Python code could now still end up checking for signals It's far from being clear to me that making such a change would actually be a net improvement, especially when there's an opportunity to mitigate the problem by having known-infinite iterators report themselves as such. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Oct 18 11:27:56 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 18:27:56 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 5:48 PM, Nick Coghlan wrote: > On 18 October 2017 at 22:36, Koos Zevenhoven wrote: > >> On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan wrote: >> >>> That one can only be fixed in count() - list already checks >>> operator.length_hint(), so implementing itertools.count.__length_hint__() >>> to always raise an exception would be enough to handle the container >>> constructor case. >>> >> >> While that may be a convenient hack to solve some of the cases, maybe >> it's possible for list(..) etc. to give Ctrl-C a chance every now and then? >> (Without a noticeable performance penalty, that is.) That would also help >> with *finite* C-implemented iterables that are just slow to turn into a >> list. >> >> If I'm not mistaken, we're talking about C-implemented functions that >> iterate over C-implemented iterators. It's not at all obvious to me that >> it's the iterator that should handle Ctrl-C. >> > > It isn't, it's the loop's responsibility. The problem is that one of the > core design assumptions in the CPython interpreter implementation is that > signals from the operating system get handled by the opcode eval loop in > the main thread, and Ctrl-C is one of those signals. > > This is why "for x in itertools.cycle(): pass" can be interrupted, while > "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop > isn't running, as we're inside a tight loop inside the sum() implementation. > > It's easy to say "Well those loops should all be checking for signals > then", but I expect folks wouldn't actually like the consequences of doing > something about it, as: > > 1. It will make those loops slower, due to the extra overhead of checking > for signals (even the opcode eval loop includes all sorts of tricks to > avoid actually checking for new signals, since doing so is relatively slow) > 2. It will make those loops harder to maintain, since the high cost of > checking for signals means the existing flat loops will need to be replaced > with nested ones to reduce the per-iteration cost of the more expensive > checks > Combining points 1 and 2, I don't believe nesting loops is strictly a requirement. > 3. It means making the signal checking even harder to reason about than it > already is, since even C implemented methods that avoid invoking arbitrary > Python code could now still end up checking for signals > So you're talking about code that would make a C-implemented Python iterable of strictly C-implemented Python objects and then pass this to something C-implemented like list(..) or sum(..), while expecting no Python code to be run or signals to be checked anywhere while doing it. I'm not really convinced that such code exists.? But if such code does exist, it sounds like the code is heavily dependent on implementation details. ? > > It's far from being clear to me that making such a change would actually > be a net improvement, especially when there's an opportunity to mitigate > the problem by having known-infinite iterators report themselves as such. > > ?I'm not against that, per se. I just don't think that solves the quite typical case of *finite* but very tedious or memory-consuming loops that one might want to break out of. And raising an exception from .__length_hint__() might ?also break some obscure, but completely valid, operations on *infinite* iterables. ???Koos??? > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 18 11:36:39 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 18 Oct 2017 16:36:39 +0100 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 18 October 2017 at 16:27, Koos Zevenhoven wrote: > So you're talking about code that would make a C-implemented Python iterable > of strictly C-implemented Python objects and then pass this to something > C-implemented like list(..) or sum(..), while expecting no Python code to be > run or signals to be checked anywhere while doing it. I'm not really > convinced that such code exists. But if such code does exist, it sounds like > the code is heavily dependent on implementation details. Well, the OP specifically noted that he had recently encountered precisely that situation: """ I recently came across a bug where checking negative membership (__contains__ returns False) of an infinite iterator will freeze the program. """ Paul From k7hoven at gmail.com Wed Oct 18 11:40:12 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 18:40:12 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore wrote: > On 18 October 2017 at 16:27, Koos Zevenhoven wrote: > > So you're talking about code that would make a C-implemented Python > iterable > > of strictly C-implemented Python objects and then pass this to something > > C-implemented like list(..) or sum(..), while expecting no Python code > to be > > run or signals to be checked anywhere while doing it. I'm not really > > convinced that such code exists. But if such code does exist, it sounds > like > > the code is heavily dependent on implementation details. > > Well, the OP specifically noted that he had recently encountered > precisely that situation: > > """ > I recently came across a bug where checking negative membership > (__contains__ returns False) of an infinite iterator will freeze the > program. > """ > > ?No, __contains__ does not expect no python code to be run, because Python code *can* run, as Serhiy in fact already demonstrated for another purpose: ? On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka wrote: > 18.10.17 13:22, Nick Coghlan ????: > >> 2.. These particular cases can be addressed locally using existing >> protocols, so the chances of negative side effects are low >> > > Only the particular case `count() in count()` can be addressed without > breaking the following examples: > > >>> class C: > ... def __init__(self, count): > ... self.count = count > ... def __eq__(self, other): > ... print(self.count, other) > ... if not self.count: > ... return True > ... self.count -= 1 > ... return False > ... > >>> import itertools > >>> C(5) in itertools.count() > 5 0 > 4 1 > 3 2 > 2 3 > 1 4 > 0 5 > True Clearly, Python code *does* run from within itertools.count.__contains__(..) ??Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Wed Oct 18 11:56:32 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Wed, 18 Oct 2017 16:56:32 +0100 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: OK, looks like I've lost track of what this thread is about then. Sorry for the noise. Paul On 18 October 2017 at 16:40, Koos Zevenhoven wrote: > On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore wrote: >> >> On 18 October 2017 at 16:27, Koos Zevenhoven wrote: >> > So you're talking about code that would make a C-implemented Python >> > iterable >> > of strictly C-implemented Python objects and then pass this to something >> > C-implemented like list(..) or sum(..), while expecting no Python code >> > to be >> > run or signals to be checked anywhere while doing it. I'm not really >> > convinced that such code exists. But if such code does exist, it sounds >> > like >> > the code is heavily dependent on implementation details. >> >> Well, the OP specifically noted that he had recently encountered >> precisely that situation: >> >> """ >> I recently came across a bug where checking negative membership >> (__contains__ returns False) of an infinite iterator will freeze the >> program. >> """ >> > > No, __contains__ does not expect no python code to be run, because Python > code *can* run, as Serhiy in fact already demonstrated for another purpose: > > On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka > wrote: >> >> 18.10.17 13:22, Nick Coghlan ????: >>> >>> 2.. These particular cases can be addressed locally using existing >>> protocols, so the chances of negative side effects are low >> >> >> Only the particular case `count() in count()` can be addressed without >> breaking the following examples: >> >> >>> class C: >> ... def __init__(self, count): >> ... self.count = count >> ... def __eq__(self, other): >> ... print(self.count, other) >> ... if not self.count: >> ... return True >> ... self.count -= 1 >> ... return False >> ... >> >>> import itertools >> >>> C(5) in itertools.count() >> 5 0 >> 4 1 >> 3 2 >> 2 3 >> 1 4 >> 0 5 >> True > > > > Clearly, Python code *does* run from within itertools.count.__contains__(..) > > > ??Koos > > > -- > + Koos Zevenhoven + http://twitter.com/k7hoven + From k7hoven at gmail.com Wed Oct 18 12:06:04 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 19:06:04 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 6:56 PM, Paul Moore wrote: > OK, looks like I've lost track of what this thread is about then. > Sorry for the noise. > Paul > > ?No worries. I'm not sure I can tell what this thread is about either. Different people seem to have different ideas about that. My most recent point was that __contains__ already has to allow Python code to run on each iteration, so it is not the kind of code that Nick was referring to, and which I'm not convinced even exists. ??Koos > On 18 October 2017 at 16:40, Koos Zevenhoven wrote: > > On Wed, Oct 18, 2017 at 6:36 PM, Paul Moore wrote: > >> > >> On 18 October 2017 at 16:27, Koos Zevenhoven wrote: > >> > So you're talking about code that would make a C-implemented Python > >> > iterable > >> > of strictly C-implemented Python objects and then pass this to > something > >> > C-implemented like list(..) or sum(..), while expecting no Python code > >> > to be > >> > run or signals to be checked anywhere while doing it. I'm not really > >> > convinced that such code exists. But if such code does exist, it > sounds > >> > like > >> > the code is heavily dependent on implementation details. > >> > >> Well, the OP specifically noted that he had recently encountered > >> precisely that situation: > >> > >> """ > >> I recently came across a bug where checking negative membership > >> (__contains__ returns False) of an infinite iterator will freeze the > >> program. > >> """ > >> > > > > No, __contains__ does not expect no python code to be run, because Python > > code *can* run, as Serhiy in fact already demonstrated for another > purpose: > > > > On Wed, Oct 18, 2017 at 3:53 PM, Serhiy Storchaka > > wrote: > >> > >> 18.10.17 13:22, Nick Coghlan ????: > >>> > >>> 2.. These particular cases can be addressed locally using existing > >>> protocols, so the chances of negative side effects are low > >> > >> > >> Only the particular case `count() in count()` can be addressed without > >> breaking the following examples: > >> > >> >>> class C: > >> ... def __init__(self, count): > >> ... self.count = count > >> ... def __eq__(self, other): > >> ... print(self.count, other) > >> ... if not self.count: > >> ... return True > >> ... self.count -= 1 > >> ... return False > >> ... > >> >>> import itertools > >> >>> C(5) in itertools.count() > >> 5 0 > >> 4 1 > >> 3 2 > >> 2 3 > >> 1 4 > >> 0 5 > >> True > > > > > > > > Clearly, Python code *does* run from within > itertools.count.__contains__(..) > > > > > > ??Koos > > > > > > -- > > + Koos Zevenhoven + http://twitter.com/k7hoven + > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Wed Oct 18 14:24:42 2017 From: python at mrabarnett.plus.com (MRAB) Date: Wed, 18 Oct 2017 19:24:42 +0100 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On 2017-10-18 15:48, Nick Coghlan wrote: > On 18 October 2017 at 22:36, Koos Zevenhoven > wrote: > > On Wed, Oct 18, 2017 at 2:08 PM, Nick Coghlan > wrote: > > That one can only be fixed in count() - list already checks > operator.length_hint(), so implementing > itertools.count.__length_hint__() to always raise an exception > would be enough to handle the container constructor case. > > > While that may be a convenient hack to solve some of the cases, > maybe it's possible for list(..) etc. to give Ctrl-C a chance every > now and then? (Without a noticeable performance penalty, that is.) > That would also help with *finite* C-implemented iterables that are > just slow to turn into a list. > > If I'm not mistaken, we're talking about C-implemented functions > that iterate over C-implemented iterators. It's not at all obvious > to me that it's the iterator that should handle Ctrl-C. > > > It isn't, it's the loop's responsibility. The problem is that one of the > core design assumptions in the CPython interpreter implementation is > that signals from the operating system get handled by the opcode eval > loop in the main thread, and Ctrl-C is one of those signals. > > This is why "for x in itertools.cycle(): pass" can be interrupted, while > "sum(itertools.cycle())" can't: in the latter case, the opcode eval loop > isn't running, as we're inside a tight loop inside the sum() implementation. > > It's easy to say "Well those loops should all be checking for signals > then", but I expect folks wouldn't actually like the consequences of > doing something about it, as: > > 1. It will make those loops slower, due to the extra overhead of > checking for signals (even the opcode eval loop includes all sorts of > tricks to avoid actually checking for new signals, since doing so is > relatively slow) > 2. It will make those loops harder to maintain, since the high cost of > checking for signals means the existing flat loops will need to be > replaced with nested ones to reduce the per-iteration cost of the more > expensive checks The re module increments a counter on each iteration and checks for signals when the bottom 12 bits are 0. The regex module increments a 16-bit counter on each iteration and checks for signals when it wraps around to 0. > 3. It means making the signal checking even harder to reason about than > it already is, since even C implemented methods that avoid invoking > arbitrary Python code could now still end up checking for signals > > It's far from being clear to me that making such a change would actually > be a net improvement, especially when there's an opportunity to mitigate > the problem by having known-infinite iterators report themselves as such. > From k7hoven at gmail.com Wed Oct 18 15:09:09 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 22:09:09 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 9:24 PM, MRAB wrote: > > The re module increments a counter on each iteration and checks for > signals when the bottom 12 bits are 0. > > The regex module increments a 16-bit counter on each iteration and checks > for signals when it wraps around to 0. > Then I?'d say that's a great solution, except that `regex` probably over-exaggerates the overhead of checking for signals, and that `re` module for some strange reason wants to make an additional bitwise and. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Oct 18 15:13:01 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 18 Oct 2017 22:13:01 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: 18.10.17 17:48, Nick Coghlan ????: > 1. It will make those loops slower, due to the extra overhead of > checking for signals (even the opcode eval loop includes all sorts of > tricks to avoid actually checking for new signals, since doing so is > relatively slow) > 2. It will make those loops harder to maintain, since the high cost of > checking for signals means the existing flat loops will need to be > replaced with nested ones to reduce the per-iteration cost of the more > expensive checks > 3. It means making the signal checking even harder to reason about than > it already is, since even C implemented methods that avoid invoking > arbitrary Python code could now still end up checking for signals I have implemented signals checking for itertools iterators. [1] The overhead is insignificant because signals are checked only for every 0x10000-th item (100-4000 times/sec). The consuming loops are not changed because signals are checked on the producer's side. [1] https://bugs.python.org/issue31815 From k7hoven at gmail.com Wed Oct 18 15:21:32 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 22:21:32 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka wrote: > 18.10.17 17:48, Nick Coghlan ????: > >> 1. It will make those loops slower, due to the extra overhead of checking >> for signals (even the opcode eval loop includes all sorts of tricks to >> avoid actually checking for new signals, since doing so is relatively slow) >> 2. It will make those loops harder to maintain, since the high cost of >> checking for signals means the existing flat loops will need to be replaced >> with nested ones to reduce the per-iteration cost of the more expensive >> checks >> 3. It means making the signal checking even harder to reason about than >> it already is, since even C implemented methods that avoid invoking >> arbitrary Python code could now still end up checking for signals >> > > I have implemented signals checking for itertools iterators. [1] The > overhead is insignificant because signals are checked only for every > 0x10000-th item (100-4000 times/sec). The consuming loops are not changed > because signals are checked on the producer's side. > > [1] https://bugs.python.org/issue31815 > > ?Nice! Though I'd really like a general ?solution that other code can easily adopt, even third-party extension libraries. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From storchaka at gmail.com Wed Oct 18 15:30:43 2017 From: storchaka at gmail.com (Serhiy Storchaka) Date: Wed, 18 Oct 2017 22:30:43 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: 18.10.17 22:21, Koos Zevenhoven ????: > ?Nice! Though I'd really like a general ?solution that other code can > easily adopt, even third-party extension libraries. What is the more general solution? For interrupting C code you need to check signals manually, either in every loop, or in every iterator. It seems to me that the number of loops is larger than the number of iterators. From k7hoven at gmail.com Wed Oct 18 15:43:21 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Wed, 18 Oct 2017 22:43:21 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 10:21 PM, Koos Zevenhoven wrote: > On Wed, Oct 18, 2017 at 10:13 PM, Serhiy Storchaka > wrote: > >> 18.10.17 17:48, Nick Coghlan ????: >> >>> 1. It will make those loops slower, due to the extra overhead of >>> checking for signals (even the opcode eval loop includes all sorts of >>> tricks to avoid actually checking for new signals, since doing so is >>> relatively slow) >>> 2. It will make those loops harder to maintain, since the high cost of >>> checking for signals means the existing flat loops will need to be replaced >>> with nested ones to reduce the per-iteration cost of the more expensive >>> checks >>> 3. It means making the signal checking even harder to reason about than >>> it already is, since even C implemented methods that avoid invoking >>> arbitrary Python code could now still end up checking for signals >>> >> >> I have implemented signals checking for itertools iterators. [1] The >> overhead is insignificant because signals are checked only for every >> 0x10000-th item (100-4000 times/sec). The consuming loops are not changed >> because signals are checked on the producer's side. >> >> [1] https://bugs.python.org/issue31815 >> >> > ?Nice! Though I'd really like a general ?solution that other code can > easily adopt, even third-party extension libraries. > > ?By the way, now that I actually read the BPO issue?, it looks like the benchmarks were for 0x1000 (15 bits)? And why is everyone doing powers of two anyway? Anyway, I still don't think infinite iterables are the most common case where this problem occurs. Solving this in the most common consuming loops would allow breaking out of a lot of long loops regardless of which iterable type (if any) is being used. So I'm still asking which one should solve the problem. ?-- Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Wed Oct 18 17:37:03 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 19 Oct 2017 00:37:03 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> Message-ID: On Wed, Oct 18, 2017 at 10:30 PM, Serhiy Storchaka wrote: > 18.10.17 22:21, Koos Zevenhoven ????: > >> ?Nice! Though I'd really like a general ?solution that other code can >> easily adopt, even third-party extension libraries. >> > > What is the more general solution? For interrupting C code you need to > check signals manually, either in every loop, or in every iterator. It > seems to me that the number of loops is larger than the number of iterators. > > ?Sorry, I missed this email earlier. Maybe a macro like Py_MAKE_THIS_LOOP_BREAKABLE_FOR_ME_PLEASE that you could insert wherever you think the code might be spending some time without calling any Python code. One could use it rather carelessly, at least more so than refcounts. Something like the macro you wrote, except that it would take care of the whole thing and not just the counting. -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From yselivanov.ml at gmail.com Wed Oct 18 17:44:11 2017 From: yselivanov.ml at gmail.com (Yury Selivanov) Date: Wed, 18 Oct 2017 17:44:11 -0400 Subject: [Python-ideas] PEP draft: context variables In-Reply-To: References: <59E0FBBF.3010402@stoneleaf.us> Message-ID: On Sun, Oct 15, 2017 at 8:15 AM, Paul Moore wrote: > On 13 October 2017 at 23:30, Yury Selivanov wrote: >> At this point of time, there's just one place which describes one well >> defined semantics: PEP 550 latest version. Paul, if you have >> time/interest, please take a look at it, and say what's confusing >> there. > > Hi Yury, > The following is my impressions from a read-through of the initial > part of the PEP. tl; dr - you say "concurrent" too much and it makes > my head hurt :-) [..] > I hope this is of some use. I appreciate I'm talking about a pretty > wholesale rewrite, and it's quite far down the line to be suggesting > such a thing. I'll understand if you don't feel it's worthwhile to > take that route. Hi Paul, Thanks *a lot* for this detailed analysis. Even though PEP 550 isn't going to make it to 3.7 and I'm not going to edit/rewrite it anymore, I'll try to incorporate some of your feedback into the new PEP. Thanks, Yury From greg.ewing at canterbury.ac.nz Wed Oct 18 18:34:20 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 19 Oct 2017 11:34:20 +1300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: Message-ID: <59E7D6EC.6090708@canterbury.ac.nz> Nick Coghlan wrote: > since breaking up the current single level loops as nested loops > would be a pre-requisite for allowing these APIs to check for signals > while they're running while keeping the per-iteration overhead low Is there really much overhead? Isn't it just checking a flag? -- Greg From steve at pearwood.info Wed Oct 18 19:59:07 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Thu, 19 Oct 2017 10:59:07 +1100 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: <20171018125137.GA16840@bytereef.org> References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> Message-ID: <20171018235907.GF9068@ando.pearwood.info> On Wed, Oct 18, 2017 at 02:51:37PM +0200, Stefan Krah wrote: > $ softlimit -m 1000000000 python3 [...] > MemoryError > > > People who are worried could make a python3 alias or use Ctrl-\. I just tried that on two different Linux computers I have, and neither have softlimit. Nor (presumably) would this help Windows users. -- Steve From ncoghlan at gmail.com Wed Oct 18 20:42:29 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Thu, 19 Oct 2017 10:42:29 +1000 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: <59E7D6EC.6090708@canterbury.ac.nz> References: <59E7D6EC.6090708@canterbury.ac.nz> Message-ID: On 19 October 2017 at 08:34, Greg Ewing wrote: > Nick Coghlan wrote: > >> since breaking up the current single level loops as nested loops would be >> a pre-requisite for allowing these APIs to check for signals while they're >> running while keeping the per-iteration overhead low >> > > Is there really much overhead? Isn't it just checking a flag? > It's checking an atomically updated flag, so it forces CPU cache synchronisation, which means you don't want to be doing it on every iteration of a low level loop. However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals() already encapsulates the "Should this thread even be checking for signals in the first place?" logic, which means the code change to make the itertools iterators inherently interruptible with Ctrl-C is much smaller than I thought it would be. That approach is also clearly safe from an exception handling point of view, since all consumer loops already need to cope with the fact that itr.__next__() may raise arbitrary exceptions (including KeyboardInterrupt). So that change alone already offers a notable improvement, and combining it with a __length_hint__() implementation that keeps container constructors from even starting to iterate would go even further towards making the infinite iterators more user friendly. Similar signal checking changes to the consumer loops would also be possible, but I don't think that's an either/or decision: changing the iterators means they'll be interruptible for any consumer, while changing the consumers would make them interruptible for any iterator, and having checks in both the producer & the consumer merely means that you'll be checking for signals twice every 65k iterations, rather than once. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Thu Oct 19 00:53:36 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Thu, 19 Oct 2017 17:53:36 +1300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E7D6EC.6090708@canterbury.ac.nz> Message-ID: <59E82FD0.20805@canterbury.ac.nz> Nick Coghlan wrote: > having checks in both the producer & the consumer merely means that > you'll be checking for signals twice every 65k iterations, rather than once. Here's a possible reason for wanting checks in the producers: If your producer happens to take a long time per iteration, and the consumer only checks every 65k iterations, it might be a while before a Ctrl-C takes effect. If the producer is checking, it is likely to have a better idea of what an appropriate checking interval might be. -- Greg From k7hoven at gmail.com Thu Oct 19 03:54:36 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Thu, 19 Oct 2017 10:54:36 +0300 Subject: [Python-ideas] Membership of infinite iterators In-Reply-To: References: <59E7D6EC.6090708@canterbury.ac.nz> Message-ID: On Thu, Oct 19, 2017 at 3:42 AM, Nick Coghlan wrote: > On 19 October 2017 at 08:34, Greg Ewing > wrote: > >> Nick Coghlan wrote: >> >>> since breaking up the current single level loops as nested loops would >>> be a pre-requisite for allowing these APIs to check for signals while >>> they're running while keeping the per-iteration overhead low >>> >> >> Is there really much overhead? Isn't it just checking a flag? >> > > It's checking an atomically updated flag, so it forces CPU cache > synchronisation, which means you don't want to be doing it on every > iteration of a low level loop. > > Even just that it's a C function call makes me not want to recommend doing it in a lot of tight loops. Who knows what the function does anyway, let alone what it might or might not do in the future. > However, reviewing Serhiy's PR reminded me that PyErr_CheckSignals() > already encapsulates the "Should this thread even be checking for signals > in the first place?" logic, which means the code change to make the > itertools iterators inherently interruptible with Ctrl-C is much smaller > than I thought it would be. > ?And if it didn't encapsulate that, you would probably have written a wrapper that does.? Good thing it's the wrapper that's exposed in the API. > That approach is also clearly safe from an exception handling point of > view, since all consumer loops already need to cope with the fact that > itr.__next__() may raise arbitrary exceptions (including KeyboardInterrupt). > > So that change alone already offers a notable improvement, and combining it > with a __length_hint__() implementation that keeps container constructors > from even starting to iterate would go even further towards making the > infinite iterators more user friendly. > > Similar signal checking changes to the consumer loops would also be > possible, but I don't think that's an either/or decision: changing the > iterators means they'll be interruptible for any consumer, while changing > the consumers would make them interruptible for any iterator, and having > checks in both the producer & the consumer merely means that you'll be > checking for signals twice every 65k iterations, rather than once. > > ?Indeed it's not strictly an either/or decision, but more about where we might spend time executing C code. But I'm leaning a bit towards doing it on the consumer side, because there it's more obvious that ?the code might take some time to run. If the consumer ends up iterating over pure-Python objects, there are no concerns about the overhead. But if it *does* call a C-implemented __next__, then that's the case where we actully need the whole thing. Adding the check in both places would double the (small) overhead. And nested (wrapped) iterators are also a thing. ???Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Thu Oct 19 04:05:58 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 19 Oct 2017 10:05:58 +0200 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: <20171018235907.GF9068@ando.pearwood.info> References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> <20171018235907.GF9068@ando.pearwood.info> Message-ID: Hi Steve, 2017-10-19 1:59 GMT+02:00 Steven D'Aprano : > On Wed, Oct 18, 2017 at 02:51:37PM +0200, Stefan Krah wrote: > > > $ softlimit -m 1000000000 python3 > [...] > > MemoryError > > > > > > People who are worried could make a python3 alias or use Ctrl-\. > > I just tried that on two different Linux computers I have, and neither > have softlimit. > > Yeah, not sure what "softlimit" is either. I'd suggest sticking to POSIX-standard ulimit or just stick something like this in the .pythonrc.py: import resource resource.setrlimit(resource.RLIMIT_DATA, (2 * 1024**3, 2 * 1024**3)) Nor (presumably) would this help Windows users. > I (quickly) tried to get something to work using the win32 package, in particular the win32job functions. However, it seems setting "ProcessMemoryLimit" using win32job.SetInformationJobObject had no effect (i.e. a subsequent win32job.QueryInformationJobObject still showed the limit as 0)? People with stronger Windows-fu may be aware what is going on here... Stephan > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at bytereef.org Thu Oct 19 04:49:03 2017 From: stefan at bytereef.org (Stefan Krah) Date: Thu, 19 Oct 2017 10:49:03 +0200 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> <20171018235907.GF9068@ando.pearwood.info> Message-ID: <20171019084903.GA3085@bytereef.org> On Thu, Oct 19, 2017 at 10:05:58AM +0200, Stephan Houben wrote: > > 2017-10-19 1:59 GMT+02:00 Steven D'Aprano : > > > On Wed, Oct 18, 2017 at 02:51:37PM +0200, Stefan Krah wrote: > > > > > $ softlimit -m 1000000000 python3 > > [...] > > > MemoryError > > > > > > > > > People who are worried could make a python3 alias or use Ctrl-\. > > > > I just tried that on two different Linux computers I have, and neither > > have softlimit. > > > > > Yeah, not sure what "softlimit" is either. Part of daemontools: https://cr.yp.to/daemontools/softlimit.html Stefan Krah From g.rodola at gmail.com Thu Oct 19 16:52:42 2017 From: g.rodola at gmail.com (Giampaolo Rodola') Date: Thu, 19 Oct 2017 22:52:42 +0200 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> <20171018235907.GF9068@ando.pearwood.info> Message-ID: On Thu, Oct 19, 2017 at 10:05 AM, Stephan Houben wrote: > Hi Steve, > > 2017-10-19 1:59 GMT+02:00 Steven D'Aprano : > >> On Wed, Oct 18, 2017 at 02:51:37PM +0200, Stefan Krah wrote: >> >> > $ softlimit -m 1000000000 python3 >> [...] >> > MemoryError >> > >> > >> > People who are worried could make a python3 alias or use Ctrl-\. >> >> I just tried that on two different Linux computers I have, and neither >> have softlimit. >> >> > Yeah, not sure what "softlimit" is either. > I'd suggest sticking to POSIX-standard ulimit or just stick > something like this in the .pythonrc.py: > > import resource > resource.setrlimit(resource.RLIMIT_DATA, (2 * 1024**3, 2 * 1024**3)) > > Nor (presumably) would this help Windows users. >> > > I (quickly) tried to get something to work using the win32 package, > in particular the win32job functions. > However, it seems setting > "ProcessMemoryLimit" using win32job.SetInformationJobObject > had no effect > (i.e. a subsequent win32job.QueryInformationJobObject > still showed the limit as 0)? > > People with stronger Windows-fu may be aware what is going on here... > > Stephan > I wasn't aware Windows was capable of setting such limits in a per-process fashion. You gave me a good idea for psutil: https://github.com/giampaolo/psutil/issues/1149 According to this cmdline tool: https://cr.yp.to/daemontools/softlimit.html ....the limit should kick in only when the system memory is full, whatever that means: <<-r n: Limit the resident set size to n bytes. This limit is not enforced unless physical memory is full.>> ...so that would explain why it had no effect. -- Giampaolo - http://grodola.blogspot.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan_ml at behnel.de Fri Oct 20 06:50:40 2017 From: stefan_ml at behnel.de (Stefan Behnel) Date: Fri, 20 Oct 2017 12:50:40 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: <20171016102023.3e0c5385@fsol> References: <20171015191716.371fba63@fsol> <20171016102023.3e0c5385@fsol> Message-ID: Antoine Pitrou schrieb am 16.10.2017 um 10:20: > On Sun, 15 Oct 2017 22:00:10 -0700 > Guido van Rossum wrote: >> On Sun, Oct 15, 2017 at 8:40 PM, Nick Coghlan wrote: >> >>> Hopefully by the time we decide it's worth worrying about picoseconds in >>> "regular" code, compiler support for decimal128 will be sufficiently >>> ubiquitous that we'll be able to rely on that as our 3rd generation time >>> representation (where the first gen is seconds as a 64 bit binary float and >>> the second gen is nanoseconds as a 64 bit integer). >> >> I hope we'll never see time_ns() and friends as the second generation -- >> it's a hack that hopefully we can retire in those glorious days of hardware >> decimal128 support. > > Given the implementation costs, hardware decimal128 will only become > mainstream if there's a strong incentive for it, which I'm not sure > exists or will ever exist ;-) Then we shouldn't implement the new nanosecond API at all, in order to keep pressure on the hardware developers. Stefan :o) From victor.stinner at gmail.com Fri Oct 20 09:12:22 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 20 Oct 2017 15:12:22 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <20171016102023.3e0c5385@fsol> Message-ID: Antoine Pitrou: > Given the implementation costs, hardware decimal128 will only become > mainstream if there's a strong incentive for it, which I'm not sure > exists or will ever exist ;-) Stefan Behnel: > Then we shouldn't implement the new nanosecond API at all, in order to keep > pressure on the hardware developers. POWER6 is available for ten years and has hardware support for decimal128: "IBM's POWER6 (2007) and System z10 (2008) processors both implement IEEE 754-2008 fully in hardware and in every core." I guess that POWER6 is not part of "mainstream" :-) I'm not aware of any hardware implementation of the decimal floating point (DFP) for Intel CPU, ARM CPU, or GPU (nothing in OpenCL nor CUDA). At least, it seems like Intel knows that DFP exists since they provide a software implementation :-) https://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library Maybe things will move quicker than than expected, and we will get DFP even in microcontrollers!? Who knows? ;-) Victor From stephanh42 at gmail.com Fri Oct 20 10:42:39 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Fri, 20 Oct 2017 16:42:39 +0200 Subject: [Python-ideas] Why not picoseconds? In-Reply-To: References: <20171015191716.371fba63@fsol> <20171016102023.3e0c5385@fsol> Message-ID: Hi all, Please excuse me for getting a bit off-topic, but I would like to point out that except for bean-counters who need to be bug-compatible with accounting standards, decimal floating point is generally a bad idea. That is because the worst-case bound on the rounding error grows linear with the base size. So you really want to choose a base size as small as possible, i.e., 2. This is not really related to the fact that computers use base-2 arithmetic, that is just a happy coincidence. If we used ternary logic for our computers, FP should still be based on base-2 and computer architects would complain about the costly multiplication and division with powers of two (just as they have historically complained about the costly implementation of denormals, but we still got that, mostly thanks to prof. Kahan convincing Intel). Worse, a base other than 2 also increases the spread in the average rounding error. This phenomenon is called "wobble" and adds additional noise into calculations. The ultimate problem is that the real number line contains quite a few more elements than our puny computers can handle. There is no 100% solution for this, but of all the possible compromises, floating-point forms a fairly optimum point in the design space for a wide range of applications. Stephan Op 20 okt. 2017 3:13 p.m. schreef "Victor Stinner" : Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From francismb at email.de Fri Oct 20 12:36:15 2017 From: francismb at email.de (francismb) Date: Fri, 20 Oct 2017 18:36:15 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: Hi Victor, On 10/13/2017 04:12 PM, Victor Stinner wrote: > I would like to add: > > * time.time_ns() > * time.monotonic_ns() > * time.perf_counter_ns() > * time.clock_gettime_ns() > * time.clock_settime_ns() due nano vs. pico vs. ... why not something like (please change '_in' for what you like): time.time_in(precision) time.monotonic_in(precision) where precision is an enumeration for 'nano', 'pico' ... Thanks in advance! --francis From eryksun at gmail.com Fri Oct 20 13:01:14 2017 From: eryksun at gmail.com (eryk sun) Date: Fri, 20 Oct 2017 18:01:14 +0100 Subject: [Python-ideas] Memory limits [was Re: Membership of infinite iterators] In-Reply-To: References: <59E64FA1.1020307@brenbarn.net> <20171018113817.GE9068@ando.pearwood.info> <20171018125137.GA16840@bytereef.org> <20171018235907.GF9068@ando.pearwood.info> Message-ID: On Thu, Oct 19, 2017 at 9:05 AM, Stephan Houben wrote: > > I (quickly) tried to get something to work using the win32 package, > in particular the win32job functions. > However, it seems setting > "ProcessMemoryLimit" using win32job.SetInformationJobObject > had no effect > (i.e. a subsequent win32job.QueryInformationJobObject > still showed the limit as 0)? Probably you didn't set the JOB_OBJECT_LIMIT_PROCESS_MEMORY flag. Here's an example that tests the process memory limit using ctypes to call VirtualAlloc, before and after assigning the current process to the Job. Note that the py.exe launcher runs python.exe in an anonymous Job that's configured to kill on close (i.e. python.exe is killed when py.exe exits) and for silent breakaway of child processes. In this case, prior to Windows 8 (the first version to support nested Job objects), assigning the current process to a new Job will fail, so you'll have to run python.exe directly, or use a child process via subprocess. I prefer the former, since a child process won't be tethered to the launcher, which could get ugly for console applications. import ctypes import winerror, win32api, win32job kernel32 = ctypes.WinDLL('kernel32', use_last_error=True) MEM_COMMIT = 0x1000 MEM_RELEASE = 0x8000 PAGE_READWRITE = 4 kernel32.VirtualAlloc.restype = ctypes.c_void_p kernel32.VirtualAlloc.argtypes = (ctypes.c_void_p, ctypes.c_size_t, ctypes.c_ulong, ctypes.c_ulong) kernel32.VirtualFree.argtypes = (ctypes.c_void_p, ctypes.c_size_t, ctypes.c_ulong) hjob = win32job.CreateJobObject(None, "") limits = win32job.QueryInformationJobObject(hjob, win32job.JobObjectExtendedLimitInformation) limits['BasicLimitInformation']['LimitFlags'] |= ( win32job.JOB_OBJECT_LIMIT_PROCESS_MEMORY) limits['ProcessMemoryLimit'] = 2**31 win32job.SetInformationJobObject(hjob, win32job.JobObjectExtendedLimitInformation, limits) addr0 = kernel32.VirtualAlloc(None, 2**31, MEM_COMMIT, PAGE_READWRITE) if addr0: mem0_released = kernel32.VirtualFree(addr0, 0, MEM_RELEASE) win32job.AssignProcessToJobObject(hjob, win32api.GetCurrentProcess()) addr1 = kernel32.VirtualAlloc(None, 2**31, MEM_COMMIT, PAGE_READWRITE) Result: >>> addr0 2508252315648 >>> mem0_released 1 >>> addr1 is None True >>> ctypes.get_last_error() == winerror.ERROR_COMMITMENT_LIMIT True From victor.stinner at gmail.com Fri Oct 20 16:29:14 2017 From: victor.stinner at gmail.com (Victor Stinner) Date: Fri, 20 Oct 2017 22:29:14 +0200 Subject: [Python-ideas] Add time.time_ns(): system clock with nanosecond resolution In-Reply-To: References: Message-ID: Hi francis, Le 20 oct. 2017 18:42, "francismb" a ?crit : Hi Victor, On 10/13/2017 04:12 PM, Victor Stinner wrote: > I would like to add: > > * time.time_ns() > * time.monotonic_ns() > * time.perf_counter_ns() > * time.clock_gettime_ns() > * time.clock_settime_ns() due nano vs. pico vs. ... why not something like (please change '_in' for what you like): time.time_in(precision) time.monotonic_in(precision) If you are not aware yet, I wrote a full PEP: PEP 564. The two discussed idea are alreafy listed in the PEP, configurable precision and sub-nanosecond resolution. I tried to explain why I proposed time.time_ns() in detail in the PEP: https://www.python.org/dev/peps/pep-0564/#sub-nanosecond-resolution You may want to join the discussion on python-dev. Victor -------------- next part -------------- An HTML attachment was scrubbed... URL: From yanp.bugz at gmail.com Thu Oct 26 07:06:45 2017 From: yanp.bugz at gmail.com (Yan Pas) Date: Thu, 26 Oct 2017 14:06:45 +0300 Subject: [Python-ideas] Dollar operator suggestion Message-ID: I've looked up this feature in haskell. Dollar sign operator is used to avoid parentheses. Rationalle: Python tends to use functions instead of methods ( e.g. len([1,2,3]) instead of [1,2,3].len() ). Sometimes the expression inside parentheses may become big and using a lot of parentheses may tend to bad readability. I suggest the following syntax: len $ [1,2,3] Functions map be also chained: len $ list $ map(...) This operator may be used for function composition too: foo = len $ set $ in the same as foo = lambda *as,**kas : len(set(*as, **kas)) in current syntax Regards, Yan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Thu Oct 26 07:45:50 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Thu, 26 Oct 2017 13:45:50 +0200 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: Why not a functional syntax, i.e. compose(f, g, h) rather than f $ g $ h Advantage: you can do it today. Without need to convince Guido to add more line noise to the language. https://gist.github.com/stephanh42/6c9158c2470832a675fad7658048be9d Stephan 2017-10-26 13:06 GMT+02:00 Yan Pas : > I've looked up this feature in haskell. Dollar sign operator is used to > avoid parentheses. > > Rationalle: > Python tends to use functions instead of methods ( e.g. len([1,2,3]) > instead of [1,2,3].len() ). Sometimes the expression inside parentheses > may become big and using a lot of parentheses may tend to bad readability. > I suggest the following syntax: > > len $ [1,2,3] > > Functions map be also chained: > > len $ list $ map(...) > > This operator may be used for function composition too: > > foo = len $ set $ > in the same as > foo = lambda *as,**kas : len(set(*as, **kas)) > in current syntax > > Regards, > Yan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rhodri at kynesim.co.uk Thu Oct 26 07:47:44 2017 From: rhodri at kynesim.co.uk (Rhodri James) Date: Thu, 26 Oct 2017 12:47:44 +0100 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: On 26/10/17 12:06, Yan Pas wrote: > I've looked up this feature in haskell. Dollar sign operator is used to > avoid parentheses. If I understand your example correctly, it does no such thing. "A $ B" appears to mean "apply callable A to object B", at least the way you portray it below. I don't speak Haskell so I can't comment on the original. > Rationalle: > Python tends to use functions instead of methods ( e.g. len([1,2,3]) > instead of [1,2,3].len() ). Sometimes the expression inside parentheses may > become big and using a lot of parentheses may tend to bad readability. If you have that sort of legibility problem, it suggests that you are trying to do far too much on a single line. New syntax won't help with that (in fact it will make it worse IMHO). > I suggest the following syntax: > > len $ [1,2,3] How is this better or easier to read than "len([1,2,3])" ? What do you do for functions with two or more arguments? The obvious thing would be to make the right-hand side of the $ operator a tuple, and whoops, there are your parentheses again. I don't think this proposal achieves your aim, and I dislike it for a lot of other reasons. -- Rhodri James *-* Kynesim Ltd From tjol at tjol.eu Thu Oct 26 07:55:35 2017 From: tjol at tjol.eu (Thomas Jollans) Date: Thu, 26 Oct 2017 13:55:35 +0200 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: On 2017-10-26 13:06, Yan Pas wrote: > I've looked up this feature in haskell. Dollar sign operator is used to > avoid parentheses. > > Rationalle: > Python tends to use functions instead of methods ( e.g.len([1,2,3]) > instead of [1,2,3].len() ). Sometimes the expression inside parentheses > may become big? and using a lot of parentheses may tend to bad > readability. I suggest the following syntax: > > len $ [1,2,3] I see absolutely no benefit adding this syntax when we already have a perfectly good function calling syntax. > > Functions map be also? chained: > > len $ list $ map(...) Function composition has been discussed at length in the past, e.g. https://mail.python.org/pipermail/python-ideas/2015-May/thread.html#33287 I'd like to highlight one key message: https://mail.python.org/pipermail/python-ideas/2015-May/033491.html Guido van Rossum wrote (May 11, 2015): > As long as I'm "in charge" the chances of this (or anything like it) being > accepted into Python are zero. I get a headache when I try to understand > code that uses function composition, and I end up having to laboriously > rewrite it using more traditional call notation before I move on to > understanding what it actually does. Python is not Haskell, and perhaps > more importantly, Python users are not like Haskel users. Either way, what > may work out beautifully in Haskell will be like a fish out of water in > Python. > > I understand that it's fun to try to sole this puzzle, but evolving Python > is more than solving puzzles. Enjoy debating the puzzle, but in the end > Python will survive without the solution. That old python-ideas thread could be of interest to you. Best Thomas From dmoisset at machinalis.com Thu Oct 26 08:53:06 2017 From: dmoisset at machinalis.com (Daniel Moisset) Date: Thu, 26 Oct 2017 13:53:06 +0100 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: Expanding on the comments of the OP (to give more information, not necessarily to support or be against it): The "$" operator in Haskell is not a composition operator, it's essentially the same as python's apply builtin (the python2 builtin, removed for python 3), but with an operator syntax; the main trick behind it is that it's right associative, so you get that: len $ set $ str $ foo => len $ (set $ (str $ foo)) -> len(set(str(foo))) It looks a bit funky, and it works only reasonably with single-argument functions (which in haskell doesn't matter given that it uses currying that makes all function one argument functions with partial application). The best way to think about it for non Haskellers is that it's exactly like the "|" operators un UNIX-like shells, but with the reverse order; in UNIX, run foo, filter the output through str, then set, then len would read like: foo | str | set | len, which is the same as above but right to left. This is to clarify that this si NOT about function composition, just an alternate application syntax The second part of the example in the post, where composition is discussed actually relies in a completely orthogonal feature of Haskell that allows to define partial operator applications as functions, for example you can define: half = (/ 2) -- same as lambda x: x/2, so half 4 => 2 next = (+ 1) -- same as lambda x: x + 1, so next 7 => 8 invert = (1 /) -- same as lambda x: 1 / x, so invert 4 => 0.25 So this implies a new way of writing anonymous functions besides lambdas. To make the second part of the proposal work, both features should be present Now going into the discussion itself, the second feature is much more invasive on the syntax+implementation (and also backwards comptibility, given that stuff like "(+ 1)" already mean something different in python). The first feature by itself shouldn't break stuff, and even if different to what we're used to is not very unidiomatic (it leads to cleaner code, although its meaning is definitely not very discoverable). To get a rough idea on how that could work, take a look at https://gist.github.com/dmoisset/bd43b8c0ce26c9cff0ad297b7e1ba5f9 ; I just used python ** operator because that's the only right associative one. Something similar provided in the default function type (and at a low level) could work. I'd probably would like to see some code samples where this is applied to check that it's worth the trouble. D. On 26 October 2017 at 12:06, Yan Pas wrote: > I've looked up this feature in haskell. Dollar sign operator is used to > avoid parentheses. > > Rationalle: > Python tends to use functions instead of methods ( e.g. len([1,2,3]) > instead of [1,2,3].len() ). Sometimes the expression inside parentheses > may become big and using a lot of parentheses may tend to bad readability. > I suggest the following syntax: > > len $ [1,2,3] > > Functions map be also chained: > > len $ list $ map(...) > > This operator may be used for function composition too: > > foo = len $ set $ > in the same as > foo = lambda *as,**kas : len(set(*as, **kas)) > in current syntax > > Regards, > Yan > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Daniel F. Moisset - UK Country Manager - Machinalis Limited www.machinalis.co.uk Skype: @dmoisset T: + 44 7398 827139 1 Fore St, London, EC2Y 9DT Machinalis Limited is a company registered in England and Wales. Registered number: 10574987. -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Thu Oct 26 09:32:51 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Thu, 26 Oct 2017 14:32:51 +0100 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: On 26 October 2017 at 13:53, Daniel Moisset wrote: > This is to clarify that this si NOT about function composition, just an alternate > application syntax The idea is already dead, based on the quote from Guido, but this makes it even more clear that it's inappropriate for Python. As you said (in part that I trimmed) Haskell uses single-argument functions plus currying to implement function calls. This is extremely common for functional languages, as it matches the theoretical basis much better. As you point out, the shell pipeline model is actually quite similar (a single input, chain of processing model). Procedural languages, and Python in particular, simply don't work like that. Functions have arbitrary numbers of arguments, currying is not built in, composition is not a fundamental operation in the same way. Although it's possible to explain how a `$` style syntax would work, it doesn't fit naturally into the language - certainly not naturally enough to warrant being part of the language syntax. Paul From chris.barker at noaa.gov Thu Oct 26 12:23:05 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 26 Oct 2017 09:23:05 -0700 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: On Thu, Oct 26, 2017 at 6:32 AM, Paul Moore wrote: > > Procedural languages, and Python in particular, simply don't work like > that. Functions have arbitrary numbers of arguments, And can return an arbitrary number of values. OK, technically a single tuple of values, but that does complicate the whole simple chaining thing. In short -- Python is not a functional language, even though is supports a number of functional idioms. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Fri Oct 27 00:51:42 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Fri, 27 Oct 2017 14:51:42 +1000 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: On 27 October 2017 at 02:23, Chris Barker wrote: > > > On Thu, Oct 26, 2017 at 6:32 AM, Paul Moore wrote: > >> >> Procedural languages, and Python in particular, simply don't work like >> that. Functions have arbitrary numbers of arguments, > > > And can return an arbitrary number of values. OK, technically a single > tuple of values, but that does complicate the whole simple chaining thing. > > In short -- Python is not a functional language, even though is supports a > number of functional idioms. > https://bugs.python.org/issue1506122 has a brief discussion of the non-syntactic variant of this proposal: functools.compose(len, set, str)(foo) => -> len(set(str(foo))) The main concerns that resulted in the suggestion being rejected are: * it isn't clear to folks that aren't already familiar with FP why the call order for the composed functions should be right to left * it isn't clear why every function other than the rightmost one must accept a single positional arg * it isn't clear why every function other than the leftmost one must return a single result And it doesn't make sense to provide something more general, because if you're writing genuinely functional code, you do tend to abide by those restrictions. So given that our position is "We don't even want to add this to the standard library, because the learning curve for using it successfully is too steep", it's even less likely we'd be willing to add syntax for the operation. By contrast, "FP-for-Python" libraries like toolz [1] can make familiarity with those kinds of concepts and a willingness to abide by the related conventions a pre-requisite for using them. It's just opt-in, the same way that learning to define your own classes (rather than importing existing ones defined elsewhere) is opt-in. Cheers, Nick. [1] https://toolz.readthedocs.io/en/latest/ -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From mmueller at python-academy.de Fri Oct 27 01:46:28 2017 From: mmueller at python-academy.de (=?UTF-8?Q?Mike_M=c3=bcller?=) Date: Fri, 27 Oct 2017 07:46:28 +0200 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: References: Message-ID: <56ffad60-947f-bb2d-abe1-47126e656c68@python-academy.de> This already exists in Coconut: http://coconut.readthedocs.io/en/master/HELP.html#function-composition >From http://coconut-lang.org/: > Coconut is a functional programming language that compiles to Python. > Since all valid Python is valid Coconut, using Coconut will only extend > and enhance what you're already capable of in Python. Mike Am 26.10.17 um 13:06 schrieb Yan Pas: > I've looked up this feature in haskell. Dollar sign operator is used to avoid > parentheses. > > Rationalle: > Python tends to use functions instead of methods ( e.g.len([1,2,3]) instead of > [1,2,3].len() ). Sometimes the expression inside parentheses may become big? > and using a lot of parentheses may tend to bad readability. I suggest the > following syntax: > > len $ [1,2,3] > > Functions map be also? chained: > > len $ list $ map(...) > > This operator may be used for function composition too: > > foo = len $ set $ > in the same as > foo = lambda *as,**kas : len(set(*as, **kas)) > in current syntax > > Regards, > Yan > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From barry at python.org Fri Oct 27 15:36:07 2017 From: barry at python.org (Barry Warsaw) Date: Fri, 27 Oct 2017 15:36:07 -0400 Subject: [Python-ideas] PEP Post-History Message-ID: <61E959D3-ECEB-4897-B3F6-3D04D8F52E90@python.org> We?ve made a small change to the PEP process which may affect readers of python-list and python-ideas, so I?d like to inform you of it. This change was made to PEP 1 and PEP 12. PEPs must have a Post-History header which records the dates at which the PEP is posted to mailing lists, in order to keep the general Python community in the loop as a PEP moves to final status. Until now, PEPs in development were supposed to be posted at least to python-dev and optionally to python-list[1]. This guideline predated the creation of the python-ideas mailing list. We?ve now changed this guideline so that Post-History will record the dates at which the PEP is posted to python-dev and optionally python-ideas. python-list is dropped from this requirement. python-dev is always the primary mailing list of record for Python development, and PEPs under development should be posted to python-dev as appropriate. python-ideas is the list for discussion of more speculative changes to Python, and it?s often where more complex PEPs, and even proto-PEPs are first raised and their concepts are hashed out. As such, it makes more sense to change the guideline to include python-ideas and/or python-dev. In the effort to keep the forums of record to a manageable number, python-list is dropped. If you have been watching for new PEPs to be posted to python-list, you are invited to follow either python-dev or python-ideas. Cheers, -Barry (on behalf of the Python development community) https://mail.python.org/mailman/listinfo/python-dev https://mail.python.org/mailman/listinfo/python-ideas Both python-dev and python-ideas are available via Gmane. [1] PEPs may have a Discussions-To header which changes the list of forums where the PEP is discussed. In that case, Post-History records the dates that the PEP is posted to those forums. See PEP 1 for details. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: Message signed with OpenPGP URL: From kulakov.ilya at gmail.com Fri Oct 27 16:59:01 2017 From: kulakov.ilya at gmail.com (Ilya Kulakov) Date: Fri, 27 Oct 2017 13:59:01 -0700 Subject: [Python-ideas] Thread.__init__ should call super() Message-ID: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> Since one of the legit use-cases of using the Thread class is subclassing, I think it's __init__ should call super() to support cooperative inheritance. Or perhaps there is a good reason for not doing so? Best Regards, Ilya Kulakov -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Fri Oct 27 19:27:55 2017 From: guido at python.org (Guido van Rossum) Date: Fri, 27 Oct 2017 16:27:55 -0700 Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> Message-ID: You can subclass Thread just fine, you just can't have it in a multiple inheritance hierarchy except at the end of the MRO (before object). That shouldn't stop you from doing anything you want though -- you can define e.g. class MyThread(Thread): def __init__(self, *args, **kwds): Thread.__init__(self, *args, **kwds) super(Thread, self).__init__(*args, **kwds) and use this class instead of Thread everywhere. (You'll have to decide which arguments to pass on and which ones to ignore, but that's not specific to the issue of Thread.) Of course you're probably better off not trying to be so clever. On Fri, Oct 27, 2017 at 1:59 PM, Ilya Kulakov wrote: > Since one of the legit use-cases of using the Thread class is subclassing, > I think it's __init__ should call super() to support cooperative > inheritance. > > Or perhaps there is a good reason for not doing so? > > Best Regards, > Ilya Kulakov > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Fri Oct 27 19:21:53 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Oct 2017 10:21:53 +1100 Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> Message-ID: <20171027232152.GR9068@ando.pearwood.info> On Fri, Oct 27, 2017 at 01:59:01PM -0700, Ilya Kulakov wrote: > Since one of the legit use-cases of using the Thread class is subclassing, > I think it's __init__ should call super() to support cooperative inheritance. > > Or perhaps there is a good reason for not doing so? Are you talking about threading.Thread or some other Thread? If you are talking about threading.Thread, its only superclass is object, so why bother calling super().__init__? To be successful, it would need to strip out all the parameters and just call: super().__init__() with no args, as object.__init__() takes no parameters. And that does nothing, so what's the point? I'm afraid I don't see why you think that threading.Thread needs to call super. Can you explain? -- Steve From mistersheik at gmail.com Sat Oct 28 03:21:53 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 28 Oct 2017 00:21:53 -0700 (PDT) Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> Message-ID: Out of curiosity, what is the benefit of not calling super from Thread.__init__? On Friday, October 27, 2017 at 7:29:17 PM UTC-4, Guido van Rossum wrote: > > You can subclass Thread just fine, you just can't have it in a multiple > inheritance hierarchy except at the end of the MRO (before object). That > shouldn't stop you from doing anything you want though -- you can define > e.g. > > class MyThread(Thread): > def __init__(self, *args, **kwds): > Thread.__init__(self, *args, **kwds) > super(Thread, self).__init__(*args, **kwds) > > and use this class instead of Thread everywhere. (You'll have to decide > which arguments to pass on and which ones to ignore, but that's not > specific to the issue of Thread.) > > Of course you're probably better off not trying to be so clever. > > On Fri, Oct 27, 2017 at 1:59 PM, Ilya Kulakov > wrote: > >> Since one of the legit use-cases of using the Thread class is subclassing, >> I think it's __init__ should call super() to support cooperative >> inheritance. >> >> Or perhaps there is a good reason for not doing so? >> >> Best Regards, >> Ilya Kulakov >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 28 03:14:31 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 28 Oct 2017 00:14:31 -0700 (PDT) Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: <20171027232152.GR9068@ando.pearwood.info> References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> <20171027232152.GR9068@ando.pearwood.info> Message-ID: On Friday, October 27, 2017 at 8:05:17 PM UTC-4, Steven D'Aprano wrote: > > On Fri, Oct 27, 2017 at 01:59:01PM -0700, Ilya Kulakov wrote: > > > Since one of the legit use-cases of using the Thread class is > subclassing, > > I think it's __init__ should call super() to support cooperative > inheritance. > > > > Or perhaps there is a good reason for not doing so? > > Are you talking about threading.Thread or some other Thread? > > If you are talking about threading.Thread, its only superclass is > object, so why bother calling super().__init__? > The way cooperative multiple inheritance works is that if someone defines class SomeClass(Thread): def __init__(self, **kwargs): super().__init() they expect this will initialize the base class Thread as desired. Now, if they add another base class: class SomeBase: def __init__(self, base_x): self.base_x = base_x then they need to pass up the arguments: class SomeClass(SomeBase, Thread): def __init__(self, **kwargs): super().__init(**kwargs) Unfortunately, if the order of base classes is reversed, this no longer works because Thread doesn't call super: class SomeClass(Thread, SomeBase): def __init__(self, **kwargs): super().__init(**kwargs) # SomeBase is not initialized! As things get more complicated it's not always possible to ensure that Thread is the last class in the inheritance, e.g., if there are two classes like Thread that don't call super. > To be successful, it would need to strip out all the parameters and just > call: > > super().__init__() > > with no args, as object.__init__() takes no parameters. And that does > nothing, so what's the point? > > I'm afraid I don't see why you think that threading.Thread needs to call > super. Can you explain? > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python... at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Oct 28 03:20:24 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 28 Oct 2017 00:20:24 -0700 (PDT) Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> <20171027232152.GR9068@ando.pearwood.info> Message-ID: <8ec9c185-c1e7-400c-bbca-397eed43d20d@googlegroups.com> I meant: class SomeBase: def __init__(self, base_x, **kwargs): super().__init__(**kwargs) self.base_x = base_x On Saturday, October 28, 2017 at 3:14:31 AM UTC-4, Neil Girdhar wrote: > > > > On Friday, October 27, 2017 at 8:05:17 PM UTC-4, Steven D'Aprano wrote: >> >> On Fri, Oct 27, 2017 at 01:59:01PM -0700, Ilya Kulakov wrote: >> >> > Since one of the legit use-cases of using the Thread class is >> subclassing, >> > I think it's __init__ should call super() to support cooperative >> inheritance. >> > >> > Or perhaps there is a good reason for not doing so? >> >> Are you talking about threading.Thread or some other Thread? >> >> If you are talking about threading.Thread, its only superclass is >> object, so why bother calling super().__init__? >> > > The way cooperative multiple inheritance works is that if someone defines > > class SomeClass(Thread): > > def __init__(self, **kwargs): > super().__init() > > they expect this will initialize the base class Thread as desired. > > Now, if they add another base class: > > class SomeBase: > > def __init__(self, base_x): > self.base_x = base_x > > then they need to pass up the arguments: > > class SomeClass(SomeBase, Thread): > > def __init__(self, **kwargs): > super().__init(**kwargs) > > Unfortunately, if the order of base classes is reversed, this no longer > works because Thread doesn't call super: > > class SomeClass(Thread, SomeBase): > > def __init__(self, **kwargs): > super().__init(**kwargs) # SomeBase is not initialized! > > As things get more complicated it's not always possible to ensure that > Thread is the last class in the inheritance, e.g., if there are two classes > like Thread that don't call super. > > >> To be successful, it would need to strip out all the parameters and just >> call: >> >> super().__init__() >> >> with no args, as object.__init__() takes no parameters. And that does >> nothing, so what's the point? >> >> I'm afraid I don't see why you think that threading.Thread needs to call >> super. Can you explain? >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python... at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Sat Oct 28 07:09:30 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sat, 28 Oct 2017 09:09:30 -0200 Subject: [Python-ideas] Composition over Inheritance Message-ID: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> As recent threads indicate, composition may sometimes be better than inheritance. And so I'd like to propose composition as a built-in feature. My idea is syntax of the form o.[c].m(), where o is an object, c is a component, m is a method. I am not sure how you'd set components, or test for components, but I don't think it makes sense to be able to do o.[c][x] or x=o.[c], because those break the idea of automatically passing the object as an argument (unless we create whole wrapper objects every time the syntax is used, and that's a bit ew. also how would you handle o.[c1].[c2] ?). Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o being evaluated only once, as that solves a lot of current issues relating to inheritance while introducing very few issues relating to python's "everything is separate" (e.g. __get__ vs __getattr__) policy.This also makes setting components and testing for components fairly trivial, and completely avoids the issues mentioned above by making their syntax illegal. (Disclaimer: This was inspired by my own programming language, Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by Rust[2] traits. Note however that the original plans for Cratera were far more flexible, including allowing the "problematic" o.[c1].[c2].m() and o.[c][x].m(). I can go into more detail on how those should work, if wanted, but implementation-wise it's not looking good.) [1] https://bitbucket.org/TeamSoni/cratera [2] https://www.rust-lang.org/ From steve at pearwood.info Sat Oct 28 07:14:13 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Oct 2017 22:14:13 +1100 Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> <20171027232152.GR9068@ando.pearwood.info> Message-ID: <20171028111413.GT9068@ando.pearwood.info> On Sat, Oct 28, 2017 at 12:14:31AM -0700, Neil Girdhar wrote: > > > On Friday, October 27, 2017 at 8:05:17 PM UTC-4, Steven D'Aprano wrote: > > > > On Fri, Oct 27, 2017 at 01:59:01PM -0700, Ilya Kulakov wrote: > > > > > Since one of the legit use-cases of using the Thread class is > > subclassing, > > > I think it's __init__ should call super() to support cooperative > > inheritance. > > > > > > Or perhaps there is a good reason for not doing so? > > > > Are you talking about threading.Thread or some other Thread? > > > > If you are talking about threading.Thread, its only superclass is > > object, so why bother calling super().__init__? > > > > The way cooperative multiple inheritance works is that if someone defines I didn't realise that Ilya was talking about *multiple* inheritance, since he didn't use the word, only "cooperative". But since you are talking about multiple inheritence: > class SomeClass(Thread): > def __init__(self, **kwargs): > super().__init() > > they expect this will initialize the base class Thread as desired. That won't work, since you misspelled __init__ and neglected to pass any arguments :-) You need: super().__init__(**kwargs) otherwise Thread will not be initialised. > Now, if they add another base class: > > class SomeBase: > def __init__(self, base_x): > self.base_x = base_x > > then they need to pass up the arguments: > > class SomeClass(SomeBase, Thread): > def __init__(self, **kwargs): > super().__init(**kwargs) That's not going to work either, because you're passing arguments to SomeBase that it doesn't understand: all the args that Thread expects. And of course its going to doubly not work, since SomeBase fails to call super, so Thread.__init__ still doesn't get called. If you fix all those problems, you still have another problem: in Thread, if you call super().__init__(**kwargs) then object.__init__ will fail, as I mentioned in my earlier post; but if you call: super().__init__() then object is satisfied, but SomeBase.__init__ gets called with no arguments. You can't satisfy both at the same time with a single call to super. > Unfortunately, if the order of base classes is reversed, this no longer > works because Thread doesn't call super: > > class SomeClass(Thread, SomeBase): > def __init__(self, **kwargs): > super().__init(**kwargs) # SomeBase is not initialized! > > As things get more complicated it's not always possible to ensure that > Thread is the last class in the inheritance, e.g., if there are two classes > like Thread that don't call super. You can't just add classes willy-nilly into the superclass list. That's why it is called *cooperative* multiple inheritence: the superclasses all have to be designed to work together. You *can't* have two classes that don't call super -- and you must have one class just ahead of object, to prevent object from receiving args it can't do anything with. And that class might as well be Thread. At least, I can't think of any reason why it shouldn't be Thread. -- Steve From steve at pearwood.info Sat Oct 28 07:51:14 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sat, 28 Oct 2017 22:51:14 +1100 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> Message-ID: <20171028115111.GU9068@ando.pearwood.info> On Sat, Oct 28, 2017 at 09:09:30AM -0200, Soni L. wrote: > As recent threads indicate, composition may sometimes be better than > inheritance. And so I'd like to propose composition as a built-in feature. > > My idea is syntax of the form o.[c].m(), where o is an object, c is a > component, m is a method. How is that different from o.c.m() which works today? My understanding of composition is simply setting an attribute of your instance to the component, then calling methods on the attribute. How does that differ from what you are describing? Instead of the classic multiple-inheritence: class Car(Engine, AcceleratorPedal, GasTank, ...): pass which requires each superclass to be designed to work with each other (e.g. you can't have both EntertainmentSystem.start() and Ignition.start(), unless you want the ignition to automatically turn on when the entertainment system does) we can instead use composition and delegation: class Car: def __init__(self): self.engine = Engine() self.accelerator = AcceleratorPedal() ... def start(self): # Delegate to the ignition component. self.ignition.start() etc. Obviously this is just a very loose sketch, don't take it too literally. Is this the sort of thing you are talking about? > I am not sure how you'd set components, or test for components, If you don't know how to set components, or test for them, what do you know how to do with components? And how are components different from attributes? > but I don't think it makes sense to be able to do o.[c][x] or x=o.[c], because > those break the idea of automatically passing the object as an argument > (unless we create whole wrapper objects every time the syntax is used, > and that's a bit ew. also how would you handle o.[c1].[c2] ?). I'm afraid I do not understand what you are talking about here. If might help if you give a concrete example, with meaningful names. It would help even better if you can contrast the way we do composition now with the way you think we should do it. I'm afraid that at the moment I'm parsing your post as: "composition is cool, we should use it; and o.[c].m() is cool syntax, we should use it for composition; I'll leave the details to others". > Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o > being evaluated only once, I don't see why you're using __getitem__ instead of attribute access; nor do I understand why m gets o as argument instead of c. Wait... is this something to do with Lieberman-style delegation? http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html http://code.activestate.com/recipes/519639-true-lieberman-style-delegation-in-python/ > as that solves a lot of current issues > relating to inheritance while introducing very few issues relating to > python's "everything is separate" (e.g. __get__ vs __getattr__) > policy.This also makes setting components and testing for components > fairly trivial, and completely avoids the issues mentioned above by > making their syntax illegal. Above you said that you don't know how to set and test for components, now you say that doing so is trivial. Which is it? > (Disclaimer: This was inspired by my own programming language, > Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by > Rust[2] traits. Traits are normally considered to be a more restricted, safer form of multiple inheritence, similar to mixins but even more restrictive. http://www.artima.com/weblogs/viewpost.jsp?thread=246488 > [1] https://bitbucket.org/TeamSoni/cratera > [2] https://www.rust-lang.org/ -- Steve From mistersheik at gmail.com Sat Oct 28 07:56:28 2017 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 28 Oct 2017 11:56:28 +0000 Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: <20171028111413.GT9068@ando.pearwood.info> References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> <20171027232152.GR9068@ando.pearwood.info> <20171028111413.GT9068@ando.pearwood.info> Message-ID: On Sat, Oct 28, 2017 at 7:15 AM Steven D'Aprano wrote: > On Sat, Oct 28, 2017 at 12:14:31AM -0700, Neil Girdhar wrote: > > > > > > On Friday, October 27, 2017 at 8:05:17 PM UTC-4, Steven D'Aprano wrote: > > > > > > On Fri, Oct 27, 2017 at 01:59:01PM -0700, Ilya Kulakov wrote: > > > > > > > Since one of the legit use-cases of using the Thread class is > > > subclassing, > > > > I think it's __init__ should call super() to support cooperative > > > inheritance. > > > > > > > > Or perhaps there is a good reason for not doing so? > > > > > > Are you talking about threading.Thread or some other Thread? > > > > > > If you are talking about threading.Thread, its only superclass is > > > object, so why bother calling super().__init__? > > > > > > > The way cooperative multiple inheritance works is that if someone defines > > I didn't realise that Ilya was talking about *multiple* inheritance, > since he didn't use the word, only "cooperative". But since you > are talking about multiple inheritence: > > > > class SomeClass(Thread): > > def __init__(self, **kwargs): > > super().__init() > > > > they expect this will initialize the base class Thread as desired. > > That won't work, since you misspelled __init__ and neglected to pass any > arguments :-) You need: > > super().__init__(**kwargs) > > otherwise Thread will not be initialised. > > (I corrected myself right after.) > > > Now, if they add another base class: > > > > class SomeBase: > > def __init__(self, base_x): > > self.base_x = base_x > > > > then they need to pass up the arguments: > > > > class SomeClass(SomeBase, Thread): > > def __init__(self, **kwargs): > > super().__init(**kwargs) > > That's not going to work either, because you're passing arguments to > SomeBase that it doesn't understand: all the args that Thread expects. > > That's totally fine. That's how cooperative multiple inheritance works in Python. SomeBase is supposed to pass along everything to its superclass init through kwargs just like I illustrated in my corrected code. > And of course its going to doubly not work, since SomeBase fails to call > super, so Thread.__init__ still doesn't get called. > > If you fix all those problems, you still have another problem: in > Thread, if you call > > super().__init__(**kwargs) > > then object.__init__ will fail, as I mentioned in my earlier post; but > if you call: > super().__init__() > > then object is satisfied, but SomeBase.__init__ gets called with no > arguments. You can't satisfy both at the same time with a single call > to super. > No, this works fine. The idea is that each class consumes the keyword arguments it wants and passes along the rest. By the time you get to object, there are none left and object.__init__ doesn't complain. If there are extra arguments, then object.__init__ raises. > > > Unfortunately, if the order of base classes is reversed, this no longer > > works because Thread doesn't call super: > > > > class SomeClass(Thread, SomeBase): > > def __init__(self, **kwargs): > > super().__init(**kwargs) # SomeBase is not initialized! > > > > As things get more complicated it's not always possible to ensure that > > Thread is the last class in the inheritance, e.g., if there are two > classes > > like Thread that don't call super. > > You can't just add classes willy-nilly into the superclass list. That's > why it is called *cooperative* multiple inheritence: the superclasses > all have to be designed to work together. You *can't* have two classes > that don't call super -- and you must have one class just ahead of > object, to prevent object from receiving args it can't do anything with. You don't need "one class just ahead of object". Every class calls super passing along its arguments via kwargs. And it's very hard when combining various mixins to ensure that a given order is maintained. > And that class might as well be Thread. At least, I can't think of any > reason why it shouldn't be Thread. > I'm sorry, but I don't agree with this. Unfortunately, there are some various oversights in Python when it comes to cooperative multiple inheritance. This is only one of them that I've also run into. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- > > --- > You received this message because you are subscribed to a topic in the > Google Groups "python-ideas" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/python-ideas/mgHYhQKAbdo/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > python-ideas+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Sat Oct 28 08:19:09 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sat, 28 Oct 2017 10:19:09 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <20171028115111.GU9068@ando.pearwood.info> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> Message-ID: <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> On 2017-10-28 09:51 AM, Steven D'Aprano wrote: > On Sat, Oct 28, 2017 at 09:09:30AM -0200, Soni L. wrote: >> As recent threads indicate, composition may sometimes be better than >> inheritance. And so I'd like to propose composition as a built-in feature. >> >> My idea is syntax of the form o.[c].m(), where o is an object, c is a >> component, m is a method. > How is that different from o.c.m() which works today? > > My understanding of composition is simply setting an attribute of your > instance to the component, then calling methods on the attribute. How > does that differ from what you are describing? > > Instead of the classic multiple-inheritence: > > > class Car(Engine, AcceleratorPedal, GasTank, ...): > pass > > which requires each superclass to be designed to work with each other > > (e.g. you can't have both EntertainmentSystem.start() and > Ignition.start(), unless you want the ignition to automatically turn on > when the entertainment system does) > > we can instead use composition and delegation: > > class Car: > def __init__(self): > self.engine = Engine() > self.accelerator = AcceleratorPedal() > ... > > def start(self): > # Delegate to the ignition component. > self.ignition.start() > > > etc. Obviously this is just a very loose sketch, don't take it too > literally. Is this the sort of thing you are talking about? So how do you call car.ignition.start() from car.key.turn()? > > >> I am not sure how you'd set components, or test for components, > If you don't know how to set components, or test for them, what do you > know how to do with components? > > And how are components different from attributes? They're more like conflict-free interfaces, and in this specific case they're designed with duck typing in mind. (You can dynamically add and remove components, and use whatever you want as the component. You cannot do that with inheritance.) > > >> but I don't think it makes sense to be able to do o.[c][x] or x=o.[c], because >> those break the idea of automatically passing the object as an argument >> (unless we create whole wrapper objects every time the syntax is used, >> and that's a bit ew. also how would you handle o.[c1].[c2] ?). > I'm afraid I do not understand what you are talking about here. > > If might help if you give a concrete example, with meaningful names. It > would help even better if you can contrast the way we do composition now > with the way you think we should do it. > > I'm afraid that at the moment I'm parsing your post as: > > "composition is cool, we should use it; and o.[c].m() is cool syntax, we > should use it for composition; I'll leave the details to others". Again, how do you call car.ignition.start() from car.key.turn()? > > >> Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o >> being evaluated only once, > I don't see why you're using __getitem__ instead of attribute access; > nor do I understand why m gets o as argument instead of c. > > Wait... is this something to do with Lieberman-style delegation? > > http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html > > http://code.activestate.com/recipes/519639-true-lieberman-style-delegation-in-python/ > TL;DR. But no, it's not some form of delegation. It still gets `self` (which is whatever is in o[c] - which may be c itself, or an arbitrary object that fulfills the contract defined by c), but also gets `o` in addition to `self`. (Unless it's a plain function, in which case it gets no `self`.) >> as that solves a lot of current issues >> relating to inheritance while introducing very few issues relating to >> python's "everything is separate" (e.g. __get__ vs __getattr__) >> policy.This also makes setting components and testing for components >> fairly trivial, and completely avoids the issues mentioned above by >> making their syntax illegal. > Above you said that you don't know how to set and test for components, > now you say that doing so is trivial. Which is it? If you pay closer attention, you'll notice the two different paragraphs talk about two different syntaxes. - o.[c] as a standalone syntax element, allowing things like x=o.[c1].[c2]; and x=o.[c1][c2];. - o.[c].m() as a standalone syntax element, *disallowing* the above. > > >> (Disclaimer: This was inspired by my own programming language, >> Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by >> Rust[2] traits. > Traits are normally considered to be a more restricted, safer form of > multiple inheritence, similar to mixins but even more restrictive. What do you mean more restricted? They let you have the same method in multiple components/traits and not have them conflict, among other things. My variant also makes them dynamic and ducky, making them even more relaxed. Definitely (still) safer tho. > > http://www.artima.com/weblogs/viewpost.jsp?thread=246488 > >> [1] https://bitbucket.org/TeamSoni/cratera >> [2] https://www.rust-lang.org/ > From tjol at tjol.eu Sat Oct 28 08:50:43 2017 From: tjol at tjol.eu (Thomas Jollans) Date: Sat, 28 Oct 2017 14:50:43 +0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> Message-ID: <4567b19c-c066-6a1c-9925-b0bc1cb0fa66@tjol.eu> On 28/10/17 14:19, Soni L. wrote: > > > On 2017-10-28 09:51 AM, Steven D'Aprano wrote: >> On Sat, Oct 28, 2017 at 09:09:30AM -0200, Soni L. wrote: >>> As recent threads indicate, composition may sometimes be better than >>> inheritance. And so I'd like to propose composition as a built-in >>> feature. >>> >>> My idea is syntax of the form o.[c].m(), where o is an object, c is a >>> component, m is a method. >> How is that different from o.c.m() which works today? >> >> My understanding of composition is simply setting an attribute of your >> instance to the component, then calling methods on the attribute. How >> does that differ from what you are describing? >> >> Instead of the classic multiple-inheritence: >> >> >> class Car(Engine, AcceleratorPedal, GasTank, ...): >> pass >> >> which requires each superclass to be designed to work with each other >> >> (e.g. you can't have both EntertainmentSystem.start() and >> Ignition.start(), unless you want the ignition to automatically turn on >> when the entertainment system does) >> >> we can instead use composition and delegation: >> >> class Car: >> def __init__(self): >> self.engine = Engine() >> self.accelerator = AcceleratorPedal() >> ... >> >> def start(self): >> # Delegate to the ignition component. >> self.ignition.start() >> >> >> etc. Obviously this is just a very loose sketch, don't take it too >> literally. Is this the sort of thing you are talking about? > > So how do you call car.ignition.start() from car.key.turn()? self.car.ignition.start() of course. If the key has to do something involving the car, it has to know about the car, so tell it about the car: class Car: def __init__(self): self.engine = Engine() self.accelerator = AcceleratorPedal(self.engine) self.ignition = Ignition(self) self.key = Key(self) # and so on. FWIW I haven't the faintest idea what you're talking about. Please provide an example that shows how you might create a "component" and use it. Ideally, comparing it with an example of how you would currently so the same thing in Python. > >> >> >>> I am not sure how you'd set components, or test for components, >> If you don't know how to set components, or test for them, what do you >> know how to do with components? >> >> And how are components different from attributes? > > They're more like conflict-free interfaces, and in this specific case > they're designed with duck typing in mind. (You can dynamically add and > remove components, and use whatever you want as the component. You > cannot do that with inheritance.) > >> >> >>> but I don't think it makes sense to be able to do o.[c][x] or >>> x=o.[c], because >>> those break the idea of automatically passing the object as an argument >>> (unless we create whole wrapper objects every time the syntax is used, >>> and that's a bit ew. also how would you handle o.[c1].[c2] ?). >> I'm afraid I do not understand what you are talking about here. >> >> If might help if you give a concrete example, with meaningful names. It >> would help even better if you can contrast the way we do composition now >> with the way you think we should do it. >> >> I'm afraid that at the moment I'm parsing your post as: >> >> "composition is cool, we should use it; and o.[c].m() is cool syntax, we >> should use it for composition; I'll leave the details to others". > > Again, how do you call car.ignition.start() from car.key.turn()? > >> >> >>> Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o >>> being evaluated only once, >> I don't see why you're using __getitem__ instead of attribute access; >> nor do I understand why m gets o as argument instead of c. >> >> Wait... is this something to do with Lieberman-style delegation? >> >> http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html >> >> http://code.activestate.com/recipes/519639-true-lieberman-style-delegation-in-python/ >> >> > > TL;DR. But no, it's not some form of delegation. > > It still gets `self` (which is whatever is in o[c] - which may be c > itself, or an arbitrary object that fulfills the contract defined by c), > but also gets `o` in addition to `self`. (Unless it's a plain function, > in which case it gets no `self`.) > >>> as that solves a lot of current issues >>> relating to inheritance while introducing very few issues relating to >>> python's "everything is separate" (e.g. __get__ vs __getattr__) >>> policy.This also makes setting components and testing for components >>> fairly trivial, and completely avoids the issues mentioned above by >>> making their syntax illegal. >> Above you said that you don't know how to set and test for components, >> now you say that doing so is trivial. Which is it? > > If you pay closer attention, you'll notice the two different paragraphs > talk about two different syntaxes. > > - o.[c] as a standalone syntax element, allowing things like > x=o.[c1].[c2]; and x=o.[c1][c2];. > - o.[c].m() as a standalone syntax element, *disallowing* the above. > >> >> >>> (Disclaimer: This was inspired by my own programming language, >>> Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by >>> Rust[2] traits. >> Traits are normally considered to be a more restricted, safer form of >> multiple inheritence, similar to mixins but even more restrictive. > > What do you mean more restricted? They let you have the same method in > multiple components/traits and not have them conflict, among other > things. My variant also makes them dynamic and ducky, making them even > more relaxed. Definitely (still) safer tho. > >> >> http://www.artima.com/weblogs/viewpost.jsp?thread=246488 >> >>> [1] https://bitbucket.org/TeamSoni/cratera >>> [2] https://www.rust-lang.org/ >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Sat Oct 28 12:51:37 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Sun, 29 Oct 2017 03:51:37 +1100 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> Message-ID: <20171028165135.GV9068@ando.pearwood.info> On Sat, Oct 28, 2017 at 10:19:09AM -0200, Soni L. wrote: > >class Car: > > def __init__(self): > > self.engine = Engine() > > self.accelerator = AcceleratorPedal() > > ... > > > > def start(self): > > # Delegate to the ignition component. > > self.ignition.start() > > > > > >etc. Obviously this is just a very loose sketch, don't take it too > >literally. Is this the sort of thing you are talking about? > > So how do you call car.ignition.start() from car.key.turn()? You don't -- the key is not a component of the car, its an argument of ignition.start. If the key doesn't fit, ignition.start() raises an exception and the car doesn't start. I'm not really interested in getting into a tedious debate over the best way to design a Car object. As I said, the above is just a loose sketch illustrating composition, not a carefully planned and debugged object. The aim here is to understand your proposal: what exactly do you mean for Python to support composition over inheritence, and how does it differ from Python's existing support for composition? You ignored my question: Is that the sort of thing you mean by composition? If not, then what do you mean by it? This is not a rhetorical question: I'm having difficulty understanding your proposal. It is too vague, and you are using terminology in ways I don't understand. Maybe that's my ignorance, or maybe you're using non-standard terminology. Either way, if I'm having trouble, probably others are too. Help us understand your proposal. > >>I am not sure how you'd set components, or test for components, > >If you don't know how to set components, or test for them, what do you > >know how to do with components? > > > >And how are components different from attributes? > > They're more like conflict-free interfaces, and in this specific case > they're designed with duck typing in mind. (You can dynamically add and > remove components, and use whatever you want as the component. You > cannot do that with inheritance.) What do you mean by "conflict-free interfaces"? I can only repeat my earlier request: > >If might help if you give a concrete example, with meaningful names. It > >would help even better if you can contrast the way we do composition now > >with the way you think we should do it. > >I'm afraid that at the moment I'm parsing your post as: > > > >"composition is cool, we should use it; and o.[c].m() is cool syntax, we > >should use it for composition; I'll leave the details to others". > > Again, how do you call car.ignition.start() from car.key.turn()? Maybe you can't. Maybe this is a crippling example of why composition isn't as good as inheritence and the OOP community is right that inheritence is the best thing since sliced bread. Maybe my design of the Car object sucks. But who cares? None of this comes any closer to explaining your proposal. > >>Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o > >>being evaluated only once, > >I don't see why you're using __getitem__ instead of attribute access; > >nor do I understand why m gets o as argument instead of c. > > > >Wait... is this something to do with Lieberman-style delegation? > > > >http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html > > > >http://code.activestate.com/recipes/519639-true-lieberman-style-delegation-in-python/ > > > > TL;DR. But no, it's not some form of delegation. One of us is using non-standard terminology, and I don't think it is me. (Happy to be corrected if I'm wrong.) I understand that composition and delegation go hand in hand: you can't have one without the other. Composition refers to the arrangement of an object that is composed of other objects. Delegation refers to the way that the compound object calls methods on the component objects. The point (as I understand it) of composition is that a Car doesn't just have an Engine, it delegates functionality to the Engine: the Car object derives functionality by calling Engine methods directly, rather than inheriting them. Car.forward() delegates to Engine.forward(). The point is that the implementation of Car.forward is found in the self.engine object, rather than being inherited from an Engine class. Without delegation, the components aren't components at all, merely data attributes: Car.colour = 'red'. Does this match your proposal? If not, how is your proposal different? > It still gets `self` (which is whatever is in o[c] - which may be c > itself, or an arbitrary object that fulfills the contract defined by c), > but also gets `o` in addition to `self`. (Unless it's a plain function, > in which case it gets no `self`.) That sounds like a hybrid of Lieberman-style delegation and the more common form. At first glance, that seems to add complexity without giving the advantages of either form of delegation. > >>as that solves a lot of current issues > >>relating to inheritance while introducing very few issues relating to > >>python's "everything is separate" (e.g. __get__ vs __getattr__) > >>policy.This also makes setting components and testing for components > >>fairly trivial, and completely avoids the issues mentioned above by > >>making their syntax illegal. > >Above you said that you don't know how to set and test for components, > >now you say that doing so is trivial. Which is it? > > If you pay closer attention, you'll notice the two different paragraphs > talk about two different syntaxes. I don't care about syntax yet. I'm still trying to understand the semantics of your proposal. Whether you spell this thing instance.[component] get_component(instance, 'component') instance!component is less important than understand what it *does*. > - o.[c] as a standalone syntax element, allowing things like > x=o.[c1].[c2]; and x=o.[c1][c2];. > - o.[c].m() as a standalone syntax element, *disallowing* the above. That makes no sense to me. I cannot make head or tail of what that is supposed to mean. > >>(Disclaimer: This was inspired by my own programming language, > >>Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by > >>Rust[2] traits. > >Traits are normally considered to be a more restricted, safer form of > >multiple inheritence, similar to mixins but even more restrictive. > > What do you mean more restricted? I mean that if you have two traits with the same method: class SpamTrait: def foo(self): ... class EggTrait: def foo(self): ... then you cannot use them both in a single class: class MyClass(SpamTrait, EggTrait): ... since the foo method clashes, unless MyClass explicitly specifies which foo method to use. Mixins and regular multiple inheritence do not have that restriction. If you expect both foo methods to be called, that's just regular multiple inheritence, with all its complexity and disadvantages. (See Michele Simionato numerous posts on Artima about super, multiple inheritence, mixins and his own traits implementation.) The point of traits is to prevent the existence of such conflicts: either by prohibiting the use of both SpamTrait and EggTrait at the same time, or by forcing MyClass to explicitly choose which foo method gets used. That's safer than unrestricted mixins and multiple inheritence, since it reduces the complexity of the inheritence heirarchy. > They let you have the same method in multiple components/traits and > not have them conflict, among other things. I think we are in agreement here. But in any case... traits are a form of inheritence, not composition. You said this proposal is inspired by Rust traits. Can you explain the connection between inheritence of traits and composition? -- Steve From solipsis at pitrou.net Sat Oct 28 12:41:54 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sat, 28 Oct 2017 18:41:54 +0200 Subject: [Python-ideas] Thread.__init__ should call super() References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> Message-ID: <20171028184154.3e5db263@fsol> On Fri, 27 Oct 2017 13:59:01 -0700 Ilya Kulakov wrote: > Since one of the legit use-cases of using the Thread class is subclassing, > I think it's __init__ should call super() to support cooperative inheritance. Not to derail this thread, but I find it much clearer to use the functional form of the Thread class, i.e. to pass the `target` and `args` constructor parameters. Regards Antoine. From fakedme+py at gmail.com Sat Oct 28 16:24:43 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sat, 28 Oct 2017 18:24:43 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <20171028165135.GV9068@ando.pearwood.info> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> Message-ID: On 2017-10-28 02:51 PM, Steven D'Aprano wrote: > On Sat, Oct 28, 2017 at 10:19:09AM -0200, Soni L. wrote: > >>> class Car: >>> def __init__(self): >>> self.engine = Engine() >>> self.accelerator = AcceleratorPedal() >>> ... >>> >>> def start(self): >>> # Delegate to the ignition component. >>> self.ignition.start() >>> >>> >>> etc. Obviously this is just a very loose sketch, don't take it too >>> literally. Is this the sort of thing you are talking about? >> So how do you call car.ignition.start() from car.key.turn()? > You don't -- the key is not a component of the car, its an argument of > ignition.start. If the key doesn't fit, ignition.start() raises an > exception and the car doesn't start. > > I'm not really interested in getting into a tedious debate over the best > way to design a Car object. As I said, the above is just a loose sketch > illustrating composition, not a carefully planned and debugged object. > The aim here is to understand your proposal: what exactly do you mean > for Python to support composition over inheritence, and how does it > differ from Python's existing support for composition? > > You ignored my question: Is that the sort of thing you mean by > composition? If not, then what do you mean by it? This is not a > rhetorical question: I'm having difficulty understanding your proposal. > It is too vague, and you are using terminology in ways I don't > understand. > > Maybe that's my ignorance, or maybe you're using non-standard > terminology. Either way, if I'm having trouble, probably others are too. > Help us understand your proposal. With composition, you can have car.key.turn() call car.ignition.start(), without having to add car to key or ignition to key. You just have to put both in a car and they can then see eachother! > > >>>> I am not sure how you'd set components, or test for components, >>> If you don't know how to set components, or test for them, what do you >>> know how to do with components? >>> >>> And how are components different from attributes? >> They're more like conflict-free interfaces, and in this specific case >> they're designed with duck typing in mind. (You can dynamically add and >> remove components, and use whatever you want as the component. You >> cannot do that with inheritance.) > What do you mean by "conflict-free interfaces"? > > I can only repeat my earlier request: > >>> If might help if you give a concrete example, with meaningful names. It >>> would help even better if you can contrast the way we do composition now >>> with the way you think we should do it. In Rust, you can put as many "conflicting" traits as you want on the same object, and it'll still work! It'll compile, it'll run, you'll be able to use the object in existing code that expects a specific trait, and code that operates on the object itself is fairly easy to write. > >>> I'm afraid that at the moment I'm parsing your post as: >>> >>> "composition is cool, we should use it; and o.[c].m() is cool syntax, we >>> should use it for composition; I'll leave the details to others". >> Again, how do you call car.ignition.start() from car.key.turn()? > Maybe you can't. Maybe this is a crippling example of why composition > isn't as good as inheritence and the OOP community is right that > inheritence is the best thing since sliced bread. Maybe my design of the > Car object sucks. > > But who cares? None of this comes any closer to explaining your > proposal. Composition works fine, if you do it like Rust. > > >>>> Thus I think o.[c].m() should be syntax sugar for o[c].m(o), with o >>>> being evaluated only once, >>> I don't see why you're using __getitem__ instead of attribute access; >>> nor do I understand why m gets o as argument instead of c. >>> >>> Wait... is this something to do with Lieberman-style delegation? >>> >>> http://web.media.mit.edu/~lieber/Lieberary/OOP/Delegation/Delegation.html >>> >>> http://code.activestate.com/recipes/519639-true-lieberman-style-delegation-in-python/ >>> >> TL;DR. But no, it's not some form of delegation. > One of us is using non-standard terminology, and I don't think it is me. > (Happy to be corrected if I'm wrong.) > > I understand that composition and delegation go hand in hand: you can't > have one without the other. Composition refers to the arrangement of an > object that is composed of other objects. Delegation refers to the way > that the compound object calls methods on the component objects. > > The point (as I understand it) of composition is that a Car doesn't just > have an Engine, it delegates functionality to the Engine: the Car object > derives functionality by calling Engine methods directly, rather than > inheriting them. Car.forward() delegates to Engine.forward(). > > The point is that the implementation of Car.forward is found in the > self.engine object, rather than being inherited from an Engine class. > > Without delegation, the components aren't components at all, merely data > attributes: Car.colour = 'red'. > > Does this match your proposal? If not, how is your proposal different? Meet ECS: https://en.wikipedia.org/wiki/Entity_component_system Now, keep in mind, the usual ECS is crap. Why's it crap? Because there's no reasonable call convention that's also performant! System.action(entity) # crap. can't override. entity.action() # crap. conflicts easily. either not very dynamic or awful semantics. entity.get(System).action(entity) # crap. while you can override this one, you get to evaluate entity twice! entity.get(System).action() # crap. creates an object every time it's used (or hogs some RAM to cache the object). I could keep going but the point is that Rust traits solve all these problems if you let them. > > >> It still gets `self` (which is whatever is in o[c] - which may be c >> itself, or an arbitrary object that fulfills the contract defined by c), >> but also gets `o` in addition to `self`. (Unless it's a plain function, >> in which case it gets no `self`.) > That sounds like a hybrid of Lieberman-style delegation and the more > common form. At first glance, that seems to add complexity without > giving the advantages of either form of delegation. It's not delegation. > > >>>> as that solves a lot of current issues >>>> relating to inheritance while introducing very few issues relating to >>>> python's "everything is separate" (e.g. __get__ vs __getattr__) >>>> policy.This also makes setting components and testing for components >>>> fairly trivial, and completely avoids the issues mentioned above by >>>> making their syntax illegal. >>> Above you said that you don't know how to set and test for components, >>> now you say that doing so is trivial. Which is it? >> If you pay closer attention, you'll notice the two different paragraphs >> talk about two different syntaxes. > I don't care about syntax yet. I'm still trying to understand the > semantics of your proposal. Whether you spell this thing > > instance.[component] > > get_component(instance, 'component') > > instance!component > > is less important than understand what it *does*. Semantics is Rust traits at runtime, without the delegation (this is by design - with delegation, libraries could make assumptions, and adding new traits to an object could break them). > > >> - o.[c] as a standalone syntax element, allowing things like >> x=o.[c1].[c2]; and x=o.[c1][c2];. >> - o.[c].m() as a standalone syntax element, *disallowing* the above. > That makes no sense to me. I cannot make head or tail of what that is > supposed to mean. It means whether your parser looks for "o.[c]" and emits an opcode for it, or it looks for "o.[c].m()" and emits an opcode for that instead. > > >>>> (Disclaimer: This was inspired by my own programming language, >>>> Cratera[1], so I'm a bit biased here. Cratera was, in turn, inspired by >>>> Rust[2] traits. >>> Traits are normally considered to be a more restricted, safer form of >>> multiple inheritence, similar to mixins but even more restrictive. >> What do you mean more restricted? > I mean that if you have two traits with the same method: > > class SpamTrait: > def foo(self): ... > > class EggTrait: > def foo(self): ... > > > then you cannot use them both in a single class: > > class MyClass(SpamTrait, EggTrait): > ... > > since the foo method clashes, unless MyClass explicitly specifies which > foo method to use. Mixins and regular multiple inheritence do not have > that restriction. > > If you expect both foo methods to be called, that's just regular > multiple inheritence, with all its complexity and disadvantages. > (See Michele Simionato numerous posts on Artima about super, multiple > inheritence, mixins and his own traits implementation.) > > The point of traits is to prevent the existence of such conflicts: either by > prohibiting the use of both SpamTrait and EggTrait at the same time, or > by forcing MyClass to explicitly choose which foo method gets used. > That's safer than unrestricted mixins and multiple inheritence, since it > reduces the complexity of the inheritence heirarchy. > > >> They let you have the same method in multiple components/traits and >> not have them conflict, among other things. > I think we are in agreement here. > > But in any case... traits are a form of inheritence, not composition. > You said this proposal is inspired by Rust traits. Can you explain the > connection between inheritence of traits and composition? > Rust traits are the best traits. You can have a Rust struct implement 2 traits with the same method. And it just works. You can call each trait method separately - disambiguation is done at the call site. This doesn't translate well to a dynamic language, so, for python, it's best to always specify the trait (no delegation - o.m() is always o.m() and if you want o.[c].m() you need to be explicit about it). Otherwise, it's the same as always: called method gets main object, etc. (Disclaimer: I wrote this message starting with the last question. It should still make sense tho. PS: You should go play with some Rust. As far as compiled languages go, Rust is pretty much the best.) > From k7hoven at gmail.com Sat Oct 28 21:31:25 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 29 Oct 2017 03:31:25 +0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> Message-ID: On Sat, Oct 28, 2017 at 11:24 PM, Soni L. wrote: > > On 2017-10-28 02:51 PM, Steven D'Aprano wrote: > >> >> You ignored my question: Is that the sort of thing you mean by >> composition? If not, then what do you mean by it? This is not a >> rhetorical question: I'm having difficulty understanding your proposal. >> It is too vague, and you are using terminology in ways I don't >> understand. >> >> Maybe that's my ignorance, or maybe you're using non-standard >> terminology. Either way, if I'm having trouble, probably others are too. >> Help us understand your proposal. >> > > I have to say I'm almost impressed by the constructiveness of the discussion, even though? I still don't understand the point of all the square brackets in the proposal. > With composition, you can have car.key.turn() call car.ignition.start(), > without having to add car to key or ignition to key. You just have to put > both in a car and they can then see eachother! > Here it's a bit confusing that the key is thought of as part of the car. It's easy to imagine that an owner of two cars would want the same key to work for both of them. Or that one car could have multiple users with non-identical keys. I'm not sure if these things already exist in real life, but if not, it's probably just a matter of time. But let's ignore this confusion for a moment, and imagine that the example makes perfect sense. Now, it sounds like you want something like namespacing for methods and attributes within a complicated class. Maybe you could implement it using nested classes and decorators to make sure 'self' is passed to to the methods in the way you want. The usage might look roughly like: @namespacedclass class Car: @classnamespace class ignition: def start(self): ... @classnamespace class key: def turn(self): self.ignition.start() Another concern regarding the example, however, is that this seems to make it unclear what the public API of the car is. It looks like you can just as easily drive the car without having the key: just call car.ignition.start(). ? -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From fakedme+py at gmail.com Sat Oct 28 21:46:37 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sat, 28 Oct 2017 23:46:37 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> Message-ID: <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> On 2017-10-28 11:31 PM, Koos Zevenhoven wrote: > On Sat, Oct 28, 2017 at 11:24 PM, Soni L. >wrote: > > > On 2017-10-28 02:51 PM, Steven D'Aprano wrote: > > > You ignored my question: Is that the sort of thing you mean by > composition? If not, then what do you mean by it? This is not a > rhetorical question: I'm having difficulty understanding your > proposal. > It is too vague, and you are using terminology in ways I don't > understand. > > Maybe that's my ignorance, or maybe you're using non-standard > terminology. Either way, if I'm having trouble, probably > others are too. > Help us understand your proposal. > > > > I have to say I'm almost impressed by the constructiveness of the > discussion, even though? I still don't understand the point of all the > square brackets in the proposal. > > With composition, you can have car.key.turn() call > car.ignition.start(), without having to add car to key or ignition > to key. You just have to put both in a car and they can then see > eachother! > > > Here it's a bit confusing that the key is thought of as part of the > car. It's easy to imagine that an owner of two cars would want the > same key to work for both of them. Or that one car could have multiple > users with non-identical keys. I'm not sure if these things already > exist in real life, but if not, it's probably just a matter of time. > > But let's ignore this confusion for a moment, and imagine that the > example makes perfect sense. Now, it sounds like you want something > like namespacing for methods and attributes within a complicated > class. Maybe you could implement it using nested classes and > decorators to make sure 'self' is passed to to the methods in the way > you want. The usage might look roughly like: > > @namespacedclass > class Car: > @classnamespace > ? ? class ignition: > ? ? ? ? def start(self): > ? ? ? ? ? ? ?... > > @classnamespace > ? ? class key: > ? ? ? ? def turn(self): > self.ignition.start() > > > Another concern regarding the example, however, is that this seems to > make it unclear what the public API of the car is. It looks like you > can just as easily drive the car without having the key: just call > car.ignition.start(). It's a crap example, yes. But you should be able to do things like car = object() car[Engine] = SimpleEngine() car.[Engine].kickstart() # calls kickstart method with an instance of SimpleEngine as `self`/first argument and `car` as second argument. # etc Which your decorator-based approach quite obviously doesn't let you. > ? > > -- Koos > > > -- > + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Sat Oct 28 21:57:13 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 29 Oct 2017 12:57:13 +1100 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> Message-ID: On Sun, Oct 29, 2017 at 12:46 PM, Soni L. wrote: > But you should be able to do things like > > car = object() > car[Engine] = SimpleEngine() > car.[Engine].kickstart() # calls kickstart method with an instance of > SimpleEngine as `self`/first argument and `car` as second argument. > # etc > > Which your decorator-based approach quite obviously doesn't let you. I think I follow what you're trying to do here. You want to have a way to refer to a subobject while letting it know about the parent. We already have something like that: when you call a function that was attached to a class, it gets to know which instance of that class you used to locate that function. Maybe there's a way to use descriptor protocol for this too? class SimpleEngine: def kickstart(self, caller): """*boot* Engine starts""" class Car: Engine = magic(SimpleEngine) car = Car() When you look up car.Engine, it remembers a reference to car. Then you look up any callable from there, and it automatically provides an additional parameter. I'm not sure how the details would work, but in theory, this should be possible, right? ChrisA From fakedme+py at gmail.com Sat Oct 28 22:05:59 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 29 Oct 2017 00:05:59 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> Message-ID: On 2017-10-28 11:57 PM, Chris Angelico wrote: > On Sun, Oct 29, 2017 at 12:46 PM, Soni L. wrote: >> But you should be able to do things like >> >> car = object() >> car[Engine] = SimpleEngine() >> car.[Engine].kickstart() # calls kickstart method with an instance of >> SimpleEngine as `self`/first argument and `car` as second argument. >> # etc >> >> Which your decorator-based approach quite obviously doesn't let you. > I think I follow what you're trying to do here. You want to have a way > to refer to a subobject while letting it know about the parent. We > already have something like that: when you call a function that was > attached to a class, it gets to know which instance of that class you > used to locate that function. Maybe there's a way to use descriptor > protocol for this too? > > class SimpleEngine: > def kickstart(self, caller): > """*boot* Engine starts""" > > class Car: > Engine = magic(SimpleEngine) > > car = Car() > > When you look up car.Engine, it remembers a reference to car. Then you > look up any callable from there, and it automatically provides an > additional parameter. I'm not sure how the details would work, but in > theory, this should be possible, right? And how do you make the object creation so cheap to the point where it's actually practical? (quick question: does Python use a single opcode and an optimized codepath for method calls, or does it always create a method wrapper, even for immediate o.m() calls? If it's the latter then yeah I guess there's no reason for new syntax because it's not gonna be significantly slower than what we currently have...) > > ChrisA > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From fakedme+py at gmail.com Sat Oct 28 22:13:15 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 29 Oct 2017 00:13:15 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> Message-ID: On 2017-10-29 12:05 AM, Soni L. wrote: > > > On 2017-10-28 11:57 PM, Chris Angelico wrote: >> On Sun, Oct 29, 2017 at 12:46 PM, Soni L. wrote: >>> But you should be able to do things like >>> >>> car = object() >>> car[Engine] = SimpleEngine() >>> car.[Engine].kickstart() # calls kickstart method with an instance of >>> SimpleEngine as `self`/first argument and `car` as second argument. >>> # etc >>> >>> Which your decorator-based approach quite obviously doesn't let you. >> I think I follow what you're trying to do here. You want to have a way >> to refer to a subobject while letting it know about the parent. We >> already have something like that: when you call a function that was >> attached to a class, it gets to know which instance of that class you >> used to locate that function. Maybe there's a way to use descriptor >> protocol for this too? >> >> class SimpleEngine: >> ???? def kickstart(self, caller): >> ???????? """*boot* Engine starts""" >> >> class Car: >> ???? Engine = magic(SimpleEngine) >> >> car = Car() >> >> When you look up car.Engine, it remembers a reference to car. Then you >> look up any callable from there, and it automatically provides an >> additional parameter. I'm not sure how the details would work, but in >> theory, this should be possible, right? > > And how do you make the object creation so cheap to the point where > it's actually practical? (quick question: does Python use a single > opcode and an optimized codepath for method calls, or does it always > create a method wrapper, even for immediate o.m() calls? If it's the > latter then yeah I guess there's no reason for new syntax because it's > not gonna be significantly slower than what we currently have...) Hmm thinking about it some more, this whole "magic()" thing is still bad. Replace class Car with: class Car: ? pass # or something like that and use it as: car = Car() car[Engine] = GasEngine() # please use the actual type instead of a stringy type for this. car[Engine].kickstart() # kickstart gets the car as second argument. And to have all cars have engines, you'd do: class Car: ? def __init__(self, ???): ??? self[Engine] = GasEngine() car = Car() car[Engine].kickstart() # kickstart gets the car as second argument. And if you can't do that, then you can't yet do what I'm proposing, and thus the proposal makes sense, even if it still needs some refining... > >> >> ChrisA >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > From rosuav at gmail.com Sat Oct 28 22:13:29 2017 From: rosuav at gmail.com (Chris Angelico) Date: Sun, 29 Oct 2017 13:13:29 +1100 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> Message-ID: On Sun, Oct 29, 2017 at 1:05 PM, Soni L. wrote: > And how do you make the object creation so cheap to the point where it's > actually practical? (quick question: does Python use a single opcode and an > optimized codepath for method calls, or does it always create a method > wrapper, even for immediate o.m() calls? If it's the latter then yeah I > guess there's no reason for new syntax because it's not gonna be > significantly slower than what we currently have...) Python-the-language simply specifies the semantics. Different implementations do different things. AIUI CPython 3.7 always creates a method wrapper (using a free-list to minimize memory allocations), but a future version might not; and PyPy's current versions have a special opcode. ChrisA From brenbarn at brenbarn.net Sat Oct 28 22:25:12 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sat, 28 Oct 2017 19:25:12 -0700 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> Message-ID: <59F53C08.6090703@brenbarn.net> On 2017-10-28 19:13, Soni L. wrote: > Hmm thinking about it some more, this whole "magic()" thing is still bad. > > Replace class Car with: > > class Car: > pass > # or something like that > > and use it as: > > car = Car() > car[Engine] = GasEngine() # please use the actual type instead of a > stringy type for this. > car[Engine].kickstart() # kickstart gets the car as second argument. > > And to have all cars have engines, you'd do: > > class Car: > def __init__(self, ???): > self[Engine] = GasEngine() > > car = Car() > car[Engine].kickstart() # kickstart gets the car as second argument. > > And if you can't do that, then you can't yet do what I'm proposing, and > thus the proposal makes sense, even if it still needs some refining... As near as I can tell you can indeed do that, although it's still not clear to me why you'd want to. You can give Car a __getitem__ that on-the-fly generates an Engine object that knows which Car it is attached to, and then you can make Engine.kickstart a descriptor that knows which Engine it is attached to, and from that can figure out which Car it is attached to. But why? Why do you want to use this particular syntax to do these things? You haven't explained what semantics you're attaching to the [brackets] as opposed to the .attribute notation. Also, why do you want the car to be the second argument? What if the components are more deeply nested? Are you going to pass every parent component as a separate argument? Why not just have the call be made on an object that has the car available as an attribute (so that inside kickstart() you can access self.car or something)? The pandas library, for instance, makes heavy use of descriptors along these lines. A DataFrame object has attributes called .loc and .iloc that you use to do different kinds of indexing with things like my_data_frame.loc['this':'that'] . This appears conceptually similar to what you're describing here. But it's hard to tell, because you still haven't given a clear, direct example of how you would use what you're proposed to actually accomplish some task. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From k7hoven at gmail.com Sat Oct 28 22:57:41 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 29 Oct 2017 04:57:41 +0200 Subject: [Python-ideas] Dollar operator suggestion In-Reply-To: <56ffad60-947f-bb2d-abe1-47126e656c68@python-academy.de> References: <56ffad60-947f-bb2d-abe1-47126e656c68@python-academy.de> Message-ID: On Fri, Oct 27, 2017 at 8:46 AM, Mike M?ller wrote: > This already exists in Coconut: > http://coconut.readthedocs.io/en/master/HELP.html#function-composition > > Quite funny to read that. It seems like they have made something like what I proposed in the 2015 function composition threads, but without what I considered improvements to it. I summarize the proposal below, but here's also a link to it: https://mail.python.org/pipermail/python-ideas/2015-May/033482.html Indeed, a discussion was going on where many people were interested in function composition syntax. Different variants had been proposed, and IMO one of the most notable suggestions had been using the new matrix multiplication operator. For example: from numpy import sqrt, mean, square rms = sqrt @ mean @ square ? rms(values) # == sqrt(mean(square(values)))? And this can of course already be implemented for custom callables which implement __matmul__. But my email (linked above, and which might be a bit hard to read because of the interleaved references to my previous proposal) was essentially about something like this: Assuming bar is a function that accepts at least one argument, make foo..bar equivalent to types.MethodType(bar, foo). In other words, it would create a bound method out of bar, with foo as self. Not only would it allow calling a method which is not defined within the class: values..square() but it would also directly allow this: values..square()..mean(axis=2)..sqrt() And even when the left-hand expression becomes the second argument for a function/method: car_door..car_key.turn() # equivalent to CarKey.turn(car_key, car_door) -- Koos PS. As you can see in the email linked above, I was already prepared to just abandon the idea, because most likely it would be rejected. But what I was not prepared for was the amount of *nonsense* arguments that were thrown against it in that thread and elsewhere, and the whole thing got turned into some kind of weird puzzle. > From http://coconut-lang.org/: > > Coconut is a functional programming language that compiles to Python. > > Since all valid Python is valid Coconut, using Coconut will only extend > > and enhance what you're already capable of in Python. > > Mike > > Am 26.10.17 um 13:06 schrieb Yan Pas: > > I've looked up this feature in haskell. Dollar sign operator is used to > avoid > > parentheses. > > > > Rationalle: > > Python tends to use functions instead of methods ( e.g.len([1,2,3]) > instead of > > [1,2,3].len() ). Sometimes the expression inside parentheses may become > big > > and using a lot of parentheses may tend to bad readability. I suggest the > > following syntax: > > > > len $ [1,2,3] > > > > Functions map be also chained: > > > > len $ list $ map(...) > > > > This operator may be used for function composition too: > > > > foo = len $ set $ > > in the same as > > foo = lambda *as,**kas : len(set(*as, **kas)) > > in current syntax > > > > Regards, > > Yan > > > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sat Oct 28 23:57:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Oct 2017 13:57:01 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" Message-ID: Over on python-dev, the question of recommending MRAB's "regex" module over the standard library's "re" module for more advanced regular expressions recently came up again. Because of various logistical issues and backwards compatibility risks, it's highly unlikely that we'll ever be able to swap out the current _sre based re module implementation in the standard library for an implementation based on the regex module. At the same time, it would be beneficial to have a way to offer an even stronger recommendation to redistributors that we think full-featured general purpose Python scripting environments should offer the regex module as an opt-in alternative to the baseline re feature set, since that would also help with other currently difficult cases like the requests module. What I'm thinking is that we could make some relatively simple additions to the `ensurepip` and `venv` modules to help with this: 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library module names containing dependency specifiers for recommended third party packages for particular tasks (e.g. "regex" as an enhanced alternative to "re", "requests" as an enhanced HTTPS-centric alternative to "urllib.request") 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap 3. Add a corresponding `--install-recommended flag to the `python -m ensurepip` CLI 4. Add a corresponding `--install-recommended flag to the `python -m venv` CLI (when combined with `--without-pip`, this would run pip directly from the bundled wheel file to do the installations) We'd also need either a new informational PEP or else a section in the developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` are up to the individual module maintainers (hence keying the mapping by standard library module name, rather than having a single flat list for the entire standard library). For redistributors with weak dependency support, these reference interpreter level recommendations could become redistributor level recommendations. Redistributors without weak dependency support could still make a distinction between "default" installations (which would include them) and "minimal" installations (which would exclude them). Folks writing scripts and example code for independent distribution (i.e. no explicitly declared dependencies) could then choose between relying on just the standard library (as now), or on the standard library plus independently versioned recommended packages. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 29 00:28:01 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Oct 2017 14:28:01 +1000 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <59F53C08.6090703@brenbarn.net> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: On 29 October 2017 at 12:25, Brendan Barnwell wrote: > On 2017-10-28 19:13, Soni L. wrote: > >> And to have all cars have engines, you'd do: >> >> class Car: >> def __init__(self, ???): >> self[Engine] = GasEngine() >> >> car = Car() >> car[Engine].kickstart() # kickstart gets the car as second argument. >> >> And if you can't do that, then you can't yet do what I'm proposing, and >> thus the proposal makes sense, even if it still needs some refining... >> > > As near as I can tell you can indeed do that, although it's still > not clear to me why you'd want to. You can give Car a __getitem__ that > on-the-fly generates an Engine object that knows which Car it is attached > to, and then you can make Engine.kickstart a descriptor that knows which > Engine it is attached to, and from that can figure out which Car it is > attached to. > Right, I think a few different things are getting confused here related to how different folks use composition. For most data modeling use cases, the composition model you want is either a tree or an acyclic graph, where the subcomponents don't know anything about the whole that they're a part of. This gives you good component isolation, and avoids circular dependencies. However, for other cases, you *do* want the child object to be aware of the parent - XML etrees are a classic example of this, where we want to allow navigation back up the tree, so each node gains a reference to its parent node. This often takes the form of a combination of delegation (parent->child references) and dependency inversion (child->parent reference). For the car/engine example, this relates to explicitly modeling the relationship whereby a car can have one or more engines (but the engine may not currently be installed), while an engine can be installed in at most one car at any given point in time. You don't even need the descriptor protocol for that though, you just need the subcomponent to accept the parent reference as a constructor parameter: class Car: def __init__(self, engine_type): self.engine = engine_type(self) However, this form of explicit dependency inversion wouldn't work as well if you want to be able to explicitly create an "uninstalled engine" instance, and then pass the engine in as a parameter to the class constructor: class Car: def __init__(self, engine): self.engine = engine # How would we ensure the engine is marked as installed here? As it turns out, Python doesn't need new syntax for this either, as it's all already baked into the regular attribute access syntax, whereby descriptor methods get passed a reference not only to the descriptor, but *also* to the object being accessed: https://docs.python.org/3/howto/descriptor.html#descriptor-protocol And then the property builtin lets you ignore the existence of the descriptor object entirely, and only care about the original object, allowing the above example to be written as: class Car: def __init__(self, engine): self.engine = engine # This implicitly marks the engine as installed @property def engine(self): return self._engine @engine.setter def engine(self, engine): if engine is not None: if self._engine is not None: raise RuntimeError("Car already has an engine installed") if engine._car is not None: raise RuntimeError("Engine is already installed in another car") engine._car = self self._engine = engine car = Car(GasEngine()) ORMs use this kind of descriptor based composition management extensively in order to reliably model database foreign key relationships in a way that's mostly transparent to users of the ORM classes. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Sun Oct 29 01:16:39 2017 From: guido at python.org (Guido van Rossum) Date: Sat, 28 Oct 2017 22:16:39 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: Why? What's wrong with pip install? Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"? On Sat, Oct 28, 2017 at 8:57 PM, Nick Coghlan wrote: > Over on python-dev, the question of recommending MRAB's "regex" module > over the standard library's "re" module for more advanced regular > expressions recently came up again. > > Because of various logistical issues and backwards compatibility risks, > it's highly unlikely that we'll ever be able to swap out the current _sre > based re module implementation in the standard library for an > implementation based on the regex module. > > At the same time, it would be beneficial to have a way to offer an even > stronger recommendation to redistributors that we think full-featured > general purpose Python scripting environments should offer the regex module > as an opt-in alternative to the baseline re feature set, since that would > also help with other currently difficult cases like the requests module. > > What I'm thinking is that we could make some relatively simple additions > to the `ensurepip` and `venv` modules to help with this: > > 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library > module names containing dependency specifiers for recommended third party > packages for particular tasks (e.g. "regex" as an enhanced alternative to > "re", "requests" as an enhanced HTTPS-centric alternative to > "urllib.request") > > 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap > > 3. Add a corresponding `--install-recommended flag to the `python -m > ensurepip` CLI > > 4. Add a corresponding `--install-recommended flag to the `python -m venv` > CLI (when combined with `--without-pip`, this would run pip directly from > the bundled wheel file to do the installations) > > We'd also need either a new informational PEP or else a section in the > developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` > are up to the individual module maintainers (hence keying the mapping by > standard library module name, rather than having a single flat list for the > entire standard library). > > For redistributors with weak dependency support, these reference > interpreter level recommendations could become redistributor level > recommendations. Redistributors without weak dependency support could still > make a distinction between "default" installations (which would include > them) and "minimal" installations (which would exclude them). > > Folks writing scripts and example code for independent distribution (i.e. > no explicitly declared dependencies) could then choose between relying on > just the standard library (as now), or on the standard library plus > independently versioned recommended packages. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From greg.ewing at canterbury.ac.nz Sun Oct 29 01:41:54 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Sun, 29 Oct 2017 18:41:54 +1300 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <59F53C08.6090703@brenbarn.net> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: <59F56A22.60900@canterbury.ac.nz> My take on all this is that it's much simpler just to give the engine an attribute that refers back to the car. The only downside is that it creates a circular reference, but in these days of cyclic gc, that's not much of an issue. -- Greg From tritium-list at sdamon.com Sun Oct 29 01:46:33 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 29 Oct 2017 01:46:33 -0400 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: <012f01d35079$47b38a90$d71a9fb0$@sdamon.com> At this point, I would punt to distutils-sig about curated packages, and pip tooling to support that, but they are bogged down as it stands with just getting warehouse up and running. I don?t support putting specialized tooling in python itself to install curated packages, because that curation would be on the same release schedule as python itself. Pip is an important special case, but it?s a special case to avoid further special cases. If there was excess manpower on the packaging side, that?s where it should be developed. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of Guido van Rossum Sent: Sunday, October 29, 2017 1:17 AM To: Nick Coghlan Cc: python-ideas at python.org Subject: Re: [Python-ideas] Defining an easily installable "Recommended baseline package set" Why? What's wrong with pip install? Why complicate things? Your motivation is really weak here. "beneficial"? "difficult cases"? On Sat, Oct 28, 2017 at 8:57 PM, Nick Coghlan > wrote: Over on python-dev, the question of recommending MRAB's "regex" module over the standard library's "re" module for more advanced regular expressions recently came up again. Because of various logistical issues and backwards compatibility risks, it's highly unlikely that we'll ever be able to swap out the current _sre based re module implementation in the standard library for an implementation based on the regex module. At the same time, it would be beneficial to have a way to offer an even stronger recommendation to redistributors that we think full-featured general purpose Python scripting environments should offer the regex module as an opt-in alternative to the baseline re feature set, since that would also help with other currently difficult cases like the requests module. What I'm thinking is that we could make some relatively simple additions to the `ensurepip` and `venv` modules to help with this: 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library module names containing dependency specifiers for recommended third party packages for particular tasks (e.g. "regex" as an enhanced alternative to "re", "requests" as an enhanced HTTPS-centric alternative to "urllib.request") 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap 3. Add a corresponding `--install-recommended flag to the `python -m ensurepip` CLI 4. Add a corresponding `--install-recommended flag to the `python -m venv` CLI (when combined with `--without-pip`, this would run pip directly from the bundled wheel file to do the installations) We'd also need either a new informational PEP or else a section in the developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` are up to the individual module maintainers (hence keying the mapping by standard library module name, rather than having a single flat list for the entire standard library). For redistributors with weak dependency support, these reference interpreter level recommendations could become redistributor level recommendations. Redistributors without weak dependency support could still make a distinction between "default" installations (which would include them) and "minimal" installations (which would exclude them). Folks writing scripts and example code for independent distribution (i.e. no explicitly declared dependencies) could then choose between relying on just the standard library (as now), or on the standard library plus independently versioned recommended packages. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- --Guido van Rossum (python.org/~guido ) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Sun Oct 29 02:31:58 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 29 Oct 2017 07:31:58 +0100 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <012f01d35079$47b38a90$d71a9fb0$@sdamon.com> References: <012f01d35079$47b38a90$d71a9fb0$@sdamon.com> Message-ID: There are already curated lists of Python packages, such as: https://github.com/vinta/awesome-python Even better, if you don't agree, you can just clone it and create your own ;-) Stephan 2017-10-29 6:46 GMT+01:00 Alex Walters : > At this point, I would punt to distutils-sig about curated packages, and > pip tooling to support that, but they are bogged down as it stands with > just getting warehouse up and running. I don?t support putting specialized > tooling in python itself to install curated packages, because that curation > would be on the same release schedule as python itself. Pip is an > important special case, but it?s a special case to avoid further special > cases. If there was excess manpower on the packaging side, that?s where it > should be developed. > > > > *From:* Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@ > python.org] *On Behalf Of *Guido van Rossum > *Sent:* Sunday, October 29, 2017 1:17 AM > *To:* Nick Coghlan > *Cc:* python-ideas at python.org > *Subject:* Re: [Python-ideas] Defining an easily installable "Recommended > baseline package set" > > > > Why? What's wrong with pip install? Why complicate things? Your motivation > is really weak here. "beneficial"? "difficult cases"? > > > > On Sat, Oct 28, 2017 at 8:57 PM, Nick Coghlan wrote: > > Over on python-dev, the question of recommending MRAB's "regex" module > over the standard library's "re" module for more advanced regular > expressions recently came up again. > > Because of various logistical issues and backwards compatibility risks, > it's highly unlikely that we'll ever be able to swap out the current _sre > based re module implementation in the standard library for an > implementation based on the regex module. > > At the same time, it would be beneficial to have a way to offer an even > stronger recommendation to redistributors that we think full-featured > general purpose Python scripting environments should offer the regex module > as an opt-in alternative to the baseline re feature set, since that would > also help with other currently difficult cases like the requests module. > > What I'm thinking is that we could make some relatively simple additions > to the `ensurepip` and `venv` modules to help with this: > > > > 1. Add a ensurepip.RECOMMENDED_PACKAGES mapping keyed by standard library > module names containing dependency specifiers for recommended third party > packages for particular tasks (e.g. "regex" as an enhanced alternative to > "re", "requests" as an enhanced HTTPS-centric alternative to > "urllib.request") > > 2. Add a new `install_recommended` boolean flag to ensurepip.bootstrap > > 3. Add a corresponding `--install-recommended flag to the `python -m > ensurepip` CLI > > 4. Add a corresponding `--install-recommended flag to the `python -m venv` > CLI (when combined with `--without-pip`, this would run pip directly from > the bundled wheel file to do the installations) > > We'd also need either a new informational PEP or else a section in the > developer guide to say that the contents of `ensurepip.RECOMMENDED_PACKAGES` > are up to the individual module maintainers (hence keying the mapping by > standard library module name, rather than having a single flat list for the > entire standard library). > > For redistributors with weak dependency support, these reference > interpreter level recommendations could become redistributor level > recommendations. Redistributors without weak dependency support could still > make a distinction between "default" installations (which would include > them) and "minimal" installations (which would exclude them). > > Folks writing scripts and example code for independent distribution (i.e. > no explicitly declared dependencies) could then choose between relying on > just the standard library (as now), or on the standard library plus > independently versioned recommended packages. > > > > Cheers, > > Nick. > > > -- > > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > -- > > --Guido van Rossum (python.org/~guido ) > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Sun Oct 29 03:54:22 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Sun, 29 Oct 2017 17:54:22 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 29 October 2017 at 15:16, Guido van Rossum wrote: > Why? What's wrong with pip install? > At a technical level, this would just be a really thin wrapper around 'pip install' (even thinner than ensurepip in general, since these libraries *wouldn't* be bundled for offline installation, only listed by name). > Why complicate things? Your motivation is really weak here. "beneficial"? > "difficult cases"? > The main recurring problems with "pip install" are a lack of discoverability and a potential lack of availability (depending on the environment). This then causes a couple of key undesirable outcomes: - folks using Python as a teaching language have to choose between teaching with just the standard library APIs, requiring that learners restrict themselves to a particular preconfigured learning environment, or make a detour into package management tools in order to ensure learners have access to the APIs they actually want to use (this isn't hypothetical - I was a technical reviewer for a book that justified teaching XML-RPC over HTTPS+JSON on the basis that xmlrpc was in the standard library, and requests wasn't) - folks using Python purely as a scripting language (i.e without app level dependency management) may end up having to restrict themselves to the standard library API, even when there's a well-established frequently preferred alternative for what they're doing (e.g. requests for API management, regex for enhanced regular expressions) The underlying problem is that our reasons for omitting these particular libraries from the standard library relate mainly to publisher side concerns like the logistics of ongoing bug fixing and support, *not* end user concerns like software reliability or API usability. This means that if educators aren't teaching them, or redistributors aren't providing them, then they're actively doing their users a disservice (as opposed to other cases like web frameworks and similar, where there are multiple competing options, you're only going to want one of them in any given application, and the relevant trade-offs between the available options depend greatly on exactly what you're doing) Now, the Python-for-data-science community have taken a particular direction around handling this, and there's an additional library set beyond the standard library that's pretty much taken for granted in a data science context. While conda has been the focal point for those efforts more recently, it started a long time ago with initiatives like Python(x, y) and the Enthought Python Distribution. Similarly, initiatives like Raspberry Pi are able to assume a particular learning environment (Raspbian in the Pi's case), rather than coping with arbitrary starting points. Curated lists like the "awesome-python" one that Stephan linked don't really help that much with the discoverability problem, since they become just another thing for people to learn: How do they find out such lists exist in the first place? Given such a list, how do they determine if the recommendations it offers are actually relevant to their needs? Since assessing a published package API against your needs as a user is a skill that has to be learned like any other, it can be a lot easier to get started in a more prescriptive environment that says "This is what you have to work with for now, we'll explain more about your options for branching out later". The proposal in this thread thus stems from asking the question "Who is going to be best positioned to offer authoritative advice on which third party modules may be preferable to their standard library counterparts for end users of Python?" and answering it with "The standard library module maintainers that are already responsible for deciding whether or not to place appropriate See Also links in the module documentation". All the proposal does is to suggest taking those existing recommendations from the documentation and converting them into a more readibly executable form. I'm not particularly wedded to any particular approach to making the recommendations available in a more machine-friendly form, though - it's just the "offer something more machine friendly than scraping the docs for recommendation links" aspect that I'm interested in. For example, we could skip touching ensurepip or venv at all, and instead limit this to a documentation proposal to collect these recommendations from the documentation, and publish them within the `venv` module docs as a "recommended-libraries.txt" file (using pip's requirements.txt format). That would be sufficient to allow straightforward 3rd party automation, without necessarily committing to providing such automation ourselves. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Oct 29 05:51:50 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 29 Oct 2017 10:51:50 +0100 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" References: Message-ID: <20171029105150.09e15bc4@fsol> On Sun, 29 Oct 2017 17:54:22 +1000 Nick Coghlan wrote: > > The underlying problem is that our reasons for omitting these particular > libraries from the standard library relate mainly to publisher side > concerns like the logistics of ongoing bug fixing and support, *not* end > user concerns like software reliability or API usability. They're both really. One important consequence of a library being in the stdlib is to tie it to the stdlib's release cycle, QA infrastructure and compatibility requirements -- which more or less solves many dependency and/or version pinning headaches. > This means that > if educators aren't teaching them, or redistributors aren't providing them, > then they're actively doing their users a disservice Which redistributors do not provide the requests library, for example? regex is probably not as popular (mostly because re is good enough for most purposes), but it still appears to be available from Ubuntu and Anaconda. > All the proposal does is to suggest taking those existing recommendations > from the documentation and converting them into a more readibly executable > form. I'm curious what such a list looks like :-) Regards Antoine. From stephanh42 at gmail.com Sun Oct 29 06:40:22 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 29 Oct 2017 11:40:22 +0100 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: Perhaps slightly off-topic, but I have sometimes wondered if pip could not be made somewhat friendlier for the absolute newbie and the classroom context. Some concrete proposals. 1. Add a function `pip` to the interactive interpreter (similar to how `help` is available). def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) This allows people to install something using pip as long as they have a Python prompt open, and avoids instructors to have to deal with platform-specific instructions for various shells. Also avoids confusion when multiple Python interpreters are available (it operates in the context of the current interpreter.) 2. Add to Pypi package webpages a line like: To install, execute this line in your Python interpreter: pip("install my-package --user") Make this copyable with a button (like Github allows you to copy the repo name). 3. Add the notion of a "channel" to which one can "subscribe". This is actually nothing new, just a Pypi package with no code and just a bunch of dependencies. But it allows me to say: Just enter: pip("install stephans-awesome-stuff --user") in your Python and you get a bunch of useful stuff. Stephan 2017-10-29 8:54 GMT+01:00 Nick Coghlan : > On 29 October 2017 at 15:16, Guido van Rossum wrote: > >> Why? What's wrong with pip install? >> > > At a technical level, this would just be a really thin wrapper around 'pip > install' (even thinner than ensurepip in general, since these libraries > *wouldn't* be bundled for offline installation, only listed by name). > > >> Why complicate things? Your motivation is really weak here. "beneficial"? >> "difficult cases"? >> > > The main recurring problems with "pip install" are a lack of > discoverability and a potential lack of availability (depending on the > environment). > > This then causes a couple of key undesirable outcomes: > > - folks using Python as a teaching language have to choose between > teaching with just the standard library APIs, requiring that learners > restrict themselves to a particular preconfigured learning environment, or > make a detour into package management tools in order to ensure learners > have access to the APIs they actually want to use (this isn't hypothetical > - I was a technical reviewer for a book that justified teaching XML-RPC > over HTTPS+JSON on the basis that xmlrpc was in the standard library, and > requests wasn't) > - folks using Python purely as a scripting language (i.e without app level > dependency management) may end up having to restrict themselves to the > standard library API, even when there's a well-established frequently > preferred alternative for what they're doing (e.g. requests for API > management, regex for enhanced regular expressions) > > The underlying problem is that our reasons for omitting these particular > libraries from the standard library relate mainly to publisher side > concerns like the logistics of ongoing bug fixing and support, *not* end > user concerns like software reliability or API usability. This means that > if educators aren't teaching them, or redistributors aren't providing them, > then they're actively doing their users a disservice (as opposed to other > cases like web frameworks and similar, where there are multiple competing > options, you're only going to want one of them in any given application, > and the relevant trade-offs between the available options depend greatly on > exactly what you're doing) > > Now, the Python-for-data-science community have taken a particular > direction around handling this, and there's an additional library set > beyond the standard library that's pretty much taken for granted in a data > science context. While conda has been the focal point for those efforts > more recently, it started a long time ago with initiatives like Python(x, > y) and the Enthought Python Distribution. > > Similarly, initiatives like Raspberry Pi are able to assume a particular > learning environment (Raspbian in the Pi's case), rather than coping with > arbitrary starting points. > > Curated lists like the "awesome-python" one that Stephan linked don't > really help that much with the discoverability problem, since they become > just another thing for people to learn: How do they find out such lists > exist in the first place? Given such a list, how do they determine if the > recommendations it offers are actually relevant to their needs? Since > assessing a published package API against your needs as a user is a skill > that has to be learned like any other, it can be a lot easier to get > started in a more prescriptive environment that says "This is what you have > to work with for now, we'll explain more about your options for branching > out later". > > The proposal in this thread thus stems from asking the question "Who is > going to be best positioned to offer authoritative advice on which third > party modules may be preferable to their standard library counterparts for > end users of Python?" and answering it with "The standard library module > maintainers that are already responsible for deciding whether or not to > place appropriate See Also links in the module documentation". > > All the proposal does is to suggest taking those existing recommendations > from the documentation and converting them into a more readibly executable > form. > > I'm not particularly wedded to any particular approach to making the > recommendations available in a more machine-friendly form, though - it's > just the "offer something more machine friendly than scraping the docs for > recommendation links" aspect that I'm interested in. For example, we could > skip touching ensurepip or venv at all, and instead limit this to a > documentation proposal to collect these recommendations from the > documentation, and publish them within the `venv` module docs as a > "recommended-libraries.txt" file (using pip's requirements.txt format). > That would be sufficient to allow straightforward 3rd party automation, > without necessarily committing to providing such automation ourselves. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 29 07:35:53 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 11:35:53 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 29 October 2017 at 07:54, Nick Coghlan wrote: > On 29 October 2017 at 15:16, Guido van Rossum wrote: >> >> Why? What's wrong with pip install? > > At a technical level, this would just be a really thin wrapper around 'pip > install' (even thinner than ensurepip in general, since these libraries > *wouldn't* be bundled for offline installation, only listed by name). I agree with Guido, this doesn't seem to add much as stated. (From an earlier message of Nick's) > At the same time, it would be beneficial to have a way to offer an even > stronger recommendation to redistributors that we think full-featured > general purpose Python scripting environments should offer the regex > module as an opt-in alternative to the baseline re feature set, since that > would also help with other currently difficult cases like the requests module. This is key to me. The target area is *scripting*. Library and application developers have their own ways of managing the dependency handling problem, and they generally work fine. The people who don't currently have a good solution are those who just use the system install - i.e., people writing adhoc scripts, people writing code that they want to share with other members of their organisation, via "here's this file, just run it", and not as a full application. For those people "not in a standard install" is a significantly higher barrier to usage than elsewhere. "Here's a Python file, just install Python and double click on the file" is a reasonable request. Running pip may be beyond the capabilities of the recipient. In my view, the key problems in this area are: 1. What users writing one-off scripts can expect to be available. 2. Corporate environments where "adding extra software" is a major hurdle. 3. Offline environments, where PyPI isn't available. The solutions to these issues aren't so much around how we let people know that they should do "pip install" (or "--install-recommended") but rather around initial installs. To that end I'd suggest: 1. The python.org installers gain an "install recommended 3rd party packages" option, that is true by default, which does "pip install ". This covers (1) and (2) above, as it makes the recommended package set the norm, and ensures that they are covered by an approval to "install Python". 2. For offline environments, we need to do a little more, but I'd imagine having the installer look for wheels in the same directory as the installer executable, and leave it to the user to put them there. If wheels aren't available the installer should warn but continue. For 3rd party distributions (Linux, Homebrew, Conda, ...) this gives a clear message that Python users are entitled to expect these modules to be available. Handling that expectation is down to those distributions. Things I haven't thought through: 1. System vs user site-packages. If we install (say) requests into the system site-packages, the user could potentially need admin rights to upgrade it. Or get into a situation where they have an older version in site-packages and a newer one in user-site (which I don't believe is a well-tested scenario), 2. Virtual environments. Should venv/virtualenv include the recommended packages by default? Probably not, but that does make the "in a virtual environment" experience different from the system Python experience. 3. The suggestion above for offline environments isn't perfect, by any means. Better solutions would be good here (but I don't think we want to go down the bundling route like we did with pip...?). Paul From p.f.moore at gmail.com Sun Oct 29 07:44:52 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 11:44:52 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <20171029105150.09e15bc4@fsol> References: <20171029105150.09e15bc4@fsol> Message-ID: On 29 October 2017 at 09:51, Antoine Pitrou wrote: > On Sun, 29 Oct 2017 17:54:22 +1000 > Nick Coghlan wrote: >> This means that >> if educators aren't teaching them, or redistributors aren't providing them, >> then they're actively doing their users a disservice > > Which redistributors do not provide the requests library, for example? > regex is probably not as popular (mostly because re is good enough for > most purposes), but it still appears to be available from Ubuntu and > Anaconda. I know it's not what you meant, but "the python.org installers" is the obvious answer here. On Windows, if you say to someone "install Python", they get the python.org distribution. Explicitly directing them to Anaconda is an option, but that gives them a distinctly different experience than "standard Python plus some best of breed packages like requests" does. >> All the proposal does is to suggest taking those existing recommendations >> from the documentation and converting them into a more readibly executable >> form. > > I'm curious what such a list looks like :-) I am also. I'd put requests on it immediately, but that's the only thing I consider obvious. regex is what triggered this, but I'm not sure it adds *that* much - it's a trade off between people who need the extra features and people confused over why we have two regex libraries available. After that, you're almost immediately into domain-specific answers, and it becomes tricky fast. Paul From fakedme+py at gmail.com Sun Oct 29 07:44:53 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 29 Oct 2017 09:44:53 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> On 2017-10-29 02:28 AM, Nick Coghlan wrote: > On 29 October 2017 at 12:25, Brendan Barnwell > wrote: > > On 2017-10-28 19:13, Soni L. wrote: > > And to have all cars have engines, you'd do: > > class Car: > ? ? def __init__(self, ???): > ? ? ? self[Engine] = GasEngine() > > car = Car() > car[Engine].kickstart() # kickstart gets the car as second > argument. > > And if you can't do that, then you can't yet do what I'm > proposing, and > thus the proposal makes sense, even if it still needs some > refining... > > > ? ? ? ? As near as I can tell you can indeed do that, although > it's still not clear to me why you'd want to. You can give Car a > __getitem__ that on-the-fly generates an Engine object that knows > which Car it is attached to, and then you can make > Engine.kickstart a descriptor that knows which Engine it is > attached to, and from that can figure out which Car it is attached to. > > > Right, I think a few different things are getting confused here > related to how different folks use composition. > > For most data modeling use cases, the composition model you want is > either a tree or an acyclic graph, where the subcomponents don't know > anything about the whole that they're a part of. This gives you good > component isolation, and avoids circular dependencies. > > However, for other cases, you *do* want the child object to be aware > of the parent - XML etrees are a classic example of this, where we > want to allow navigation back up the tree, so each node gains a > reference to its parent node. This often takes the form of a > combination of delegation (parent->child references) and dependency > inversion (child->parent reference). > > For the car/engine example, this relates to explicitly modeling the > relationship whereby a car can have one or more engines (but the > engine may not currently be installed), while an engine can be > installed in at most one car at any given point in time. > > You don't even need the descriptor protocol for that though, you just > need the subcomponent to accept the parent reference as a constructor > parameter: > > ??? class Car: > ? ? ? def __init__(self, engine_type): > ??????? self.engine = engine_type(self) > > However, this form of explicit dependency inversion wouldn't work as > well if you want to be able to explicitly create an "uninstalled > engine" instance, and then pass the engine in as a parameter to the > class constructor: > > ??? class Car: > ? ? ? def __init__(self, engine): > ??????? self.engine = engine # How would we ensure the engine is > marked as installed here? > > As it turns out, Python doesn't need new syntax for this either, as > it's all already baked into the regular attribute access syntax, > whereby descriptor methods get passed a reference not only to the > descriptor, but *also* to the object being accessed: > https://docs.python.org/3/howto/descriptor.html#descriptor-protocol > > And then the property builtin lets you ignore the existence of the > descriptor object entirely, and only care about the original object, > allowing the above example to be written as: > > ??? class Car: > ? ? ? def __init__(self, engine): > ??????? self.engine = engine # This implicitly marks the engine as > installed > > ????? @property > ????? def engine(self): > ????????? return self._engine > > ????? @engine.setter > ????? def engine(self, engine): > ???????? if engine is not None: > ? ?? ??????? if self._engine is not None: > ??? ? ?? ??????? raise RuntimeError("Car already has an engine installed") > ??????? ? ?? if engine._car is not None: > ??????????? ? ?? raise RuntimeError("Engine is already installed in > another car") > ? ?? ??????? engine._car = self > ???????? self._engine = engine > > car = Car(GasEngine()) > > ORMs use this kind of descriptor based composition management > extensively in order to reliably model database foreign key > relationships in a way that's mostly transparent to users of the ORM > classes. And this is how you miss the whole point of being able to dynamically add/remove arbitrary components on objects you didn't create, at runtime. Someone gave me this code and told me it explains what I'm trying to do: https://repl.it/NYCF/3 class T: ??? pass class C: ??? pass c = C() #c.[T] = 1 c.__dict__[T] = 1 I'd also like to add: def someone_elses_lib_function(arbitrary_object): ? #arbitrary_object.[T] = object() ? arbitrary_object.__dict__[T] = object() and def another_ones_lib_function(arbitrary_object): ? #if arbitrary_object.[T]: ? if arbitrary_object.__dict__[T]: ??? #arbitrary_object.[T].thing() ??? arbitrary_object.__dict__[T].thing(arbitrary_object) > > Cheers, > Nick. > > -- > Nick Coghlan?? | ncoghlan at gmail.com ?? | > Brisbane, Australia > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 29 07:50:26 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 11:50:26 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 29 October 2017 at 10:40, Stephan Houben wrote: > Perhaps slightly off-topic, but I have sometimes wondered if > pip could not be made somewhat friendlier for the absolute newbie > and the classroom context. > > Some concrete proposals. > > 1. Add a function `pip` to the interactive interpreter > (similar to how `help` is available). > > def pip(args): > import sys > import subprocess > subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) > > This allows people to install something using pip as long as they have a > Python prompt open, and avoids instructors to have to deal with > platform-specific > instructions for various shells. Also avoids confusion when multiple > Python interpreters > are available (it operates in the context of the current interpreter.) There are subtle issues around whether newly installed/upgraded packages are visible in a running Python interpreter. It's possible that this would result in *more* confusion than the current situation. I can see the appeal of something like this, but it's not as simple as it looks. If you want to discuss this further, I'd definitely suggest making it a thread of its own. Personally, as a pip maintainer, I'm -0 on this (possibly even -1). Paul From k7hoven at gmail.com Sun Oct 29 08:57:59 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 29 Oct 2017 14:57:59 +0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <59F56B61.2020308@canterbury.ac.nz> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <59F56B61.2020308@canterbury.ac.nz> Message-ID: (looks like you forgot to reply to the list) On Sun, Oct 29, 2017 at 7:47 AM, Greg Ewing wrote: > Koos Zevenhoven wrote: > >> It looks like you can just as easily drive the car without having the key >> > > That's been found to be a problem in real life, too. > More than one Python-programming car thief has been > caught with this function in their personal library: > > def hotwire(car): > car.engine.start() ?Yes, but my point was still about what the public API is. Starting the car with the key should be easier than starting it without the key. Surely you should be able to start the engine without the key if you look under the hood and figure it out. And maybe you'd need a tool for that to be kept in your garage for debugging, but not as a button attached to the outside of the car or on the dashboard :-). The problem becomes even worse if you instead make Car inherit from its parts (including Engine), because the relationship of a car and and engine is not well described by inheritance. A car is not a kind of engine, nor should it pretend to be an implementation of an engine. A car *comprises* (or contains) an engine. And who knows, maybe it has two engines! -- Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Sun Oct 29 09:06:42 2017 From: wes.turner at gmail.com (Wes Turner) Date: Sun, 29 Oct 2017 09:06:42 -0400 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On Sunday, October 29, 2017, Paul Moore wrote: > On 29 October 2017 at 10:40, Stephan Houben > wrote: > > Perhaps slightly off-topic, but I have sometimes wondered if > > pip could not be made somewhat friendlier for the absolute newbie > > and the classroom context. > > > > Some concrete proposals. > > > > 1. Add a function `pip` to the interactive interpreter > > (similar to how `help` is available). > > > > def pip(args): > > import sys > > import subprocess > > subprocess.check_call([sys.executable, "-m", "pip"] + > args.split()) > > > > This allows people to install something using pip as long as they > have a > > Python prompt open, and avoids instructors to have to deal with > > platform-specific > > instructions for various shells. Also avoids confusion when multiple > > Python interpreters > > are available (it operates in the context of the current interpreter.) > > There are subtle issues around whether newly installed/upgraded > packages are visible in a running Python interpreter. If the package is already imported, reload() isn't sufficient; but deepreload from IPython may be. IPython also supports shell commands prefixed with '!': ! pip install -U requests regex IPython ! pip install -U --user psfblessedset1 %run pip install -U --user psfblessedset1 %run? % autoreload? # http://ipython.readthedocs.io/en/stable/config/extensions/autoreload.html This: > It's possible > that this would result in *more* confusion than the current situation. Why isn't it upgrading over top of the preinstalled set? At least with bash, the shell command history is logged to .bash_history by default. IMO, it's easier to assume that environment-modifying commands are logged in the shell log; and that running a ``%logstart -o logged-cmds.py`` won't change the versions of packages it builds upon. Is it bad form to put ``! pip install -U requests `` at the top of every jupyter notebook? ``! pip install requests==version `` is definitely more reproducible. (seeAlso: binder, jupyter/docker-stacks) I'll just include these here unnecessarily from the other curated package set discussions: https://github.com/jupyter/docker-stacks https://github.com/Kaggle/docker-python/blob/master/Dockerfile https://binderhub.readthedocs.io/en/latest/ I can see the appeal of something like this, but it's not as simple as > it looks. If you want to discuss this further, I'd definitely suggest > making it a thread of its own. > Personally, as a pip maintainer, I'm -0 > on this (possibly even -1). Same. -------------- next part -------------- An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Sun Oct 29 11:15:06 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Sun, 29 Oct 2017 17:15:06 +0200 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: <20171029105150.09e15bc4@fsol> Message-ID: On Sun, Oct 29, 2017 at 1:44 PM, Paul Moore wrote: > On 29 October 2017 at 09:51, Antoine Pitrou wrote: > > On Sun, 29 Oct 2017 17:54:22 +1000 > > Nick Coghlan wrote: > >> All the proposal does is to suggest taking those existing > recommendations > >> from the documentation and converting them into a more readibly > executable > >> form. > > > > I'm curious what such a list looks like :-) > > I am also. I'd put requests on it immediately, but that's the only > thing I consider obvious. regex is what triggered this, but I'm not > sure it adds *that* much - it's a trade off between people who need > the extra features and people confused over why we have two regex > libraries available. After that, you're almost immediately into > domain-specific answers, and it becomes tricky fast. > ?If the list is otherwise not be long enough, maybe it will contain something that is now in the stdlib?-) -- Koos? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From brenbarn at brenbarn.net Sun Oct 29 12:57:53 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sun, 29 Oct 2017 09:57:53 -0700 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> Message-ID: <59F60891.8020704@brenbarn.net> On 2017-10-29 04:44, Soni L. wrote: > And this is how you miss the whole point of being able to dynamically > add/remove arbitrary components on objects you didn't create, at runtime. > > Someone gave me this code and told me it explains what I'm trying to do: > https://repl.it/NYCF/3 > > class T: > pass > > class C: > pass > > c = C() > > #c.[T] = 1 > c.__dict__[T] = 1 Again, can you please explain why you want to write c.[T]? What do you intend that to *do*? Your commented line seems to indicate you want it to do what `c.__dict__[T]` does, but you can already do that with `setattr(c, T, 1)`. Or you can just give c an attribute that's a dict, but has an easier-to-type name than __dict__, so you can do `c.mydict[T]`. What is the specific advantage of `c.[T]` over these existing solutions? -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From brenbarn at brenbarn.net Sun Oct 29 13:10:21 2017 From: brenbarn at brenbarn.net (Brendan Barnwell) Date: Sun, 29 Oct 2017 10:10:21 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: <59F60B7D.1070200@brenbarn.net> On 2017-10-29 00:54, Nick Coghlan wrote: > The proposal in this thread thus stems from asking the question "Who is > going to be best positioned to offer authoritative advice on which third > party modules may be preferable to their standard library counterparts > for end users of Python?" and answering it with "The standard library > module maintainers that are already responsible for deciding whether or > not to place appropriate See Also links in the module documentation". > > All the proposal does is to suggest taking those existing > recommendations from the documentation and converting them into a more > readibly executable form. So it sounds like you're just saying that there should be one more "awesome-python" style list, except it should be official, with the seal of approval of Python itself. The details of exactly how that would displayed to users are not so important as that it be easily accessible from python.org and clearly shown to be endorsed by the Python developers. I think that is a good idea. The current situation is quite confusing in many cases. You see a lot of questions on StackOverflow where someone is tearing their hair out trying to use urllib to do heavy lifting and the answer is "use requests instead". Likewise people trying to implement matrix multiplication who should probably be using numpy, As for regex, to be honest, I have yet to find any way in which it is not completely superior to the existing re library, and I find it somewhat frustrating that the "publisher-side" concerns you mention continue to hold it back from replacing re. The only problem I see with this sort of "Python seal of approval" library list is that it carries some of the same risks as incorporating such libraries into the stdlib. Not all, but some. For one thing, if people go to python.org and see a thing that says "You may also want to use these libraries that bear the Python Seal of Approval. . .", and then they download them, and something goes wrong (e.g., there is some kind of version conflict with another package), they will likely blame python.org (at least partially). Only now, since the libraries aren't in the stdlib, the Python devs won't really be able to do anything to fix that; all they could do is remove the offending package from the approved list. In practice I think this is unlikely to happen, since the idea would be that Python devs would be judicious in awarding the seal of approval only to projects that are robust and not prone to breakage. But it does change the nature of the approval from "we approve this *code* and we will fix it if it breaks" (which is how the existing stdlib works) to "we approve these *people* (the people working on requests or regex or whatever) and we will cease to do if they break their code". -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown From stefan at bytereef.org Sun Oct 29 13:28:28 2017 From: stefan at bytereef.org (Stefan Krah) Date: Sun, 29 Oct 2017 18:28:28 +0100 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <59F60B7D.1070200@brenbarn.net> References: <59F60B7D.1070200@brenbarn.net> Message-ID: <20171029172828.GA19595@bytereef.org> On Sun, Oct 29, 2017 at 10:10:21AM -0700, Brendan Barnwell wrote: > Only now, since the libraries aren't in the stdlib, the Python devs > won't really be able to do anything to fix that; all they could do > is remove the offending package from the approved list. In practice > I think this is unlikely to happen, since the idea would be that > Python devs would be judicious in awarding the seal of approval only > to projects that are robust and not prone to breakage. But it does > change the nature of the approval from "we approve this *code* and > we will fix it if it breaks" (which is how the existing stdlib > works) to "we approve these *people* (the people working on requests > or regex or whatever) and we will cease to do if they break their > code". Let's be realistic: If MRAB were to include regex in the stdlib, *he* would be the one fixing things anyway. And he'd get to fight feature requests and stylistic rewrites. :-) Stefan Krah From fakedme+py at gmail.com Sun Oct 29 14:25:12 2017 From: fakedme+py at gmail.com (Soni L.) Date: Sun, 29 Oct 2017 16:25:12 -0200 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <59F60891.8020704@brenbarn.net> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> <59F60891.8020704@brenbarn.net> Message-ID: <7b5a08ad-a690-5e7f-689f-11deb36ffc76@gmail.com> On 2017-10-29 02:57 PM, Brendan Barnwell wrote: > On 2017-10-29 04:44, Soni L. wrote: >> And this is how you miss the whole point of being able to dynamically >> add/remove arbitrary components on objects you didn't create, at >> runtime. >> >> Someone gave me this code and told me it explains what I'm trying to do: >> https://repl.it/NYCF/3 >> >> class T: >> ???? pass >> >> class C: >> ???? pass >> >> c = C() >> >> #c.[T] = 1 >> c.__dict__[T] = 1 > > ????Again, can you please explain why you want to write c.[T]? What do > you intend that to *do*?? Your commented line seems to indicate you > want it to do what `c.__dict__[T]` does, but you can already do that > with `setattr(c, T, 1)`.? Or you can just give c an attribute that's a > dict, but has an easier-to-type name than __dict__, so you can do > `c.mydict[T]`.? What is the specific advantage of `c.[T]` over these > existing solutions? > Hmm... Why can't we just allow empty identifiers, and set a default handler for empty identifiers that implements the proposed ECS? But the basic idea is to indicate something at the call site, namely that T is a contract and the object returned should respect that contract and any function calls should pass the original object as an argument. (I personally don't like how Python treats o.m() (has self) the same as o.f() (no self) syntax-wise, either.) From guido at python.org Sun Oct 29 14:56:51 2017 From: guido at python.org (Guido van Rossum) Date: Sun, 29 Oct 2017 11:56:51 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: The two use cases you describe (scripters and teachers) leave me luke-warm -- scripters live in the wild west and can just pip install whatever (that's what it means to be scripting) and teachers tend to want a customized bundle anyway -- let the edu world get together and create their own recommended bundle. As long as it's not going to be bundled, i.e. there's just going to be some list of packages that we recommend to 3rd party repackagers, then I'm fine with it. But they must remain clearly marked as 3rd party packages in whatever docs we provide, and live in site-packages. I would really like to see what you'd add to the list besides requests -- I really don't see why the teaching use case would need the regex module (unless it's a class in regular expressions). --Guido On Sun, Oct 29, 2017 at 12:54 AM, Nick Coghlan wrote: > On 29 October 2017 at 15:16, Guido van Rossum wrote: > >> Why? What's wrong with pip install? >> > > At a technical level, this would just be a really thin wrapper around 'pip > install' (even thinner than ensurepip in general, since these libraries > *wouldn't* be bundled for offline installation, only listed by name). > > >> Why complicate things? Your motivation is really weak here. "beneficial"? >> "difficult cases"? >> > > The main recurring problems with "pip install" are a lack of > discoverability and a potential lack of availability (depending on the > environment). > > This then causes a couple of key undesirable outcomes: > > - folks using Python as a teaching language have to choose between > teaching with just the standard library APIs, requiring that learners > restrict themselves to a particular preconfigured learning environment, or > make a detour into package management tools in order to ensure learners > have access to the APIs they actually want to use (this isn't hypothetical > - I was a technical reviewer for a book that justified teaching XML-RPC > over HTTPS+JSON on the basis that xmlrpc was in the standard library, and > requests wasn't) > - folks using Python purely as a scripting language (i.e without app level > dependency management) may end up having to restrict themselves to the > standard library API, even when there's a well-established frequently > preferred alternative for what they're doing (e.g. requests for API > management, regex for enhanced regular expressions) > > The underlying problem is that our reasons for omitting these particular > libraries from the standard library relate mainly to publisher side > concerns like the logistics of ongoing bug fixing and support, *not* end > user concerns like software reliability or API usability. This means that > if educators aren't teaching them, or redistributors aren't providing them, > then they're actively doing their users a disservice (as opposed to other > cases like web frameworks and similar, where there are multiple competing > options, you're only going to want one of them in any given application, > and the relevant trade-offs between the available options depend greatly on > exactly what you're doing) > > Now, the Python-for-data-science community have taken a particular > direction around handling this, and there's an additional library set > beyond the standard library that's pretty much taken for granted in a data > science context. While conda has been the focal point for those efforts > more recently, it started a long time ago with initiatives like Python(x, > y) and the Enthought Python Distribution. > > Similarly, initiatives like Raspberry Pi are able to assume a particular > learning environment (Raspbian in the Pi's case), rather than coping with > arbitrary starting points. > > Curated lists like the "awesome-python" one that Stephan linked don't > really help that much with the discoverability problem, since they become > just another thing for people to learn: How do they find out such lists > exist in the first place? Given such a list, how do they determine if the > recommendations it offers are actually relevant to their needs? Since > assessing a published package API against your needs as a user is a skill > that has to be learned like any other, it can be a lot easier to get > started in a more prescriptive environment that says "This is what you have > to work with for now, we'll explain more about your options for branching > out later". > > The proposal in this thread thus stems from asking the question "Who is > going to be best positioned to offer authoritative advice on which third > party modules may be preferable to their standard library counterparts for > end users of Python?" and answering it with "The standard library module > maintainers that are already responsible for deciding whether or not to > place appropriate See Also links in the module documentation". > > All the proposal does is to suggest taking those existing recommendations > from the documentation and converting them into a more readibly executable > form. > > I'm not particularly wedded to any particular approach to making the > recommendations available in a more machine-friendly form, though - it's > just the "offer something more machine friendly than scraping the docs for > recommendation links" aspect that I'm interested in. For example, we could > skip touching ensurepip or venv at all, and instead limit this to a > documentation proposal to collect these recommendations from the > documentation, and publish them within the `venv` module docs as a > "recommended-libraries.txt" file (using pip's requirements.txt format). > That would be sufficient to allow straightforward 3rd party automation, > without necessarily committing to providing such automation ourselves. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Sun Oct 29 15:05:10 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Sun, 29 Oct 2017 20:05:10 +0100 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" References: <20171029105150.09e15bc4@fsol> Message-ID: <20171029200510.1ff3cfaa@fsol> On Sun, 29 Oct 2017 11:44:52 +0000 Paul Moore wrote: > On 29 October 2017 at 09:51, Antoine Pitrou wrote: > > On Sun, 29 Oct 2017 17:54:22 +1000 > > Nick Coghlan wrote: > >> This means that > >> if educators aren't teaching them, or redistributors aren't providing them, > >> then they're actively doing their users a disservice > > > > Which redistributors do not provide the requests library, for example? > > regex is probably not as popular (mostly because re is good enough for > > most purposes), but it still appears to be available from Ubuntu and > > Anaconda. > > I know it's not what you meant, but "the python.org installers" is the > obvious answer here. Well, I'm not sure what Nick meant by the word exactly, but "to redistribute" means "to distribute again", so the word looked aimed at third-party vendors of Python rather than python.org :-) Regards Antoine. From stephanh42 at gmail.com Sun Oct 29 15:19:04 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 29 Oct 2017 20:19:04 +0100 Subject: [Python-ideas] install pip packages from Python prompt Message-ID: Hi all, Here is in somewhat more detail my earlier proposal for having in the interactive Python interpreter a `pip` function to install packages from Pypi. Motivation: it appears to me that there is a category of newbies for which "open a shell and do `pip whatever`" is a bit too much. It would, in my opinion, simplify things a bit if they could just copy-and-paste some text into the Python interpreter and have some packages from pip installed. That would simplify instructions on how to install package xyz, without going into the vagaries of how to open a shell on various platforms, and how to get to the right pip executable. I think this could be as simple as: def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) print("Please re-start Python now to use installed or upgraded packages.") Note that I added the final message about restarting the interpreter as a low-tech solution to the problem of packages being already imported in the current Python session. I would imagine that the author of package xyz would then put on their webpage something like: To use, enter in your Python interpreter: pip("install xyz --user") As another example, consider prof. Baldwin from Woolamaloo university who teaches a course "Introductory Python programming for Sheep Shavers". In his course material, he instructs his students to execute the following line in their Python interpreter. pip("install woolamaloo-sheepshavers-goodies --user") which will install a package which will in turn, as dependencies, pull in a number of packages which are relevant for sheep shaving but which have nevertheless irresponsibly been left outside the stdlib. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sun Oct 29 15:26:06 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 29 Oct 2017 15:26:06 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: Message-ID: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> I have a somewhat better, imo, implementation of a pip object to be loaded into the repl. class pip: def __call__(self, *a, **kw): sys.stderr.write(str(self)) def __repr__(self): return str(self) def __str__(self): return ?Please run pip from your system command prompt? From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of Stephan Houben Sent: Sunday, October 29, 2017 3:19 PM To: Python-Ideas Subject: [Python-ideas] install pip packages from Python prompt Hi all, Here is in somewhat more detail my earlier proposal for having in the interactive Python interpreter a `pip` function to install packages from Pypi. Motivation: it appears to me that there is a category of newbies for which "open a shell and do `pip whatever`" is a bit too much. It would, in my opinion, simplify things a bit if they could just copy-and-paste some text into the Python interpreter and have some packages from pip installed. That would simplify instructions on how to install package xyz, without going into the vagaries of how to open a shell on various platforms, and how to get to the right pip executable. I think this could be as simple as: def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) print("Please re-start Python now to use installed or upgraded packages.") Note that I added the final message about restarting the interpreter as a low-tech solution to the problem of packages being already imported in the current Python session. I would imagine that the author of package xyz would then put on their webpage something like: To use, enter in your Python interpreter: pip("install xyz --user") As another example, consider prof. Baldwin from Woolamaloo university who teaches a course "Introductory Python programming for Sheep Shavers". In his course material, he instructs his students to execute the following line in their Python interpreter. pip("install woolamaloo-sheepshavers-goodies --user") which will install a package which will in turn, as dependencies, pull in a number of packages which are relevant for sheep shaving but which have nevertheless irresponsibly been left outside the stdlib. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From antoine.rozo at gmail.com Sun Oct 29 15:31:33 2017 From: antoine.rozo at gmail.com (Antoine Rozo) Date: Sun, 29 Oct 2017 20:31:33 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: Hi, What would be the difference with current pip module? pip.main(['install', 'some_package']) 2017-10-29 20:26 GMT+01:00 Alex Walters : > I have a somewhat better, imo, implementation of a pip object to be loaded > into the repl. > > > > class pip: > > def __call__(self, *a, **kw): > > sys.stderr.write(str(self)) > > > > def __repr__(self): > > return str(self) > > > > def __str__(self): > > return ?Please run pip from your system command prompt? > > > > > > > > *From:* Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@ > python.org] *On Behalf Of *Stephan Houben > *Sent:* Sunday, October 29, 2017 3:19 PM > *To:* Python-Ideas > *Subject:* [Python-ideas] install pip packages from Python prompt > > > > Hi all, > > Here is in somewhat more detail my earlier proposal for > > having in the interactive Python interpreter a `pip` function to > > install packages from Pypi. > > Motivation: it appears to me that there is a category of newbies > > for which "open a shell and do `pip whatever`" is a bit too much. > > It would, in my opinion, simplify things a bit if they could just > > copy-and-paste some text into the Python interpreter and have > > some packages from pip installed. > > That would simplify instructions on how to install package xyz, > > without going into the vagaries of how to open a shell on various > > platforms, and how to get to the right pip executable. > > I think this could be as simple as: > > def pip(args): > import sys > import subprocess > subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) > > print("Please re-start Python now to use installed or upgraded > packages.") > > Note that I added the final message about restarting the interpreter > > as a low-tech solution to the problem of packages being already > > imported in the current Python session. > > I would imagine that the author of package xyz would then put on > > their webpage something like: > > To use, enter in your Python interpreter: > > pip("install xyz --user") > > As another example, consider prof. Baldwin from Woolamaloo university > > who teaches a course "Introductory Python programming for Sheep Shavers". > > In his course material, he instructs his students to execute the > > following line in their Python interpreter. > > pip("install woolamaloo-sheepshavers-goodies --user") > > which will install a package which will in turn, as dependencies, > > pull in a number of packages which are relevant for sheep shaving but > > which have nevertheless irresponsibly been left outside the stdlib. > > Stephan > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Sun Oct 29 15:40:39 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 29 Oct 2017 20:40:39 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: Hi Antoine, 2017-10-29 20:31 GMT+01:00 Antoine Rozo : > Hi, > > What would be the difference with current pip module? > pip.main(['install', 'some_package']) > My understanding is that direct use of the `pip` module is explicitly not recommended. Stephan > > 2017-10-29 20:26 GMT+01:00 Alex Walters : > >> I have a somewhat better, imo, implementation of a pip object to be >> loaded into the repl. >> >> >> >> class pip: >> >> def __call__(self, *a, **kw): >> >> sys.stderr.write(str(self)) >> >> >> >> def __repr__(self): >> >> return str(self) >> >> >> >> def __str__(self): >> >> return ?Please run pip from your system command prompt? >> >> >> >> >> >> >> >> *From:* Python-ideas [mailto:python-ideas-bounces+tritium-list= >> sdamon.com at python.org] *On Behalf Of *Stephan Houben >> *Sent:* Sunday, October 29, 2017 3:19 PM >> *To:* Python-Ideas >> *Subject:* [Python-ideas] install pip packages from Python prompt >> >> >> >> Hi all, >> >> Here is in somewhat more detail my earlier proposal for >> >> having in the interactive Python interpreter a `pip` function to >> >> install packages from Pypi. >> >> Motivation: it appears to me that there is a category of newbies >> >> for which "open a shell and do `pip whatever`" is a bit too much. >> >> It would, in my opinion, simplify things a bit if they could just >> >> copy-and-paste some text into the Python interpreter and have >> >> some packages from pip installed. >> >> That would simplify instructions on how to install package xyz, >> >> without going into the vagaries of how to open a shell on various >> >> platforms, and how to get to the right pip executable. >> >> I think this could be as simple as: >> >> def pip(args): >> import sys >> import subprocess >> subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) >> >> print("Please re-start Python now to use installed or upgraded >> packages.") >> >> Note that I added the final message about restarting the interpreter >> >> as a low-tech solution to the problem of packages being already >> >> imported in the current Python session. >> >> I would imagine that the author of package xyz would then put on >> >> their webpage something like: >> >> To use, enter in your Python interpreter: >> >> pip("install xyz --user") >> >> As another example, consider prof. Baldwin from Woolamaloo university >> >> who teaches a course "Introductory Python programming for Sheep Shavers". >> >> In his course material, he instructs his students to execute the >> >> following line in their Python interpreter. >> >> pip("install woolamaloo-sheepshavers-goodies --user") >> >> which will install a package which will in turn, as dependencies, >> >> pull in a number of packages which are relevant for sheep shaving but >> >> which have nevertheless irresponsibly been left outside the stdlib. >> >> Stephan >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > > -- > Antoine Rozo > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sun Oct 29 15:42:06 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 29 Oct 2017 15:42:06 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: <020f01d350ee$010faf40$032f0dc0$@sdamon.com> If you are calling pip.main, you know what you are doing, and you know when you have to restart the interpreter or not. An object that displays an error any time you try to use it, instructing a user of the proper way to use pip outside of the interpreter, is intended to catch the newbie mistake of trying to run pip from inside the repl, without encouraging potentially problematic behavior related to calling pip from inside the interpreter. Don?t encourage the wrong way because it?s a common newbie mistake, give them better documentation and error messages. Pip is a tool to be run outside of the repl, let?s keep it that way. My method avoids the entire problem by not attempting to run pip, while still providing a meaningful and useful error message to new users. If there is to be a solution other than SyntaxError or NameError (my solution only solves NameError), it should be to tell users how to use pip properly, not support improper use. From: Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf Of Antoine Rozo Sent: Sunday, October 29, 2017 3:32 PM To: Python-Ideas Subject: Re: [Python-ideas] install pip packages from Python prompt Hi, What would be the difference with current pip module? pip.main(['install', 'some_package']) 2017-10-29 20:26 GMT+01:00 Alex Walters >: I have a somewhat better, imo, implementation of a pip object to be loaded into the repl. class pip: def __call__(self, *a, **kw): sys.stderr.write(str(self)) def __repr__(self): return str(self) def __str__(self): return ?Please run pip from your system command prompt? From: Python-ideas [mailto:python-ideas-bounces+tritium-list =sdamon.com at python.org ] On Behalf Of Stephan Houben Sent: Sunday, October 29, 2017 3:19 PM To: Python-Ideas > Subject: [Python-ideas] install pip packages from Python prompt Hi all, Here is in somewhat more detail my earlier proposal for having in the interactive Python interpreter a `pip` function to install packages from Pypi. Motivation: it appears to me that there is a category of newbies for which "open a shell and do `pip whatever`" is a bit too much. It would, in my opinion, simplify things a bit if they could just copy-and-paste some text into the Python interpreter and have some packages from pip installed. That would simplify instructions on how to install package xyz, without going into the vagaries of how to open a shell on various platforms, and how to get to the right pip executable. I think this could be as simple as: def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) print("Please re-start Python now to use installed or upgraded packages.") Note that I added the final message about restarting the interpreter as a low-tech solution to the problem of packages being already imported in the current Python session. I would imagine that the author of package xyz would then put on their webpage something like: To use, enter in your Python interpreter: pip("install xyz --user") As another example, consider prof. Baldwin from Woolamaloo university who teaches a course "Introductory Python programming for Sheep Shavers". In his course material, he instructs his students to execute the following line in their Python interpreter. pip("install woolamaloo-sheepshavers-goodies --user") which will install a package which will in turn, as dependencies, pull in a number of packages which are relevant for sheep shaving but which have nevertheless irresponsibly been left outside the stdlib. Stephan _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -- Antoine Rozo -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Sun Oct 29 15:42:44 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Sun, 29 Oct 2017 20:42:44 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: Hi Alex, 2017-10-29 20:26 GMT+01:00 Alex Walters : > return ?Please run pip from your system command prompt? > > > The target audience for my proposal are people who do not know which part of the sheep the "system command prompt" is. Stephan > > > > > *From:* Python-ideas [mailto:python-ideas-bounces+tritium-list=sdamon.com@ > python.org] *On Behalf Of *Stephan Houben > *Sent:* Sunday, October 29, 2017 3:19 PM > *To:* Python-Ideas > *Subject:* [Python-ideas] install pip packages from Python prompt > > > > Hi all, > > Here is in somewhat more detail my earlier proposal for > > having in the interactive Python interpreter a `pip` function to > > install packages from Pypi. > > Motivation: it appears to me that there is a category of newbies > > for which "open a shell and do `pip whatever`" is a bit too much. > > It would, in my opinion, simplify things a bit if they could just > > copy-and-paste some text into the Python interpreter and have > > some packages from pip installed. > > That would simplify instructions on how to install package xyz, > > without going into the vagaries of how to open a shell on various > > platforms, and how to get to the right pip executable. > > I think this could be as simple as: > > def pip(args): > import sys > import subprocess > subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) > > print("Please re-start Python now to use installed or upgraded > packages.") > > Note that I added the final message about restarting the interpreter > > as a low-tech solution to the problem of packages being already > > imported in the current Python session. > > I would imagine that the author of package xyz would then put on > > their webpage something like: > > To use, enter in your Python interpreter: > > pip("install xyz --user") > > As another example, consider prof. Baldwin from Woolamaloo university > > who teaches a course "Introductory Python programming for Sheep Shavers". > > In his course material, he instructs his students to execute the > > following line in their Python interpreter. > > pip("install woolamaloo-sheepshavers-goodies --user") > > which will install a package which will in turn, as dependencies, > > pull in a number of packages which are relevant for sheep shaving but > > which have nevertheless irresponsibly been left outside the stdlib. > > Stephan > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tritium-list at sdamon.com Sun Oct 29 15:45:54 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 29 Oct 2017 15:45:54 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Then those users have more fundamental problems. There is a minimum level of computer knowledge needed to be successful in programming. Insulating users from the reality of the situation is not preparing them to be successful. Pretending that there is no system command prompt, or shell, or whatever platform specific term applies, only hurts new programmers. Give users an error message they can google, and they will be better off in the long run than they would be if we just ran pip for them. From: Stephan Houben [mailto:stephanh42 at gmail.com] Sent: Sunday, October 29, 2017 3:43 PM To: Alex Walters Cc: Python-Ideas Subject: Re: [Python-ideas] install pip packages from Python prompt Hi Alex, 2017-10-29 20:26 GMT+01:00 Alex Walters >: return ?Please run pip from your system command prompt? The target audience for my proposal are people who do not know which part of the sheep the "system command prompt" is. Stephan From: Python-ideas [mailto:python-ideas-bounces+tritium-list =sdamon.com at python.org ] On Behalf Of Stephan Houben Sent: Sunday, October 29, 2017 3:19 PM To: Python-Ideas > Subject: [Python-ideas] install pip packages from Python prompt Hi all, Here is in somewhat more detail my earlier proposal for having in the interactive Python interpreter a `pip` function to install packages from Pypi. Motivation: it appears to me that there is a category of newbies for which "open a shell and do `pip whatever`" is a bit too much. It would, in my opinion, simplify things a bit if they could just copy-and-paste some text into the Python interpreter and have some packages from pip installed. That would simplify instructions on how to install package xyz, without going into the vagaries of how to open a shell on various platforms, and how to get to the right pip executable. I think this could be as simple as: def pip(args): import sys import subprocess subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) print("Please re-start Python now to use installed or upgraded packages.") Note that I added the final message about restarting the interpreter as a low-tech solution to the problem of packages being already imported in the current Python session. I would imagine that the author of package xyz would then put on their webpage something like: To use, enter in your Python interpreter: pip("install xyz --user") As another example, consider prof. Baldwin from Woolamaloo university who teaches a course "Introductory Python programming for Sheep Shavers". In his course material, he instructs his students to execute the following line in their Python interpreter. pip("install woolamaloo-sheepshavers-goodies --user") which will install a package which will in turn, as dependencies, pull in a number of packages which are relevant for sheep shaving but which have nevertheless irresponsibly been left outside the stdlib. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Sun Oct 29 16:26:23 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 20:26:23 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 29 October 2017 at 18:56, Guido van Rossum wrote: > The two use cases you describe (scripters and teachers) leave me luke-warm > -- scripters live in the wild west and can just pip install whatever (that's > what it means to be scripting) In my experience, "scripting" *does* include people for whom "being in a standard Python distribution" is a requirement. Situations I have encountered: 1. Servers where developers need to write administrative or monitoring scripts, but they don't control what's on that server. The Python installation that comes by default is all that's available. 2. Developers working in environments with limited internet access. For example, my employer has a Windows (NTLM) proxy that pip can't work with. Getting internet access for pip involves installing a separate application, and that's not always possible/desirable. 3. Developers writing scripts to be shared with non-developers. On Unix, this means "must work with the system Python", and on Windows "download and install Python from python.org" is typically all you can expect (although in that case "install Anaconda" is a possible alternative, although not one I've tried telling people to do myself). Nick's proposal doesn't actually help for (1) or (2), as the problem there is that "pip install" won't work. And bundling a script with its (pure Python) dependencies, for example as a zipapp, is always a solution - although it's nowhere near as easy as simply copying a single-file script to the destination where it's to be run. So these situations don't actually matter in terms of the value of the proposals being discussed here. But I did want to dispute the idea that "scripters can just pip install whatever" is inherent to the idea of being a scripter - my experience is the opposite, that scripters are the people *least* able to simply pip install things. Paul From p.f.moore at gmail.com Sun Oct 29 16:42:46 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 20:42:46 +0000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> Message-ID: On 29 October 2017 at 19:40, Stephan Houben wrote: > Hi Antoine, > > 2017-10-29 20:31 GMT+01:00 Antoine Rozo : >> >> Hi, >> >> What would be the difference with current pip module? >> pip.main(['install', 'some_package']) > > > > My understanding is that direct use of the `pip` module is explicitly not > recommended. Not only not recommended, but explicitly not supported. And it won't be available at all in pip 10. Having said that, I'm -1 on this proposal. Installing new modules (or upgrading existing ones) in a running Python process has all sorts of subtle issues - the need to reload already-loaded modules, the fact that failed imports may have resulted in different code paths being taken ("except ImportError"), etc. Exiting the Python process to run pi avoids all of these. For someone who doesn't understand the difference between the Python REPL and the command prompt, offering an interface that exposes them to these sort of potential issues won't actually help them. Better to teach them the basics of what a command line is, and what an interactive Python prompt is, before exposing them to subtleties like this. Paul From tritium-list at sdamon.com Sun Oct 29 16:44:11 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Sun, 29 Oct 2017 16:44:11 -0400 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: <026701d350f6$ad0c55c0$07250140$@sdamon.com> Writing scripts for non-developers, in an unmanaged environment (IT cant push a python install to the system) on windows means running pyinstaller et. al., on your script, if it has dependencies or not. Its not worth it to walk someone through a python install to run a script, let alone installing optional dependencies. For linux, you are not limited to the standard library, but what the distros have packaged - its easy enough to tell a typical non-developer linux user "apt install python python-twisted, then run the script". For deployment to restricted environments, optional dependencies suggested by python-dev is unlikely to be installed either. Not that this matters, or would matter for years, because IME those environments have ancient versions of python that the developer is restricted to anyways. And environments with limited internet access ... Bandersnatch on a portable hard drive. Not to sound too crass, but there are only so many edge cases that a third party platform developer can be expected to care about (enough to bend over backwards to support). Putting a curated list of "These are good packages that solve a lot of problems that the stdlib doesn't or makes complicated" on python.org is a great idea. Building tooling to make those part of a default installation is not. > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Paul Moore > Sent: Sunday, October 29, 2017 4:26 PM > To: Guido van Rossum > Cc: python-ideas at python.org > Subject: Re: [Python-ideas] Defining an easily installable "Recommended > baseline package set" > > On 29 October 2017 at 18:56, Guido van Rossum wrote: > > The two use cases you describe (scripters and teachers) leave me luke- > warm > > -- scripters live in the wild west and can just pip install whatever (that's > > what it means to be scripting) > > In my experience, "scripting" *does* include people for whom "being in > a standard Python distribution" is a requirement. > > Situations I have encountered: > > 1. Servers where developers need to write administrative or monitoring > scripts, but they don't control what's on that server. The Python > installation that comes by default is all that's available. > 2. Developers working in environments with limited internet access. > For example, my employer has a Windows (NTLM) proxy that pip can't > work with. Getting internet access for pip involves installing a > separate application, and that's not always possible/desirable. > 3. Developers writing scripts to be shared with non-developers. On > Unix, this means "must work with the system Python", and on Windows > "download and install Python from python.org" is typically all you can > expect (although in that case "install Anaconda" is a possible > alternative, although not one I've tried telling people to do myself). > > Nick's proposal doesn't actually help for (1) or (2), as the problem > there is that "pip install" won't work. And bundling a script with its > (pure Python) dependencies, for example as a zipapp, is always a > solution - although it's nowhere near as easy as simply copying a > single-file script to the destination where it's to be run. So these > situations don't actually matter in terms of the value of the > proposals being discussed here. But I did want to dispute the idea > that "scripters can just pip install whatever" is inherent to the idea > of being a scripter - my experience is the opposite, that scripters > are the people *least* able to simply pip install things. > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From p.f.moore at gmail.com Sun Oct 29 16:56:19 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Sun, 29 Oct 2017 20:56:19 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <026701d350f6$ad0c55c0$07250140$@sdamon.com> References: <026701d350f6$ad0c55c0$07250140$@sdamon.com> Message-ID: On 29 October 2017 at 20:44, Alex Walters wrote: > Writing scripts for non-developers, in an unmanaged environment (IT cant > push a python install to the system) on windows means running pyinstaller > et. al., on your script, if it has dependencies or not. Its not worth it to > walk someone through a python install to run a script, let alone installing > optional dependencies. Let's just say "not in the environments I work in", and leave it at that. > Not to sound too crass, but there are only so many edge cases > that a third party platform developer can be expected to care about (enough > to bend over backwards to support). I never suggested otherwise. I just wanted to point out that "scripting in Python" covers a wider range of use cases than "can use pip install". I'm not asking python-dev to support those use cases (heck, I'm part of python-dev and *I* don't want to bend over backwards to support them), but I do ask that people be careful not to dismiss a group of users who are commonly under-represented in open source mailing lists and communities. Paul From ncoghlan at gmail.com Mon Oct 30 00:01:48 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Oct 2017 14:01:48 +1000 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> Message-ID: On 29 October 2017 at 21:44, Soni L. wrote: > ORMs use this kind of descriptor based composition management extensively > in order to reliably model database foreign key relationships in a way > that's mostly transparent to users of the ORM classes. > > > And this is how you miss the whole point of being able to dynamically > add/remove arbitrary components on objects you didn't create, at runtime. > You can already do that by adding new properties to classes post-definition, or by changing __class__ to refer to a different type, or by wrapping objects in transparent proxy types the way wrapt does. We *allow* that kind of thing, because it's sometimes beneficial in order to get two libraries to play nicely together at runtime without having to patch one or the other. However, it's a last resort option that you use when you've exhausted the other more maintainable alternatives, not something we actually want to encourage. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From vano at mail.mipt.ru Mon Oct 30 00:14:10 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Mon, 30 Oct 2017 07:14:10 +0300 Subject: [Python-ideas] Add single() to itertools Message-ID: The eponymous C#'s LINQ method, I found very useful in the following, quite recurring use-case: I need to get a specific element from a data structure that only supports search semantics (i.e. returns a sequence/iterator of results). For that, I specify very precise search criteria, so only that one item is supposed to be found. But I'd rather verify that just in case. "A data structure that only supports search semantics" is a recurring phenomenon due to this: I make a special-purpose data structure (usually for some domain-specific data like task specification or data directory) using a combination of existing and/or new containers. Now, these types do not enforce all the integrity constraints of my data. And I don't wish to construct a special-purpose class, complete with validation procedures, and pass records into it one by one etc -- when I can just write an initializer, loading all the data at once in a nicely readable construct, and call it a day. So, when querying this structure, I "know" that there should only be one item satisfying a certain criteria - if there's more, or less, something is wrong with the data. https://stackoverflow.com/questions/46009985/get-contentcontrol-by-title-or-tag is the most recent occasion where I had this use case (that is not Python but the concept is language-agnostic). -- Regards, Ivan From ncoghlan at gmail.com Mon Oct 30 00:47:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Oct 2017 14:47:59 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <20171029105150.09e15bc4@fsol> References: <20171029105150.09e15bc4@fsol> Message-ID: On 29 October 2017 at 19:51, Antoine Pitrou wrote: > On Sun, 29 Oct 2017 17:54:22 +1000 > Nick Coghlan wrote: > > > > The underlying problem is that our reasons for omitting these particular > > libraries from the standard library relate mainly to publisher side > > concerns like the logistics of ongoing bug fixing and support, *not* end > > user concerns like software reliability or API usability. > > They're both really. One important consequence of a library being in > the stdlib is to tie it to the stdlib's release cycle, QA > infrastructure and compatibility requirements -- which more or less > solves many dependency and/or version pinning headaches. > For the QA & platform compatibility aspects, one of the things actually defining a specific extended package set would let us do is to amend the test suite with a new "third-party" resource, whereby we: 1. Named specific known-working versions in the recommended-packages.txt file (updating them for each maintenance release) 2. Added a new test case that installed and ran the test suites for these projects in a venv (guarded by "-uthird-party") > > This means that > > if educators aren't teaching them, or redistributors aren't providing > them, > > then they're actively doing their users a disservice > > Which redistributors do not provide the requests library, for example? > regex is probably not as popular (mostly because re is good enough for > most purposes), but it still appears to be available from Ubuntu and > Anaconda. > The existing commercial redistributors have been doing this long enough now that they offer the most popular third party packages. Where folks can get into trouble is when they're putting their own bespoke environments together based directly on the python.org installers, and that quite often includes folks teaching themselves from a book or online tutorial (since book and tutorial authors are frequently reluctant to favour particular commercial vendors, and will hence direct readers to the python.org downloads instead). > > All the proposal does is to suggest taking those existing recommendations > > from the documentation and converting them into a more readibly > executable > > form. > > I'm curious what such a list looks like :-) > regex and requests are the two cases I'm personally aware of that already have "You'll probably want to look at this 3rd party option instead" links at the beginning of the related standard library module documentation. ctypes should probably have a similar "Consider this alternative instead" pointer to cffi, but doesn't currently have one. All three of those (regex, requests, cffi) have received "in principle" approval for standard library inclusion at one point or another, but we haven't been able to devise a way to make the resulting maintenance logistics work (e.g. bringing in cffi would mean bringing in pycparser, which would mean bringing in PLY...) six should be on the recommended packages list for as long as 2.7 is still supported (and potentially for a while after) setuptools is currently brought in implicitly as a dependency of pip, but would also belong on a recommended packages list in its own right as an enhanced alternative to distutils (https://docs.python.org/3/distutils/ indirectly recommends that by pointing to the https://packaging.python.org/guides/tool-recommendations/ page) Beyond those, I think things get significantly more debatable (for example, while datetime doesn't currently reference pytz as a regularly updated timezone database, most ad hoc scripts can also get away with working in either UTC or local time without worrying about arbitrary timezones, which means that use case is at least arguably already adequately covered by explicit dependency management). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Mon Oct 30 00:51:38 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Mon, 30 Oct 2017 15:51:38 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: Message-ID: <20171030045137.GY9068@ando.pearwood.info> On Mon, Oct 30, 2017 at 07:14:10AM +0300, Ivan Pozdeev via Python-ideas wrote: > The eponymous C#'s LINQ method, I found very useful in the following, > quite recurring use-case: If I have understood your use-case, you have a function that returns a list of results (or possibly an iterator, or a tuple, or some other sequence): print(search(haystack, needle)) # prints ['bronze needle', 'gold needle', 'silver needle'] There are times you expect there to be a single result, and if there are multiple results, that is considered an error. Am I correct so far? If so, then sequence unpacking is your friend: result, = search(haystack, needle) Note the comma after the variable name on the left-hand side of the assignment. That's a special case of Python's more general sequence unpacking: a, b, c = [100, 200, 300] assigns a = 100, b = 200, c == 300. Using a single item is valid: py> result, = [100] py> print(result) 100 but if the right-hand side has more than one item, you get an exception: py> result, = [100, 200, 300] Traceback (most recent call last): File "", line 1, in ValueError: too many values to unpack (expected 1) I *think* this will solve your problem. If not, can you please explain what "single()" is supposed to do, why it belongs in itertools, and show an example of how it will work. -- Steve From ncoghlan at gmail.com Mon Oct 30 01:24:51 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Mon, 30 Oct 2017 15:24:51 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 30 October 2017 at 04:56, Guido van Rossum wrote: > The two use cases you describe (scripters and teachers) leave me luke-warm > -- scripters live in the wild west and can just pip install whatever > (that's what it means to be scripting) > For personal scripting, we can install whatever, but "institutional scripting" isn't the same thing - there we're scripting predefined "Standard Operating Environments", like those Mahmoud Hashemi describes for PayPal at the start of https://www.paypal-engineering.com/2016/09/07/python-packaging-at-paypal/ "Just use PyInstaller" isn't an adequate answer in large-scale environments because of the "zlib problem": you don't want to have to rebuild and redeploy the world to handle a security update in a low level frequently used component, you want to just update that component and have everything else pick it up dynamically. While "Just use conda" is excellent advice nowadays for any organisation contemplating defining their own bespoke Python SOE (hence Mahmoud's post), if that isn't being driven by the folks that already maintain the SOE (as happened in PayPal's case) convincing an org to add a handful of python-dev endorsed libraries to an established SOE is going to be easier than rebasing their entire Python SOE on conda. > and teachers tend to want a customized bundle anyway -- let the edu world > get together and create their own recommended bundle. > > As long as it's not going to be bundled, i.e. there's just going to be > some list of packages that we recommend to 3rd party repackagers, then I'm > fine with it. But they must remain clearly marked as 3rd party packages in > whatever docs we provide, and live in site-packages. > Yep, that was my intent, although it may not have been clear in my initial proposal. I've filed two separate RFEs in relation to that: * Documentation only: https://bugs.python.org/issue31898 * Regression testing resource: https://bugs.python.org/issue31899 Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Oct 30 02:29:55 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 30 Oct 2017 02:29:55 -0400 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: On Sunday, October 29, 2017, Nick Coghlan wrote: > On 29 October 2017 at 12:25, Brendan Barnwell > wrote: > >> On 2017-10-28 19:13, Soni L. wrote: >> >>> And to have all cars have engines, you'd do: >>> >>> class Car: >>> def __init__(self, ???): >>> self[Engine] = GasEngine() >>> >>> car = Car() >>> car[Engine].kickstart() # kickstart gets the car as second argument. >>> >>> And if you can't do that, then you can't yet do what I'm proposing, and >>> thus the proposal makes sense, even if it still needs some refining... >>> >> >> As near as I can tell you can indeed do that, although it's still >> not clear to me why you'd want to. You can give Car a __getitem__ that >> on-the-fly generates an Engine object that knows which Car it is attached >> to, and then you can make Engine.kickstart a descriptor that knows which >> Engine it is attached to, and from that can figure out which Car it is >> attached to. >> > > Right, I think a few different things are getting confused here related to > how different folks use composition. > > For most data modeling use cases, the composition model you want is either > a tree or an acyclic graph, where the subcomponents don't know anything > about the whole that they're a part of. This gives you good component > isolation, and avoids circular dependencies. > > However, for other cases, you *do* want the child object to be aware of > the parent - XML etrees are a classic example of this, where we want to > allow navigation back up the tree, so each node gains a reference to its > parent node. This often takes the form of a combination of delegation > (parent->child references) and dependency inversion (child->parent > reference). > This is Java-y and maybe not opcode optimizable, but maybe there's a case for defining __setattribute__ so that square brackets denote Rust-like traits: https://docs.spring.io/spring-python/1.2.x/sphinx/html/objects-pythonconfig.html#object-definition-inheritance @Object(parent="request") def request_dev(self, req=None): > Observe that in the following example the child definitions must define an optional ?req? argument; in runtime they will be passed its value basing on what their parent object will return. It's testable, but confusing to Java programmers who aren't familiar with why Guice forces the patterns that it does: https://docs.spring.io/spring-python/1.2.x/sphinx/html/objects-more.html#testable-code https://github.com/google/guice/wiki/Motivation#dependency-injection > Like the factory, dependency injection is just a design pattern. The core principle is to separate behaviour from dependency resolution. In our example, the RealBillingService is not responsible for looking up the TransactionLog and CreditCardProcessor. Instead, they're passed in as constructor parameters: When these are constructor parameters, we don't need to monkeypatch attrs in order to write tests; which, IIUC, is also partly why you'd want traits/mixins with the proposed special Rust-like syntax: https://docs.pytest.org/en/latest/monkeypatch.html https://docs.pytest.org/en/latest/fixture.html#modularity-using-fixtures-from-a-fixture-function (this is too magic(), too) But you want dynamic mixins that have an upward reference and Rust-like syntax (and no factories). > For the car/engine example, this relates to explicitly modeling the > relationship whereby a car can have one or more engines > class MultiEngine(): zope.interface.implements(IEngine) https://zopeinterface.readthedocs.io/en/latest/README.html#declaring-implemented-interfaces But interfaces aren't yet justified because it's only a few lines and those are just documentation or a too-complex adapter registry dict, anyway. > (but the engine may not currently be installed), > So it should default to a MockEngine which also implements(IEngine) and raises NotImplementedError > > while an engine can be installed in at most one car at any given point in > time. > But the refcounts would be too difficult This: > > You don't even need the descriptor protocol for that though, you just need > the subcomponent to accept the parent reference as a constructor parameter: > > class Car: > def __init__(self, engine_type): > self.engine = engine_type(self) > > However, this form of explicit dependency inversion wouldn't work as well > if you want to be able to explicitly create an "uninstalled engine" > instance, and then pass the engine in as a parameter to the class > constructor: > > class Car: > def __init__(self, engine): > self.engine = engine # How would we ensure the engine is marked as > installed here? > > As it turns out, Python doesn't need new syntax for this either, as it's > all already baked into the regular attribute access syntax, whereby > descriptor methods get passed a reference not only to the descriptor, but > *also* to the object being accessed: https://docs.python.org/3/ > howto/descriptor.html#descriptor-protocol > > > And then the property builtin lets you ignore the existence of the > descriptor object entirely, and only care about the original object, > allowing the above example to be written as: > > class Car: > def __init__(self, engine): > self.engine = engine # This implicitly marks the engine as > installed > > @property > def engine(self): > return self._engine > > @engine.setter > def engine(self, engine): > if engine is not None: > if self._engine is not None: > raise RuntimeError("Car already has an engine installed") > if engine._car is not None: > raise RuntimeError("Engine is already installed in > another car") > engine._car = self > self._engine = engine > > car = Car(GasEngine()) > This could be less verbose. And less likely to raise a RuntimeError. > > ORMs use this kind of descriptor based composition management extensively > in order to reliably model database foreign key relationships in a way > that's mostly transparent to users of the ORM classes. > So there's a 'factory' which passes the ref as a constructor parameter for such ORM instances; but they generally don't need to be dynamically modified at runtime because traits. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com > | Brisbane, > Australia > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Oct 30 02:40:31 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 30 Oct 2017 02:40:31 -0400 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: ... But interfaces are clunky and traits are lightweight, and this isn't Go, so we can't just create a class as a namespace full of @staticmethods which accept the relevant object references. * __setattribute__ -> __getitem__, __setitem__ On Monday, October 30, 2017, Wes Turner wrote: > > > On Sunday, October 29, 2017, Nick Coghlan > wrote: > >> On 29 October 2017 at 12:25, Brendan Barnwell >> wrote: >> >>> On 2017-10-28 19:13, Soni L. wrote: >>> >>>> And to have all cars have engines, you'd do: >>>> >>>> class Car: >>>> def __init__(self, ???): >>>> self[Engine] = GasEngine() >>>> >>>> car = Car() >>>> car[Engine].kickstart() # kickstart gets the car as second argument. >>>> >>>> And if you can't do that, then you can't yet do what I'm proposing, and >>>> thus the proposal makes sense, even if it still needs some refining... >>>> >>> >>> As near as I can tell you can indeed do that, although it's >>> still not clear to me why you'd want to. You can give Car a __getitem__ >>> that on-the-fly generates an Engine object that knows which Car it is >>> attached to, and then you can make Engine.kickstart a descriptor that knows >>> which Engine it is attached to, and from that can figure out which Car it >>> is attached to. >>> >> >> Right, I think a few different things are getting confused here related >> to how different folks use composition. >> >> For most data modeling use cases, the composition model you want is >> either a tree or an acyclic graph, where the subcomponents don't know >> anything about the whole that they're a part of. This gives you good >> component isolation, and avoids circular dependencies. >> >> However, for other cases, you *do* want the child object to be aware of >> the parent - XML etrees are a classic example of this, where we want to >> allow navigation back up the tree, so each node gains a reference to its >> parent node. This often takes the form of a combination of delegation >> (parent->child references) and dependency inversion (child->parent >> reference). >> > > This is Java-y and maybe not opcode optimizable, but maybe there's a case > for defining __setattribute__ so that square brackets denote Rust-like > traits: > > https://docs.spring.io/spring-python/1.2.x/sphinx/html/ > objects-pythonconfig.html#object-definition-inheritance > > @Object(parent="request") > def request_dev(self, req=None): > > > Observe that in the following example the child definitions must > define an optional ?req? argument; in runtime they will be passed its value > basing on what their parent object will return. > > It's testable, but confusing to Java programmers who aren't familiar with > why Guice forces the patterns that it does: > > https://docs.spring.io/spring-python/1.2.x/sphinx/html/ > objects-more.html#testable-code > > https://github.com/google/guice/wiki/Motivation#dependency-injection > > > Like the factory, dependency injection is just a design pattern. The > core principle is to separate behaviour from dependency resolution. In our > example, the RealBillingService is not responsible for looking up the > TransactionLog and CreditCardProcessor. Instead, they're passed in as > constructor parameters: > > When these are constructor parameters, we don't need to monkeypatch attrs > in order to write tests; which, IIUC, is also partly why you'd want > traits/mixins with the proposed special Rust-like syntax: > > https://docs.pytest.org/en/latest/monkeypatch.html > > https://docs.pytest.org/en/latest/fixture.html#modularity-using-fixtures- > from-a-fixture-function (this is too magic(), too) > > But you want dynamic mixins that have an upward reference and Rust-like > syntax (and no factories). > > >> For the car/engine example, this relates to explicitly modeling the >> relationship whereby a car can have one or more engines >> > > class MultiEngine(): > zope.interface.implements(IEngine) > > https://zopeinterface.readthedocs.io/en/latest/README.html#declaring- > implemented-interfaces > > But interfaces aren't yet justified because it's only a few lines and > those are just documentation or a too-complex adapter registry dict, anyway. > > >> (but the engine may not currently be installed), >> > > So it should default to a MockEngine which also implements(IEngine) and > raises NotImplementedError > > >> >> while an engine can be installed in at most one car at any given point >> in time. >> > > But the refcounts would be too difficult > > This: > > >> >> You don't even need the descriptor protocol for that though, you just >> need the subcomponent to accept the parent reference as a constructor >> parameter: >> >> class Car: >> def __init__(self, engine_type): >> self.engine = engine_type(self) >> >> However, this form of explicit dependency inversion wouldn't work as well >> if you want to be able to explicitly create an "uninstalled engine" >> instance, and then pass the engine in as a parameter to the class >> constructor: >> >> class Car: >> def __init__(self, engine): >> self.engine = engine # How would we ensure the engine is marked >> as installed here? >> >> As it turns out, Python doesn't need new syntax for this either, as it's >> all already baked into the regular attribute access syntax, whereby >> descriptor methods get passed a reference not only to the descriptor, but >> *also* to the object being accessed: https://docs.python.org/3/howt >> o/descriptor.html#descriptor-protocol >> > > >> >> And then the property builtin lets you ignore the existence of the >> descriptor object entirely, and only care about the original object, >> allowing the above example to be written as: >> >> class Car: >> def __init__(self, engine): >> self.engine = engine # This implicitly marks the engine as >> installed >> >> @property >> def engine(self): >> return self._engine >> >> @engine.setter >> def engine(self, engine): >> if engine is not None: >> if self._engine is not None: >> raise RuntimeError("Car already has an engine installed") >> if engine._car is not None: >> raise RuntimeError("Engine is already installed in >> another car") >> engine._car = self >> self._engine = engine >> >> car = Car(GasEngine()) >> > > This could be less verbose. And less likely to raise a RuntimeError. > > >> >> ORMs use this kind of descriptor based composition management extensively >> in order to reliably model database foreign key relationships in a way >> that's mostly transparent to users of the ORM classes. >> > > So there's a 'factory' which passes the ref as a constructor parameter for > such ORM instances; but they generally don't need to be dynamically > modified at runtime because traits. > > >> >> Cheers, >> Nick. >> >> -- >> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Mon Oct 30 03:12:58 2017 From: wes.turner at gmail.com (Wes Turner) Date: Mon, 30 Oct 2017 03:12:58 -0400 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> Message-ID: On Monday, October 30, 2017, Wes Turner wrote: > ... But interfaces are clunky and traits are lightweight, and this isn't > Go, so we can't just create a class as a namespace full of @staticmethods > which accept the relevant object references. > > * __setattribute__ -> __getitem__, __setitem__ > > On Monday, October 30, 2017, Wes Turner > wrote: > >> >> >> On Sunday, October 29, 2017, Nick Coghlan wrote: >> >>> On 29 October 2017 at 12:25, Brendan Barnwell >>> wrote: >>> >>>> On 2017-10-28 19:13, Soni L. wrote: >>>> >>>>> And to have all cars have engines, you'd do: >>>>> >>>>> class Car: >>>>> def __init__(self, ???): >>>>> self[Engine] = GasEngine() >>>>> >>>>> car = Car() >>>>> car[Engine].kickstart() # kickstart gets the car as second argument. >>>>> >>>>> And if you can't do that, then you can't yet do what I'm proposing, and >>>>> thus the proposal makes sense, even if it still needs some refining... >>>>> >>>> >>>> As near as I can tell you can indeed do that, although it's >>>> still not clear to me why you'd want to. You can give Car a __getitem__ >>>> that on-the-fly generates an Engine object that knows which Car it is >>>> attached to, and then you can make Engine.kickstart a descriptor that knows >>>> which Engine it is attached to, and from that can figure out which Car it >>>> is attached to. >>>> >>> >>> Right, I think a few different things are getting confused here related >>> to how different folks use composition. >>> >>> For most data modeling use cases, the composition model you want is >>> either a tree or an acyclic graph, where the subcomponents don't know >>> anything about the whole that they're a part of. This gives you good >>> component isolation, and avoids circular dependencies. >>> >>> However, for other cases, you *do* want the child object to be aware of >>> the parent - XML etrees are a classic example of this, where we want to >>> allow navigation back up the tree, so each node gains a reference to its >>> parent node. This often takes the form of a combination of delegation >>> (parent->child references) and dependency inversion (child->parent >>> reference). >>> >> >> This is Java-y and maybe not opcode optimizable, but maybe there's a case >> for defining __setattribute__ so that square brackets denote Rust-like >> traits: >> >> https://docs.spring.io/spring-python/1.2.x/sphinx/html/objec >> ts-pythonconfig.html#object-definition-inheritance >> >> @Object(parent="request") >> def request_dev(self, req=None): >> >> > Observe that in the following example the child definitions must >> define an optional ?req? argument; in runtime they will be passed its value >> basing on what their parent object will return. >> >> It's testable, but confusing to Java programmers who aren't familiar with >> why Guice forces the patterns that it does: >> >> https://docs.spring.io/spring-python/1.2.x/sphinx/html/objec >> ts-more.html#testable-code >> >> https://github.com/google/guice/wiki/Motivation#dependency-injection >> >> > Like the factory, dependency injection is just a design pattern. The >> core principle is to separate behaviour from dependency resolution. In our >> example, the RealBillingService is not responsible for looking up the >> TransactionLog and CreditCardProcessor. Instead, they're passed in as >> constructor parameters: >> >> When these are constructor parameters, we don't need to monkeypatch attrs >> in order to write tests; which, IIUC, is also partly why you'd want >> traits/mixins with the proposed special Rust-like syntax: >> >> https://docs.pytest.org/en/latest/monkeypatch.html >> >> https://docs.pytest.org/en/latest/fixture.html#modularity- >> using-fixtures-from-a-fixture-function (this is too magic(), too) >> >> But you want dynamic mixins that have an upward reference and Rust-like >> syntax (and no factories). >> >> >>> For the car/engine example, this relates to explicitly modeling the >>> relationship whereby a car can have one or more engines >>> >> >> class MultiEngine(): >> zope.interface.implements(IEngine) >> >> https://zopeinterface.readthedocs.io/en/latest/README.html# >> declaring-implemented-interfaces >> >> But interfaces aren't yet justified because it's only a few lines and >> those are just documentation or a too-complex adapter registry dict, anyway. >> >> >>> (but the engine may not currently be installed), >>> >> >> So it should default to a MockEngine which also implements(IEngine) and >> raises NotImplementedError >> >> >>> >>> while an engine can be installed in at most one car at any given point >>> in time. >>> >> >> But the refcounts would be too difficult >> >> This: >> >> >>> >>> You don't even need the descriptor protocol for that though, you just >>> need the subcomponent to accept the parent reference as a constructor >>> parameter: >>> >>> class Car: >>> def __init__(self, engine_type): >>> self.engine = engine_type(self) >>> >>> However, this form of explicit dependency inversion wouldn't work as >>> well if you want to be able to explicitly create an "uninstalled engine" >>> instance, and then pass the engine in as a parameter to the class >>> constructor: >>> >>> class Car: >>> def __init__(self, engine): >>> self.engine = engine # How would we ensure the engine is marked >>> as installed here? >>> >> > >>> As it turns out, Python doesn't need new syntax for this either, as it's >>> all already baked into the regular attribute access syntax, whereby >>> descriptor methods get passed a reference not only to the descriptor, but >>> *also* to the object being accessed: https://docs.python.org/3/howt >>> o/descriptor.html#descriptor-protocol >>> >> https://docs.python.org/3/howto/descriptor.html ... http://python-reference.readthedocs.io/en/latest/docs/dunderdsc/#descriptor-protocol ""' Instance BindingIf binding to a new-style object instance, a.x is transformed into the call: type(a).__dict__[?x?].__get__(a, type(a)). """ > >> >>> >>> And then the property builtin lets you ignore the existence of the >>> descriptor object entirely, and only care about the original object, >>> allowing the above example to be written as: >>> >>> class Car: >>> def __init__(self, engine): >>> self.engine = engine # This implicitly marks the engine as >>> installed >>> >>> @property >>> def engine(self): >>> return self._engine >>> >>> @engine.setter >>> def engine(self, engine): >>> if engine is not None: >>> if self._engine is not None: >>> raise RuntimeError("Car already has an engine >>> installed") >>> if engine._car is not None: >>> raise RuntimeError("Engine is already installed in >>> another car") >>> engine._car = self >>> self._engine = engine >>> >>> car = Car(GasEngine()) >>> >> >> This could be less verbose. And less likely to raise a RuntimeError. >> >> >>> >>> ORMs use this kind of descriptor based composition management >>> extensively in order to reliably model database foreign key relationships >>> in a way that's mostly transparent to users of the ORM classes. >>> >> >> So there's a 'factory' which passes the ref as a constructor parameter >> for such ORM instances; but they generally don't need to be dynamically >> modified at runtime because traits. >> >> >>> >>> Cheers, >>> Nick. >>> >>> -- >>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjol at tjol.eu Mon Oct 30 04:38:16 2017 From: tjol at tjol.eu (Thomas Jollans) Date: Mon, 30 Oct 2017 09:38:16 +0100 Subject: [Python-ideas] Composition over Inheritance In-Reply-To: <7b5a08ad-a690-5e7f-689f-11deb36ffc76@gmail.com> References: <98d527b2-6263-feae-05b4-67cc93bfa360@gmail.com> <20171028115111.GU9068@ando.pearwood.info> <26abc818-bd7d-8932-8b85-d8f879c9a1e0@gmail.com> <20171028165135.GV9068@ando.pearwood.info> <8ece59fa-de24-a0e8-c373-1182d9b7e6b2@gmail.com> <59F53C08.6090703@brenbarn.net> <947ecad8-6594-081a-e15d-a01d8fe67e76@gmail.com> <59F60891.8020704@brenbarn.net> <7b5a08ad-a690-5e7f-689f-11deb36ffc76@gmail.com> Message-ID: <8cb38da3-f488-e6a4-0a7e-f31d085beb1f@tjol.eu> On 29/10/17 19:25, Soni L. wrote: > > > On 2017-10-29 02:57 PM, Brendan Barnwell wrote: >> On 2017-10-29 04:44, Soni L. wrote: >>> And this is how you miss the whole point of being able to dynamically >>> add/remove arbitrary components on objects you didn't create, at >>> runtime. >>> >>> Someone gave me this code and told me it explains what I'm trying to do: >>> https://repl.it/NYCF/3 >>> >>> class T: >>> pass >>> >>> class C: >>> pass >>> >>> c = C() >>> >>> #c.[T] = 1 >>> c.__dict__[T] = 1 >> >> Again, can you please explain why you want to write c.[T]? What do >> you intend that to *do*? Your commented line seems to indicate you >> want it to do what `c.__dict__[T]` does, but you can already do that >> with `setattr(c, T, 1)`. Or you can just give c an attribute that's a >> dict, but has an easier-to-type name than __dict__, so you can do >> `c.mydict[T]`. What is the specific advantage of `c.[T]` over these >> existing solutions? >> > > Hmm... Why can't we just allow empty identifiers, and set a default > handler for empty identifiers that implements the proposed ECS? It sounds to me like the general shape of what you're proposing is already entirely possible, just without the idiosyncratic syntax. You could write a library that adds support for your components using some other syntax without any additional language support. I don't know, something like this should be doable: @componentlib.has_components class Car: # ... class Engine(componentlib.Component): # might have to be a metaclass? def stall(self, car): raise UnpleasantNoise() # ... car = Car() car.engine = Engine() car.engine.stall() # and so on. If you prefer square brackets, you can implement it with __getitem__ syntax instead of __getattr__ syntax. The point is: not only does your proposal not *need* additional language support; I'm not at all convinced that it would benefit from additional language support. > > But the basic idea is to indicate something at the call site, namely > that T is a contract and the object returned should respect that > contract and any function calls should pass the original object as an > argument. (I personally don't like how Python treats o.m() (has self) > the same as o.f() (no self) syntax-wise, either.) > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From sf at fermigier.com Mon Oct 30 04:59:53 2017 From: sf at fermigier.com (=?UTF-8?Q?St=C3=A9fane_Fermigier?=) Date: Mon, 30 Oct 2017 09:59:53 +0100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <20171030045137.GY9068@ando.pearwood.info> References: <20171030045137.GY9068@ando.pearwood.info> Message-ID: IIUC, this would be similar to "first" ( https://pypi.python.org/pypi/first/ ) but would raise exception in case the iterable returns more than one (or less than one) element. Would also be similar to one() in SQLAlchemy queries ( http://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.one ). Regards, S. On Mon, Oct 30, 2017 at 5:51 AM, Steven D'Aprano wrote: > On Mon, Oct 30, 2017 at 07:14:10AM +0300, Ivan Pozdeev via Python-ideas > wrote: > > > The eponymous C#'s LINQ method, I found very useful in the following, > > quite recurring use-case: > > If I have understood your use-case, you have a function that returns a > list of results (or possibly an iterator, or a tuple, or some other > sequence): > > print(search(haystack, needle)) > # prints ['bronze needle', 'gold needle', 'silver needle'] > > There are times you expect there to be a single result, and if there are > multiple results, that is considered an error. Am I correct so far? > > If so, then sequence unpacking is your friend: > > result, = search(haystack, needle) > > Note the comma after the variable name on the left-hand side of the > assignment. That's a special case of Python's more general sequence > unpacking: > > a, b, c = [100, 200, 300] > > assigns a = 100, b = 200, c == 300. Using a single item is valid: > > py> result, = [100] > py> print(result) > 100 > > > but if the right-hand side has more than one item, you get an exception: > > py> result, = [100, 200, 300] > Traceback (most recent call last): > File "", line 1, in > ValueError: too many values to unpack (expected 1) > > > I *think* this will solve your problem. > > If not, can you please explain what "single()" is supposed to do, why it > belongs in itertools, and show an example of how it will work. > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Stefane Fermigier - http://fermigier.com/ - http://twitter.com/sfermigier - http://linkedin.com/in/sfermigier Founder & CEO, Abilian - Enterprise Social Software - http://www.abilian.com/ Chairman, Free&OSS Group / Systematic Cluster - http://www.gt-logiciel-libre.org/ Co-Chairman, National Council for Free & Open Source Software (CNLL) - http://cnll.fr/ Founder & Organiser, PyData Paris - http://pydata.fr/ --- ?You never change things by ?ghting the existing reality. To change something, build a new model that makes the existing model obsolete.? ? R. Buckminster Fuller -------------- next part -------------- An HTML attachment was scrubbed... URL: From vano at mail.mipt.ru Mon Oct 30 05:49:09 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Mon, 30 Oct 2017 12:49:09 +0300 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: Message-ID: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> On 30.10.2017 9:29, python-ideas-request at python.org wrote: > If I have understood your use-case, you have a function that returns a > list of results (or possibly an iterator, or a tuple, or some other > sequence): > > print(search(haystack, needle)) > # prints ['bronze needle', 'gold needle', 'silver needle'] > > There are times you expect there to be a single result, and if there are > multiple results, that is considered an error. Am I correct so far? Correct. > If so, then sequence unpacking is your friend: > > result, = search(haystack, needle) > > <...> > > I *think* this will solve your problem. > > If not, can you please explain what "single()" is supposed to do, why it > belongs in itertools, and show an example of how it will work. That works. Too arcane in my book though (and others' too according to https://stackoverflow.com/a/473337/648265), and the error messages are cryptic in this use case. It also cannot be a part of an expression, unlike next(). The initial post on the above link summarizes the suggested implementation pretty well. -- Regards, Ivan From erik.m.bray at gmail.com Mon Oct 30 06:27:44 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 30 Oct 2017 11:27:44 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <021601d350ee$88f236d0$9ad6a470$@sdamon.com> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters wrote: > Then those users have more fundamental problems. There is a minimum level > of computer knowledge needed to be successful in programming. Insulating > users from the reality of the situation is not preparing them to be > successful. Pretending that there is no system command prompt, or shell, or > whatever platform specific term applies, only hurts new programmers. Give > users an error message they can google, and they will be better off in the > long run than they would be if we just ran pip for them. While I completely agree with this in principle, I think you overestimate the average beginner. Many beginners I've taught or helped, even if they can manage to get to the correct command prompt, often don't even know how to run the correct Python. They might often have multiple Pythons installed on their system--maybe they have Anaconda, maybe Python installed by homebrew, or a Python that came with an IDE like Spyder. If they're on OSX often running "python" from the command prompt gives the system's crippled Python 2.6 and they don't know the difference. One thing that has been a step in the right direction is moving more documentation toward preferring running `python -m pip` over just `pip`, since this often has a better guarantee of running `pip` in the Python interpreter you intended. But that still requires one to know how to run the correct Python interpreter from the command-line (which the newbie double-clicking on IDLE may not even have a concept of...). While I agree this is something that is important for beginners to learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for many newbies just to install one or two packages from pip, which they often might need/want to do for whatever educational pursuit they're following (heck, it's pretty common even just to want to install the `requests` module, as I would never throw `urllib` at a beginner). So while I don't think anything proposed here will work technically, I am in favor of an in-interpreter pip install functionality. Perhaps it could work something like this: a) Allow it *only* in interactive mode: running `pip(...)` (or whatever this looks like) outside of interactive mode raises a `RuntimeError` with the appropriate documentation b) When running `pip(...)` the user is supplied with an interactive prompt explaining that since installing packages with `pip()` can result in changes to the interpreter, it is necessary to restart the interpreter after installation--give them an opportunity to cancel the action in case they have any work they need to save. If they proceed, install the new package then restart the interpreter for them. This avoids any ambiguity as to states of loaded modules before/after pip install. > From: Stephan Houben [mailto:stephanh42 at gmail.com] > Sent: Sunday, October 29, 2017 3:43 PM > To: Alex Walters > Cc: Python-Ideas > Subject: Re: [Python-ideas] install pip packages from Python prompt > > > > Hi Alex, > > > > 2017-10-29 20:26 GMT+01:00 Alex Walters : > > return ?Please run pip from your system command prompt? > > > > > > The target audience for my proposal are people who do not know > > which part of the sheep the "system command prompt" is. > > Stephan > > > > > > From: Python-ideas > [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf > Of Stephan Houben > Sent: Sunday, October 29, 2017 3:19 PM > To: Python-Ideas > Subject: [Python-ideas] install pip packages from Python prompt > > > > Hi all, > > Here is in somewhat more detail my earlier proposal for > > having in the interactive Python interpreter a `pip` function to > > install packages from Pypi. > > Motivation: it appears to me that there is a category of newbies > > for which "open a shell and do `pip whatever`" is a bit too much. > > It would, in my opinion, simplify things a bit if they could just > > copy-and-paste some text into the Python interpreter and have > > some packages from pip installed. > > That would simplify instructions on how to install package xyz, > > without going into the vagaries of how to open a shell on various > > platforms, and how to get to the right pip executable. > > I think this could be as simple as: > > def pip(args): > import sys > import subprocess > subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) > > print("Please re-start Python now to use installed or upgraded > packages.") > > Note that I added the final message about restarting the interpreter > > as a low-tech solution to the problem of packages being already > > imported in the current Python session. > > I would imagine that the author of package xyz would then put on > > their webpage something like: > > To use, enter in your Python interpreter: > > pip("install xyz --user") > > As another example, consider prof. Baldwin from Woolamaloo university > > who teaches a course "Introductory Python programming for Sheep Shavers". > > In his course material, he instructs his students to execute the > > following line in their Python interpreter. > > pip("install woolamaloo-sheepshavers-goodies --user") > > which will install a package which will in turn, as dependencies, > > pull in a number of packages which are relevant for sheep shaving but > > which have nevertheless irresponsibly been left outside the stdlib. > > Stephan > > > > > > > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > From erik.m.bray at gmail.com Mon Oct 30 06:35:15 2017 From: erik.m.bray at gmail.com (Erik Bray) Date: Mon, 30 Oct 2017 11:35:15 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On Mon, Oct 30, 2017 at 11:27 AM, Erik Bray wrote: > On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters wrote: >> Then those users have more fundamental problems. There is a minimum level >> of computer knowledge needed to be successful in programming. Insulating >> users from the reality of the situation is not preparing them to be >> successful. Pretending that there is no system command prompt, or shell, or >> whatever platform specific term applies, only hurts new programmers. Give >> users an error message they can google, and they will be better off in the >> long run than they would be if we just ran pip for them. > > While I completely agree with this in principle, I think you > overestimate the average beginner. Many beginners I've taught or > helped, even if they can manage to get to the correct command prompt, > often don't even know how to run the correct Python. They might often > have multiple Pythons installed on their system--maybe they have > Anaconda, maybe Python installed by homebrew, or a Python that came > with an IDE like Spyder. If they're on OSX often running "python" > from the command prompt gives the system's crippled Python 2.6 and > they don't know the difference. I should add--another case that is becoming extremely common is beginners learning Python for the first time inside the Jupyter/IPython Notebook. And in my experience it can be very difficult for beginners to understand the connection between what's happening in the notebook ("it's in the web-browser--what does that have to do with anything on my computer??") and the underlying Python interpreter, file system, etc. Being able to pip install from within the Notebook would be a big win. This is already possible since IPython allows running system commands and it is possible to run the pip executable from the notebook, then manually restart the Jupyter kernel. It's not 100% clear to me how my proposal below would work within a Jupyter Notebook, so that would also be an angle worth looking into. Best, Erik > One thing that has been a step in the right direction is moving more > documentation toward preferring running `python -m pip` over just > `pip`, since this often has a better guarantee of running `pip` in the > Python interpreter you intended. But that still requires one to know > how to run the correct Python interpreter from the command-line (which > the newbie double-clicking on IDLE may not even have a concept of...). > > While I agree this is something that is important for beginners to > learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for > many newbies just to install one or two packages from pip, which they > often might need/want to do for whatever educational pursuit they're > following (heck, it's pretty common even just to want to install the > `requests` module, as I would never throw `urllib` at a beginner). > > So while I don't think anything proposed here will work technically, I > am in favor of an in-interpreter pip install functionality. Perhaps > it could work something like this: > > a) Allow it *only* in interactive mode: running `pip(...)` (or > whatever this looks like) outside of interactive mode raises a > `RuntimeError` with the appropriate documentation > b) When running `pip(...)` the user is supplied with an interactive > prompt explaining that since installing packages with `pip()` can > result in changes to the interpreter, it is necessary to restart the > interpreter after installation--give them an opportunity to cancel the > action in case they have any work they need to save. If they proceed, > install the new package then restart the interpreter for them. This > avoids any ambiguity as to states of loaded modules before/after pip > install. > > > >> From: Stephan Houben [mailto:stephanh42 at gmail.com] >> Sent: Sunday, October 29, 2017 3:43 PM >> To: Alex Walters >> Cc: Python-Ideas >> Subject: Re: [Python-ideas] install pip packages from Python prompt >> >> >> >> Hi Alex, >> >> >> >> 2017-10-29 20:26 GMT+01:00 Alex Walters : >> >> return ?Please run pip from your system command prompt? >> >> >> >> >> >> The target audience for my proposal are people who do not know >> >> which part of the sheep the "system command prompt" is. >> >> Stephan >> >> >> >> >> >> From: Python-ideas >> [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On Behalf >> Of Stephan Houben >> Sent: Sunday, October 29, 2017 3:19 PM >> To: Python-Ideas >> Subject: [Python-ideas] install pip packages from Python prompt >> >> >> >> Hi all, >> >> Here is in somewhat more detail my earlier proposal for >> >> having in the interactive Python interpreter a `pip` function to >> >> install packages from Pypi. >> >> Motivation: it appears to me that there is a category of newbies >> >> for which "open a shell and do `pip whatever`" is a bit too much. >> >> It would, in my opinion, simplify things a bit if they could just >> >> copy-and-paste some text into the Python interpreter and have >> >> some packages from pip installed. >> >> That would simplify instructions on how to install package xyz, >> >> without going into the vagaries of how to open a shell on various >> >> platforms, and how to get to the right pip executable. >> >> I think this could be as simple as: >> >> def pip(args): >> import sys >> import subprocess >> subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) >> >> print("Please re-start Python now to use installed or upgraded >> packages.") >> >> Note that I added the final message about restarting the interpreter >> >> as a low-tech solution to the problem of packages being already >> >> imported in the current Python session. >> >> I would imagine that the author of package xyz would then put on >> >> their webpage something like: >> >> To use, enter in your Python interpreter: >> >> pip("install xyz --user") >> >> As another example, consider prof. Baldwin from Woolamaloo university >> >> who teaches a course "Introductory Python programming for Sheep Shavers". >> >> In his course material, he instructs his students to execute the >> >> following line in their Python interpreter. >> >> pip("install woolamaloo-sheepshavers-goodies --user") >> >> which will install a package which will in turn, as dependencies, >> >> pull in a number of packages which are relevant for sheep shaving but >> >> which have nevertheless irresponsibly been left outside the stdlib. >> >> Stephan >> >> >> >> >> >> >> >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> From vano at mail.mipt.ru Mon Oct 30 08:45:57 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Mon, 30 Oct 2017 15:45:57 +0300 Subject: [Python-ideas] Add processor generation to wheel metadata Message-ID: <1a58d84a-23e0-3533-417e-b2f27090ddc1@mail.mipt.ru> Generally, packages are compiled for the same processor generation as the corresponding Python. But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some compiler bug (https://github.com/numpy/numpy/issues/6428). I was bitten by that at an old machine once and found out that there is no way for `pip' to have checked for that. Besides, performance-oriented packages like the one mentioned could probably benefit from newer instructions. Regarding identifiers: gcc, cl and clang all have their private directories of generation identifiers: https://gcc.gnu.org/onlinedocs/gcc-4.7.1/gcc/i386-and-x86_002d64-Options.html https://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx https://clang.llvm.org/doxygen/Driver_2ToolChains_2Arch_2X86_8cpp_source.html Linux packages typically use gcc's ones. Clang generally follows in gcc's footsteps and accepts cl's IDs, too, as aliases. So, using the IDs of whatever compiler is used to build the package (i.e. most likely the canonical compiler for CPython for that platform) looks like the simple&stupid(r) way - we can just take the value of the "march" argument. The tricky part is mapping the system's processor to an ID when checking compatibility: the logic will have to keep a directory (that's the job of `wheel' package maintainers though, I guess). I also guess that there are cases where there's no such thing as _the_ system's processor. -- Regards, Ivan From guido at python.org Mon Oct 30 10:28:55 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Oct 2017 07:28:55 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: I just feel that when you're talking about an org like PayPal they can take care of themselves and don't need our help. They will likely have packages they want installed everywhere that would never make in on your list. So it feels this whole discussion is a distraction and a waste of time (yours, too). On Sun, Oct 29, 2017 at 10:24 PM, Nick Coghlan wrote: > On 30 October 2017 at 04:56, Guido van Rossum wrote: > >> The two use cases you describe (scripters and teachers) leave me >> luke-warm -- scripters live in the wild west and can just pip install >> whatever (that's what it means to be scripting) >> > > For personal scripting, we can install whatever, but "institutional > scripting" isn't the same thing - there we're scripting predefined > "Standard Operating Environments", like those Mahmoud Hashemi describes for > PayPal at the start of https://www.paypal-engineering.com/2016/09/07/ > python-packaging-at-paypal/ > > "Just use PyInstaller" isn't an adequate answer in large-scale > environments because of the "zlib problem": you don't want to have to > rebuild and redeploy the world to handle a security update in a low level > frequently used component, you want to just update that component and have > everything else pick it up dynamically. > > While "Just use conda" is excellent advice nowadays for any organisation > contemplating defining their own bespoke Python SOE (hence Mahmoud's post), > if that isn't being driven by the folks that already maintain the SOE (as > happened in PayPal's case) convincing an org to add a handful of python-dev > endorsed libraries to an established SOE is going to be easier than > rebasing their entire Python SOE on conda. > > >> and teachers tend to want a customized bundle anyway -- let the edu world >> get together and create their own recommended bundle. >> >> As long as it's not going to be bundled, i.e. there's just going to be >> some list of packages that we recommend to 3rd party repackagers, then I'm >> fine with it. But they must remain clearly marked as 3rd party packages in >> whatever docs we provide, and live in site-packages. >> > > Yep, that was my intent, although it may not have been clear in my initial > proposal. I've filed two separate RFEs in relation to that: > > * Documentation only: https://bugs.python.org/issue31898 > * Regression testing resource: https://bugs.python.org/issue31899 > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 30 10:32:33 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Oct 2017 07:32:33 -0700 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> References: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> Message-ID: This is a key example of a case where code speaks. Can you write an implementation of how you would want single() to work in Python code? On Mon, Oct 30, 2017 at 2:49 AM, Ivan Pozdeev via Python-ideas < python-ideas at python.org> wrote: > > > On 30.10.2017 9:29, python-ideas-request at python.org wrote: > >> If I have understood your use-case, you have a function that returns a >> list of results (or possibly an iterator, or a tuple, or some other >> sequence): >> >> print(search(haystack, needle)) >> # prints ['bronze needle', 'gold needle', 'silver needle'] >> >> There are times you expect there to be a single result, and if there are >> multiple results, that is considered an error. Am I correct so far? >> > Correct. > >> If so, then sequence unpacking is your friend: >> >> result, = search(haystack, needle) >> >> <...> >> >> I *think* this will solve your problem. >> >> If not, can you please explain what "single()" is supposed to do, why it >> belongs in itertools, and show an example of how it will work. >> > That works. Too arcane in my book though (and others' too according to > https://stackoverflow.com/a/473337/648265), and the error messages are > cryptic in this use case. > It also cannot be a part of an expression, unlike next(). > > The initial post on the above link summarizes the suggested implementation > pretty well. > > -- > Regards, > Ivan > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 30 11:44:10 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Oct 2017 01:44:10 +1000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On 30 October 2017 at 20:35, Erik Bray wrote: > I should add--another case that is becoming extremely common is > beginners learning Python for the first time inside the > Jupyter/IPython Notebook. And in my experience it can be very > difficult for beginners to understand the connection between what's > happening in the notebook ("it's in the web-browser--what does that > have to do with anything on my computer??") and the underlying Python > interpreter, file system, etc. Being able to pip install from within > the Notebook would be a big win. This is already possible since > IPython allows running system commands and it is possible to run the > pip executable from the notebook, then manually restart the Jupyter > kernel. > > It's not 100% clear to me how my proposal below would work within a > Jupyter Notebook, so that would also be an angle worth looking into. > A few specific notes here: 1. As you say, this sort of already works in notebooks, since instructors can say to run "!pip install requests" and then restart the language kernel. 2. We could probably replicate that style in IDLE, since that runs user code in a subprocess, similar to the way Jupyter language kernels are separate from the frontend client 3. We can't replicate it as readily in the regular REPL, since that runs Python code directly in the current process, but even there I believe we could potentially trigger a full process restart via execve (or the C++ style _execve on Windows) (We'd want a real process restart, rather than emulating it by calling Py_Initialize & Py_Finalize multiple times, as not every module properly supports multiple initialise/finalise cycles within a single process, and module-specific quirks are exactly what we'd be trying to avoid by forcing an interpreter restart) So the main missing piece if we went down that path would be to offer a way to say from within the interpreter itself "Restart the current interactive session". One possible approach to that would be to define a RestartInterpreter subclass of SystemExit, which the interpreter would intercept at around the same point where it checks for the PYTHONINSPECT flag, and then initiate a graceful process shutdown and restart, rather than a normal exit. We'd probably want that capability to be off by default and enable it explicitly from the CPython CLI though, as otherwise it could have some really annoying side effects in runtime embedding use cases. I'm sure there'd be some thorny edge cases that would arise in trying to make this work in practice, but at first glance, the general idea sounds potentially feasible to me. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Oct 30 11:53:11 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Oct 2017 16:53:11 +0100 Subject: [Python-ideas] install pip packages from Python prompt References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: <20171030165311.468cdb62@fsol> On Tue, 31 Oct 2017 01:44:10 +1000 Nick Coghlan wrote: > > A few specific notes here: > > 1. As you say, this sort of already works in notebooks, since instructors > can say to run "!pip install requests" and then restart the language kernel. > 2. We could probably replicate that style in IDLE, since that runs user > code in a subprocess, similar to the way Jupyter language kernels are > separate from the frontend client > 3. We can't replicate it as readily in the regular REPL, since that runs > Python code directly in the current process, but even there I believe we > could potentially trigger a full process restart via execve (or the C++ > style _execve on Windows) > > (We'd want a real process restart, rather than emulating it by calling > Py_Initialize & Py_Finalize multiple times, as not every module properly > supports multiple initialise/finalise cycles within a single process, and > module-specific quirks are exactly what we'd be trying to avoid by forcing > an interpreter restart) The main difference, though, is that a notebook will reload and replay all your session, while restarting the regular REPL will simply lose all current work. I think that makes the idea much less appealing. Regards Antoine. From p.f.moore at gmail.com Mon Oct 30 12:06:14 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 30 Oct 2017 16:06:14 +0000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <20171030165311.468cdb62@fsol> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <20171030165311.468cdb62@fsol> Message-ID: On 30 October 2017 at 15:53, Antoine Pitrou wrote: > On Tue, 31 Oct 2017 01:44:10 +1000 > Nick Coghlan wrote: >> >> A few specific notes here: >> >> 1. As you say, this sort of already works in notebooks, since instructors >> can say to run "!pip install requests" and then restart the language kernel. >> 2. We could probably replicate that style in IDLE, since that runs user >> code in a subprocess, similar to the way Jupyter language kernels are >> separate from the frontend client >> 3. We can't replicate it as readily in the regular REPL, since that runs >> Python code directly in the current process, but even there I believe we >> could potentially trigger a full process restart via execve (or the C++ >> style _execve on Windows) >> >> (We'd want a real process restart, rather than emulating it by calling >> Py_Initialize & Py_Finalize multiple times, as not every module properly >> supports multiple initialise/finalise cycles within a single process, and >> module-specific quirks are exactly what we'd be trying to avoid by forcing >> an interpreter restart) > > The main difference, though, is that a notebook will reload and > replay all your session, while restarting the regular REPL will simply > lose all current work. I think that makes the idea much less > appealing. Also, on Windows, I believe that any emulation of execve either leaves the original process in memory, or has problems getting console inheritance right. It's been a long time since I worked at that level, and things may be better now, but getting a robust "restart this process" interface in Windows would need some care (that's one of the reasons the py launcher runs Python as a subprocess rather than doing any sort of exec equivalent). Paul From ncoghlan at gmail.com Mon Oct 30 12:08:59 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Oct 2017 02:08:59 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 31 October 2017 at 00:28, Guido van Rossum wrote: > I just feel that when you're talking about an org like PayPal they can > take care of themselves and don't need our help. They will likely have > packages they want installed everywhere that would never make in on your > list. So it feels this whole discussion is a distraction and a waste of > time (yours, too). > Just because companies are big doesn't mean they necessarily have anyone internally that's already up to speed on the specifics of recommended practices in a sprawling open source community like Python's. The genesis of this idea is that I think we can offer a more consistent initial experience for those folks than "Here's PyPI and Google, y'all have fun now" (and in so doing, help folks writing books and online tutorials to feel more comfortable with the idea of assuming that libraries like requests will be available in even the most restrictive institutional environments that still allow the use of Python). One specific situation this idea is designed to help with is the one where: - there's a centrally managed Standard Operating Environment that dictates what gets installed - they've approved the python.org installers - they *haven't* approved anything else yet Now, a lot of large orgs simply won't get into that situation in the first place, since their own supplier management rules will push them towards a commercial redistributor, in which case they'll get their chosen redistributor's preferred package set, which will then typically cover at least a few hundred of the most popular PyPI packages. But some of them will start from the narrower "standard library only" baseline, and I spent enough time back at Boeing arguing for libraries to be added to our approved component list to appreciate the benefits of transitive declarations of trust ("we trust supplier X, they unambigously state their trust in supplier Y, so that's an additional point in favour of our also trusting supplier Y") when it comes time to make your case to your supplier management organisation. Such declarations still aren'y always sufficient, but they definitely don't hurt, and they sometimes help. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Mon Oct 30 12:22:50 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Oct 2017 02:22:50 +1000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <20171030165311.468cdb62@fsol> Message-ID: On 31 October 2017 at 02:06, Paul Moore wrote: > On 30 October 2017 at 15:53, Antoine Pitrou wrote: > > On Tue, 31 Oct 2017 01:44:10 +1000 > > Nick Coghlan wrote: > >> (We'd want a real process restart, rather than emulating it by calling > >> Py_Initialize & Py_Finalize multiple times, as not every module properly > >> supports multiple initialise/finalise cycles within a single process, > and > >> module-specific quirks are exactly what we'd be trying to avoid by > forcing > >> an interpreter restart) > > > > The main difference, though, is that a notebook will reload and > > replay all your session, while restarting the regular REPL will simply > > lose all current work. I think that makes the idea much less > > appealing. > Right, but if you want an installation to work reliably, you're going to lose that state anyway. Erik's original comment included the suggestion to "give them an opportunity to cancel the action in case they have any work they need to save", and I think some kind of warning's going to be necessary no matter how we handle the restart. > Also, on Windows, I believe that any emulation of execve either leaves > the original process in memory, or has problems getting console > inheritance right. It's been a long time since I worked at that level, > and things may be better now, but getting a robust "restart this > process" interface in Windows would need some care (that's one of the > reasons the py launcher runs Python as a subprocess rather than doing > any sort of exec equivalent). > As long as the standard streams are passed along correctly, whatever the py launcher does would presumably be adequate for a REPL restart as well, assuming we decided to go down that path. It would also be reasonable to say that the regular REPL just issues a warning that a restart might be needed, and it's only REPLs with a separate client process that offer a way to restart the subprocess where code actually executes. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Mon Oct 30 12:29:01 2017 From: solipsis at pitrou.net (Antoine Pitrou) Date: Mon, 30 Oct 2017 17:29:01 +0100 Subject: [Python-ideas] install pip packages from Python prompt References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <20171030165311.468cdb62@fsol> Message-ID: <20171030172901.7d39c025@fsol> On Tue, 31 Oct 2017 02:22:50 +1000 Nick Coghlan wrote: > On 31 October 2017 at 02:06, Paul Moore wrote: > > > On 30 October 2017 at 15:53, Antoine Pitrou wrote: > > > On Tue, 31 Oct 2017 01:44:10 +1000 > > > Nick Coghlan wrote: > > >> (We'd want a real process restart, rather than emulating it by calling > > >> Py_Initialize & Py_Finalize multiple times, as not every module properly > > >> supports multiple initialise/finalise cycles within a single process, > > and > > >> module-specific quirks are exactly what we'd be trying to avoid by > > forcing > > >> an interpreter restart) > > > > > > The main difference, though, is that a notebook will reload and > > > replay all your session, while restarting the regular REPL will simply > > > lose all current work. I think that makes the idea much less > > > appealing. > > > > Right, but if you want an installation to work reliably, you're going to > lose that state anyway. You're going to lose the concrete state, but not the sequence of prompt commands and expressions which led to that state, and which a notebook makes trivial to replay (it's a bit like statement-based replication on a database). The regular REPL would lose both. Regards Antoine. From guido at python.org Mon Oct 30 12:29:11 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Oct 2017 09:29:11 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: What's your proposed process to arrive at the list of recommended packages? And is it really just going to be a list of names, or is there going to be some documentation (about the vetting, not about the contents of the packages) for each name? On Mon, Oct 30, 2017 at 9:08 AM, Nick Coghlan wrote: > On 31 October 2017 at 00:28, Guido van Rossum wrote: > >> I just feel that when you're talking about an org like PayPal they can >> take care of themselves and don't need our help. They will likely have >> packages they want installed everywhere that would never make in on your >> list. So it feels this whole discussion is a distraction and a waste of >> time (yours, too). >> > > Just because companies are big doesn't mean they necessarily have anyone > internally that's already up to speed on the specifics of recommended > practices in a sprawling open source community like Python's. The genesis > of this idea is that I think we can offer a more consistent initial > experience for those folks than "Here's PyPI and Google, y'all have fun > now" (and in so doing, help folks writing books and online tutorials to > feel more comfortable with the idea of assuming that libraries like > requests will be available in even the most restrictive institutional > environments that still allow the use of Python). > > One specific situation this idea is designed to help with is the one where: > > - there's a centrally managed Standard Operating Environment that dictates > what gets installed > - they've approved the python.org installers > - they *haven't* approved anything else yet > > Now, a lot of large orgs simply won't get into that situation in the first > place, since their own supplier management rules will push them towards a > commercial redistributor, in which case they'll get their chosen > redistributor's preferred package set, which will then typically cover at > least a few hundred of the most popular PyPI packages. > > But some of them will start from the narrower "standard library only" > baseline, and I spent enough time back at Boeing arguing for libraries to > be added to our approved component list to appreciate the benefits of > transitive declarations of trust ("we trust supplier X, they unambigously > state their trust in supplier Y, so that's an additional point in favour of > our also trusting supplier Y") when it comes time to make your case to your > supplier management organisation. Such declarations still aren'y always > sufficient, but they definitely don't hurt, and they sometimes help. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.f.moore at gmail.com Mon Oct 30 12:33:36 2017 From: p.f.moore at gmail.com (Paul Moore) Date: Mon, 30 Oct 2017 16:33:36 +0000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <20171030165311.468cdb62@fsol> Message-ID: On 30 October 2017 at 16:22, Nick Coghlan wrote: >> Also, on Windows, I believe that any emulation of execve either leaves >> the original process in memory, or has problems getting console >> inheritance right. It's been a long time since I worked at that level, >> and things may be better now, but getting a robust "restart this >> process" interface in Windows would need some care (that's one of the >> reasons the py launcher runs Python as a subprocess rather than doing >> any sort of exec equivalent). > > As long as the standard streams are passed along correctly, whatever the py > launcher does would presumably be adequate for a REPL restart as well, > assuming we decided to go down that path. The py launcher starts a subprocess for python.exe and waits on it. I wouldn't have thought that's going to work for installing mods in a REPL - imagine a long working session where I install 10 mods as I explore options for a particular problem (I don't know how likely that is in practice...) - there'd be a chain of 10+ Python processes, only the last of which is still useful. It's probably not a massive problem (I assume everything but the last process is paged out) but it's not exactly friendly. OTOH, if you lose the command history and the interpreter state after each install, you'll probably get fed up long before the number of processes is an issue... > It would also be reasonable to say that the regular REPL just issues a > warning that a restart might be needed, and it's only REPLs with a separate > client process that offer a way to restart the subprocess where code > actually executes. This feels awfully like the traditional Windows "your mouse has moved - please reboot to have your changes take effect" behaviour. I don't think we're going to impress many people emulating that :-( Paul From dwarwick96 at gmail.com Mon Oct 30 13:20:35 2017 From: dwarwick96 at gmail.com (Drew) Date: Mon, 30 Oct 2017 13:20:35 -0400 Subject: [Python-ideas] IMAP4.__exit__ counterintuitive for with blocks Message-ID: <0e71a8a9-c4b4-488a-a152-16c7911f27e4@gmail.com> IMAP4.close closes the selected inbox and commits changes such as deletions. It is not called on IMAP4.__exit__ (only logout is, which doesn't call close in its call stack) however, so: with imaplib.IMAP4_SSL(...) as i: ??? ... would fail to commit those changes. close must be explicitly invoked i.e. with imaplib.IMAP4_SSL(...) as i: ??? ... ??? i.close() This is counterintuitive however, as the with statement is meant to automatically clean up. Another programmer might come along and delete i.close() because it seems unnecessary. Now changes aren't being committed and the programmer doesn't realize it's because of this weird Python idiosyncracy. Python IO such as open commits changes automatically, so I'm not sure why IMAP4 doesn't and only logs out. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alexander.belopolsky at gmail.com Mon Oct 30 13:25:53 2017 From: alexander.belopolsky at gmail.com (Alexander Belopolsky) Date: Mon, 30 Oct 2017 13:25:53 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On Mon, Oct 30, 2017 at 11:44 AM, Nick Coghlan wrote: .. > 3. We can't replicate it as readily in the regular REPL, since that runs > Python code directly in the current process, but even there I believe we > could potentially trigger a full process restart via execve (or the C++ > style _execve on Windows) This exact problem is solved rather elegantly in Julia. When you upgrade a package that is already loaded in the REPL, it prints a warning: "The following packages have been updated but were already imported: ... Restart Julia to use the updated versions." listing the affected packages. See . From guido at python.org Mon Oct 30 13:32:23 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Oct 2017 10:32:23 -0700 Subject: [Python-ideas] IMAP4.__exit__ counterintuitive for with blocks In-Reply-To: <0e71a8a9-c4b4-488a-a152-16c7911f27e4@gmail.com> References: <0e71a8a9-c4b4-488a-a152-16c7911f27e4@gmail.com> Message-ID: But maybe if __exit__ is called with an exception it should roll back. In any case it looks like your proposal could break existing code (e.g. code that uses `with` depending on its current behavior, using some other logic to decide whether to commit or not) and that feels difficult to overcome. Perhaps a new method or flag argument can be added to request that the transactions be automatically committed? On Mon, Oct 30, 2017 at 10:20 AM, Drew wrote: > IMAP4.close closes the selected inbox and commits changes such as > deletions. It is not called on IMAP4.__exit__ (only logout is, which > doesn't call close in its call stack) however, so: > > with imaplib.IMAP4_SSL(...) as i: > ... > > would fail to commit those changes. close must be explicitly invoked i.e. > > with imaplib.IMAP4_SSL(...) as i: > ... > i.close() > > This is counterintuitive however, as the with statement is meant to > automatically clean up. Another programmer might come along and delete > i.close() because it seems unnecessary. Now changes aren't being committed > and the programmer doesn't realize it's because of this weird Python > idiosyncracy. > > Python IO such as open commits changes automatically, so I'm not sure why > IMAP4 doesn't and only logs out. > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephanh42 at gmail.com Mon Oct 30 13:52:12 2017 From: stephanh42 at gmail.com (Stephan Houben) Date: Mon, 30 Oct 2017 18:52:12 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <20171030165311.468cdb62@fsol> Message-ID: What about something like the following to simulate a "restart", portably. def restart(): import sys import os import subprocess if os.getenv("PYTHON_EXIT_ON_RESTART") == "1": sys.exit(42) else: env = os.environ.copy() env["PYTHON_EXIT_ON_RESTART"] = "1" while True: sp = subprocess.run([sys.executable], env=env) if sp.returncode != 42: sys.exit(sp.returncode) Stephan 2017-10-30 17:33 GMT+01:00 Paul Moore : > On 30 October 2017 at 16:22, Nick Coghlan wrote: > >> Also, on Windows, I believe that any emulation of execve either leaves > >> the original process in memory, or has problems getting console > >> inheritance right. It's been a long time since I worked at that level, > >> and things may be better now, but getting a robust "restart this > >> process" interface in Windows would need some care (that's one of the > >> reasons the py launcher runs Python as a subprocess rather than doing > >> any sort of exec equivalent). > > > > As long as the standard streams are passed along correctly, whatever the > py > > launcher does would presumably be adequate for a REPL restart as well, > > assuming we decided to go down that path. > > The py launcher starts a subprocess for python.exe and waits on it. I > wouldn't have thought that's going to work for installing mods in a > REPL - imagine a long working session where I install 10 mods as I > explore options for a particular problem (I don't know how likely that > is in practice...) - there'd be a chain of 10+ Python processes, only > the last of which is still useful. It's probably not a massive problem > (I assume everything but the last process is paged out) but it's not > exactly friendly. > > OTOH, if you lose the command history and the interpreter state after > each install, you'll probably get fed up long before the number of > processes is an issue... > > > It would also be reasonable to say that the regular REPL just issues a > > warning that a restart might be needed, and it's only REPLs with a > separate > > client process that offer a way to restart the subprocess where code > > actually executes. > > This feels awfully like the traditional Windows "your mouse has moved > - please reboot to have your changes take effect" behaviour. I don't > think we're going to impress many people emulating that :-( > > Paul > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwarwick96 at gmail.com Mon Oct 30 14:30:15 2017 From: dwarwick96 at gmail.com (Drew) Date: Mon, 30 Oct 2017 14:30:15 -0400 Subject: [Python-ideas] IMAP4.__exit__ counterintuitive for with blocks In-Reply-To: References: <0e71a8a9-c4b4-488a-a152-16c7911f27e4@gmail.com> Message-ID: <1bff6ba3-038f-489e-8bf7-c2110ed625aa@gmail.com> Yeah, a flag for IMAP4 objects sounds like it'd solve backwards-compatibility problems. E.g. imaplib.IMAP4_SSL(..., commit=manual/auto) As for handling errors with automatic commits, we should probably just defer to whatever Python IO does to remain consistent. ? On Oct 30, 2017, 1:33 PM, at 1:33 PM, Guido van Rossum wrote: >But maybe if __exit__ is called with an exception it should roll back. > >In any case it looks like your proposal could break existing code (e.g. >code that uses `with` depending on its current behavior, using some >other >logic to decide whether to commit or not) and that feels difficult to >overcome. Perhaps a new method or flag argument can be added to request >that the transactions be automatically committed? > >On Mon, Oct 30, 2017 at 10:20 AM, Drew wrote: > >> IMAP4.close closes the selected inbox and commits changes such as >> deletions. It is not called on IMAP4.__exit__ (only logout is, which >> doesn't call close in its call stack) however, so: >> >> with imaplib.IMAP4_SSL(...) as i: >> ... >> >> would fail to commit those changes. close must be explicitly invoked >i.e. >> >> with imaplib.IMAP4_SSL(...) as i: >> ... >> i.close() >> >> This is counterintuitive however, as the with statement is meant to >> automatically clean up. Another programmer might come along and >delete >> i.close() because it seems unnecessary. Now changes aren't being >committed >> and the programmer doesn't realize it's because of this weird Python >> idiosyncracy. >> >> Python IO such as open commits changes automatically, so I'm not sure >why >> IMAP4 doesn't and only logs out. >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> >> > > >-- >--Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From apalala at gmail.com Mon Oct 30 15:08:13 2017 From: apalala at gmail.com (=?UTF-8?Q?Juancarlo_A=C3=B1ez?=) Date: Mon, 30 Oct 2017 15:08:13 -0400 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On Mon, Oct 30, 2017 at 12:29 PM, Guido van Rossum wrote: > What's your proposed process to arrive at the list of recommended > packages? And is it really just going to be a list of names, or is there > going to be some documentation (about the vetting, not about the contents > of the packages) for each name? > As I see it, the bootstrap would be a requirements.txt (requirements.pip) with the packages that more experienced developers prefer to use over stdlib, pinned to package versions known to work well with the CPython release. I think this is a great idea, specially if it's an easy opt-in. For example, sometimes I go into a programming niche for a few years, and I'd very much appreciate a curated list of packages when I move to another niche. A pay-forward kind of thing. The downside to the idea is that the list of packages will need a curator (which could be python.org). -------------- next part -------------- An HTML attachment was scrubbed... URL: From kulakov.ilya at gmail.com Mon Oct 30 15:47:30 2017 From: kulakov.ilya at gmail.com (Ilya Kulakov) Date: Mon, 30 Oct 2017 12:47:30 -0700 Subject: [Python-ideas] Thread.__init__ should call super() In-Reply-To: References: <290F93FE-4E4D-463C-9AB3-F42EB9874EF7@gmail.com> <20171027232152.GR9068@ando.pearwood.info> <20171028111413.GT9068@ando.pearwood.info> Message-ID: <60F1BB07-3C39-4CB9-9E0A-5BD5AA812955@gmail.com> Neil, thank you for doing much better job explaining the problem. Generally, I'm cool with Python's standard library classes not calling super(), as many of them are not designed for subclassing. But those which are should do that. E.g. take a look at more recent asyncio's Protocol and Transport classes: they all properly call super(). One potential problem is that it will break existing code: class X(Thread, SomethingElse): def __init__(self): Thread.__init__(self) SomethingElse.__init__(self) SomethingElse.__init__ will be called twice. Is it a good reason for "old" classes to lag behind? I don't know. Perhaps some mechanism (invisible to a user) can be designed to avoid that. E.g. super() may leave a flag which should signal interpreter to "skip" all direct calls of a function and warn about it (DeprecationWarning?). Best Regards, Ilya Kulakov From tritium-list at sdamon.com Mon Oct 30 15:57:34 2017 From: tritium-list at sdamon.com (Alex Walters) Date: Mon, 30 Oct 2017 15:57:34 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: <045101d351b9$54a25240$fde6f6c0$@sdamon.com> > -----Original Message----- > From: Python-ideas [mailto:python-ideas-bounces+tritium- > list=sdamon.com at python.org] On Behalf Of Erik Bray > Sent: Monday, October 30, 2017 6:28 AM > To: Python-Ideas > Subject: Re: [Python-ideas] install pip packages from Python prompt > > On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters > wrote: > > Then those users have more fundamental problems. There is a minimum > level > > of computer knowledge needed to be successful in programming. > Insulating > > users from the reality of the situation is not preparing them to be > > successful. Pretending that there is no system command prompt, or shell, > or > > whatever platform specific term applies, only hurts new programmers. > Give > > users an error message they can google, and they will be better off in the > > long run than they would be if we just ran pip for them. > > While I completely agree with this in principle, I think you > overestimate the average beginner. Nope. I totally get that they don?t know what a shell or command prompt is. THEY. NEED. TO. LEARN. Hiding it is not a good idea for anyone. If this is an insurmountable problem for the newbie, maybe they really shouldn?t be attempting to program. This field is not for everyone. > Many beginners I've taught or > helped, even if they can manage to get to the correct command prompt, > often don't even know how to run the correct Python. They might often > have multiple Pythons installed on their system--maybe they have > Anaconda, maybe Python installed by homebrew, or a Python that came > with an IDE like Spyder. If they're on OSX often running "python" > from the command prompt gives the system's crippled Python 2.6 and > they don't know the difference. > > One thing that has been a step in the right direction is moving more > documentation toward preferring running `python -m pip` over just > `pip`, since this often has a better guarantee of running `pip` in the > Python interpreter you intended. But that still requires one to know > how to run the correct Python interpreter from the command-line (which > the newbie double-clicking on IDLE may not even have a concept of...). > > While I agree this is something that is important for beginners to > learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for > many newbies just to install one or two packages from pip, which they > often might need/want to do for whatever educational pursuit they're > following (heck, it's pretty common even just to want to install the > `requests` module, as I would never throw `urllib` at a beginner). > > So while I don't think anything proposed here will work technically, I > am in favor of an in-interpreter pip install functionality. Perhaps > it could work something like this: > > a) Allow it *only* in interactive mode: running `pip(...)` (or > whatever this looks like) outside of interactive mode raises a > `RuntimeError` with the appropriate documentation > b) When running `pip(...)` the user is supplied with an interactive > prompt explaining that since installing packages with `pip()` can > result in changes to the interpreter, it is necessary to restart the > interpreter after installation--give them an opportunity to cancel the > action in case they have any work they need to save. If they proceed, > install the new package then restart the interpreter for them. This > avoids any ambiguity as to states of loaded modules before/after pip > install. > > > > > From: Stephan Houben [mailto:stephanh42 at gmail.com] > > Sent: Sunday, October 29, 2017 3:43 PM > > To: Alex Walters > > Cc: Python-Ideas > > Subject: Re: [Python-ideas] install pip packages from Python prompt > > > > > > > > Hi Alex, > > > > > > > > 2017-10-29 20:26 GMT+01:00 Alex Walters : > > > > return ?Please run pip from your system command prompt? > > > > > > > > > > > > The target audience for my proposal are people who do not know > > > > which part of the sheep the "system command prompt" is. > > > > Stephan > > > > > > > > > > > > From: Python-ideas > > [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On > Behalf > > Of Stephan Houben > > Sent: Sunday, October 29, 2017 3:19 PM > > To: Python-Ideas > > Subject: [Python-ideas] install pip packages from Python prompt > > > > > > > > Hi all, > > > > Here is in somewhat more detail my earlier proposal for > > > > having in the interactive Python interpreter a `pip` function to > > > > install packages from Pypi. > > > > Motivation: it appears to me that there is a category of newbies > > > > for which "open a shell and do `pip whatever`" is a bit too much. > > > > It would, in my opinion, simplify things a bit if they could just > > > > copy-and-paste some text into the Python interpreter and have > > > > some packages from pip installed. > > > > That would simplify instructions on how to install package xyz, > > > > without going into the vagaries of how to open a shell on various > > > > platforms, and how to get to the right pip executable. > > > > I think this could be as simple as: > > > > def pip(args): > > import sys > > import subprocess > > subprocess.check_call([sys.executable, "-m", "pip"] + args.split()) > > > > print("Please re-start Python now to use installed or upgraded > > packages.") > > > > Note that I added the final message about restarting the interpreter > > > > as a low-tech solution to the problem of packages being already > > > > imported in the current Python session. > > > > I would imagine that the author of package xyz would then put on > > > > their webpage something like: > > > > To use, enter in your Python interpreter: > > > > pip("install xyz --user") > > > > As another example, consider prof. Baldwin from Woolamaloo university > > > > who teaches a course "Introductory Python programming for Sheep > Shavers". > > > > In his course material, he instructs his students to execute the > > > > following line in their Python interpreter. > > > > pip("install woolamaloo-sheepshavers-goodies --user") > > > > which will install a package which will in turn, as dependencies, > > > > pull in a number of packages which are relevant for sheep shaving but > > > > which have nevertheless irresponsibly been left outside the stdlib. > > > > Stephan > > > > > > > > > > > > > > > > > > _______________________________________________ > > Python-ideas mailing list > > Python-ideas at python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From k7hoven at gmail.com Mon Oct 30 16:02:34 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Mon, 30 Oct 2017 22:02:34 +0200 Subject: [Python-ideas] IMAP4.__exit__ counterintuitive for with blocks In-Reply-To: <1bff6ba3-038f-489e-8bf7-c2110ed625aa@gmail.com> References: <0e71a8a9-c4b4-488a-a152-16c7911f27e4@gmail.com> <1bff6ba3-038f-489e-8bf7-c2110ed625aa@gmail.com> Message-ID: On Mon, Oct 30, 2017 at 8:30 PM, Drew wrote: > Yeah, a flag for IMAP4 objects sounds like it'd solve > backwards-compatibility problems. E.g. > > imaplib.IMAP4_SSL(..., commit=manual/auto) > > As for handling errors with automatic commits, we should probably just > defer to whatever Python IO does to remain consistent. > > ?Yeah, at least if it makes sense here too. BTW, it so happens that an old contextlib feature strikes again and allows you to write this in plain English, with punctuation and all: with imaplib.IMAP4_SSL(..) as i, closing(i): ... This one does call .close() regardless of errors, though, and possibly the kind of error may affect whether the commit succeeds or not. Not sure if that matters. -- Koos > On Oct 30, 2017, at 1:33 PM, Guido van Rossum wrote: >> >> But maybe if __exit__ is called with an exception it should roll back. >> >> In any case it looks like your proposal could break existing code (e.g. >> code that uses `with` depending on its current behavior, using some other >> logic to decide whether to commit or not) and that feels difficult to >> overcome. Perhaps a new method or flag argument can be added to request >> that the transactions be automatically committed? >> >> On Mon, Oct 30, 2017 at 10:20 AM, Drew wrote: >> >>> IMAP4.close closes the selected inbox and commits changes such as >>> deletions. It is not called on IMAP4.__exit__ (only logout is, which >>> doesn't call close in its call stack) however, so: >>> >>> with imaplib.IMAP4_SSL(...) as i: >>> ... >>> >>> would fail to commit those changes. close must be explicitly invoked >>> i.e. >>> >>> with imaplib.IMAP4_SSL(...) as i: >>> ... >>> i.close() >>> >>> This is counterintuitive however, as the with statement is meant to >>> automatically clean up. Another programmer might come along and delete >>> i.close() because it seems unnecessary. Now changes aren't being committed >>> and the programmer doesn't realize it's because of this weird Python >>> idiosyncracy. >>> >>> Python IO such as open commits changes automatically, so I'm not sure >>> why IMAP4 doesn't and only logs out. >>> >>> ______________________________ _________________ >>> Python-ideas mailing list >>> Python-ideas at python.org >>> https://mail.python.org/mailman/listinfo/python-ideas >>> Code of Conduct: http://python.org/psf/codeofconduct/ >>> >>> >> >> >> -- >> --Guido van Rossum ( python.org/~guido) >> > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Mon Oct 30 16:32:26 2017 From: guido at python.org (Guido van Rossum) Date: Mon, 30 Oct 2017 13:32:26 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: So what's your list? What would you put in requirements.txt? I'm fine with doing it in the form of requirements.txt -- I am worried that when push comes to shove, we won't be able to agree on what list of package names should go into that file. On Mon, Oct 30, 2017 at 12:08 PM, Juancarlo A?ez wrote: > > > On Mon, Oct 30, 2017 at 12:29 PM, Guido van Rossum > wrote: > >> What's your proposed process to arrive at the list of recommended >> packages? And is it really just going to be a list of names, or is there >> going to be some documentation (about the vetting, not about the contents >> of the packages) for each name? >> > > As I see it, the bootstrap would be a requirements.txt (requirements.pip) > with the packages that more experienced developers prefer to use over > stdlib, pinned to package versions known to work well with the CPython > release. > > I think this is a great idea, specially if it's an easy opt-in. > > For example, sometimes I go into a programming niche for a few years, and > I'd very much appreciate a curated list of packages when I move to another > niche. A pay-forward kind of thing. > > The downside to the idea is that the list of packages will need a curator > (which could be python.org). > > > > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett at python.org Mon Oct 30 16:40:42 2017 From: brett at python.org (Brett Cannon) Date: Mon, 30 Oct 2017 20:40:42 +0000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On Mon, 30 Oct 2017 at 03:36 Erik Bray wrote: > On Mon, Oct 30, 2017 at 11:27 AM, Erik Bray wrote: > > On Sun, Oct 29, 2017 at 8:45 PM, Alex Walters > wrote: > >> Then those users have more fundamental problems. There is a minimum > level > >> of computer knowledge needed to be successful in programming. > Insulating > >> users from the reality of the situation is not preparing them to be > >> successful. Pretending that there is no system command prompt, or > shell, or > >> whatever platform specific term applies, only hurts new programmers. > Give > >> users an error message they can google, and they will be better off in > the > >> long run than they would be if we just ran pip for them. > > > > While I completely agree with this in principle, I think you > > overestimate the average beginner. Many beginners I've taught or > > helped, even if they can manage to get to the correct command prompt, > > often don't even know how to run the correct Python. They might often > > have multiple Pythons installed on their system--maybe they have > > Anaconda, maybe Python installed by homebrew, or a Python that came > > with an IDE like Spyder. If they're on OSX often running "python" > > from the command prompt gives the system's crippled Python 2.6 and > > they don't know the difference. > > > I should add--another case that is becoming extremely common is > beginners learning Python for the first time inside the > Jupyter/IPython Notebook. And in my experience it can be very > difficult for beginners to understand the connection between what's > happening in the notebook ("it's in the web-browser--what does that > have to do with anything on my computer??") and the underlying Python > interpreter, file system, etc. Being able to pip install from within > the Notebook would be a big win. This is already possible since > IPython allows running system commands and it is possible to run the > pip executable from the notebook, then manually restart the Jupyter > kernel. > > It's not 100% clear to me how my proposal below would work within a > Jupyter Notebook, so that would also be an angle worth looking into. > I'm -1 on this as I view it as a tooling issue, not a language issue. If you're teaching in an environment where you don't want to instruct on the differences between the REPL and a command prompt, then that suggests to me that the environment you're presenting students needs to be different rather than asking Python to work around your teaching environment (remember, any time you ask for a change in Python you're asking the change to be applied to literally millions of developers). I also wouldn't want to tie Python-the-language to pip-the-tool so tightly. While we may make sure that pip is available for convenience, Python as a language is not dependent on pip being installed in order to function. Adding something that shells out to pip wouldn't suddenly change that relationship between language and tool. -Brett > > Best, > Erik > > > > One thing that has been a step in the right direction is moving more > > documentation toward preferring running `python -m pip` over just > > `pip`, since this often has a better guarantee of running `pip` in the > > Python interpreter you intended. But that still requires one to know > > how to run the correct Python interpreter from the command-line (which > > the newbie double-clicking on IDLE may not even have a concept of...). > > > > While I agree this is something that is important for beginners to > > learn (e.g. print(sys.executable) if in doubt), it *is* a high bar for > > many newbies just to install one or two packages from pip, which they > > often might need/want to do for whatever educational pursuit they're > > following (heck, it's pretty common even just to want to install the > > `requests` module, as I would never throw `urllib` at a beginner). > > > > So while I don't think anything proposed here will work technically, I > > am in favor of an in-interpreter pip install functionality. Perhaps > > it could work something like this: > > > > a) Allow it *only* in interactive mode: running `pip(...)` (or > > whatever this looks like) outside of interactive mode raises a > > `RuntimeError` with the appropriate documentation > > b) When running `pip(...)` the user is supplied with an interactive > > prompt explaining that since installing packages with `pip()` can > > result in changes to the interpreter, it is necessary to restart the > > interpreter after installation--give them an opportunity to cancel the > > action in case they have any work they need to save. If they proceed, > > install the new package then restart the interpreter for them. This > > avoids any ambiguity as to states of loaded modules before/after pip > > install. > > > > > > > >> From: Stephan Houben [mailto:stephanh42 at gmail.com] > >> Sent: Sunday, October 29, 2017 3:43 PM > >> To: Alex Walters > >> Cc: Python-Ideas > >> Subject: Re: [Python-ideas] install pip packages from Python prompt > >> > >> > >> > >> Hi Alex, > >> > >> > >> > >> 2017-10-29 20:26 GMT+01:00 Alex Walters : > >> > >> return ?Please run pip from your system command prompt? > >> > >> > >> > >> > >> > >> The target audience for my proposal are people who do not know > >> > >> which part of the sheep the "system command prompt" is. > >> > >> Stephan > >> > >> > >> > >> > >> > >> From: Python-ideas > >> [mailto:python-ideas-bounces+tritium-list=sdamon.com at python.org] On > Behalf > >> Of Stephan Houben > >> Sent: Sunday, October 29, 2017 3:19 PM > >> To: Python-Ideas > >> Subject: [Python-ideas] install pip packages from Python prompt > >> > >> > >> > >> Hi all, > >> > >> Here is in somewhat more detail my earlier proposal for > >> > >> having in the interactive Python interpreter a `pip` function to > >> > >> install packages from Pypi. > >> > >> Motivation: it appears to me that there is a category of newbies > >> > >> for which "open a shell and do `pip whatever`" is a bit too much. > >> > >> It would, in my opinion, simplify things a bit if they could just > >> > >> copy-and-paste some text into the Python interpreter and have > >> > >> some packages from pip installed. > >> > >> That would simplify instructions on how to install package xyz, > >> > >> without going into the vagaries of how to open a shell on various > >> > >> platforms, and how to get to the right pip executable. > >> > >> I think this could be as simple as: > >> > >> def pip(args): > >> import sys > >> import subprocess > >> subprocess.check_call([sys.executable, "-m", "pip"] + > args.split()) > >> > >> print("Please re-start Python now to use installed or upgraded > >> packages.") > >> > >> Note that I added the final message about restarting the interpreter > >> > >> as a low-tech solution to the problem of packages being already > >> > >> imported in the current Python session. > >> > >> I would imagine that the author of package xyz would then put on > >> > >> their webpage something like: > >> > >> To use, enter in your Python interpreter: > >> > >> pip("install xyz --user") > >> > >> As another example, consider prof. Baldwin from Woolamaloo university > >> > >> who teaches a course "Introductory Python programming for Sheep > Shavers". > >> > >> In his course material, he instructs his students to execute the > >> > >> following line in their Python interpreter. > >> > >> pip("install woolamaloo-sheepshavers-goodies --user") > >> > >> which will install a package which will in turn, as dependencies, > >> > >> pull in a number of packages which are relevant for sheep shaving but > >> > >> which have nevertheless irresponsibly been left outside the stdlib. > >> > >> Stephan > >> > >> > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Python-ideas mailing list > >> Python-ideas at python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > >> > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at cskk.id.au Mon Oct 30 16:51:02 2017 From: cs at cskk.id.au (Cameron Simpson) Date: Tue, 31 Oct 2017 07:51:02 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: Message-ID: <20171030205102.GA8894@cskk.homeip.net> On 30Oct2017 07:32, Guido van Rossum wrote: >This is a key example of a case where code speaks. Can you write an >implementation of how you would want single() to work in Python code? Myself, I'm not advocating for putting such a thing in itertools. However, I do have an equivalent utility function of my own that makes for more readable code, named "the". It sees far less use than I'd imagined it would, but it does read nicely to my eye when used. I have a few select-something from data where there _can_ be multiple hits i.e. the data format/structure support multiple matching results, but the caller's use case requires just one hit or failure. So I have some fuzzy-db-lookup functions which end with: return the(rows) or include: row = the(rows) and some HTML find-this-DOM-node code which ends with: return the(nodes) It's this kind of thing that expresses my intent better than the: node, = nodes return node idiom. And as remarked, you can embed the() in an expression. I don't think it ranks in "belongs in the stdlib". I do keep it about in a module for ready use though. If nothing else, it raises IndexErrors with distinct text for 0 and >1 failure. Cheers, Cameron Simpson (formerly cs at zip.com.au) From steve at pearwood.info Mon Oct 30 21:09:09 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 31 Oct 2017 12:09:09 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <20171030205102.GA8894@cskk.homeip.net> References: <20171030205102.GA8894@cskk.homeip.net> Message-ID: <20171031010909.GZ9068@ando.pearwood.info> On Tue, Oct 31, 2017 at 07:51:02AM +1100, Cameron Simpson wrote: > return the(nodes) > > It's this kind of thing that expresses my intent better than the: > > node, = nodes > return node > > idiom. If the intent is to indicate that there is only one node, then "the(nodes)" fails completely. "The" can refer to plurals as easily as singular: "Wash the dirty clothes." (Later) "Why did you only wash one sock?" The simplest implementation of this "single()" function I can think of would be: def single(iterable): result, = iterable return result That raises ValueError if iterable has too few or too many items, which I believe is the right exception to use. Conceptually, there's no indexing involved, so IndexError would be the wrong exception to use. We're expecting a compound value (an iterable) with exactly one item. If there's not exactly one item, that's a ValueError. -- Steve From chris.barker at noaa.gov Mon Oct 30 21:12:31 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Mon, 30 Oct 2017 18:12:31 -0700 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: <-738935701966126035@unknownmsgid> It's not 100% clear to me how my proposal below would work within a > Jupyter Notebook, so that would also be an angle worth looking into. > I'm -1 on this as I view it as a tooling issue, not a language issue. Agreed. And for the tool at hand, the notebook already controls the python interpreter? it could have a ?package installer? that you could run from the notebook, but NOT with python code in the notebook. If the Jupyter developers though it as a good idea. I also wouldn't want to tie Python-the-language to pip-the-tool so tightly. Exactly, I really wouldn?t want my students pip installing stuff from side the REPL in the conda-based environment I?ve set up for them.... -CHB -------------- next part -------------- An HTML attachment was scrubbed... URL: From vano at mail.mipt.ru Tue Oct 31 00:50:33 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Tue, 31 Oct 2017 07:50:33 +0300 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> Message-ID: <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> On 30.10.2017 17:32, Guido van Rossum wrote: > This is a key example of a case where code speaks. Can you write an > implementation of how you would want single() to work in Python code? > > On Mon, Oct 30, 2017 at 2:49 AM, Ivan Pozdeev via Python-ideas > > wrote: > > The initial post on the above link summarizes the suggested > implementation pretty well. > |defsingle(i): try: ||v =i.next() |||exceptStopIteration:||||raiseException('No values')|||try: ||i.next() ||exceptStopIteration: ||returnv||else: ||raiseException('Too many values')| ||printsingle(name forname in('bob','fred')ifname=='bob')||| | || -- Regards, Ivan From rosuav at gmail.com Tue Oct 31 00:56:10 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 31 Oct 2017 15:56:10 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> References: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> Message-ID: On Tue, Oct 31, 2017 at 3:50 PM, Ivan Pozdeev via Python-ideas wrote: > On 30.10.2017 17:32, Guido van Rossum wrote: >> >> This is a key example of a case where code speaks. Can you write an >> implementation of how you would want single() to work in Python code? >> >> On Mon, Oct 30, 2017 at 2:49 AM, Ivan Pozdeev via Python-ideas >> > wrote: >> >> The initial post on the above link summarizes the suggested >> implementation pretty well. >> > |defsingle(i): try: ||v =i.next() > |||exceptStopIteration:||||raiseException('No values')|||try: ||i.next() > ||exceptStopIteration: ||returnv||else: ||raiseException('Too many values')| > ||printsingle(name forname in('bob','fred')ifname=='bob')||| | > > || raise WhitespaceDamagedException from None ChrisA From njs at pobox.com Tue Oct 31 01:37:47 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 30 Oct 2017 22:37:47 -0700 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> Message-ID: On Mon, Oct 30, 2017 at 10:25 AM, Alexander Belopolsky wrote: > On Mon, Oct 30, 2017 at 11:44 AM, Nick Coghlan wrote: > .. >> 3. We can't replicate it as readily in the regular REPL, since that runs >> Python code directly in the current process, but even there I believe we >> could potentially trigger a full process restart via execve (or the C++ >> style _execve on Windows) > > This exact problem is solved rather elegantly in Julia. When you > upgrade a package that is already loaded in the REPL, it prints a > warning: > > "The following packages have been updated but were already imported: > ... Restart Julia to use the updated versions." > > listing the affected packages. > > See . This seems like the obvious solution to me too. Pip knows exactly which files it modified. The interpreter knows which packages have been imported. Having the REPL provide a friendly interface that ran pip and then compared the lists would need some coordination between the projects but wouldn't be rocket science, and would be *much* more new-user-friendly than the current system. (Also, I'm kind of grossed out by the attitude that it's a good thing to drive people away by giving a bad first impression. Sure the shell is worth learning, but it can wait until you actually need it. If you make people fail for opaque reasons on basic tasks then the lesson they learn isn't "oh I need to learn the shell", it's "oh I must be stupid / maybe girls really can't do programming / I should give up".) If you want to support conda too then cool, conda can install a site.py that provides a conda() builtin that uses the same machinery. -n -- Nathaniel J. Smith -- https://vorpus.org From njs at pobox.com Tue Oct 31 02:02:08 2017 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 30 Oct 2017 23:02:08 -0700 Subject: [Python-ideas] Add processor generation to wheel metadata In-Reply-To: <1a58d84a-23e0-3533-417e-b2f27090ddc1@mail.mipt.ru> References: <1a58d84a-23e0-3533-417e-b2f27090ddc1@mail.mipt.ru> Message-ID: On Mon, Oct 30, 2017 at 5:45 AM, Ivan Pozdeev via Python-ideas wrote: > Generally, packages are compiled for the same processor generation as the > corresponding Python. > But not always -- e.g. NumPy opted for SSE2 even for Py2 to work around some > compiler bug > (https://github.com/numpy/numpy/issues/6428). > I was bitten by that at an old machine once and found out that there is no > way for `pip' to have checked for that. > Besides, performance-oriented packages like the one mentioned could probably > benefit from newer instructions. You should probably resend this to distutils-sig instead of python-ideas -- that's where discussions about python packaging happen. (Python-ideas is more for discussions about the language itself.) -n -- Nathaniel J. Smith -- https://vorpus.org From vano at mail.mipt.ru Tue Oct 31 02:18:48 2017 From: vano at mail.mipt.ru (Ivan Pozdeev) Date: Tue, 31 Oct 2017 09:18:48 +0300 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: Message-ID: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> On 31.10.2017 8:37, python-ideas-request at python.org wrote: > On Tue, Oct 31, 2017 at 3:50 PM, Ivan Pozdeev via Python-ideas > wrote: >> On 30.10.2017 17:32, Guido van Rossum wrote: >>> This is a key example of a case where code speaks. Can you write an >>> implementation of how you would want single() to work in Python code? >>> >>> On Mon, Oct 30, 2017 at 2:49 AM, Ivan Pozdeev via Python-ideas >>> > wrote: >>> >>> The initial post on the above link summarizes the suggested >>> implementation pretty well. >>> >> |defsingle(i): try: ||v =i.next() >> |||exceptStopIteration:||||raiseException('No values')|||try: ||i.next() >> ||exceptStopIteration: ||returnv||else: ||raiseException('Too many values')| >> ||printsingle(name forname in('bob','fred')ifname=='bob')||| | >> >> || > raise WhitespaceDamagedException from None > > ChrisA Thunderbird jerked on me big time. It never did anything like this before! Switched off Digest mode, individual messages aren't so complicated. def single(i): ??? try: ??????? v =i.next() ??? except StopIteration: ??????? raise ValueError('No items') ??? try: ??????? i.next() ??? except StopIteration: ??????? return v ??? else: ??????? raise ValueError('More than one item') print single(name for name in('bob','fred') if name=='bob') -- Regards, Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Oct 31 03:02:34 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 31 Oct 2017 18:02:34 +1100 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> Message-ID: On Tue, Oct 31, 2017 at 5:18 PM, Ivan Pozdeev via Python-ideas wrote: >> raise WhitespaceDamagedException from None > > > Thunderbird jerked on me big time. It never did anything like this before! > Switched off Digest mode, individual messages aren't so complicated. > > def single(i): > try: > v =i.next() > except StopIteration: > raise ValueError('No items') > try: > i.next() > except StopIteration: > return v > else: > raise ValueError('More than one item') > > print single(name for name in('bob','fred') if name=='bob') > Thanks :) One small change: If you use next(i) instead of i.next(), your code should work on both Py2 and Py3. But other than that, I think it's exactly the same as most people would expect of this function. ChrisA From steve at pearwood.info Tue Oct 31 03:46:48 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 31 Oct 2017 18:46:48 +1100 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> Message-ID: <20171031074647.GB9068@ando.pearwood.info> On Tue, Oct 31, 2017 at 06:02:34PM +1100, Chris Angelico wrote: > > def single(i): > > try: > > v =i.next() > > except StopIteration: > > raise ValueError('No items') > > try: > > i.next() > > except StopIteration: > > return v > > else: > > raise ValueError('More than one item') > > > > print single(name for name in('bob','fred') if name=='bob') Seems like an awfully complicated way to do by hand what Python already does for you with sequence unpacking. Why re-invent the wheel? > Thanks :) > > One small change: If you use next(i) instead of i.next(), your code > should work on both Py2 and Py3. But other than that, I think it's > exactly the same as most people would expect of this function. Not me. As far as I can tell, that's semantically equivalent to: def single(i): result, = i return result apart from slightly different error messages. -- Steve From k7hoven at gmail.com Tue Oct 31 03:53:54 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 31 Oct 2017 09:53:54 +0200 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> References: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> Message-ID: > > > |defsingle(i): try: ||v =i.next() |||exceptStopIteration:||||raiseException('No > values')|||try: ||i.next() ||exceptStopIteration: ||returnv||else: > ||raiseException('Too many values')| > ||printsingle(name forname in('bob','fred')ifname=='bob')||| | > > ?Now that looks seriously weird. Oh wait, I know, it must be a regular expression! Perhaps mixed with Perl? To figure out what it does, we could try compiling it and throwing input at it, or perhaps more simply by just reverse engineering the implementation. ??Koos? ?? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosuav at gmail.com Tue Oct 31 04:01:34 2017 From: rosuav at gmail.com (Chris Angelico) Date: Tue, 31 Oct 2017 19:01:34 +1100 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: <20171031074647.GB9068@ando.pearwood.info> References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: On Tue, Oct 31, 2017 at 6:46 PM, Steven D'Aprano wrote: > On Tue, Oct 31, 2017 at 06:02:34PM +1100, Chris Angelico wrote: >> One small change: If you use next(i) instead of i.next(), your code >> should work on both Py2 and Py3. But other than that, I think it's >> exactly the same as most people would expect of this function. > > Not me. As far as I can tell, that's semantically equivalent to: > > def single(i): > result, = i > return result > > apart from slightly different error messages. I saw the original code as being like the itertools explanatory functions - you wouldn't actually USE those functions, but they tell you what's going on when you use the simpler, faster, more compact form. ChrisA From k7hoven at gmail.com Tue Oct 31 04:54:41 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 31 Oct 2017 10:54:41 +0200 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: On Tue, Oct 31, 2017 at 10:01 AM, Chris Angelico wrote: > On Tue, Oct 31, 2017 at 6:46 PM, Steven D'Aprano > wrote: > > On Tue, Oct 31, 2017 at 06:02:34PM +1100, Chris Angelico wrote: > >> One small change: If you use next(i) instead of i.next(), your code > >> should work on both Py2 and Py3. But other than that, I think it's > >> exactly the same as most people would expect of this function. > > > > Not me. As far as I can tell, that's semantically equivalent to: > > > > def single(i): > > result, = i > > return result > > > > apart from slightly different error messages. > > I saw the original code as being like the itertools explanatory > functions - you wouldn't actually USE those functions, but they tell > you what's going on when you use the simpler, faster, more compact > form. > ?I wonder if that's more easily understood if you write it along these line(s): (the_bob,) = ?(name for name in ('bob','fred') if name=='bob') People need to learn about how to make a 1-tuple quite early on anyway, and omitting the parentheses doesn't really help there, AFAICT. Then again, the idiom looks even better when doing a, b = find_complex_roots(polynomial_of_second_order) Except of course that I couldn't really come up with a good example of something that is expected to find exactly two values from a larger collection, and the students are already coming into the lecture hall. Or should it be (a, b,) = find_complex_roots(polynomial_of_second_order) ? ??Koos -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From encukou at gmail.com Tue Oct 31 05:24:41 2017 From: encukou at gmail.com (Petr Viktorin) Date: Tue, 31 Oct 2017 10:24:41 +0100 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: On 10/31/2017 09:54 AM, Koos Zevenhoven wrote: > On Tue, Oct 31, 2017 at 10:01 AM, Chris Angelico >wrote: > > On Tue, Oct 31, 2017 at 6:46 PM, Steven D'Aprano > > wrote: > > On Tue, Oct 31, 2017 at 06:02:34PM +1100, Chris Angelico wrote: > >> One small change: If you use next(i) instead of i.next(), your code > >> should work on both Py2 and Py3. But other than that, I think it's > >> exactly the same as most people would expect of this function. > > > > Not me. As far as I can tell, that's semantically equivalent to: > > > > def single(i): > >? ? ?result, = i > >? ? ?return result > > > > apart from slightly different error messages. > > I saw the original code as being like the itertools explanatory > functions - you wouldn't actually USE those functions, but they tell > you what's going on when you use the simpler, faster, more compact > form. > > > ?I wonder if that's more easily understood if you write it along these > line(s): > > ? (the_bob,) = ?(name for name in ('bob','fred') if name=='bob') There are (unfortunately) several ways to do it. I prefer one that avoids a trailing comma: [the_bob] = ?(name for name in ('bob','fred') if name=='bob') From greg.ewing at canterbury.ac.nz Tue Oct 31 05:50:59 2017 From: greg.ewing at canterbury.ac.nz (Greg Ewing) Date: Tue, 31 Oct 2017 22:50:59 +1300 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: <2e42329e-294c-0143-de80-43617d658866@mail.mipt.ru> <91c972d3-486b-60fd-69a3-c146ae634fa8@mail.mipt.ru> <59F830C9.1000202@canterbury.ac.nz> Message-ID: <59F84783.1020101@canterbury.ac.nz> An HTML attachment was scrubbed... URL: From k7hoven at gmail.com Tue Oct 31 06:17:02 2017 From: k7hoven at gmail.com (Koos Zevenhoven) Date: Tue, 31 Oct 2017 12:17:02 +0200 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: On Tue, Oct 31, 2017 at 11:24 AM, Petr Viktorin wrote: > On 10/31/2017 09:54 AM, Koos Zevenhoven wrote: >> >> >> ?I wonder if that's more easily understood if you write it along these >> line(s): >> >> (the_bob,) = ?(name for name in ('bob','fred') if name=='bob') >> > > There are (unfortunately) several ways to do it. I prefer one that avoids > a trailing comma: > > [the_bob] = ?(name for name in ('bob','fred') if name=='bob') > > ?Maybe it's just me, but somehow that list-like syntax as an assignment target feels wrong in somewhat the same way that (1, 2).append(3) does.? ??Koos ? PS. In your previous email, something (your email client?) removed the vertical line from the quoted Chris's email, so it looks like just an indented block. I wonder if a setting could fix that.? -- + Koos Zevenhoven + http://twitter.com/k7hoven + -------------- next part -------------- An HTML attachment was scrubbed... URL: From elazarg at gmail.com Tue Oct 31 06:31:50 2017 From: elazarg at gmail.com (=?UTF-8?B?15DXnNei15bXqA==?=) Date: Tue, 31 Oct 2017 10:31:50 +0000 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: On Tue, Oct 31, 2017 at 12:18 PM Koos Zevenhoven wrote: > On Tue, Oct 31, 2017 at 11:24 AM, Petr Viktorin wrote: > >> On 10/31/2017 09:54 AM, Koos Zevenhoven wrote: >>> >>> >>> ?I wonder if that's more easily understood if you write it along these >>> line(s): >>> >>> (the_bob,) = ?(name for name in ('bob','fred') if name=='bob') >>> >> >> There are (unfortunately) several ways to do it. I prefer one that avoids >> a trailing comma: >> >> [the_bob] = ?(name for name in ('bob','fred') if name=='bob') >> >> > ?Maybe it's just me, but somehow that list-like syntax as an assignment > target feels wrong in somewhat the same way that (1, 2).append(3) does.? > > Off topic: why can't we simply allow something like this: (the_bob) = (name for name in ('bob','fred') if name=='bob') Why does Python treat the parenthesis at the LHS as grouping parens? operators are not allowed anyway; (a + (b + c)) = [1] is syntax error. Currently (x) = 1 works, but I can't see why should it. Elazar -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 31 07:42:24 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Tue, 31 Oct 2017 21:42:24 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 31 October 2017 at 02:29, Guido van Rossum wrote: > What's your proposed process to arrive at the list of recommended packages? > I'm thinking it makes the most sense to treat inclusion in the recommended packages list as a possible outcome of proposals for standard library inclusion, rather than being something we'd provide a way to propose specifically. We'd only use it in cases where a proposal would otherwise meet the criteria for stdlib inclusion, but the logistics of actually doing so don't work for some reason. Running the initial 5 proposals through that filter: * six: a cross-version compatibility layer clearly needs to be outside the standard library * setuptools: we want to update this in line with the PyPA interop specs, not the Python language version * cffi: updates may be needed for PyPA interop specs, Python implementation updates or C language definition updates * requests: updates are more likely to be driven by changes in network protocols and client platform APIs than Python language changes * regex: we don't want two regex engines in the stdlib, transparently replacing _sre would be difficult, and _sre is still good enough for most purposes Of the 5, I'd suggest that regex is the only one that could potentially still make its way into the standard library some day - it would just require someone with both the time and inclination to create a CPython variant that used _regex instead of _sre as the default regex engine, and then gathered evidence to show that it was "compatible enough" with _sre to serve as the default engine for CPython. For the first four, there are compelling arguments that their drivers for new feature additions are such that their release cycles shouldn't ever be tied to the rate at which we update the Python language definition. > And is it really just going to be a list of names, or is there going to be > some documentation (about the vetting, not about the contents of the > packages) for each name? > I'm thinking a new subsection in https://docs.python.org/devguide/stdlibchanges.html for "Recommended Third Party Packages" would make sense, covering what I wrote above. It also occurred to me that since the recommendations are independent of the Python version, they don't really belong in the version specific documentation. While the Developer's Guide isn't really the right place for the list either (except as an easier way to answer "Why isn't in the standard library?" questions), it could be a good interim option until I get around to actually writing a first draft of https://github.com/python/redistributor-guide/ (which I was talking to Barry about at the dev sprint, but didn't end up actually creating any content for since I went down a signal handling rabbit hole instead). Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsbueno at python.org.br Tue Oct 31 08:42:23 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 31 Oct 2017 10:42:23 -0200 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <20171031010909.GZ9068@ando.pearwood.info> References: <20171030205102.GA8894@cskk.homeip.net> <20171031010909.GZ9068@ando.pearwood.info> Message-ID: When I need something like this, I usually rop a line on the module namespace that goes like: first = lambda x: next(iter(x)) On 30 October 2017 at 23:09, Steven D'Aprano wrote: > On Tue, Oct 31, 2017 at 07:51:02AM +1100, Cameron Simpson wrote: > >> return the(nodes) >> >> It's this kind of thing that expresses my intent better than the: >> >> node, = nodes >> return node >> >> idiom. > > If the intent is to indicate that there is only one node, then > "the(nodes)" fails completely. "The" can refer to plurals as easily as > singular: > > "Wash the dirty clothes." > (Later) "Why did you only wash one sock?" > > > The simplest implementation of this "single()" function I can think of > would be: > > def single(iterable): > result, = iterable > return result > > > That raises ValueError if iterable has too few or too many items, which > I believe is the right exception to use. Conceptually, there's no > indexing involved, so IndexError would be the wrong exception to use. > We're expecting a compound value (an iterable) with exactly one item. If > there's not exactly one item, that's a ValueError. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From steve at pearwood.info Tue Oct 31 08:52:31 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Tue, 31 Oct 2017 23:52:31 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: <20171030205102.GA8894@cskk.homeip.net> <20171031010909.GZ9068@ando.pearwood.info> Message-ID: <20171031125227.GD9068@ando.pearwood.info> On Tue, Oct 31, 2017 at 10:42:23AM -0200, Joao S. O. Bueno wrote: > When I need something like this, I usually rop a line on the module > namespace that goes like: > > first = lambda x: next(iter(x)) That doesn't meet the requirement that x has ONLY one item. And using lambda like that is bad style. This would be better: def first(x): return next(iter(x)) and now first has a proper __name__. -- Steve From jsbueno at python.org.br Tue Oct 31 08:58:30 2017 From: jsbueno at python.org.br (Joao S. O. Bueno) Date: Tue, 31 Oct 2017 10:58:30 -0200 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <20171031125227.GD9068@ando.pearwood.info> References: <20171030205102.GA8894@cskk.homeip.net> <20171031010909.GZ9068@ando.pearwood.info> <20171031125227.GD9068@ando.pearwood.info> Message-ID: On 31 October 2017 at 10:52, Steven D'Aprano wrote: > On Tue, Oct 31, 2017 at 10:42:23AM -0200, Joao S. O. Bueno wrote: >> When I need something like this, I usually rop a line on the module >> namespace that goes like: >> >> first = lambda x: next(iter(x)) > > That doesn't meet the requirement that x has ONLY one item. > > And using lambda like that is bad style. This would be better: > > def first(x): return next(iter(x)) > > and now first has a proper __name__. I know that. But then, I'd rather write it as 3-4 lines in some utils module. So, although I was initially -1 to -0 on this suggestion, maybe it has a point. > > > > -- > Steve > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From fakedme+py at gmail.com Tue Oct 31 09:03:23 2017 From: fakedme+py at gmail.com (Soni L.) Date: Tue, 31 Oct 2017 11:03:23 -0200 Subject: [Python-ideas] Add single() to itertools In-Reply-To: References: <20171030205102.GA8894@cskk.homeip.net> <20171031010909.GZ9068@ando.pearwood.info> <20171031125227.GD9068@ando.pearwood.info> Message-ID: On 2017-10-31 10:58 AM, Joao S. O. Bueno wrote: > On 31 October 2017 at 10:52, Steven D'Aprano wrote: >> On Tue, Oct 31, 2017 at 10:42:23AM -0200, Joao S. O. Bueno wrote: >>> When I need something like this, I usually rop a line on the module >>> namespace that goes like: >>> >>> first = lambda x: next(iter(x)) >> That doesn't meet the requirement that x has ONLY one item. >> >> And using lambda like that is bad style. This would be better: >> >> def first(x): return next(iter(x)) >> >> and now first has a proper __name__. > > I know that. But then, I'd rather write it as 3-4 lines in some utils module. > > So, although I was initially -1 to -0 on this suggestion, maybe it has a point. Plop this one-liner somewhere: exec('def single(x):\n [v] = x\n return v') > >> >> >> -- >> Steve >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ From ncoghlan at gmail.com Tue Oct 31 10:41:10 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 1 Nov 2017 00:41:10 +1000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <045101d351b9$54a25240$fde6f6c0$@sdamon.com> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> Message-ID: On 31 October 2017 at 05:57, Alex Walters wrote: > > While I completely agree with this in principle, I think you > > overestimate the average beginner. > > Nope. I totally get that they don?t know what a shell or command prompt > is. THEY. NEED. TO. LEARN. Hiding it is not a good idea for anyone. If > this is an insurmountable problem for the newbie, maybe they really > shouldn?t be attempting to program. This field is not for everyone. > We're not in the business of making judgements about who should and shouldn't become Python programmers - we're in the business of making sure that Python is accessible to as many people as possible by removing irrelevant barriers to adoption, whether that's translating documentation so folks can start learning with instructions in their native language, or making it possible for them to defer learning the idiosyncrasies of the Windows, Linux, and Mac OS X command line environments. On the latter front, the details of the development interfaces offered by traditional desktop operating systems may *never* be relevant to the new generation of folks coming through that are learning to program by manipulating remote coding environments on tablets and other app-centric devices, just as most developers nowadays are able to get by without even learning C, let alone any flavour of assembly language. Our role in this process isn't to create future generations that think and work in exactly the same ways we do, it's to enable them to discover new ways of working that build on top of whatever we create. Jupyter notebooks are a decent example of this, where the difference between a Python statement and a "command line statement" is just an exclamation mark at the beginning of the line - exactly where the backing environment lives is mostly a hidden implementation detail from the user's perspective. Eclipse Che and other online coding environments are another case - there, the "command line" is a different window inside the editor app (this is also going to be a familiar option for heavy IDE users on traditional desktop operating systems). And putting it in those terms makes me think that we should explicitly exclude the default REPL from consideration here, as we've long taken the position that that *isn't* a good teaching environment, and you certainly can't access it remotely without some kind of other service in front of it to manage the network connection (even if that service is just ssh). That means I now see a few potential RFEs from this thread: 1. An import system feature that allows a running Python program to report a timestamp (with the same granularity as pyc file timestamp headers) for *when* the currently loaded modules were last modified. This could be as simple as a new `__mtime__` attribute in each module to store that number. 2. A new importlib.util API to check for potentially out of date modules in sys.modules (those where a freshly calculated module mtime doesn't match the stored __mtime__ attribute) 3. Support in IDLE for Jupyter-style "!" commands 4. Having IDLE call that importlib API and warn about any stale modules after each command line operation The first two features would be about enabling learning environments to more easily detect when the currently loaded modules may not match what's actually on disk (hot reloaders already do this by watching for filesystem changes, but we're currently missing a simpler polling based alternative that will also pick up package updates). The second two would be about enhancing IDLE's capabilities in this area, as we *do* suggest that as a reasonable initial learning environment, even though there are also plenty of alternatives out there now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 31 10:53:08 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Oct 2017 07:53:08 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan wrote: > On 31 October 2017 at 02:29, Guido van Rossum wrote: > >> What's your proposed process to arrive at the list of recommended >> packages? >> > > I'm thinking it makes the most sense to treat inclusion in the recommended > packages list as a possible outcome of proposals for standard library > inclusion, rather than being something we'd provide a way to propose > specifically. > I don't think that gets you off the hook for a process proposal. We need some criteria to explain why a module should be on the recommended list -- not just a ruling as to why it shouldn't be in the stdlib. > We'd only use it in cases where a proposal would otherwise meet the > criteria for stdlib inclusion, but the logistics of actually doing so don't > work for some reason. > But that would exclude most of the modules you mention below, since one of the criteria is that their development speed be matched with Python's release cycle. I think there must be some form of "popularity" combined with "best of breed". In particular I'd like to have a rule that explains why flask and Django would never make the list. (I don't know what that rule is, or I would tell you -- my gut tells me it's something to do with having their own community *and* competing for the same spot.) Running the initial 5 proposals through that filter: > > * six: a cross-version compatibility layer clearly needs to be outside the > standard library > Hm... Does six still change regularly? If not I think it *would* be a candidate for actual stdlib inclusion. Just like we added u"..." literals to Python 3.4. > * setuptools: we want to update this in line with the PyPA interop specs, > not the Python language version > But does that exclude stdlib inclusion? Why would those specs change, and why couldn't they wait for a new Python release? > * cffi: updates may be needed for PyPA interop specs, Python > implementation updates or C language definition updates > Hm, again, I don't recall that this was debated -- I think it's a failure that it's not in the stdlib. > * requests: updates are more likely to be driven by changes in network > protocols and client platform APIs than Python language changes > Here I agree. There's no alternative (except aiohttp, but that's asyncio-based) and it can't be in the stdlib because it's actively being developed. > * regex: we don't want two regex engines in the stdlib, transparently > replacing _sre would be difficult, and _sre is still good enough for most > purposes > I think this needn't be recommended at all. For 99.9% of regular expression uses, re is just fine. Let's just work on a strategy for introducing regex into the stdlib. > Of the 5, I'd suggest that regex is the only one that could potentially > still make its way into the standard library some day - it would just > require someone with both the time and inclination to create a CPython > variant that used _regex instead of _sre as the default regex engine, and > then gathered evidence to show that it was "compatible enough" with _sre to > serve as the default engine for CPython. > > For the first four, there are compelling arguments that their drivers for > new feature additions are such that their release cycles shouldn't ever be > tied to the rate at which we update the Python language definition. > As you can tell from my arguing, the reasons need to be written up in more detail. > And is it really just going to be a list of names, or is there going to be >> some documentation (about the vetting, not about the contents of the >> packages) for each name? >> > > I'm thinking a new subsection in https://docs.python.org/ > devguide/stdlibchanges.html for "Recommended Third Party Packages" would > make sense, covering what I wrote above. > That's too well hidden for my taste. > It also occurred to me that since the recommendations are independent of > the Python version, they don't really belong in the version specific > documentation. > But that doesn't mean they can't (also) be listed there. (And each probably has its version dependencies.) > While the Developer's Guide isn't really the right place for the list > either (except as an easier way to answer "Why isn't in the standard > library?" questions), it could be a good interim option until I get around > to actually writing a first draft of https://github.com/python/ > redistributor-guide/ (which I was talking to Barry about at the dev > sprint, but didn't end up actually creating any content for since I went > down a signal handling rabbit hole instead). > Hm, let's not put more arbitrary check boxes in the way of progress. Maybe it can be an informational PEP that's occasionally updated? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 31 11:41:46 2017 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Tue, 31 Oct 2017 08:41:46 -0700 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> Message-ID: <4409775187778125080@unknownmsgid> > Nope. I totally get that they don?t know what a shell or command prompt > is. THEY. NEED. TO. LEARN. Hiding it is not a good idea for anyone. I actually take this approach myself in my classes. However, I also have as prerequisites for my classes: Some Experience in some programming language And Basic familiarity with the command line. I then let them use whatever dev. Environment they want, while supporting and recommending a good editor and the command line. However, If people want to learn python that don?t have those prerequisites, then we suggest a different class designed for total newbies. In THAT class, we use a more proscribed dev environment so that everyone is doing the same thing in the same way. It was IDLE, and has lately been PyCharm. And the intro to data analytics class uses Anaconda and the Jupyter notebook. My point? We're not in the business of making judgements about who should and shouldn't become Python programmers - we're in the business of making sure that Python is accessible to as many people as possible by removing irrelevant barriers to adoption, Sure, but who is ?we?? I think ?we? is the python community, not the cPython developers. So providing an environment that makes it easy and obvious to install packages is a great idea, but I think it?s the job of IDEs and other higher level tools, not the REPL. If we need to add a feature to Python itself to make it easier for IDEs and the like to implement dynamic package adding, then by all means, let?s do it. Final note: I DO see a lot of questions ( on mailing lists, etc) from folks that try to type ?pip install something? at the python command line. whether that's translating documentation so folks can start learning with instructions in their native language, or making it possible for them to defer learning the idiosyncrasies of the Windows, Linux, and Mac OS X command line environments. On the latter front, the details of the development interfaces offered by traditional desktop operating systems may *never* be relevant to the new generation of folks coming through that are learning to program by manipulating remote coding environments on tablets and other app-centric devices, just as most developers nowadays are able to get by without even learning C, let alone any flavour of assembly language. Our role in this process isn't to create future generations that think and work in exactly the same ways we do, it's to enable them to discover new ways of working that build on top of whatever we create. Jupyter notebooks are a decent example of this, where the difference between a Python statement and a "command line statement" is just an exclamation mark at the beginning of the line - exactly where the backing environment lives is mostly a hidden implementation detail from the user's perspective. Eclipse Che and other online coding environments are another case - there, the "command line" is a different window inside the editor app (this is also going to be a familiar option for heavy IDE users on traditional desktop operating systems). And putting it in those terms makes me think that we should explicitly exclude the default REPL from consideration here, as we've long taken the position that that *isn't* a good teaching environment, and you certainly can't access it remotely without some kind of other service in front of it to manage the network connection (even if that service is just ssh). That means I now see a few potential RFEs from this thread: 1. An import system feature that allows a running Python program to report a timestamp (with the same granularity as pyc file timestamp headers) for *when* the currently loaded modules were last modified. This could be as simple as a new `__mtime__` attribute in each module to store that number. 2. A new importlib.util API to check for potentially out of date modules in sys.modules (those where a freshly calculated module mtime doesn't match the stored __mtime__ attribute) 3. Support in IDLE for Jupyter-style "!" commands 4. Having IDLE call that importlib API and warn about any stale modules after each command line operation The first two features would be about enabling learning environments to more easily detect when the currently loaded modules may not match what's actually on disk (hot reloaders already do this by watching for filesystem changes, but we're currently missing a simpler polling based alternative that will also pick up package updates). The second two would be about enhancing IDLE's capabilities in this area, as we *do* suggest that as a reasonable initial learning environment, even though there are also plenty of alternatives out there now. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas at python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 31 11:49:12 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 31 Oct 2017 08:49:12 -0700 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: <4409775187778125080@unknownmsgid> References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: oops, hit the send button too soon... here's some more: On Tue, Oct 31, 2017 at 8:41 AM, Chris Barker - NOAA Federal < chris.barker at noaa.gov> wrote: > > >> Nope. I totally get that they don?t know what a shell or command prompt >> is. THEY. NEED. TO. LEARN. Hiding it is not a good idea for anyone. > > > I actually take this approach myself in my classes. However, I also have > as prerequisites for my classes: > > Some Experience in some programming language > > And > > Basic familiarity with the command line. > > I then let them use whatever dev. Environment they want, while supporting > and recommending a good editor and the command line. > > However, If people want to learn python that don?t have those > prerequisites, then we suggest a different class designed for total newbies. > > In THAT class, we use a more proscribed dev environment so that everyone > is doing the same thing in the same way. It was IDLE, and has lately been > PyCharm. > > And the intro to data analytics class uses Anaconda and the Jupyter > notebook. > > My point? > > We're not in the business of making judgements about who should and > shouldn't become Python programmers - we're in the business of making sure > that Python is accessible to as many people as possible by removing > irrelevant barriers to adoption, > > > Sure, but who is ?we?? I think ?we? is the python community, not the > cPython developers. > > So providing an environment that makes it easy and obvious to install > packages is a great idea, but I think it?s the job of IDEs and other higher > level tools, not the REPL. > > If we need to add a feature to Python itself to make it easier for IDEs > and the like to implement dynamic package adding, then by all means, let?s > do it. > > Final note: > > I DO see a lot of questions ( on mailing lists, etc) from folks that try > to type ?pip install something? at the python command line. > And sure, some of those are completly clueless about what a command line is and how to use it, but others DO have an idea about the command line, but dont know that: >>> pip install something File "", line 1 pip install something ^ SyntaxError: invalid syntax means: "this was supposed to be run at the command prompt" So I think defining a "pip" builtin that simply gave a helpful message would be a good start. (hmm, it's a syntax error, so not as simple as a builtin -- but it could be caught somehow to give a better message) At the end of the day, python is an open source programming language -- it simply is NOT ever going to provide one complete well integrated environment --we'll just have to live with that. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ncoghlan at gmail.com Tue Oct 31 12:19:46 2017 From: ncoghlan at gmail.com (Nick Coghlan) Date: Wed, 1 Nov 2017 02:19:46 +1000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On 1 November 2017 at 00:53, Guido van Rossum wrote: > On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan wrote: > >> On 31 October 2017 at 02:29, Guido van Rossum wrote: >> >>> What's your proposed process to arrive at the list of recommended >>> packages? >>> >> >> I'm thinking it makes the most sense to treat inclusion in the >> recommended packages list as a possible outcome of proposals for standard >> library inclusion, rather than being something we'd provide a way to >> propose specifically. >> > > I don't think that gets you off the hook for a process proposal. We need > some criteria to explain why a module should be on the recommended list -- > not just a ruling as to why it shouldn't be in the stdlib. > The developer guide already has couple of sections on this aspect: * https://devguide.python.org/stdlibchanges/#acceptable-types-of-modules * https://devguide.python.org/stdlibchanges/#requirements I don't think either of those sections is actually quite right (since we've approved new modules that wouldn't meet them), but they're not terrible as a starting point in general, and they're accurate for the recommended packages use case. > We'd only use it in cases where a proposal would otherwise meet the >> criteria for stdlib inclusion, but the logistics of actually doing so don't >> work for some reason. >> > > But that would exclude most of the modules you mention below, since one of > the criteria is that their development speed be matched with Python's > release cycle. I think there must be some form of "popularity" combined > with "best of breed". In particular I'd like to have a rule that explains > why flask and Django would never make the list. (I don't know what that > rule is, or I would tell you -- my gut tells me it's something to do with > having their own community *and* competing for the same spot.) > The developer guide words this as "The module needs to be considered best-of-breed.". In some categories (like WSGI frameworks), there are inherent trade-offs that mean there will *never* be a single best-of-breed solution, since projects like Django, Flask, and Pyramid occupy deliberately different points in the space of available design decisions. Running the initial 5 proposals through that filter: >> >> * six: a cross-version compatibility layer clearly needs to be outside >> the standard library >> > > Hm... Does six still change regularly? If not I think it *would* be a > candidate for actual stdlib inclusion. Just like we added u"..." literals > to Python 3.4. > It still changes as folks porting new projects discover additional discrepancies between the 2.x and 3.x standard library layouts and behaviour (e.g. we found recently that the 3.x subprocess module's emulation of the old commands module APIs actually bit shifts the status codes relative to the 2.7 versions). The rate of change has definitely slowed down a lot since the early days, but it isn't zero. In addition, the only folks that need it are those that already care about older versions of Python - if you can start with whatever the latest version of Python is, and don't have any reason to support users still running older version, you can happily pretend those older versions don't exist, and hence don't need a compatibility library like six. As a separate library, it can just gracefully fade away as folks stop depending on it as they drop Python 2.7 support. By contrast, if we were to bring it into the 3.x standard library, then we'd eventually have to figure out when we could deprecate and remove it again. > * setuptools: we want to update this in line with the PyPA interop specs, >> not the Python language version >> > > But does that exclude stdlib inclusion? Why would those specs change, and > why couldn't they wait for a new Python release? > The specs mainly change when we want to offer publishers new capabilities while still maintaining compatibility with older installation clients (and vice-versa: we want folks still running Python 2.7 to be able to publish wheel files and use recently added metadata fields like Description-Content-Type). The reason we can't wait for new Python releases is because when we add such things, we need them to work on *all* supported Python releases (including 2.7 and security-release-only 3.x versions). There are also other drivers for setuptools updates, which include: - operating system build toolchain changes (e.g. finding new versions of Visual Studio or XCode) - changes to PyPI's operations (e.g. the legacy upload API getting turned off due to persistent service stability problems, switching to HTTPS only access) With setuptools as a separate project, a whole lot of package publication problems can be solved via "pip install --upgrade setuptools wheel" in a virtual environment, which is a luxury we don't have with plain distutils. > * cffi: updates may be needed for PyPA interop specs, Python > implementation updates or C language definition updates > > Hm, again, I don't recall that this was debated -- I think it's a failure > that it's not in the stdlib. > A couple of years ago I would have agreed with you, but I've spent enough time on packaging problems now to realise that cffi actually qualifies as a build tool due to the way it generates extension module wrappers when used in "out-of-line" mode. Being outside the standard library means that cffi still has significant flexibility to evolve how it separates its buildtime functionality from its runtime functionality, and may eventually be adjusted so that only "_cffi_backend" needs to be installed at runtime for the out-of-line compilation mode, without the full C header parsing and inline extension module compilation capabilities of CFFI itself (see https://cffi.readthedocs.io/en/latest/cdef.html#preparing-and-distributing-modules for the details). Being separate also means that cffi can be updated to generate more efficient code even for existing Python versions. > * requests: updates are more likely to be driven by changes in network >> protocols and client platform APIs than Python language changes >> > > Here I agree. There's no alternative (except aiohttp, but that's > asyncio-based) and it can't be in the stdlib because it's actively being > developed. > > >> * regex: we don't want two regex engines in the stdlib, transparently >> replacing _sre would be difficult, and _sre is still good enough for most >> purposes >> > > I think this needn't be recommended at all. For 99.9% of regular > expression uses, re is just fine. Let's just work on a strategy for > introducing regex into the stdlib. > Given your informational PEP suggestion below, I'd probably still include it, but in a separate section from the others (e.g. the others might be listed as "Misaligned Feature Release Cycles", which is an inherent logistical problem that no amount of coding can fix, while regex would instead be categorised as "Technical Challenges"). > Of the 5, I'd suggest that regex is the only one that could potentially >> still make its way into the standard library some day - it would just >> require someone with both the time and inclination to create a CPython >> variant that used _regex instead of _sre as the default regex engine, and >> then gathered evidence to show that it was "compatible enough" with _sre to >> serve as the default engine for CPython. >> >> For the first four, there are compelling arguments that their drivers for >> new feature additions are such that their release cycles shouldn't ever be >> tied to the rate at which we update the Python language definition. >> > > As you can tell from my arguing, the reasons need to be written up in more > detail. > > >> And is it really just going to be a list of names, or is there going to >>> be some documentation (about the vetting, not about the contents of the >>> packages) for each name? >>> >> >> I'm thinking a new subsection in https://docs.python.org/devgui >> de/stdlibchanges.html for "Recommended Third Party Packages" would make >> sense, covering what I wrote above. >> > > That's too well hidden for my taste. > > >> It also occurred to me that since the recommendations are independent of >> the Python version, they don't really belong in the version specific >> documentation. >> > > But that doesn't mean they can't (also) be listed there. (And each > probably has its version dependencies.) > > >> While the Developer's Guide isn't really the right place for the list >> either (except as an easier way to answer "Why isn't in the standard >> library?" questions), it could be a good interim option until I get around >> to actually writing a first draft of https://github.com/python/redi >> stributor-guide/ (which I was talking to Barry about at the dev sprint, >> but didn't end up actually creating any content for since I went down a >> signal handling rabbit hole instead). >> > > Hm, let's not put more arbitrary check boxes in the way of progress. Maybe > it can be an informational PEP that's occasionally updated? > If I'm correctly reading that as "Could the list of Recommended Third Party Packages be an informational PEP?", then I agree that's probably a good way to tackle it, since it will cover both the developer-centric "Why isn't this in the standard library yet?" aspect *and* the "Redistributors should probably provide this" aspect. Cheers, Nick. -- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia -------------- next part -------------- An HTML attachment was scrubbed... URL: From levkivskyi at gmail.com Tue Oct 31 12:21:27 2017 From: levkivskyi at gmail.com (Ivan Levkivskyi) Date: Tue, 31 Oct 2017 17:21:27 +0100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: I think it was proposed several times before, but I just wanted to revive the idea that we could add a GUI interface to install/update packages from IDLE (maybe even with some package browser). -- Ivan -------------- next part -------------- An HTML attachment was scrubbed... URL: From guido at python.org Tue Oct 31 12:26:27 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Oct 2017 09:26:27 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: OK, go ahead and write the PEP! I'm actually happy with the responses you gave, so your last email will make a good start for some of the contents of the PEP. On Tue, Oct 31, 2017 at 9:19 AM, Nick Coghlan wrote: > On 1 November 2017 at 00:53, Guido van Rossum wrote: > >> On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan wrote: >> >>> On 31 October 2017 at 02:29, Guido van Rossum wrote: >>> >>>> What's your proposed process to arrive at the list of recommended >>>> packages? >>>> >>> >>> I'm thinking it makes the most sense to treat inclusion in the >>> recommended packages list as a possible outcome of proposals for standard >>> library inclusion, rather than being something we'd provide a way to >>> propose specifically. >>> >> >> I don't think that gets you off the hook for a process proposal. We need >> some criteria to explain why a module should be on the recommended list -- >> not just a ruling as to why it shouldn't be in the stdlib. >> > > The developer guide already has couple of sections on this aspect: > > * https://devguide.python.org/stdlibchanges/#acceptable-types-of-modules > * https://devguide.python.org/stdlibchanges/#requirements > > I don't think either of those sections is actually quite right (since > we've approved new modules that wouldn't meet them), but they're not > terrible as a starting point in general, and they're accurate for the > recommended packages use case. > > >> We'd only use it in cases where a proposal would otherwise meet the >>> criteria for stdlib inclusion, but the logistics of actually doing so don't >>> work for some reason. >>> >> >> But that would exclude most of the modules you mention below, since one >> of the criteria is that their development speed be matched with Python's >> release cycle. I think there must be some form of "popularity" combined >> with "best of breed". In particular I'd like to have a rule that explains >> why flask and Django would never make the list. (I don't know what that >> rule is, or I would tell you -- my gut tells me it's something to do with >> having their own community *and* competing for the same spot.) >> > > The developer guide words this as "The module needs to be considered > best-of-breed.". In some categories (like WSGI frameworks), there are > inherent trade-offs that mean there will *never* be a single best-of-breed > solution, since projects like Django, Flask, and Pyramid occupy > deliberately different points in the space of available design decisions. > > Running the initial 5 proposals through that filter: >>> >>> * six: a cross-version compatibility layer clearly needs to be outside >>> the standard library >>> >> >> Hm... Does six still change regularly? If not I think it *would* be a >> candidate for actual stdlib inclusion. Just like we added u"..." literals >> to Python 3.4. >> > > It still changes as folks porting new projects discover additional > discrepancies between the 2.x and 3.x standard library layouts and > behaviour (e.g. we found recently that the 3.x subprocess module's > emulation of the old commands module APIs actually bit shifts the status > codes relative to the 2.7 versions). The rate of change has definitely > slowed down a lot since the early days, but it isn't zero. > > In addition, the only folks that need it are those that already care about > older versions of Python - if you can start with whatever the latest > version of Python is, and don't have any reason to support users still > running older version, you can happily pretend those older versions don't > exist, and hence don't need a compatibility library like six. As a separate > library, it can just gracefully fade away as folks stop depending on it as > they drop Python 2.7 support. By contrast, if we were to bring it into the > 3.x standard library, then we'd eventually have to figure out when we could > deprecate and remove it again. > > >> * setuptools: we want to update this in line with the PyPA interop specs, >>> not the Python language version >>> >> >> But does that exclude stdlib inclusion? Why would those specs change, and >> why couldn't they wait for a new Python release? >> > > The specs mainly change when we want to offer publishers new capabilities > while still maintaining compatibility with older installation clients (and > vice-versa: we want folks still running Python 2.7 to be able to publish > wheel files and use recently added metadata fields like > Description-Content-Type). > > The reason we can't wait for new Python releases is because when we add > such things, we need them to work on *all* supported Python releases > (including 2.7 and security-release-only 3.x versions). > > There are also other drivers for setuptools updates, which include: > > - operating system build toolchain changes (e.g. finding new versions of > Visual Studio or XCode) > - changes to PyPI's operations (e.g. the legacy upload API getting turned > off due to persistent service stability problems, switching to HTTPS only > access) > > With setuptools as a separate project, a whole lot of package publication > problems can be solved via "pip install --upgrade setuptools wheel" in a > virtual environment, which is a luxury we don't have with plain distutils. > > >> * cffi: updates may be needed for PyPA interop specs, Python >> implementation updates or C language definition updates >> >> Hm, again, I don't recall that this was debated -- I think it's a failure >> that it's not in the stdlib. >> > > A couple of years ago I would have agreed with you, but I've spent enough > time on packaging problems now to realise that cffi actually qualifies as a > build tool due to the way it generates extension module wrappers when used > in "out-of-line" mode. > > Being outside the standard library means that cffi still has significant > flexibility to evolve how it separates its buildtime functionality from its > runtime functionality, and may eventually be adjusted so that only > "_cffi_backend" needs to be installed at runtime for the out-of-line > compilation mode, without the full C header parsing and inline extension > module compilation capabilities of CFFI itself (see > https://cffi.readthedocs.io/en/latest/cdef.html#preparing- > and-distributing-modules for the details). Being separate also means that > cffi can be updated to generate more efficient code even for existing > Python versions. > > >> * requests: updates are more likely to be driven by changes in network >>> protocols and client platform APIs than Python language changes >>> >> >> Here I agree. There's no alternative (except aiohttp, but that's >> asyncio-based) and it can't be in the stdlib because it's actively being >> developed. >> >> >>> * regex: we don't want two regex engines in the stdlib, transparently >>> replacing _sre would be difficult, and _sre is still good enough for most >>> purposes >>> >> >> I think this needn't be recommended at all. For 99.9% of regular >> expression uses, re is just fine. Let's just work on a strategy for >> introducing regex into the stdlib. >> > > Given your informational PEP suggestion below, I'd probably still include > it, but in a separate section from the others (e.g. the others might be > listed as "Misaligned Feature Release Cycles", which is an inherent > logistical problem that no amount of coding can fix, while regex would > instead be categorised as "Technical Challenges"). > > >> Of the 5, I'd suggest that regex is the only one that could potentially >>> still make its way into the standard library some day - it would just >>> require someone with both the time and inclination to create a CPython >>> variant that used _regex instead of _sre as the default regex engine, and >>> then gathered evidence to show that it was "compatible enough" with _sre to >>> serve as the default engine for CPython. >>> >>> For the first four, there are compelling arguments that their drivers >>> for new feature additions are such that their release cycles shouldn't ever >>> be tied to the rate at which we update the Python language definition. >>> >> >> As you can tell from my arguing, the reasons need to be written up in >> more detail. >> >> >>> And is it really just going to be a list of names, or is there going to >>>> be some documentation (about the vetting, not about the contents of the >>>> packages) for each name? >>>> >>> >>> I'm thinking a new subsection in https://docs.python.org/devgui >>> de/stdlibchanges.html for "Recommended Third Party Packages" would make >>> sense, covering what I wrote above. >>> >> >> That's too well hidden for my taste. >> >> >>> It also occurred to me that since the recommendations are independent of >>> the Python version, they don't really belong in the version specific >>> documentation. >>> >> >> But that doesn't mean they can't (also) be listed there. (And each >> probably has its version dependencies.) >> >> >>> While the Developer's Guide isn't really the right place for the list >>> either (except as an easier way to answer "Why isn't in the standard >>> library?" questions), it could be a good interim option until I get around >>> to actually writing a first draft of https://github.com/python/redi >>> stributor-guide/ (which I was talking to Barry about at the dev sprint, >>> but didn't end up actually creating any content for since I went down a >>> signal handling rabbit hole instead). >>> >> >> Hm, let's not put more arbitrary check boxes in the way of progress. >> Maybe it can be an informational PEP that's occasionally updated? >> > > If I'm correctly reading that as "Could the list of Recommended Third > Party Packages be an informational PEP?", then I agree that's probably a > good way to tackle it, since it will cover both the developer-centric "Why > isn't this in the standard library yet?" aspect *and* the "Redistributors > should probably provide this" aspect. > > Cheers, > Nick. > > -- > Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia > -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Tue Oct 31 12:29:14 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 31 Oct 2017 12:29:14 -0400 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: On Tuesday, October 31, 2017, Guido van Rossum wrote: > On Tue, Oct 31, 2017 at 4:42 AM, Nick Coghlan > wrote: > >> On 31 October 2017 at 02:29, Guido van Rossum > > wrote: >> >>> What's your proposed process to arrive at the list of recommended >>> packages? >>> >> >> I'm thinking it makes the most sense to treat inclusion in the >> recommended packages list as a possible outcome of proposals for standard >> library inclusion, rather than being something we'd provide a way to >> propose specifically. >> > > I don't think that gets you off the hook for a process proposal. We need > some criteria to explain why a module should be on the recommended list -- > not just a ruling as to why it shouldn't be in the stdlib. > > >> We'd only use it in cases where a proposal would otherwise meet the >> criteria for stdlib inclusion, but the logistics of actually doing so don't >> work for some reason. >> > > But that would exclude most of the modules you mention below, since one of > the criteria is that their development speed be matched with Python's > release cycle. I think there must be some form of "popularity" combined > with "best of breed". In particular I'd like to have a rule that explains > why flask and Django would never make the list. (I don't know what that > rule is, or I would tell you -- my gut tells me it's something to do with > having their own community *and* competing for the same spot.) > > Running the initial 5 proposals through that filter: >> >> * six: a cross-version compatibility layer clearly needs to be outside >> the standard library >> > > Hm... Does six still change regularly? If not I think it *would* be a > candidate for actual stdlib inclusion. Just like we added u"..." literals > to Python 3.4. > > >> * setuptools: we want to update this in line with the PyPA interop specs, >> not the Python language version >> > > But does that exclude stdlib inclusion? Why would those specs change, and > why couldn't they wait for a new Python release? > > >> * cffi: updates may be needed for PyPA interop specs, Python >> implementation updates or C language definition updates >> > > Hm, again, I don't recall that this was debated -- I think it's a failure > that it's not in the stdlib. > > >> * requests: updates are more likely to be driven by changes in network >> protocols and client platform APIs than Python language changes >> > > Here I agree. There's no alternative (except aiohttp, but that's > asyncio-based) and it can't be in the stdlib because it's actively being > developed. > What about certifi (SSL bundles (from requests (?)) on PyPi) https://pypi.org/project/certifi/ ? > > >> * regex: we don't want two regex engines in the stdlib, transparently >> replacing _sre would be difficult, and _sre is still good enough for most >> purposes >> > > I think this needn't be recommended at all. For 99.9% of regular > expression uses, re is just fine. Let's just work on a strategy for > introducing regex into the stdlib. > > >> Of the 5, I'd suggest that regex is the only one that could potentially >> still make its way into the standard library some day - it would just >> require someone with both the time and inclination to create a CPython >> variant that used _regex instead of _sre as the default regex engine, and >> then gathered evidence to show that it was "compatible enough" with _sre to >> serve as the default engine for CPython. >> >> For the first four, there are compelling arguments that their drivers for >> new feature additions are such that their release cycles shouldn't ever be >> tied to the rate at which we update the Python language definition. >> > > As you can tell from my arguing, the reasons need to be written up in more > detail. > > >> And is it really just going to be a list of names, or is there going to >>> be some documentation (about the vetting, not about the contents of the >>> packages) for each name? >>> >> >> I'm thinking a new subsection in https://docs.python.org/devgui >> de/stdlibchanges.html for "Recommended Third Party Packages" would make >> sense, covering what I wrote above. >> > > That's too well hidden for my taste. > > >> It also occurred to me that since the recommendations are independent of >> the Python version, they don't really belong in the version specific >> documentation. >> > > But that doesn't mean they can't (also) be listed there. (And each > probably has its version dependencies.) > > >> While the Developer's Guide isn't really the right place for the list >> either (except as an easier way to answer "Why isn't in the standard >> library?" questions), it could be a good interim option until I get around >> to actually writing a first draft of https://github.com/python/redi >> stributor-guide/ (which I was talking to Barry about at the dev sprint, >> but didn't end up actually creating any content for since I went down a >> signal handling rabbit hole instead). >> > > Hm, let's not put more arbitrary check boxes in the way of progress. Maybe > it can be an informational PEP that's occasionally updated? > > -- > --Guido van Rossum (python.org/~guido) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From steve at pearwood.info Tue Oct 31 13:31:07 2017 From: steve at pearwood.info (Steven D'Aprano) Date: Wed, 1 Nov 2017 04:31:07 +1100 Subject: [Python-ideas] Python-ideas Digest, Vol 131, Issue 106 In-Reply-To: References: <3d5bb8b1-246c-74d0-2944-9ced462ee6bb@mail.mipt.ru> <20171031074647.GB9068@ando.pearwood.info> Message-ID: <20171031173107.GE9068@ando.pearwood.info> On Tue, Oct 31, 2017 at 10:31:50AM +0000, ????? wrote: > Off topic: why can't we simply allow something like this: > > (the_bob) = (name for name in ('bob','fred') if name=='bob') Parens don't make a tuple. They are just for grouping. If you want a tuple, you need a comma: the_bob, = ... with or without the parens. It would be terribly surprising if (x) was a sequence on the left hand side but not on the right hand side of an assignment. > Why does Python treat the parenthesis at the LHS as grouping parens? > operators are not allowed anyway; (a + (b + c)) = [1] is syntax error. a, (b, c), d = [1, "xy", 2] > Currently > > (x) = 1 > > works, but I can't see why should it. Why shouldn't it? Its just a trivial case of the fact that the left hand side can be certain kinds of expressions, some of which require parens: (spam or ham)[x] = value There are lots of possible expressions allowed on the LHS, and no good reason to prohibit (x) even though its pointless. -- Steve From rosuav at gmail.com Tue Oct 31 14:05:29 2017 From: rosuav at gmail.com (Chris Angelico) Date: Wed, 1 Nov 2017 05:05:29 +1100 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: On Wed, Nov 1, 2017 at 2:49 AM, Chris Barker wrote: > And sure, some of those are completly clueless about what a command line is > and how to use it, but others DO have an idea about the command line, but > dont know that: > >>>> pip install something > File "", line 1 > pip install something > ^ > SyntaxError: invalid syntax > > means: "this was supposed to be run at the command prompt" > > So I think defining a "pip" builtin that simply gave a helpful message would > be a good start. > > (hmm, it's a syntax error, so not as simple as a builtin -- but it could be > caught somehow to give a better message) This sounds like the job for an enhanced REPL like Jupyter/ipython. In fact, it might already exist - I haven't looked. ChrisA From python at mrabarnett.plus.com Tue Oct 31 14:41:50 2017 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 31 Oct 2017 18:41:50 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: Message-ID: <13e5b1bc-7f93-c530-35a8-cbe4651c5975@mrabarnett.plus.com> On 2017-10-31 11:42, Nick Coghlan wrote: > On 31 October 2017 at 02:29, Guido van Rossum > wrote: > > What's your proposed process to arrive at the list of recommended > packages? > > > I'm thinking it makes the most sense to treat inclusion in the > recommended packages list as a possible outcome of proposals for > standard library inclusion, rather than being something we'd provide a > way to propose specifically. > > We'd only use it in cases where a proposal would otherwise meet the > criteria for stdlib inclusion, but the logistics of actually doing so > don't work for some reason. > > Running the initial 5 proposals through that filter: > > * six: a cross-version compatibility layer clearly needs to be outside > the standard library > * setuptools: we want to update this in line with the PyPA interop > specs, not the Python language version > * cffi: updates may be needed for PyPA interop specs, Python > implementation updates or C language definition updates > * requests: updates are more likely to be driven by changes in network > protocols and client platform APIs than Python language changes > * regex: we don't want two regex engines in the stdlib, transparently > replacing _sre would be difficult, and _sre is still good enough for > most purposes > regex gets updated when the Unicode Consortium releases an update. [snip] From guido at python.org Tue Oct 31 14:44:45 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Oct 2017 11:44:45 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <13e5b1bc-7f93-c530-35a8-cbe4651c5975@mrabarnett.plus.com> References: <13e5b1bc-7f93-c530-35a8-cbe4651c5975@mrabarnett.plus.com> Message-ID: On Tue, Oct 31, 2017 at 11:41 AM, MRAB wrote: > regex gets updated when the Unicode Consortium releases an update. > Is it a feature that that is more frequently than Python releases? There are other things in Python that must be updated whenever the UC releases an update, and they get treated as features (or perhaps as bugfixes, I'm not sure) but this means they generally don't get backported. -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From python at mrabarnett.plus.com Tue Oct 31 15:24:35 2017 From: python at mrabarnett.plus.com (MRAB) Date: Tue, 31 Oct 2017 19:24:35 +0000 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: References: <13e5b1bc-7f93-c530-35a8-cbe4651c5975@mrabarnett.plus.com> Message-ID: <5d0d0c90-0919-f248-60eb-2be208514d99@mrabarnett.plus.com> On 2017-10-31 18:44, Guido van Rossum wrote: > On Tue, Oct 31, 2017 at 11:41 AM, MRAB > wrote: > > regex gets updated when the Unicode Consortium releases an update. > > > Is it a feature that that is more frequently than Python releases? > There are other things in Python that must be updated whenever the UC > releases an update, and they get treated as features (or perhaps as > bugfixes, I'm not sure) but this means they generally don't get > backported. > Here's a list of the updates to Unicode: https://www.unicode.org/versions/enumeratedversions.html Roughly yearly. Those still on Python 2.7, for example, are now 8 years behind re Unicode. At least Python 3.6 is only 1 year/release behind, which is fine! From gadgetsteve at live.co.uk Tue Oct 31 15:31:56 2017 From: gadgetsteve at live.co.uk (Steve Barnes) Date: Tue, 31 Oct 2017 19:31:56 +0000 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: On 31/10/2017 18:05, Chris Angelico wrote: > On Wed, Nov 1, 2017 at 2:49 AM, Chris Barker wrote: >> And sure, some of those are completly clueless about what a command line is >> and how to use it, but others DO have an idea about the command line, but >> dont know that: >> >>>>> pip install something >> File "", line 1 >> pip install something >> ^ >> SyntaxError: invalid syntax >> >> means: "this was supposed to be run at the command prompt" >> >> So I think defining a "pip" builtin that simply gave a helpful message would >> be a good start. >> >> (hmm, it's a syntax error, so not as simple as a builtin -- but it could be >> caught somehow to give a better message) > > This sounds like the job for an enhanced REPL like Jupyter/ipython. In > fact, it might already exist - I haven't looked. > > ChrisA Of course those in environments where they need to avoid the use of the shell could use ipython and simply start the session with: !pip install -U needed_package before doing any imports. -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer. --- This email has been checked for viruses by AVG. http://www.avg.com From guido at python.org Tue Oct 31 15:56:22 2017 From: guido at python.org (Guido van Rossum) Date: Tue, 31 Oct 2017 12:56:22 -0700 Subject: [Python-ideas] Defining an easily installable "Recommended baseline package set" In-Reply-To: <5d0d0c90-0919-f248-60eb-2be208514d99@mrabarnett.plus.com> References: <13e5b1bc-7f93-c530-35a8-cbe4651c5975@mrabarnett.plus.com> <5d0d0c90-0919-f248-60eb-2be208514d99@mrabarnett.plus.com> Message-ID: On Tue, Oct 31, 2017 at 12:24 PM, MRAB wrote: > On 2017-10-31 18:44, Guido van Rossum wrote: > >> On Tue, Oct 31, 2017 at 11:41 AM, MRAB > > wrote: >> >> regex gets updated when the Unicode Consortium releases an update. >> >> >> Is it a feature that that is more frequently than Python releases? There >> are other things in Python that must be updated whenever the UC releases an >> update, and they get treated as features (or perhaps as bugfixes, I'm not >> sure) but this means they generally don't get backported. >> >> Here's a list of the updates to Unicode: > > https://www.unicode.org/versions/enumeratedversions.html > > Roughly yearly. > > Those still on Python 2.7, for example, are now 8 years behind re Unicode. > Frankly I consider that a feature of the 2.7 end-of-life planning. :-) > At least Python 3.6 is only 1 year/release behind, which is fine! > OK, so presumably that argument doesn't preclude inclusion in the 3.7 (or later) stdlib. I'm beginning to warm up to the idea again... Maybe we should just bite the bullet. Nick, what do you think? Is it worth a small PEP? -- --Guido van Rossum (python.org/~guido) -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 31 16:31:48 2017 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 31 Oct 2017 13:31:48 -0700 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: On Tue, Oct 31, 2017 at 11:05 AM, Chris Angelico wrote: > > So I think defining a "pip" builtin that simply gave a helpful message > would > > be a good start. > > > > (hmm, it's a syntax error, so not as simple as a builtin -- but it could > be > > caught somehow to give a better message) > > This sounds like the job for an enhanced REPL like Jupyter/ipython. In > fact, it might already exist - I haven't looked. > I jsu tlooked, no it doesn't -- but yes, a good idea for a feature request. However, no matter how you slice it, some folks will be trying to to run pip via the usual REPL, if we could get them a nice error, it would save a lot of headaches and useless messages to mailing lists... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From cs at cskk.id.au Tue Oct 31 18:13:00 2017 From: cs at cskk.id.au (Cameron Simpson) Date: Wed, 1 Nov 2017 09:13:00 +1100 Subject: [Python-ideas] Add single() to itertools In-Reply-To: <59F84783.1020101@canterbury.ac.nz> References: <59F84783.1020101@canterbury.ac.nz> Message-ID: <20171031221300.GA95209@cskk.homeip.net> On 31Oct2017 22:50, Greg Ewing wrote: >Koos Zevenhoven wrote: > > |defsingle(i): try: ||v =i.next() |||exceptStopIteration:||||raiseException('No values')|||try: ||i.next() ||exceptStopIteration: ||returnv||else: ||raiseException('Too many > values')|||printsingle(name forname in('bob','fred')ifname=='bob')||| | > >Looks like a clever method of whitespace compression to me. Those narrow >vertical bars take up far less room that spaces! And so convenient! No dependency on some parochial indentation size policy. - Cameron From tjreedy at udel.edu Tue Oct 31 18:50:10 2017 From: tjreedy at udel.edu (Terry Reedy) Date: Tue, 31 Oct 2017 18:50:10 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: On 10/31/2017 12:21 PM, Ivan Levkivskyi wrote: > I think it was proposed several times before, but I just wanted to > revive the idea that we could add > a GUI interface to install/update packages from IDLE (maybe even with > some package browser). https://bugs.python.org/issue23551. I agreed with and still agree with Raymond's opening message in Feb 2015: "In teaching Python, I find that many Windows users are command-line challenged and have difficulties using and accessing PIP. ... I would love to be able to start a class with a fresh Python download from python.org and effortlessly install requests and other tools without users having to fire-up a terminal window and wrestle with the various parts." The one change I made in Raymond's proposal is that instead of having multiple IDLE menu entries tied to multiple IDLE functions invoking multiple pip functions, there would be one IDLE menu entry, perhaps 'Help => Install packages' (plural intentional), that would invoke a standalone tkinter based gui front-end to pip. 'Standalone' means no dependency on IDLE code. I don't think every IDE or app should *have to* write its own gui. Plus, a standalone tkinter module could be invoked from a command line with 'python -m pipgui' or invoked from interactive python with 'import pipgui; pipgui.main()'. In April 2016, after posting the idea to pydev list and getting 'go ahead's from Nick Coughlin and someone else, with no negatives, I approved Upendra Kumar's GSOC proposal to write a pip gui. This was https://bugs.python.org/issue27051. On June 20, Ned Deily and Nick Coughlin vetoed adding a pip gui anywhere in the stdlib since it depended on something not in the stdlib, and perhaps for other reasons I don't fully understand. Looking back, I can see that I made two mistakes. The first was proposing to use the public-looking pip.main after importing pip. It is actually intended to be private (and should have been named '_main' to make that clearer). As it turns out, the extra work of accessing pip through the intended command line interface (via subprocess) is necessary anyway since running pip makes changes to the in-memory modules that are not reset when .main is called again. So it might as well be used for every access. The second was not requiring an approved PEP before proceeding to actual coding. -- Terry Jan Reedy From wes.turner at gmail.com Tue Oct 31 19:03:03 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 31 Oct 2017 19:03:03 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: You could teach them subprocess and os command injection safety from the start: ```python import subprocess import sys cmd = [sys.executable, -m', 'pip', 'install', '-r', 'psfblessed-requirements.txt']) retcode = subprocess.check_call(cmd) assert retcode == 0 ``` (Because shell=True is dangerous) On Tuesday, October 31, 2017, Terry Reedy wrote: > On 10/31/2017 12:21 PM, Ivan Levkivskyi wrote: > >> I think it was proposed several times before, but I just wanted to revive >> the idea that we could add >> a GUI interface to install/update packages from IDLE (maybe even with >> some package browser). >> > > https://bugs.python.org/issue23551. I agreed with and still agree with > Raymond's opening message in Feb 2015: > "In teaching Python, I find that many Windows users are command-line > challenged and have difficulties using and accessing PIP. ... I would love > to be able to start a class with a fresh Python download from python.org > and effortlessly install requests and other tools without users having to > fire-up a terminal window and wrestle with the various parts." > > The one change I made in Raymond's proposal is that instead of having > multiple IDLE menu entries tied to multiple IDLE functions invoking > multiple pip functions, there would be one IDLE menu entry, perhaps 'Help > => Install packages' (plural intentional), that would invoke a standalone > tkinter based gui front-end to pip. 'Standalone' means no dependency on > IDLE code. I don't think every IDE or app should *have to* write its own > gui. Plus, a standalone tkinter module could be invoked from a command > line with 'python -m pipgui' or invoked from interactive python with > 'import pipgui; pipgui.main()'. > > In April 2016, after posting the idea to pydev list and getting 'go > ahead's from Nick Coughlin and someone else, with no negatives, I approved > Upendra Kumar's GSOC proposal to write a pip gui. This was > https://bugs.python.org/issue27051. On June 20, Ned Deily and Nick > Coughlin vetoed adding a pip gui anywhere in the stdlib since it depended > on something not in the stdlib, and perhaps for other reasons I don't fully > understand. > > Looking back, I can see that I made two mistakes. > > The first was proposing to use the public-looking pip.main after importing > pip. It is actually intended to be private (and should have been named > '_main' to make that clearer). As it turns out, the extra work of > accessing pip through the intended command line interface (via > subprocess) is necessary anyway since running pip makes changes to the > in-memory modules that are not reset when .main is called again. So it > might as well be used for every access. > > The second was not requiring an approved PEP before proceeding to actual > coding. > > -- > Terry Jan Reedy > > _______________________________________________ > Python-ideas mailing list > Python-ideas at python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wes.turner at gmail.com Tue Oct 31 19:38:36 2017 From: wes.turner at gmail.com (Wes Turner) Date: Tue, 31 Oct 2017 19:38:36 -0400 Subject: [Python-ideas] install pip packages from Python prompt In-Reply-To: References: <01f701d350eb$c4ee6210$4ecb2630$@sdamon.com> <021601d350ee$88f236d0$9ad6a470$@sdamon.com> <045101d351b9$54a25240$fde6f6c0$@sdamon.com> <4409775187778125080@unknownmsgid> Message-ID: You could teach them subprocess and os command injection safety from the start: ```python import subprocess import sys cmd = [sys.executable, -m', 'pip', 'install', '-r', 'psfblessed-requirements.txt']) retcode = subprocess.check_call(cmd) assert retcode == 0 ``` (Because shell=True is dangerous and str.split is dangerous): ```python filename = "'/etc/ passwd' ; shutdown -r now" cmd = ("cat '%s'" % filename) cmd # "cat ''/etc/ passwd'' ; shutdown -r now" cmd.split() # ["'", '/etc', 'passwd', ';', 'shutdown', '-r', 'now'] shlex.split(cmd) # ['cmd', '', '/etc', 'passwd', ';', 'shutdown', '-r', 'now'] ``` (Sarge solves for a number of additional cases beyond shlex.split (empty string should be '', 'already-quoted commands') https://sarge.readthedocs.io/en/latest/overview.html#why-not-just-use-subprocess https://en.wikipedia.org/wiki/Code_injection#Shell_injection https://sarge.readthedocs.io/en/latest/internals.html#how-shell-quoting-works Of course, we're programmers and our input is not untrusted, so shell=True without any string operations is not as dangerous. On Tuesday, October 31, 2017, Wes Turner wrote: > You could teach them subprocess and os command injection safety from the > start: > > ```python > import subprocess > import sys > cmd = [sys.executable, -m', 'pip', 'install', '-r', > 'psfblessed-requirements.txt']) > retcode = subprocess.check_call(cmd) > assert retcode == 0 > ``` > > (Because shell=True is dangerous) > > On Tuesday, October 31, 2017, Terry Reedy > wrote: > >> On 10/31/2017 12:21 PM, Ivan Levkivskyi wrote: >> >>> I think it was proposed several times before, but I just wanted to >>> revive the idea that we could add >>> a GUI interface to install/update packages from IDLE (maybe even with >>> some package browser). >>> >> >> https://bugs.python.org/issue23551. I agreed with and still agree with >> Raymond's opening message in Feb 2015: >> "In teaching Python, I find that many Windows users are command-line >> challenged and have difficulties using and accessing PIP. ... I would love >> to be able to start a class with a fresh Python download from python.org >> and effortlessly install requests and other tools without users having to >> fire-up a terminal window and wrestle with the various parts." >> >> The one change I made in Raymond's proposal is that instead of having >> multiple IDLE menu entries tied to multiple IDLE functions invoking >> multiple pip functions, there would be one IDLE menu entry, perhaps 'Help >> => Install packages' (plural intentional), that would invoke a standalone >> tkinter based gui front-end to pip. 'Standalone' means no dependency on >> IDLE code. I don't think every IDE or app should *have to* write its own >> gui. Plus, a standalone tkinter module could be invoked from a command >> line with 'python -m pipgui' or invoked from interactive python with >> 'import pipgui; pipgui.main()'. >> >> In April 2016, after posting the idea to pydev list and getting 'go >> ahead's from Nick Coughlin and someone else, with no negatives, I approved >> Upendra Kumar's GSOC proposal to write a pip gui. This was >> https://bugs.python.org/issue27051. On June 20, Ned Deily and Nick >> Coughlin vetoed adding a pip gui anywhere in the stdlib since it depended >> on something not in the stdlib, and perhaps for other reasons I don't fully >> understand. >> >> Looking back, I can see that I made two mistakes. >> >> The first was proposing to use the public-looking pip.main after >> importing pip. It is actually intended to be private (and should have been >> named '_main' to make that clearer). As it turns out, the extra work of >> accessing pip through the intended command line interface (via >> subprocess) is necessary anyway since running pip makes changes to the >> in-memory modules that are not reset when .main is called again. So it >> might as well be used for every access. >> >> The second was not requiring an approved PEP before proceeding to actual >> coding. >> >> -- >> Terry Jan Reedy >> >> _______________________________________________ >> Python-ideas mailing list >> Python-ideas at python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: